IDENTIFYING GENE PATHWAYS ASSOCIATED WITH CANCER CHARACTERISTICS VIA SPARSE STATISTICAL METHODS
IDENTIFYING GENE PATHWAYS ASSOCIATED WITH CANCER CHARACTERISTICS VIA SPARSE STATISTICAL METHODS
We propose a statistical method for uncovering gene pathways that characterize cancer heterogeneity. To incorporate knowledge of the pathways into the model, we define a set of activities of pathways from microarray gene expression data based on the Sparse Probabilistic Principal Component Analysis (SPPCA). A pathway activity logistic regression model is then formulated for cancer phenotype. To select pathway activities related to binary cancer phenotypes, we use the elastic net for the parameter estimation and derive a model selection criterion for selecting tuning parameters included in the model estimation. Our proposed method can also reverse-engineer gene networks based on the identified multiple pathways that enables us to discover novel genegene associations relating with the cancer phenotypes. We illustrate the whole process of the proposed method through the analysis of breast cancer gene expression data.
Existing System:
To handle large numbers of biological elements in high-throughput data like microarray data, various gene set-based methods have been proposed with successful applications, and so on. A key idea of the gene setbased methods is to evaluate enrichment of the significant genes in the prescribed gene set; this leads to the results biologically more interpretable.
However, almost all existing gene set-based methods reply on the results of the analysis of individual genes, e.g., statistical hypothesis testing for each gene, effects of the gene set can be assessed only with gene enrichments.
Furthermore, the effects of multiple gene sets, or gene pathways, are hardly discovered. Since combinations of multiple mutations on genome cause complex diseases like cancers, effects of multiple gene pathways are essential to uncover cancer systems.
Proposed System:
We propose a statistical method to uncover gene pathways that are critical to determine cancer heterogeneity. We develop a pathway activity logistic regression analysis for binary cancer phenotypes.
Our proposed statistical method consists of two-stage procedures. First, the activities of pathways can be estimated by Sparse Probabilistic Principal Component Analysis (SPPCA) from gene expression data. By considering the relationship between SPPCA and graphical Gaussian models, we can reverse-engineer gene networks that are strongly associated with cancer phenotype of interest.
Also, in this step, we find informative subset of genes on the pathways that can improve the power of gene set-based methods. Second, we construct a logistic regression model based on the pathway activities and the corresponding cancer phenotype.
The logistic models are estimated by using an L1-type regularization method, elastic net, for the purpose of discovering a set of pathway activities strongly associated with cancer phenotypes. To optimize tuning parameters included in our method, we use a modified version of AIC.
Software Requirements:
.Net
Front End – ASP.Net
Language – C#.Net
Back End – SQL Server
Windows XP
Hardware Requirements:
RAM : 512 Mb
Hard Disk : 80 Gb
Processor : Pentium IV
Comments are closed.