Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
Mol Psychiatry ; 28(5): 2018-2029, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-36732587

RESUMO

Seven Tesla magnetic resonance spectroscopy (7T MRS) offers a precise measurement of metabolic levels in the human brain via a non-invasive approach. Studying longitudinal changes in brain metabolites could help evaluate the characteristics of disease over time. This approach may also shed light on how the age of study participants and duration of illness may influence these metabolites. This study used 7T MRS to investigate longitudinal patterns of brain metabolites in young adulthood in both healthy controls and patients. A four-year longitudinal cohort with 38 patients with first episode psychosis (onset within 2 years) and 48 healthy controls was used to examine 10 brain metabolites in 5 brain regions associated with the pathophysiology of psychosis in a comprehensive manner. Both patients and controls were found to have significant longitudinal reductions in glutamate in the anterior cingulate cortex (ACC). Only patients were found to have a significant decrease over time in γ-aminobutyric acid, N-acetyl aspartate, myo-inositol, total choline, and total creatine in the ACC. Together we highlight the ACC with dynamic changes in several metabolites in early-stage psychosis, in contrast to the other 4 brain regions that also are known to play roles in psychosis. Meanwhile, glutathione was uniquely found to have a near zero annual percentage change in both patients and controls in all 5 brain regions during a four-year follow-up in young adulthood. Given that a reduction of the glutathione in the ACC has been reported as a feature of treatment-refractory psychosis, this observation further supports the potential of glutathione as a biomarker for this subset of patients with psychosis.


Assuntos
Glutamina , Transtornos Psicóticos , Humanos , Adulto Jovem , Adulto , Glutamina/metabolismo , Transtornos Psicóticos/metabolismo , Encéfalo/metabolismo , Ácido Glutâmico/metabolismo , Giro do Cíngulo/metabolismo , Ácido Aspártico/metabolismo , Glutationa/metabolismo
2.
iScience ; 26(3): 106108, 2023 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-36852282

RESUMO

Many gene signatures have been developed by applying machine learning (ML) on omics profiles, however, their clinical utility is often hindered by limited interpretability and unstable performance. Here, we show the importance of embedding prior biological knowledge in the decision rules yielded by ML approaches to build robust classifiers. We tested this by applying different ML algorithms on gene expression data to predict three difficult cancer phenotypes: bladder cancer progression to muscle-invasive disease, response to neoadjuvant chemotherapy in triple-negative breast cancer, and prostate cancer metastatic progression. We developed two sets of classifiers: mechanistic, by restricting the training to features capturing specific biological mechanisms; and agnostic, in which the training did not use any a priori biological information. Mechanistic models had a similar or better testing performance than their agnostic counterparts, with enhanced interpretability. Our findings support the use of biological constraints to develop robust gene signatures with high translational potential.

3.
PLoS Comput Biol ; 17(6): e1008944, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-34115745

RESUMO

Cancer cells display massive dysregulation of key regulatory pathways due to now well-catalogued mutations and other DNA-related aberrations. Moreover, enormous heterogeneity has been commonly observed in the identity, frequency and location of these aberrations across individuals with the same cancer type or subtype, and this variation naturally propagates to the transcriptome, resulting in myriad types of dysregulated gene expression programs. Many have argued that a more integrative and quantitative analysis of heterogeneity of DNA and RNA molecular profiles may be necessary for designing more systematic explorations of alternative therapies and improving predictive accuracy. We introduce a representation of multi-omics profiles which is sufficiently rich to account for observed heterogeneity and support the construction of quantitative, integrated, metrics of variation. Starting from the network of interactions existing in Reactome, we build a library of "paired DNA-RNA aberrations" that represent prototypical and recurrent patterns of dysregulation in cancer; each two-gene "Source-Target Pair" (STP) consists of a "source" regulatory gene and a "target" gene whose expression is plausibly "controlled" by the source gene. The STP is then "aberrant" in a joint DNA-RNA profile if the source gene is DNA-aberrant (e.g., mutated, deleted, or duplicated), and the downstream target gene is "RNA-aberrant", meaning its expression level is outside the normal, baseline range. With M STPs, each sample profile has exactly one of the 2M possible configurations. We concentrate on subsets of STPs, and the corresponding reduced configurations, by selecting tissue-dependent minimal coverings, defined as the smallest family of STPs with the property that every sample in the considered population displays at least one aberrant STP within that family. These minimal coverings can be computed with integer programming. Given such a covering, a natural measure of cross-sample diversity is the extent to which the particular aberrant STPs composing a covering vary from sample to sample; this variability is captured by the entropy of the distribution over configurations. We apply this program to data from TCGA for six distinct tumor types (breast, prostate, lung, colon, liver, and kidney cancer). This enables an efficient simplification of the complex landscape observed in cancer populations, resulting in the identification of novel signatures of molecular alterations which are not detected with frequency-based criteria. Estimates of cancer heterogeneity across tumor phenotypes reveals a stable pattern: entropy increases with disease severity. This framework is then well-suited to accommodate the expanding complexity of cancer genomes and epigenomes emerging from large consortia projects.


Assuntos
DNA de Neoplasias/genética , Neoplasias/genética , RNA Neoplásico/genética , Biologia Computacional/métodos , Redes Reguladoras de Genes , Humanos , Mutação
4.
PLoS One ; 16(4): e0249002, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33819273

RESUMO

Given the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with data from the Cancer Genome Atlas.


Assuntos
Genômica/métodos , Software , Bases de Dados Genéticas , Humanos , Neoplasias/genética
5.
Metabolites ; 11(1)2020 Dec 30.
Artigo em Inglês | MEDLINE | ID: mdl-33396819

RESUMO

Cancer cells are adept at reprogramming energy metabolism, and the precise manifestation of this metabolic reprogramming exhibits heterogeneity across individuals (and from cell to cell). In this study, we analyzed the metabolic differences between interpersonal heterogeneous cancer phenotypes. We used divergence analysis on gene expression data of 1156 breast normal and tumor samples from The Cancer Genome Atlas (TCGA) and integrated this information with a genome-scale reconstruction of human metabolism to generate personalized, context-specific metabolic networks. Using this approach, we classified the samples into four distinct groups based on their metabolic profiles. Enrichment analysis of the subsystems indicated that amino acid metabolism, fatty acid oxidation, citric acid cycle, androgen and estrogen metabolism, and reactive oxygen species (ROS) detoxification distinguished these four groups. Additionally, we developed a workflow to identify potential drugs that can selectively target genes associated with the reactions of interest. MG-132 (a proteasome inhibitor) and OSU-03012 (a celecoxib derivative) were the top-ranking drugs identified from our analysis and known to have anti-tumor activity. Our approach has the potential to provide mechanistic insights into cancer-specific metabolic dependencies, ultimately enabling the identification of potential drug targets for each patient independently, contributing to a rational personalized medicine approach.

6.
Proc Natl Acad Sci U S A ; 117(2): 857-864, 2020 01 14.
Artigo em Inglês | MEDLINE | ID: mdl-31882448

RESUMO

Cancer is driven by the sequential accumulation of genetic and epigenetic changes in oncogenes and tumor suppressor genes. The timing of these events is not well understood. Moreover, it is currently unknown why the same driver gene change appears as an early event in some cancer types and as a later event, or not at all, in others. These questions have become even more topical with the recent progress brought by genome-wide sequencing studies of cancer. Focusing on mutational events, we provide a mathematical model of the full process of tumor evolution that includes different types of fitness advantages for driver genes and carrying-capacity considerations. The model is able to recapitulate a substantial proportion of the observed cancer incidence in several cancer types (colorectal, pancreatic, and leukemia) and inherited conditions (Lynch and familial adenomatous polyposis), by changing only 2 tissue-specific parameters: the number of stem cells in a tissue and its cell division frequency. The model sheds light on the evolutionary dynamics of cancer by suggesting a generalized early onset of tumorigenesis followed by slow mutational waves, in contrast to previous conclusions. Formulas and estimates are provided for the fitness increases induced by driver mutations, often much larger than previously described, and highly tissue dependent. Our results suggest a mechanistic explanation for why the selective fitness advantage introduced by specific driver genes is tissue dependent.


Assuntos
Carcinogênese/genética , Modelos Genéticos , Neoplasias/classificação , Polipose Adenomatosa do Colo/genética , Idoso , Divisão Celular , Neoplasias Colorretais/genética , Neoplasias Colorretais Hereditárias sem Polipose , Humanos , Pessoa de Meia-Idade , Mutação , Neoplasias/genética , Oncogenes/genética
7.
Proc Natl Acad Sci U S A ; 115(18): 4545-4552, 2018 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-29666255

RESUMO

Data collected from omics technologies have revealed pervasive heterogeneity and stochasticity of molecular states within and between phenotypes. A prominent example of such heterogeneity occurs between genome-wide mRNA, microRNA, and methylation profiles from one individual tumor to another, even within a cancer subtype. However, current methods in bioinformatics, such as detecting differentially expressed genes or CpG sites, are population-based and therefore do not effectively model intersample diversity. Here we introduce a unified theory to quantify sample-level heterogeneity that is applicable to a single omics profile. Specifically, we simplify an omics profile to a digital representation based on the omics profiles from a set of samples from a reference or baseline population (e.g., normal tissues). The state of any subprofile (e.g., expression vector for a subset of genes) is said to be "divergent" if it lies outside the estimated support of the baseline distribution and is consequently interpreted as "dysregulated" relative to that baseline. We focus on two cases: single features (e.g., individual genes) and distinguished subsets (e.g., regulatory pathways). Notably, since the divergence analysis is at the individual sample level, dysregulation can be analyzed probabilistically; for example, one can estimate the probability that a gene or pathway is divergent in some population. Finally, the reduction in complexity facilitates a more "personalized" and biologically interpretable analysis of variation, as illustrated by experiments involving tissue characterization, disease detection and progression, and disease-pathway associations.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Medicina de Precisão/métodos , Biologia Computacional/estatística & dados numéricos , Interpretação Estatística de Dados , Bases de Dados Genéticas , Perfilação da Expressão Gênica/estatística & dados numéricos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , MicroRNAs/genética , Neoplasias/genética , Proteômica/métodos
8.
Bioinformatics ; 34(11): 1859-1867, 2018 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-29342249

RESUMO

Motivation: Current bioinformatics methods to detect changes in gene isoform usage in distinct phenotypes compare the relative expected isoform usage in phenotypes. These statistics model differences in isoform usage in normal tissues, which have stable regulation of gene splicing. Pathological conditions, such as cancer, can have broken regulation of splicing that increases the heterogeneity of the expression of splice variants. Inferring events with such differential heterogeneity in gene isoform usage requires new statistical approaches. Results: We introduce Splice Expression Variability Analysis (SEVA) to model increased heterogeneity of splice variant usage between conditions (e.g. tumor and normal samples). SEVA uses a rank-based multivariate statistic that compares the variability of junction expression profiles within one condition to the variability within another. Simulated data show that SEVA is unique in modeling heterogeneity of gene isoform usage, and benchmark SEVA's performance against EBSeq, DiffSplice and rMATS that model differential isoform usage instead of heterogeneity. We confirm the accuracy of SEVA in identifying known splice variants in head and neck cancer and perform cross-study validation of novel splice variants. A novel comparison of splice variant heterogeneity between subtypes of head and neck cancer demonstrated unanticipated similarity between the heterogeneity of gene isoform usage in HPV-positive and HPV-negative subtypes and anticipated increased heterogeneity among HPV-negative samples with mutations in genes that regulate the splice variant machinery. These results show that SEVA accurately models differential heterogeneity of gene isoform usage from RNA-seq data. Availability and implementation: SEVA is implemented in the R/Bioconductor package GSReg. Contact: bahman@jhu.edu or favorov@sensi.org or ejfertig@jhmi.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Processamento Alternativo , Neoplasias/genética , Isoformas de Proteínas/genética , Análise de Sequência de RNA/métodos , Software , Biologia Computacional/métodos , Regulação Neoplásica da Expressão Gênica , Neoplasias de Cabeça e Pescoço/genética , Humanos , Modelos Genéticos
9.
Hum Genet ; 134(5): 479-95, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25381197

RESUMO

Cancer is perhaps the prototypical systems disease, and as such has been the focus of extensive study in quantitative systems biology. However, translating these programs into personalized clinical care remains elusive and incomplete. In this perspective, we argue that realizing this agenda­in particular, predicting disease phenotypes, progression and treatment response for individuals­requires going well beyond standard computational and bioinformatics tools and algorithms. It entails designing global mathematical models over network-scale configurations of genomic states and molecular concentrations, and learning the model parameters from limited available samples of high-dimensional and integrative omics data. As such, any plausible design should accommodate: biological mechanism, necessary for both feasible learning and interpretable decision making; stochasticity, to deal with uncertainty and observed variation at many scales; and a capacity for statistical inference at the patient level. This program, which requires a close, sustained collaboration between mathematicians and biologists, is illustrated in several contexts, including learning biomarkers, metabolism, cell signaling, network inference and tumorigenesis.


Assuntos
Biologia Computacional/métodos , Interpretação Estatística de Dados , Redes Reguladoras de Genes/genética , Neoplasias/genética , Fenótipo , Biologia de Sistemas/métodos , Pesquisa Translacional Biomédica/métodos , Biomarcadores Tumorais , Carcinogênese/genética , Humanos , Redes e Vias Metabólicas/genética , Redes e Vias Metabólicas/fisiologia , Mutação/genética , Neoplasias/patologia , Transdução de Sinais/genética , Transdução de Sinais/fisiologia , Pesquisa Translacional Biomédica/tendências
10.
Bioinformatics ; 31(2): 273-4, 2015 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-25262153

RESUMO

UNLABELLED: k-Top Scoring Pairs (kTSP) is a classification method for prediction from high-throughput data based on a set of the paired measurements. Each of the two possible orderings of a pair of measurements (e.g. a reversal in the expression of two genes) is associated with one of two classes. The kTSP prediction rule is the aggregation of voting among such individual two-feature decision rules based on order switching. kTSP, like its predecessor, Top Scoring Pair (TSP), is a parameter-free classifier relying only on ranking of a small subset of features, rendering it robust to noise and potentially easy to interpret in biological terms. In contrast to TSP, kTSP has comparable accuracy to standard genomics classification techniques, including Support Vector Machines and Prediction Analysis for Microarrays. Here, we describe 'switchBox', an R package for kTSP-based prediction. AVAILABILITY: The 'switchBox' package is freely available from Bioconductor: http://www.bioconductor.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Biomarcadores Tumorais/genética , Neoplasias da Mama/classificação , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Recidiva Local de Neoplasia/diagnóstico , Neoplasias da Mama/genética , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Recidiva Local de Neoplasia/genética , Máquina de Vetores de Suporte
11.
Cancer Inform ; 13(Suppl 5): 61-7, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25392694

RESUMO

Analysis of gene sets can implicate activity in signaling pathways that is responsible for cancer initiation and progression, but is not discernible from the analysis of individual genes. Multiple methods and software packages have been developed to infer pathway activity from expression measurements for set of genes targeted by that pathway. Broadly, three major methodologies have been proposed: over-representation, enrichment, and differential variability. Both over-representation and enrichment analyses are effective techniques to infer differentially regulated pathways from gene sets with relatively consistent differentially expressed (DE) genes. Specifically, these algorithms aggregate statistics from each gene in the pathway. However, they overlook multivariate patterns related to gene interactions and variations in expression. Therefore, the analysis of differential variability of multigene expression patterns can be essential to pathway inference in cancers. The corresponding methodologies and software packages for such multivariate variability analysis of pathways are reviewed here. We also introduce a new, computationally efficient algorithm, expression variation analysis (EVA), which has been implemented along with a previously proposed algorithm, Differential Rank Conservation (DIRAC), in an open source R package, gene set regulation (GSReg). EVA inferred similar pathways as DIRAC at reduced computational costs. Moreover, EVA also inferred different dysregulated pathways than those identified by enrichment analysis.

12.
PLoS One ; 9(10): e110840, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25330348

RESUMO

BACKGROUND: The biomarker discovery field is replete with molecular signatures that have not translated into the clinic despite ostensibly promising performance in predicting disease phenotypes. One widely cited reason is lack of classification consistency, largely due to failure to maintain performance from study to study. This failure is widely attributed to variability in data collected for the same phenotype among disparate studies, due to technical factors unrelated to phenotypes (e.g., laboratory settings resulting in "batch-effects") and non-phenotype-associated biological variation in the underlying populations. These sources of variability persist in new data collection technologies. METHODS: Here we quantify the impact of these combined "study-effects" on a disease signature's predictive performance by comparing two types of validation methods: ordinary randomized cross-validation (RCV), which extracts random subsets of samples for testing, and inter-study validation (ISV), which excludes an entire study for testing. Whereas RCV hardwires an assumption of training and testing on identically distributed data, this key property is lost in ISV, yielding systematic decreases in performance estimates relative to RCV. Measuring the RCV-ISV difference as a function of number of studies quantifies influence of study-effects on performance. RESULTS: As a case study, we gathered publicly available gene expression data from 1,470 microarray samples of 6 lung phenotypes from 26 independent experimental studies and 769 RNA-seq samples of 2 lung phenotypes from 4 independent studies. We find that the RCV-ISV performance discrepancy is greater in phenotypes with few studies, and that the ISV performance converges toward RCV performance as data from additional studies are incorporated into classification. CONCLUSIONS: We show that by examining how fast ISV performance approaches RCV as the number of studies is increased, one can estimate when "sufficient" diversity has been achieved for learning a molecular signature likely to translate without significant loss of accuracy to new clinical settings.


Assuntos
Biomarcadores Tumorais/biossíntese , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Adenocarcinoma/genética , Adenocarcinoma/patologia , Adenocarcinoma de Pulmão , Carcinoma de Células Escamosas/genética , Carcinoma de Células Escamosas/patologia , Humanos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patologia , Fenótipo , Doença Pulmonar Obstrutiva Crônica/genética , Doença Pulmonar Obstrutiva Crônica/patologia , Análise de Sequência de RNA , Máquina de Vetores de Suporte
13.
PLoS Comput Biol ; 9(7): e1003148, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23935471

RESUMO

We utilized abundant transcriptomic data for the primary classes of brain cancers to study the feasibility of separating all of these diseases simultaneously based on molecular data alone. These signatures were based on a new method reported herein--Identification of Structured Signatures and Classifiers (ISSAC)--that resulted in a brain cancer marker panel of 44 unique genes. Many of these genes have established relevance to the brain cancers examined herein, with others having known roles in cancer biology. Analyses on large-scale data from multiple sources must deal with significant challenges associated with heterogeneity between different published studies, for it was observed that the variation among individual studies often had a larger effect on the transcriptome than did phenotype differences, as is typical. For this reason, we restricted ourselves to studying only cases where we had at least two independent studies performed for each phenotype, and also reprocessed all the raw data from the studies using a unified pre-processing pipeline. We found that learning signatures across multiple datasets greatly enhanced reproducibility and accuracy in predictive performance on truly independent validation sets, even when keeping the size of the training set the same. This was most likely due to the meta-signature encompassing more of the heterogeneity across different sources and conditions, while amplifying signal from the repeated global characteristics of the phenotype. When molecular signatures of brain cancers were constructed from all currently available microarray data, 90% phenotype prediction accuracy, or the accuracy of identifying a particular brain cancer from the background of all phenotypes, was found. Looking forward, we discuss our approach in the context of the eventual development of organ-specific molecular signatures from peripheral fluids such as the blood.


Assuntos
Neoplasias Encefálicas/genética , Transcriptoma , Biomarcadores Tumorais/metabolismo , Neoplasias Encefálicas/metabolismo , Biologia Computacional , Humanos , Reprodutibilidade dos Testes
14.
BMC Genomics ; 14: 336, 2013 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-23682826

RESUMO

BACKGROUND: A small number of prognostic and predictive tests based on gene expression are currently offered as reference laboratory tests. In contrast to such success stories, a number of flaws and errors have recently been identified in other genomic-based predictors and the success rate for developing clinically useful genomic signatures is low. These errors have led to widespread concerns about the protocols for conducting and reporting of computational research. As a result, a need has emerged for a template for reproducible development of genomic signatures that incorporates full transparency, data sharing and statistical robustness. RESULTS: Here we present the first fully reproducible analysis of the data used to train and test MammaPrint, an FDA-cleared prognostic test for breast cancer based on a 70-gene expression signature. We provide all the software and documentation necessary for researchers to build and evaluate genomic classifiers based on these data. As an example of the utility of this reproducible research resource, we develop a simple prognostic classifier that uses only 16 genes from the MammaPrint signature and is equally accurate in predicting 5-year disease free survival. CONCLUSIONS: Our study provides a prototypic example for reproducible development of computational algorithms for learning prognostic biomarkers in the era of personalized medicine.


Assuntos
Neoplasias da Mama/diagnóstico , Neoplasias da Mama/genética , Biologia Computacional/métodos , Perfilação da Expressão Gênica , Estudos de Coortes , Humanos , Prognóstico , Reprodutibilidade dos Testes , Software
15.
Sci Transl Med ; 4(158): 158rv11, 2012 Oct 31.
Artigo em Inglês | MEDLINE | ID: mdl-23115356

RESUMO

Because of the inherent complexity of coupled nonlinear biological systems, the development of computational models is necessary for achieving a quantitative understanding of their structure and function in health and disease. Statistical learning is applied to high-dimensional biomolecular data to create models that describe relationships between molecules and networks. Multiscale modeling links networks to cells, organs, and organ systems. Computational approaches are used to characterize anatomic shape and its variations in health and disease. In each case, the purposes of modeling are to capture all that we know about disease and to develop improved therapies tailored to the needs of individuals. We discuss advances in computational medicine, with specific examples in the fields of cancer, diabetes, cardiology, and neurology. Advances in translating these computational methods to the clinic are described, as well as challenges in applying models for improving patient health.


Assuntos
Biologia Computacional/métodos , Medicina/métodos , Modelos Teóricos , Simulação por Computador , Humanos
16.
Artigo em Inglês | MEDLINE | ID: mdl-20855924

RESUMO

Protein signaling networks play a central role in transcriptional regulation and the etiology of many diseases. Statistical methods, particularly Bayesian networks, have been widely used to model cell signaling, mostly for model organisms and with focus on uncovering connectivity rather than inferring aberrations. Extensions to mammalian systems have not yielded compelling results, due likely to greatly increased complexity and limited proteomic measurements in vivo. In this study, we propose a comprehensive statistical model that is anchored to a predefined core topology, has a limited complexity due to parameter sharing and uses microarray data of mRNA transcripts as the only observable components of signaling. Specifically, we account for cell heterogeneity and a multilevel process, representing signaling as a Bayesian network at the cell level, modeling measurements as ensemble averages at the tissue level, and incorporating patient-to-patient differences at the population level. Motivated by the goal of identifying individual protein abnormalities as potential therapeutical targets, we applied our method to the RAS-RAF network using a breast cancer study with 118 patients. We demonstrated rigorous statistical inference, established reproducibility through simulations and the ability to recover receptor status from available microarray data.


Assuntos
Teorema de Bayes , Comunicação Celular/fisiologia , Biologia Computacional/métodos , Modelos Biológicos , Transdução de Sinais , Algoritmos , Inteligência Artificial , Simulação por Computador , Perfilação da Expressão Gênica , Humanos , Hibridização Genética , Análise de Sequência com Séries de Oligonucleotídeos , RNA Mensageiro/genética , RNA Mensageiro/metabolismo
17.
PLoS Comput Biol ; 6(5): e1000792, 2010 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-20523739

RESUMO

A powerful way to separate signal from noise in biology is to convert the molecular data from individual genes or proteins into an analysis of comparative biological network behaviors. One of the limitations of previous network analyses is that they do not take into account the combinatorial nature of gene interactions within the network. We report here a new technique, Differential Rank Conservation (DIRAC), which permits one to assess these combinatorial interactions to quantify various biological pathways or networks in a comparative sense, and to determine how they change in different individuals experiencing the same disease process. This approach is based on the relative expression values of participating genes-i.e., the ordering of expression within network profiles. DIRAC provides quantitative measures of how network rankings differ either among networks for a selected phenotype or among phenotypes for a selected network. We examined disease phenotypes including cancer subtypes and neurological disorders and identified networks that are tightly regulated, as defined by high conservation of transcript ordering. Interestingly, we observed a strong trend to looser network regulation in more malignant phenotypes and later stages of disease. At a sample level, DIRAC can detect a change in ranking between phenotypes for any selected network. Variably expressed networks represent statistically robust differences between disease states and serve as signatures for accurate molecular classification, validating the information about expression patterns captured by DIRAC. Importantly, DIRAC can be applied not only to transcriptomic data, but to any ordinal data type.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Análise por Conglomerados , Bases de Dados Factuais , Humanos , Neoplasias/genética , Fenótipo , Reprodutibilidade dos Testes , Transdução de Sinais
18.
Technol Cancer Res Treat ; 9(2): 149-59, 2010 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-20218737

RESUMO

The enormous amount of biomolecule measurement data generated from high-throughput technologies has brought an increased need for computational tools in biological analyses. Such tools can enhance our understanding of human health and genetic diseases, such as cancer, by accurately classifying phenotypes, detecting the presence of disease, discriminating among cancer sub-types, predicting clinical outcomes, and characterizing disease progression. In the case of gene expression microarray data, standard statistical learning methods have been used to identify classifiers that can accurately distinguish disease phenotypes. However, these mathematical prediction rules are often highly complex, and they lack the convenience and simplicity desired for extracting underlying biological meaning or transitioning into the clinic. In this review, we survey a powerful collection of computational methods for analyzing transcriptomic microarray data that address these limitations. Relative Expression Analysis (RXA) is based only on the relative orderings among the expressions of a small number of genes. Specifically, we provide a description of the first and simplest example of RXA, the K-TSP classifier, which is based on _ pairs of genes; the case K = 1 is the TSP classifier. Given their simplicity and ease of biological interpretation, as well as their invariance to data normalization and parameter-fitting, these classifiers have been widely applied in aiding molecular diagnostics in a broad range of human cancers. We review several studies which demonstrate accurate classification of disease phenotypes (e.g., cancer vs. normal), cancer subclasses (e.g., AML vs. ALL, GIST vs. LMS), disease outcomes (e.g., metastasis, survival), and diverse human pathologies assayed through blood-borne leukocytes. The studies presented demonstrate that RXA-specifically the TSP and K-TSP classifiers-is a promising new class of computational methods for analyzing high-throughput data, and has the potential to significantly contribute to molecular cancer diagnosis and prognosis.


Assuntos
Perfilação da Expressão Gênica/métodos , Biologia Molecular/métodos , Neoplasias/diagnóstico , Neoplasias/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Algoritmos , Expressão Gênica , Humanos , Reconhecimento Automatizado de Padrão/métodos , Prognóstico
19.
Artigo em Inglês | MEDLINE | ID: mdl-19964680

RESUMO

The computational identification from global data sets of stable and predictive patterns of gene and protein relative expression reversals offers a simple, yet powerful approach to target therapies for personalized medicine and to identify pathways that are disease-perturbed. We previously utilized this approach to identify a molecular classifier with near 100% accuracy for differentiating gastrointestinal stromal tumor (GIST) and leiomyosarcoma (LMS), two cancers that have very similar histopathology, but require very different treatments. Differential Rank Conservation (DIRAC) is a novel approach for studying gene ordering within pathways and is based on the relative expression ranks of participating genes. DIRAC provides quantitative measures of how pathway rankings differ both within and between phenotypes. DIRAC between pathways in a selected phenotype contrasts the scenarios where either (i) pathways are ranked similarly in all samples; or (ii) the ordering of pathway genes is highly varied. We examined gene expression in GIST and LMS tumor profiles and identified pathways that appear to be tightly regulated based on high conservation of gene ordering. The second form of DIRAC manifests as a change in ranking (i.e., shuffling) between phenotypes for a selected pathway. These variably expressed pathways serve as signatures for molecular classification, and the ability to accurately classify microarray samples provided strong validation for the pathway-level expression differences identified by DIRAC.


Assuntos
Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica , Modelos Biológicos , Proteínas de Neoplasias/metabolismo , Sarcoma/metabolismo , Transdução de Sinais , Simulação por Computador , Humanos
20.
BMC Bioinformatics ; 10: 256, 2009 Aug 20.
Artigo em Inglês | MEDLINE | ID: mdl-19695104

RESUMO

BACKGROUND: A major challenge in computational biology is to extract knowledge about the genetic nature of disease from high-throughput data. However, an important obstacle to both biological understanding and clinical applications is the "black box" nature of the decision rules provided by most machine learning approaches, which usually involve many genes combined in a highly complex fashion. Achieving biologically relevant results argues for a different strategy. A promising alternative is to base prediction entirely upon the relative expression ordering of a small number of genes. RESULTS: We present a three-gene version of "relative expression analysis" (RXA), a rigorous and systematic comparison with earlier approaches in a variety of cancer studies, a clinically relevant application to predicting germline BRCA1 mutations in breast cancer and a cross-study validation for predicting ER status. In the BRCA1 study, RXA yields high accuracy with a simple decision rule: in tumors carrying mutations, the expression of a "reference gene" falls between the expression of two differentially expressed genes, PPP1CB and RNF14. An analysis of the protein-protein interactions among the triplet of genes and BRCA1 suggests that the classifier has a biological foundation. CONCLUSION: RXA has the potential to identify genomic "marker interactions" with plausible biological interpretation and direct clinical applicability. It provides a general framework for understanding the roles of the genes involved in decision rules, as illustrated for the difficult and clinically relevant problem of identifying BRCA1 mutation carriers.


Assuntos
Biomarcadores Tumorais/genética , Neoplasias da Mama/genética , Biologia Computacional/métodos , Regulação Neoplásica da Expressão Gênica , Genes BRCA1 , Feminino , Perfilação da Expressão Gênica , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA