Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
1.
EMBO J ; 40(7): e105846, 2021 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-33469951

RESUMEN

Protein termini are determinants of protein stability. Proteins bearing degradation signals, or degrons, at their amino- or carboxyl-termini are eliminated by the N- or C-degron pathways, respectively. We aimed to elucidate the function of C-degron pathways and to unveil how normal proteomes are exempt from C-degron pathway-mediated destruction. Our data reveal that C-degron pathways remove mislocalized cellular proteins and cleavage products of deubiquitinating enzymes. Furthermore, the C-degron and N-degron pathways cooperate in protein removal. Proteome analysis revealed a shortfall in normal proteins targeted by C-degron pathways, but not of defective proteins, suggesting proteolysis-based immunity as a constraint for protein evolution/selection. Our work highlights the importance of protein termini for protein quality surveillance, and the relationship between the functional proteome and protein degradation pathways.


Asunto(s)
Proteolisis , Ubiquitinación , Secuencias de Aminoácidos , Línea Celular Tumoral , Células HEK293 , Humanos , Transporte de Proteínas , Proteoma/química , Proteoma/metabolismo , Receptores de Citocinas/metabolismo
2.
BMC Bioinformatics ; 25(1): 209, 2024 Jun 12.
Artículo en Inglés | MEDLINE | ID: mdl-38867193

RESUMEN

BACKGROUND: Single-cell RNA sequencing (sc-RNASeq) data illuminate transcriptomic heterogeneity but also possess a high level of noise, abundant missing entries and sometimes inadequate or no cell type annotations at all. Bulk-level gene expression data lack direct information of cell population composition but are more robust and complete and often better annotated. We propose a modeling framework to integrate bulk-level and single-cell RNASeq data to address the deficiencies and leverage the mutual strengths of each type of data and enable a more comprehensive inference of their transcriptomic heterogeneity. Contrary to the standard approaches of factorizing the bulk-level data with one algorithm and (for some methods) treating single-cell RNASeq data as references to decompose bulk-level data, we employed multiple deconvolution algorithms to factorize the bulk-level data, constructed the probabilistic graphical models of cell-level gene expressions from the decomposition outcomes, and compared the log-likelihood scores of these models in single-cell data. We term this framework backward deconvolution as inference operates from coarse-grained bulk-level data to fine-grained single-cell data. As the abundant missing entries in sc-RNASeq data have a significant effect on log-likelihood scores, we also developed a criterion for inclusion or exclusion of zero entries in log-likelihood score computation. RESULTS: We selected nine deconvolution algorithms and validated backward deconvolution in five datasets. In the in-silico mixtures of mouse sc-RNASeq data, the log-likelihood scores of the deconvolution algorithms were strongly anticorrelated with their errors of mixture coefficients and cell type specific gene expression signatures. In the true bulk-level mouse data, the sample mixture coefficients were unknown but the log-likelihood scores were strongly correlated with accuracy rates of inferred cell types. In the data of autism spectrum disorder (ASD) and normal controls, we found that ASD brains possessed higher fractions of astrocytes and lower fractions of NRGN-expressing neurons than normal controls. In datasets of breast cancer and low-grade gliomas (LGG), we compared the log-likelihood scores of three simple hypotheses about the gene expression patterns of the cell types underlying the tumor subtypes. The model that tumors of each subtype were dominated by one cell type persistently outperformed an alternative model that each cell type had elevated expression in one gene group and tumors were mixtures of those cell types. Superiority of the former model is also supported by comparing the real breast cancer sc-RNASeq clusters with those generated by simulated sc-RNASeq data. CONCLUSIONS: The results indicate that backward deconvolution serves as a sensible model selection tool for deconvolution algorithms and facilitates discerning hypotheses about cell type compositions underlying heterogeneous specimens such as tumors.


Asunto(s)
Algoritmos , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Transcriptoma , Análisis de la Célula Individual/métodos , Análisis de Secuencia de ARN/métodos , Transcriptoma/genética , Humanos , Perfilación de la Expresión Génica/métodos , Animales , Ratones , Análisis de Expresión Génica de una Sola Célula
3.
Am J Hum Genet ; 106(3): 371-388, 2020 03 05.
Artículo en Inglés | MEDLINE | ID: mdl-32142644

RESUMEN

The population of the United States is shaped by centuries of migration, isolation, growth, and admixture between ancestors of global origins. Here, we assemble a comprehensive view of recent population history by studying the ancestry and population structure of more than 32,000 individuals in the US using genetic, ancestral birth origin, and geographic data from the National Geographic Genographic Project. We identify migration routes and barriers that reflect historical demographic events. We also uncover the spatial patterns of relatedness in subpopulations through the combination of haplotype clustering, ancestral birth origin analysis, and local ancestry inference. Examples of these patterns include substantial substructure and heterogeneity in Hispanics/Latinos, isolation-by-distance in African Americans, elevated levels of relatedness and homozygosity in Asian immigrants, and fine-scale structure in European descents. Taken together, our results provide detailed insights into the genetic structure and demographic history of the diverse US population.


Asunto(s)
Emigración e Inmigración , Genética de Población , Haplotipos , Análisis por Conglomerados , Demografía , Humanos , Estados Unidos
4.
BMC Bioinformatics ; 20(1): 145, 2019 Mar 18.
Artículo en Inglés | MEDLINE | ID: mdl-30885118

RESUMEN

BACKGROUND: Gene Set Enrichment Analysis (GSEA) is a powerful tool to identify enriched functional categories of informative biomarkers. Canonical GSEA takes one-dimensional feature scores derived from the data of one platform as inputs. Numerous extensions of GSEA handling multimodal OMIC data are proposed, yet none of them explicitly captures combinatorial relations of feature scores from multiple platforms. RESULTS: We propose multivariate GSEA (MGSEA) to capture combinatorial relations of gene set enrichment among multiple platform features. MGSEA successfully captures designed feature relations from simulated data. By applying it to the scores of delineating breast cancer and glioblastoma multiforme (GBM) subtypes from The Cancer Genome Atlas (TCGA) datasets of CNV, DNA methylation and mRNA expressions, we find that breast cancer and GBM data yield both similar and distinct outcomes. Among the enriched functional categories, subtype-specific biomarkers are dominated by mRNA expression in many functional categories in both cancer types and also by CNV in many functional categories in breast cancer. The enriched functional categories belonging to distinct combinatorial patterns are involved different oncogenic processes: cell proliferation (such as cell cycle control, estrogen responses, MYC and E2F targets) for mRNA expression in breast cancer, invasion and metastasis (such as cell adhesion and epithelial-mesenchymal transition (EMT)) for CNV in breast cancer, and diverse processes (such as immune and inflammatory responses, cell adhesion, angiogenesis, and EMT) for mRNA expression in GBM. These observations persist in two external datasets (Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) for breast cancer and Repository for Molecular Brain Neoplasia Data (REMBRANDT) for GBM) and are consistent with knowledge of cancer subtypes. We further compare the characteristics of MGSEA with several extensions of GSEA and point out the pros and cons of each method. CONCLUSIONS: We demonstrated the utility of MGSEA by inferring the combinatorial relations of multiple platforms for cancer subtype delineation in three multi-OMIC datasets: TCGA, METABRIC and REMBRANDT. The inferred combinatorial patterns are consistent with the current knowledge and also reveal novel insights about cancer subtypes. MGSEA can be further applied to any genotype-phenotype association problems with multimodal OMIC data.


Asunto(s)
Neoplasias Encefálicas/genética , Neoplasias de la Mama/genética , Glioblastoma/genética , Biomarcadores de Tumor/genética , Proliferación Celular , Metilación de ADN , Bases de Datos Genéticas , Transición Epitelial-Mesenquimal , Regulación Neoplásica de la Expresión Génica , Humanos , Modelos Teóricos , Análisis Multivariante
5.
J Theor Biol ; 474: 88-102, 2019 08 07.
Artículo en Inglés | MEDLINE | ID: mdl-31077681

RESUMEN

Despite recent advances in targeted drugs and immunotherapy, cancer remains "the emperor of all maladies" due to almost inevitable emergence of resistance. Drug resistance is thought to be driven by genetic alterations and/or dynamic plasticity that deregulate pathway activities and regulatory programs of a highly heterogeneous tumour. In this study, we propose a modelling framework to simulate population dynamics of heterogeneous tumour cells with reversible drug resistance. Drug sensitivity of a tumour cell is determined by its internal states, which are demarcated by coordinated activities of multiple interconnected oncogenic pathways. Transitions between cellular states depend on the effects of targeted drugs and regulatory relations between the pathways. Under this framework, we build a simple model to capture drug resistance characteristics of BRAF-mutant melanoma, where two cell states are determined by two mutually inhibitory - main and alternative - pathways. We assume that cells with an activated main pathway are proliferative yet sensitive to the BRAF inhibitor, and cells with an activated alternative pathway are quiescent but resistant to the drug. We describe a dynamical process of tumour growth under various drug regimens using the explicit solutions of mean-field equations. Based on these solutions, we compare efficacy of three treatment strategies from simulated data: static treatments with continuous and constant dosages, periodic treatments with regular intermittent active phases and drug holidays, and treatments derived from optimal control theory (OCT). Periodic treatments outperform static treatments with a considerable margin, while treatments based on OCT outperform the best periodic treatment. Our results provide insights regarding optimal cancer treatment modalities for heterogeneous tumours, and may guide the development of optimal therapeutic strategies to circumvent plastic drug resistance. They can also be used to evaluate the efficacy of suboptimal treatments that may account for side effects of the treatment and the cost of its application.


Asunto(s)
Resistencia a Antineoplásicos , Melanoma , Modelos Biológicos , Mutación , Inhibidores de Proteínas Quinasas/uso terapéutico , Proteínas Proto-Oncogénicas B-raf , Humanos , Melanoma/tratamiento farmacológico , Melanoma/enzimología , Melanoma/genética , Melanoma/patología , Proteínas Proto-Oncogénicas B-raf/antagonistas & inhibidores , Proteínas Proto-Oncogénicas B-raf/genética , Proteínas Proto-Oncogénicas B-raf/metabolismo
6.
PLoS Comput Biol ; 13(2): e1005367, 2017 02.
Artículo en Inglés | MEDLINE | ID: mdl-28178267

RESUMEN

Ambiguity in genetic codes exists in cases where certain stop codons are alternatively used to encode non-canonical amino acids. In selenoprotein transcripts, the UGA codon may either represent a translation termination signal or a selenocysteine (Sec) codon. Translating UGA to Sec requires selenium and specialized Sec incorporation machinery such as the interaction between the SECIS element and SBP2 protein, but how these factors quantitatively affect alternative assignments of UGA has not been fully investigated. We developed a model simulating the UGA decoding process. Our model is based on the following assumptions: (1) charged Sec-specific tRNAs (Sec-tRNASec) and release factors compete for a UGA site, (2) Sec-tRNASec abundance is limited by the concentrations of selenium and Sec-specific tRNA (tRNASec) precursors, and (3) all synthesis reactions follow first-order kinetics. We demonstrated that this model captured two prominent characteristics observed from experimental data. First, UGA to Sec decoding increases with elevated selenium availability, but saturates under high selenium supply. Second, the efficiency of Sec incorporation is reduced with increasing selenoprotein synthesis. We measured the expressions of four selenoprotein constructs and estimated their model parameters. Their inferred Sec incorporation efficiencies did not correlate well with their SECIS-SBP2 binding affinities, suggesting the existence of additional factors determining the hierarchy of selenoprotein synthesis under selenium deficiency. This model provides a framework to systematically study the interplay of factors affecting the dual definitions of a genetic codon.


Asunto(s)
Codón Iniciador/genética , Codón de Terminación/genética , Modelos Genéticos , Proteínas/genética , Selenocisteína/genética , Selenoproteínas/genética , Simulación por Computador , Biosíntesis de Proteínas/genética , Selenoproteínas/biosíntesis , Análisis de Secuencia de ARN/métodos
7.
Nucleic Acids Res ; 41(19): 8803-21, 2013 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-23907387

RESUMEN

Glioblastoma multiforme (GBM) is the most common and malignant primary brain tumor in adults. Decades of investigations and the recent effort of the Cancer Genome Atlas (TCGA) project have mapped many molecular alterations in GBM cells. Alterations on DNAs may dysregulate gene expressions and drive malignancy of tumors. It is thus important to uncover causal and statistical dependency between 'effector' molecular aberrations and 'target' gene expressions in GBMs. A rich collection of prior studies attempted to combine copy number variation (CNV) and mRNA expression data. However, systematic methods to integrate multiple types of cancer genomic data-gene mutations, single nucleotide polymorphisms, CNVs, DNA methylations, mRNA and microRNA expressions and clinical information-are relatively scarce. We proposed an algorithm to build 'association modules' linking effector molecular aberrations and target gene expressions and applied the module-finding algorithm to the integrated TCGA GBM data sets. The inferred association modules were validated by six tests using external information and datasets of central nervous system tumors: (i) indication of prognostic effects among patients; (ii) coherence of target gene expressions; (iii) retention of effector-target associations in external data sets; (iv) recurrence of effector molecular aberrations in GBM; (v) functional enrichment of target genes; and (vi) co-citations between effectors and targets. Modules associated with well-known molecular aberrations of GBM-such as chromosome 7 amplifications, chromosome 10 deletions, EGFR and NF1 mutations-passed the majority of the validation tests. Furthermore, several modules associated with less well-reported molecular aberrations-such as chromosome 11 CNVs, CD40, PLXNB1 and GSTM1 methylations, and mir-21 expressions-were also validated by external information. In particular, modules constituting trans-acting effects with chromosome 11 CNVs and cis-acting effects with chromosome 10 CNVs manifested strong negative and positive associations with survival times in brain tumors. By aligning the information of association modules with the established GBM subclasses based on transcription or methylation levels, we found each subclass possessed multiple concurrent molecular aberrations. Furthermore, the joint molecular characteristics derived from 16 association modules had prognostic power not explained away by the strong biomarker of CpG island methylator phenotypes. Functional and survival analyses indicated that immune/inflammatory responses and epithelial-mesenchymal transitions were among the most important determining processes of prognosis. Finally, we demonstrated that certain molecular aberrations uniquely recurred in GBM but were relatively rare in non-GBM glioma cells. These results justify the utility of an integrative analysis on cancer genomes and provide testable characterizations of driver aberration events in GBM.


Asunto(s)
Neoplasias Encefálicas/genética , Aberraciones Cromosómicas , Glioblastoma/genética , Algoritmos , Neoplasias Encefálicas/mortalidad , Deleción Cromosómica , Variaciones en el Número de Copia de ADN , Metilación de ADN , Transición Epitelial-Mesenquimal , Genes de Neurofibromatosis 1 , Genoma Humano , Genómica/métodos , Glioblastoma/clasificación , Glioblastoma/mortalidad , Humanos , MicroARNs/metabolismo , Mutación , Pronóstico , Proteínas Celulares de Unión al Retinol/genética , Análisis de Supervivencia , Transcripción Genética
8.
Nucleic Acids Res ; 41(4): 2105-20, 2013 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-23303791

RESUMEN

Diverse life forms are driven by the evolution of gene regulatory programs including changes in regulator proteins and cis-regulatory elements. Alterations of cis-regulatory elements are likely to dominate the evolution of the gene regulatory networks, as they are subjected to smaller selective constraints compared with proteins and hence may evolve quickly to adapt the environment. Prior studies on cis-regulatory element evolution focus primarily on sequence substitutions of known transcription factor-binding motifs. However, evolutionary models for the dynamics of motif occurrence are relatively rare, and comprehensive characterization of the evolution of all possible motif sequences has not been pursued. In the present study, we propose an algorithm to estimate the strength of purifying selection of a motif sequence based on an evolutionary model capturing the birth and death of motif occurrences on promoters. We term this measure as the 'evolutionary retention coefficient', as it is related yet distinct from the canonical definition of selection coefficient in population genetics. Using this algorithm, we estimate and report the evolutionary retention coefficients of all possible 10-nucleotide sequences from the aligned promoter sequences of 27 748. orthologous gene families in 34 mammalian species. Intriguingly, the evolutionary retention coefficients of motifs are intimately associated with their functional relevance. Top-ranking motifs (sorted by evolutionary retention coefficients) are significantly enriched with transcription factor-binding sequences according to the curated knowledge from the TRANSFAC database and the ChIP-seq data generated from the ENCODE Consortium. Moreover, genes harbouring high-scoring motifs on their promoters retain significantly coherent expression profiles, and those genes are over-represented in the functional classes involved in gene regulation. The validation results reveal the dependencies between natural selection and functions of cis-regulatory elements and shed light on the evolution of gene regulatory networks.


Asunto(s)
Algoritmos , Motivos de Nucleótidos , Regiones Promotoras Genéticas , Animales , Sitios de Unión , Evolución Molecular , Genes , Humanos , Ratones , Análisis de Secuencia de ADN , Factores de Transcripción/metabolismo
9.
Proc Natl Acad Sci U S A ; 109(36): 14586-91, 2012 Sep 04.
Artículo en Inglés | MEDLINE | ID: mdl-22891318

RESUMEN

Cancers are heterogeneous and genetically unstable. Current practice of personalized medicine tailors therapy to heterogeneity between cancers of the same organ type. However, it does not yet systematically address heterogeneity at the single-cell level within a single individual's cancer or the dynamic nature of cancer due to genetic and epigenetic change as well as transient functional changes. We have developed a mathematical model of personalized cancer therapy incorporating genetic evolutionary dynamics and single-cell heterogeneity, and have examined simulated clinical outcomes. Analyses of an illustrative case and a virtual clinical trial of over 3 million evaluable "patients" demonstrate that augmented (and sometimes counterintuitive) nonstandard personalized medicine strategies may lead to superior patient outcomes compared with the current personalized medicine approach. Current personalized medicine matches therapy to a tumor molecular profile at diagnosis and at tumor relapse or progression, generally focusing on the average, static, and current properties of the sample. Nonstandard strategies also consider minor subclones, dynamics, and predicted future tumor states. Our methods allow systematic study and evaluation of nonstandard personalized medicine strategies. These findings may, in turn, suggest global adjustments and enhancements to translational oncology research paradigms.


Asunto(s)
Epigénesis Genética , Evolución Molecular , Modelos Biológicos , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Medicina de Precisión/métodos , Biología de Sistemas/métodos , Simulación por Computador , Humanos , Medicina de Precisión/tendencias
10.
Biol Open ; 11(6)2022 06 15.
Artículo en Inglés | MEDLINE | ID: mdl-35665803

RESUMEN

Despite the remarkable progress in probing tumor transcriptomic heterogeneity by single-cell RNA sequencing (sc-RNAseq) data, several gaps exist in prior studies. Tumor heterogeneity is frequently mentioned but not quantified. Clustering analyses typically target cells rather than genes, and differential levels of transcriptomic heterogeneity of gene clusters are not characterized. Relations between gene clusters inferred from multiple datasets remain less explored. We provided a series of quantitative methods to analyze cancer sc-RNAseq data. First, we proposed two quantitative measures to assess intra-tumoral heterogeneity/homogeneity. Second, we established a hierarchy of gene clusters from sc-RNAseq data, devised an algorithm to reduce the gene cluster hierarchy to a compact structure, and characterized the gene clusters with functional enrichment and heterogeneity. Third, we developed an algorithm to align the gene cluster hierarchies from multiple datasets to a small number of meta gene clusters. By applying these methods to nine cancer sc-RNAseq datasets, we discovered that cancer cell transcriptomes were more homogeneous within tumors than the accompanying normal cells. Furthermore, many gene clusters from the nine datasets were aligned to two large meta gene clusters, which had high and low heterogeneity and were enriched with distinct functions. Finally, we found the homogeneous meta gene cluster retained stronger expression coherence and associations with survival times in bulk level RNAseq data than the heterogeneous meta gene cluster, yet the combinatorial expression patterns of breast cancer subtypes in bulk level data were not preserved in single-cell data. The inference outcomes derived from nine cancer sc-RNAseq datasets provide insights about the contributing factors for transcriptomic heterogeneity of cancer cells and complex relations between bulk level and single-cell RNAseq data. They demonstrate the utility of our methods to enable a comprehensive characterization of co-expressed gene clusters in a wide range of sc-RNAseq data in cancers and beyond.


Asunto(s)
Neoplasias de la Mama , Transcriptoma , Algoritmos , Neoplasias de la Mama/genética , Análisis por Conglomerados , Femenino , Humanos , Familia de Multigenes
11.
Sci Rep ; 12(1): 10490, 2022 06 21.
Artículo en Inglés | MEDLINE | ID: mdl-35729235

RESUMEN

Protein complexes are the fundamental units of many biological functions. Despite their many advantages, one major adverse impact of protein complexes is accumulations of unassembled subunits that may disrupt other processes or exert cytotoxic effects. Synthesis of excess subunits can be inhibited via negative feedback control or they can be degraded more efficiently than assembled subunits, with this latter being termed cooperative stability. Whereas controlled synthesis of complex subunits has been investigated extensively, how cooperative stability acts in complex formation remains largely unexplored. To fill this knowledge gap, we have built quantitative models of heteromeric complexes with or without cooperative stability and compared their behaviours in the presence of synthesis rate variations. A system displaying cooperative stability is robust against synthesis rate variations as it retains high dimer/monomer ratios across a broad range of parameter configurations. Moreover, cooperative stability can alleviate the constraint of limited supply of a given subunit and makes complex abundance more responsive to unilateral upregulation of another subunit. We also conducted an in silico experiment to comprehensively characterize and compare four types of circuits that incorporate combinations of negative feedback control and cooperative stability in terms of eight systems characteristics pertaining to optimality, robustness and controllability. Intriguingly, though individual circuits prevailed for distinct characteristics, the system with cooperative stability alone achieved the most balanced performance across all characteristics. Our study provides theoretical justification for the contribution of cooperative stability to natural biological systems and represents a guideline for designing synthetic complex formation systems with desirable characteristics.

12.
PLOS Digit Health ; 1(12): e0000151, 2022 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-36812605

RESUMEN

Cancer cells harbor molecular alterations at all levels of information processing. Genomic/epigenomic and transcriptomic alterations are inter-related between genes, within and across cancer types and may affect clinical phenotypes. Despite the abundant prior studies of integrating cancer multi-omics data, none of them organizes these associations in a hierarchical structure and validates the discoveries in extensive external data. We infer this Integrated Hierarchical Association Structure (IHAS) from the complete data of The Cancer Genome Atlas (TCGA) and compile a compendium of cancer multi-omics associations. Intriguingly, diverse alterations on genomes/epigenomes from multiple cancer types impact transcriptions of 18 Gene Groups. Half of them are further reduced to three Meta Gene Groups enriched with (1) immune and inflammatory responses, (2) embryonic development and neurogenesis, (3) cell cycle process and DNA repair. Over 80% of the clinical/molecular phenotypes reported in TCGA are aligned with the combinatorial expressions of Meta Gene Groups, Gene Groups, and other IHAS subunits. Furthermore, IHAS derived from TCGA is validated in more than 300 external datasets including multi-omics measurements and cellular responses upon drug treatments and gene perturbations in tumors, cancer cell lines, and normal tissues. To sum up, IHAS stratifies patients in terms of molecular signatures of its subunits, selects targeted genes or drugs for precision cancer therapy, and demonstrates that associations between survival times and transcriptional biomarkers may vary with cancer types. These rich information is critical for diagnosis and treatments of cancers.

13.
Sci Rep ; 11(1): 17741, 2021 09 07.
Artículo en Inglés | MEDLINE | ID: mdl-34493766

RESUMEN

Principal Component Analysis (PCA) projects high-dimensional genotype data into a few components that discern populations. Ancestry Informative Markers (AIMs) are a small subset of SNPs capable of distinguishing populations. We integrate these two approaches by proposing an algorithm to identify necessary informative loci whose removal from the data deteriorates the PCA structure. Unlike classical AIMs, necessary informative loci densely cover the genome, hence can illuminate the evolution and mixing history of populations. We conduct a comprehensive analysis to the genotype data of the 1000 Genomes Project using necessary informative loci. Projections along the top seven principal components demarcate populations at distinct geographic levels. Millions of necessary informative loci along each PC are identified. Population identities along each PC are approximately determined by weighted sums of minor (or major) alleles over the informative loci. Variations of allele frequencies are aligned with the history and direction of population evolution. The population distribution of projections along the top three PCs is recapitulated by a simple demographic model based on several waves of founder population separation and mixing. Informative loci possess locational concentration in the genome and functional enrichment. Genes at two hot spots encompassing dense PC 7 informative loci exhibit differential expressions among European populations. The mosaic of local ancestry in the genome of a mixed descendant from multiple populations can be inferred from partial PCA projections of informative loci. Finally, informative loci derived from the 1000 Genomes data well predict the projections of an independent genotype data of South Asians. These results demonstrate the utility and relevance of informative loci to investigate human evolution.


Asunto(s)
Evolución Molecular , Genoma Humano , Genotipo , Migración Humana , Algoritmos , Conjuntos de Datos como Asunto , Expresión Génica , Genética de Población , Humanos , Polimorfismo de Nucleótido Simple/genética , Análisis de Componente Principal , Grupos Raciales/genética
14.
BMC Bioinformatics ; 11: 495, 2010 Oct 06.
Artículo en Inglés | MEDLINE | ID: mdl-20925909

RESUMEN

BACKGROUND: Cancer is a complex disease where various types of molecular aberrations drive the development and progression of malignancies. Large-scale screenings of multiple types of molecular aberrations (e.g., mutations, copy number variations, DNA methylations, gene expressions) become increasingly important in the prognosis and study of cancer. Consequently, a computational model integrating multiple types of information is essential for the analysis of the comprehensive data. RESULTS: We propose an integrated modeling framework to identify the statistical and putative causal relations of various molecular aberrations and gene expressions in cancer. To reduce spurious associations among the massive number of probed features, we sequentially applied three layers of logistic regression models with increasing complexity and uncertainty regarding the possible mechanisms connecting molecular aberrations and gene expressions. Layer 1 models associate gene expressions with the molecular aberrations on the same loci. Layer 2 models associate expressions with the aberrations on different loci but have known mechanistic links. Layer 3 models associate expressions with nonlocal aberrations which have unknown mechanistic links. We applied the layered models to the integrated datasets of NCI-60 cancer cell lines and validated the results with large-scale statistical analysis. Furthermore, we discovered/reaffirmed the following prominent links: (1) Protein expressions are generally consistent with mRNA expressions. (2) Several gene expressions are modulated by composite local aberrations. For instance, CDKN2A expressions are repressed by either frame-shift mutations or DNA methylations. (3) Amplification of chromosome 6q in leukemia elevates the expression of MYB, and the downstream targets of MYB on other chromosomes are up-regulated accordingly. (4) Amplification of chromosome 3p and hypo-methylation of PAX3 together elevate MITF expression in melanoma, which up-regulates the downstream targets of MITF. (5)Mutations of TP53 are negatively associated with its direct target genes. CONCLUSIONS: The analysis results on NCI-60 data justify the utility of the layered models for the incoming flow of cancer genomic data. Experimental validations on selected prominent links and application of the layered modeling framework to other integrated datasets will be carried out subsequently.


Asunto(s)
Biología Computacional/métodos , Neoplasias/genética , Línea Celular Tumoral , Metilación de ADN , Bases de Datos Genéticas , Genes p16 , Humanos , Neoplasias/metabolismo , ARN Mensajero/metabolismo , Estados Unidos
15.
PLoS Comput Biol ; 5(1): e1000274, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-19180177

RESUMEN

Complex phenotypes such as the transformation of a normal population of cells into cancerous tissue result from a series of molecular triggers gone awry. We describe a method that searches for a genetic network consistent with expression changes observed under the knock-down of a set of genes that share a common role in the cell, such as a disease phenotype. The method extends the Nested Effects Model of Markowetz et al. (2005) by using a probabilistic factor graph to search for a network representing interactions among these silenced genes. The method also expands the network by attaching new genes at specific downstream points, providing candidates for subsequent perturbations to further characterize the pathway. We investigated an extension provided by the factor graph approach in which the model distinguishes between inhibitory and stimulatory interactions. We found that the extension yielded significant improvements in recovering the structure of simulated and Saccharomyces cerevisae networks. We applied the approach to discover a signaling network among genes involved in a human colon cancer cell invasiveness pathway. The method predicts several genes with new roles in the invasiveness process. We knocked down two genes identified by our approach and found that both knock-downs produce loss of invasive potential in a colon cancer cell line. Nested effects models may be a powerful tool for inferring regulatory connections and genes that operate in normal and disease-related processes.


Asunto(s)
Neoplasias del Colon/genética , Biología Computacional/métodos , Regulación Neoplásica de la Expresión Génica , Regulación de la Expresión Génica , Redes Reguladoras de Genes/fisiología , Silenciador del Gen , Saccharomyces cerevisiae/genética , Algoritmos , Neoplasias del Colon/patología , Simulación por Computador , Interpretación Estadística de Datos , Células HT29 , Humanos , Modelos Genéticos , Invasividad Neoplásica , Distribución Normal , Análisis de Secuencia por Matrices de Oligonucleótidos , Transducción de Señal
16.
FASEB J ; 22(8): 2605-22, 2008 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-18434431

RESUMEN

Cancer is a complex process in which the abnormalities of many genes appear to be involved. The combinatorial patterns of gene mutations may reveal the functional relations between genes and pathways in tumorigenesis as well as identify targets for treatment. We examined the patterns of somatic mutations of cancers from Catalog of Somatic Mutations in Cancer (COSMIC), a large-scale database curated by the Wellcome Trust Sanger Institute. The frequently mutated genes are well-known oncogenes and tumor suppressors that are involved in generic processes of cell-cycle control, signal transduction, and stress responses. These "signatures" of gene mutations are heterogeneous when the cancers from different tissues are compared. Mutations in genes functioning in different pathways can occur in the same cancer (i.e., co-occur), whereas those in genes functioning in the same pathway are rarely mutated in the same sample. This observation supports the view of tumorigenesis as derived from a process like Darwinian evolution. However, certain combinatorial mutational patterns violate these simple rules and demonstrate tissue-specific variations. For instance, mutations of genes in the Ras and Wnt pathways tend to co-occur in the large intestine but are mutually exclusive in cancers of the pancreas. The relationships between mutations in different samples of a cancer can also reveal the temporal orders of mutational events. In addition, the observed mutational patterns suggest candidates of new cosequencing targets that can either reveal novel patterns or validate the predictions deduced from existing patterns. These combinatorial mutational patterns provide guiding information for the ongoing cancer genome projects.


Asunto(s)
Mutación , Neoplasias/genética , Ciclo Celular/genética , Línea Celular Tumoral , Bases de Datos Genéticas , Genes Supresores de Tumor , Humanos , Modelos Genéticos , Neoplasias/etiología , Oncogenes , Especificidad de Órganos , Transducción de Señal/genética , Estrés Fisiológico/genética
17.
PLoS One ; 14(8): e0221703, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31437254

RESUMEN

[This corrects the article DOI: 10.1371/journal.pone.0185475.].

18.
PLoS Comput Biol ; 3(11): e211, 2007 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-17983264

RESUMEN

Correlated changes of nucleic or amino acids have provided strong information about the structures and interactions of molecules. Despite the rich literature in coevolutionary sequence analysis, previous methods often have to trade off between generality, simplicity, phylogenetic information, and specific knowledge about interactions. Furthermore, despite the evidence of coevolution in selected protein families, a comprehensive screening of coevolution among all protein domains is still lacking. We propose an augmented continuous-time Markov process model for sequence coevolution. The model can handle different types of interactions, incorporate phylogenetic information and sequence substitution, has only one extra free parameter, and requires no knowledge about interaction rules. We employ this model to large-scale screenings on the entire protein domain database (Pfam). Strikingly, with 0.1 trillion tests executed, the majority of the inferred coevolving protein domains are functionally related, and the coevolving amino acid residues are spatially coupled. Moreover, many of the coevolving positions are located at functionally important sites of proteins/protein complexes, such as the subunit linkers of superoxide dismutase, the tRNA binding sites of ribosomes, the DNA binding region of RNA polymerase, and the active and ligand binding sites of various enzymes. The results suggest sequence coevolution manifests structural and functional constraints of proteins. The intricate relations between sequence coevolution and various selective constraints are worth pursuing at a deeper level.


Asunto(s)
Evolución Molecular , Modelos Químicos , Estructura Terciaria de Proteína/genética , Proteínas/química , Proteínas/genética , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Secuencia de Bases , Simulación por Computador , Modelos Moleculares , Datos de Secuencia Molecular , Conformación Proteica , Proteínas/ultraestructura , Alineación de Secuencia/métodos , Relación Estructura-Actividad
19.
Sci Rep ; 8(1): 11456, 2018 07 30.
Artículo en Inglés | MEDLINE | ID: mdl-30061703

RESUMEN

Most cancer driver genes are involved in generic cellular processes such as DNA repair, cell proliferation and cell adhesion, yet their mutations are often confined to specific cancer types. To resolve this paradox, we explained mutation frequencies of selected genes across tumor types with four features in the corresponding normal tissues from cancer-free subjects: mRNA expression and chromatin accessibility of mutated genes, mRNA expressions of their neighbors in curated pathways and the protein-protein interaction network. Encouragingly, these transcriptomic/epigenomic features in normal tissues were closely associated with mutational/functional characteristics in tumors. First, chromatin accessibility was a necessary but not sufficient condition for frequent mutations. Second, variations of mutation frequencies in selected genes across tissue types were significantly associated with all four features. Third, the genes possessing significant associations between mutation frequency variations and pathway gene expression were enriched with documented cancer genes. We further proposed a novel bivariate gene set enrichment analysis and confirmed that the pathway gene expression was the dominant factor in cancer gene enrichment. These findings shed lights on the functional roles of genes in normal tissues in shaping the mutational landscape during tumor genome evolution.


Asunto(s)
Epigénesis Genética , Mutación/genética , Neoplasias/genética , Transcriptoma/genética , Cromatina/metabolismo , Genes Relacionados con las Neoplasias , Humanos , Tasa de Mutación , Especificidad de Órganos/genética
20.
PLoS One ; 12(10): e0185475, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28981547

RESUMEN

The great amount of gene expression data has brought a big challenge for the discovery of Gene Regulatory Network (GRN). For network reconstruction and the investigation of regulatory relations, it is desirable to ensure directness of links between genes on a map, infer their directionality and explore candidate biological functions from high-throughput transcriptomic data. To address these problems, we introduce a Boolean Function Network (BFN) model based on techniques of hidden Markov model (HMM), likelihood ratio test and Boolean logic functions. BFN consists of two consecutive tests to establish links between pairs of genes and check their directness. We evaluate the performance of BFN through the application to S. cerevisiae time course data. BFN produces regulatory relations which show consistency with succession of cell cycle phases. Furthermore, it also improves sensitivity and specificity when compared with alternative methods of genetic network reverse engineering. Moreover, we demonstrate that BFN can provide proper resolution for GO enrichment of gene sets. Finally, the Boolean functions discovered by BFN can provide useful insights for the identification of control mechanisms of regulatory processes, which is the special advantage of the proposed approach. In combination with low computational complexity, BFN can serve as an efficient screening tool to reconstruct genes relations on the whole genome level. In addition, the BFN approach is also feasible to a wide range of time course datasets.


Asunto(s)
Redes Reguladoras de Genes , Genes Fúngicos , Saccharomyces cerevisiae/metabolismo , Regulación de la Expresión Génica , Funciones de Verosimilitud , Cadenas de Markov , Transcripción Genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA