Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
PLoS Comput Biol ; 17(2): e1008720, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33630864

RESUMEN

Increased availability of drug response and genomics data for many tumor cell lines has accelerated the development of pan-cancer prediction models of drug response. However, it is unclear how much between-tissue differences in drug response and molecular characteristics may contribute to pan-cancer predictions. Also unknown is whether the performance of pan-cancer models could vary by cancer type. Here, we built a series of pan-cancer models using two datasets containing 346 and 504 cell lines, each with MEK inhibitor (MEKi) response and mRNA expression, point mutation, and copy number variation data, and found that, while the tissue-level drug responses are accurately predicted (between-tissue ρ = 0.88-0.98), only 5 of 10 cancer types showed successful within-tissue prediction performance (within-tissue ρ = 0.11-0.64). Between-tissue differences make substantial contributions to the performance of pan-cancer MEKi response predictions, as exclusion of between-tissue signals leads to a decrease in Spearman's ρ from a range of 0.43-0.62 to 0.30-0.51. In practice, joint analysis of multiple cancer types usually has a larger sample size, hence greater power, than for one cancer type; and we observe that higher accuracy of pan-cancer prediction of MEKi response is almost entirely due to the sample size advantage. Success of pan-cancer prediction reveals how drug response in different cancers may invoke shared regulatory mechanisms despite tissue-specific routes of oncogenesis, yet predictions in different cancer types require flexible incorporation of between-cancer and within-cancer signals. As most datasets in genome sciences contain multiple levels of heterogeneity, careful parsing of group characteristics and within-group, individual variation is essential when making robust inference.


Asunto(s)
Antineoplásicos/farmacología , Ensayos de Selección de Medicamentos Antitumorales , Neoplasias/tratamiento farmacológico , Algoritmos , Área Bajo la Curva , Línea Celular Tumoral , Variaciones en el Número de Copia de ADN , Inhibidores Enzimáticos/farmacología , Dosificación de Gen , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Genómica , Humanos , MAP Quinasa Quinasa 1/antagonistas & inhibidores , Aprendizaje Automático , Mutación Puntual , Polimorfismo de Nucleótido Simple , ARN/genética , ARN/metabolismo , ARN Mensajero/metabolismo , Análisis de Regresión
2.
Proc Natl Acad Sci U S A ; 116(6): 2344-2353, 2019 02 05.
Artículo en Inglés | MEDLINE | ID: mdl-30674669

RESUMEN

Plant specialized metabolism (SM) enzymes produce lineage-specific metabolites with important ecological, evolutionary, and biotechnological implications. Using Arabidopsis thaliana as a model, we identified distinguishing characteristics of SM and GM (general metabolism, traditionally referred to as primary metabolism) genes through a detailed study of features including duplication pattern, sequence conservation, transcription, protein domain content, and gene network properties. Analysis of multiple sets of benchmark genes revealed that SM genes tend to be tandemly duplicated, coexpressed with their paralogs, narrowly expressed at lower levels, less conserved, and less well connected in gene networks relative to GM genes. Although the values of each of these features significantly differed between SM and GM genes, any single feature was ineffective at predicting SM from GM genes. Using machine learning methods to integrate all features, a prediction model was established with a true positive rate of 87% and a true negative rate of 71%. In addition, 86% of known SM genes not used to create the machine learning model were predicted. We also demonstrated that the model could be further improved when we distinguished between SM, GM, and junction genes responsible for reactions shared by SM and GM pathways, indicating that topological considerations may further improve the SM prediction model. Application of the prediction model led to the identification of 1,220 A. thaliana genes with previously unknown functions, each assigned a confidence measure called an SM score, providing a global estimate of SM gene content in a plant genome.

3.
BMC Genomics ; 21(1): 159, 2020 Feb 13.
Artículo en Inglés | MEDLINE | ID: mdl-32054475

RESUMEN

BACKGROUND: Gene expression is regulated by DNA-binding transcription factors (TFs). Together with their target genes, these factors and their interactions collectively form a gene regulatory network (GRN), which is responsible for producing patterns of transcription, including cyclical processes such as genome replication and cell division. However, identifying how this network regulates the timing of these patterns, including important interactions and regulatory motifs, remains a challenging task. RESULTS: We employed four in vivo and in vitro regulatory data sets to investigate the regulatory basis of expression timing and phase-specific patterns cell-cycle expression in Saccharomyces cerevisiae. Specifically, we considered interactions based on direct binding between TF and target gene, indirect effects of TF deletion on gene expression, and computational inference. We found that the source of regulatory information significantly impacts the accuracy and completeness of recovering known cell-cycle expressed genes. The best approach involved combining TF-target and TF-TF interactions features from multiple datasets in a single model. In addition, TFs important to multiple phases of cell-cycle expression also have the greatest impact on individual phases. Important TFs regulating a cell-cycle phase also tend to form modules in the GRN, including two sub-modules composed entirely of unannotated cell-cycle regulators (STE12-TEC1 and RAP1-HAP1-MSN4). CONCLUSION: Our findings illustrate the importance of integrating both multiple omics data and regulatory motifs in order to understand the significance regulatory interactions involved in timing gene expression. This integrated approached allowed us to recover both known cell-cycles interactions and the overall pattern of phase-specific expression across the cell-cycle better than any single data set. Likewise, by looking at regulatory motifs in the form of TF-TF interactions, we identified sets of TFs whose co-regulation of target genes was important for cell-cycle expression, even when regulation by individual TFs was not. Overall, this demonstrates the power of integrating multiple data sets and models of interaction in order to understand the regulatory basis of established biological processes and their associated gene regulatory networks.


Asunto(s)
Regulación Fúngica de la Expresión Génica , Redes Reguladoras de Genes , Genes cdc , Genómica , Saccharomyces cerevisiae/genética , Biología Computacional/métodos , Genómica/métodos , Aprendizaje Automático , Unión Proteica , Mapeo de Interacción de Proteínas , Mapas de Interacción de Proteínas , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Factores de Transcripción/metabolismo
4.
Breast Cancer Res Treat ; 179(2): 337-347, 2020 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-31655920

RESUMEN

PURPOSE: There is a need for biomarkers of drug efficacy for targeted therapies in triple-negative breast cancer (TNBC). As a step toward this, we identify multi-omic molecular determinants of anti-TNBC efficacy in cell lines for a panel of oncology drugs. METHODS: Using 23 TNBC cell lines, drug sensitivity scores (DSS3) were determined using a panel of investigational drugs and drugs approved for other indications. Molecular readouts were generated for each cell line using RNA sequencing, RNA targeted panels, DNA sequencing, and functional proteomics. DSS3 values were correlated with molecular readouts using a FDR-corrected significance cutoff of p* < 0.05 and yielded molecular determinant panels that predict anti-TNBC efficacy. RESULTS: Six molecular determinant panels were obtained from 12 drugs we prioritized based on their efficacy. Determinant panels were largely devoid of DNA mutations of the targeted pathway. Molecular determinants were obtained by correlating DSS3 with molecular readouts. We found that co-inhibiting molecular correlate pathways leads to robust synergy across many cell lines. CONCLUSIONS: These findings demonstrate an integrated method to identify biomarkers of drug efficacy in TNBC where DNA predictions correlate poorly with drug response. Our work outlines a framework for the identification of novel molecular determinants and optimal companion drugs for combination therapy based on these correlates.


Asunto(s)
Antineoplásicos/farmacología , Resistencia a Antineoplásicos , Neoplasias de la Mama Triple Negativas/tratamiento farmacológico , Neoplasias de la Mama Triple Negativas/etiología , Antineoplásicos/uso terapéutico , Protocolos de Quimioterapia Combinada Antineoplásica/efectos adversos , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapéutico , Línea Celular Tumoral , Biología Computacional/métodos , Relación Dosis-Respuesta a Droga , Resistencia a Antineoplásicos/genética , Ensayos de Selección de Medicamentos Antitumorales , Femenino , Perfilación de la Expresión Génica , Humanos , Mutación , Proteómica , Resultado del Tratamiento , Neoplasias de la Mama Triple Negativas/metabolismo
5.
Mol Biol Evol ; 35(6): 1422-1436, 2018 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-29554332

RESUMEN

With advances in transcript profiling, the presence of transcriptional activities in intergenic regions has been well established. However, whether intergenic expression reflects transcriptional noise or activity of novel genes remains unclear. We identified intergenic transcribed regions (ITRs) in 15 diverse flowering plant species and found that the amount of intergenic expression correlates with genome size, a pattern that could be expected if intergenic expression is largely nonfunctional. To further assess the functionality of ITRs, we first built machine learning models using Arabidopsis thaliana as a model that accurately distinguish functional sequences (benchmark protein-coding and RNA genes) and likely nonfunctional ones (pseudogenes and unexpressed intergenic regions) by integrating 93 biochemical, evolutionary, and sequence-structure features. Next, by applying the models genome-wide, we found that 4,427 ITRs (38%) and 796 annotated ncRNAs (44%) had features significantly similar to benchmark protein-coding or RNA genes and thus were likely parts of functional genes. Approximately 60% of ITRs and ncRNAs were more similar to nonfunctional sequences and were likely transcriptional noise. The predictive framework established here provides not only a comprehensive look at how functional, genic sequences are distinct from likely nonfunctional ones, but also a new way to differentiate novel genes from genomic regions with noisy transcriptional activities.


Asunto(s)
ADN Intergénico , Tamaño del Genoma , Genoma de Planta , Modelos Genéticos , ARN no Traducido , Metilación de ADN , Aprendizaje Automático , Magnoliopsida , Fenotipo , Transcripción Genética
6.
Mol Biol Evol ; 34(7): 1788-1798, 2017 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-28398576

RESUMEN

The human genome is dominated by large tracts of DNA with extensive biochemical activity but no known function. In particular, it is well established that transcriptional activities are not restricted to known genes. However, whether this intergenic transcription represents activity with functional significance or noise is under debate, highlighting the need for an effective method of defining functional genomic regions. Moreover, these discoveries raise the question whether genomic regions can be defined as functional based solely on the presence of biochemical activities, without considering evolutionary (conservation) and genetic (effects of mutations) evidence. Here, computational models integrating genetic, evolutionary, and biochemical evidence are established that provide reliable predictions of human protein-coding and RNA genes. Importantly, in addition to sequence conservation, biochemical features allow accurate predictions of genic sequences with phenotypic evidence under strong purifying selection, suggesting that they can be used as an alternative measure of selection. Moreover, 18.5% of annotated noncoding RNAs exhibit higher degrees of similarity to phenotype genes and, thus, are likely functional. However, 64.5% of noncoding RNAs appear to belong to a sequence class of their own, and the remaining 17% are more similar to pseudogenes and random intergenic sequences that may represent noisy transcription.


Asunto(s)
Biología Computacional/métodos , ADN Intergénico/genética , Análisis de Secuencia de ADN/métodos , Animales , Evolución Biológica , Simulación por Computador , Secuencia Conservada/genética , Evolución Molecular , Genoma Humano , Genómica/métodos , Humanos , Seudogenes/genética , ARN , ARN no Traducido , Selección Genética , Transcripción Genética
7.
Plant Cell ; 27(8): 2133-47, 2015 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-26286535

RESUMEN

Essential genes represent critical cellular components whose disruption results in lethality. Characteristics shared among essential genes have been uncovered in fungal and metazoan model systems. However, features associated with plant essential genes are largely unknown and the full set of essential genes remains to be discovered in any plant species. Here, we show that essential genes in Arabidopsis thaliana have distinct features useful for constructing within- and cross-species prediction models. Essential genes in A. thaliana are often single copy or derived from older duplications, highly and broadly expressed, slow evolving, and highly connected within molecular networks compared with genes with nonlethal mutant phenotypes. These gene features allowed the application of machine learning methods that predicted known lethal genes as well as an additional 1970 likely essential genes without documented phenotypes. Prediction models from A. thaliana could also be applied to predict Oryza sativa and Saccharomyces cerevisiae essential genes. Importantly, successful predictions drew upon many features, while any single feature was not sufficient. Our findings show that essential genes can be distinguished from genes with nonlethal phenotypes using features that are similar across kingdoms and indicate the possibility for translational application of our approach to species without extensive functional genomic and phenomic resources.


Asunto(s)
Arabidopsis/genética , Genes Letales/genética , Genes de Plantas/genética , Mutación , Evolución Molecular , Dosificación de Gen , Regulación de la Expresión Génica de las Plantas , Ontología de Genes , Genes Esenciales/genética , Oryza/genética , Fenotipo , Saccharomyces cerevisiae , Especificidad de la Especie , Máquina de Vectores de Soporte
8.
Nat Neurosci ; 2024 Jul 08.
Artículo en Inglés | MEDLINE | ID: mdl-38977887

RESUMEN

Coughing is a respiratory behavior that plays a crucial role in protecting the respiratory system. Here we show that the nucleus of the solitary tract (NTS) in mice contains heterogenous neuronal populations that differentially control breathing. Within these subtypes, activation of tachykinin 1 (Tac1)-expressing neurons triggers specific respiratory behaviors that, as revealed by our detailed characterization, are cough-like behaviors. Chemogenetic silencing or genetic ablation of Tac1 neurons inhibits cough-like behaviors induced by tussive challenges. These Tac1 neurons receive synaptic inputs from the bronchopulmonary chemosensory and mechanosensory neurons in the vagal ganglion and coordinate medullary regions to control distinct aspects of cough-like defensive behaviors. We propose that these Tac1 neurons in the NTS are a key component of the airway-vagal-brain neural circuit that controls cough-like defensive behaviors in mice and that they coordinate the downstream modular circuits to elicit the sequential motor pattern of forceful expiratory responses.

9.
NAR Genom Bioinform ; 2(3): lqaa049, 2020 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-33575601

RESUMEN

Plants respond to their environment by dynamically modulating gene expression. A powerful approach for understanding how these responses are regulated is to integrate information about cis-regulatory elements (CREs) into models called cis-regulatory codes. Transcriptional response to combined stress is typically not the sum of the responses to the individual stresses. However, cis-regulatory codes underlying combined stress response have not been established. Here we modeled transcriptional response to single and combined heat and drought stress in Arabidopsis thaliana. We grouped genes by their pattern of response (independent, antagonistic and synergistic) and trained machine learning models to predict their response using putative CREs (pCREs) as features (median F-measure = 0.64). We then developed a deep learning approach to integrate additional omics information (sequence conservation, chromatin accessibility and histone modification) into our models, improving performance by 6.2%. While pCREs important for predicting independent and antagonistic responses tended to resemble binding motifs of transcription factors associated with heat and/or drought stress, important synergistic pCREs resembled binding motifs of transcription factors not known to be associated with stress. These findings demonstrate how in silico approaches can improve our understanding of the complex codes regulating response to combined stress and help us identify prime targets for future characterization.

10.
Sci Rep ; 9(1): 12122, 2019 08 20.
Artículo en Inglés | MEDLINE | ID: mdl-31431676

RESUMEN

Extensive transcriptional activity occurring in intergenic regions of genomes has raised the question whether intergenic transcription represents the activity of novel genes or noisy expression. To address this, we evaluated cross-species and post-duplication sequence and expression conservation of intergenic transcribed regions (ITRs) in four Poaceae species. Among 43,301 ITRs across the four species, 34,460 (80%) are species-specific. ITRs found across species tend to be more divergent in expression and have more recent duplicates compared to annotated genes. To assess if ITRs are functional (under selection), machine learning models were established in Oryza sativa (rice) that could accurately distinguish between phenotype genes and pseudogenes (area under curve-receiver operating characteristic = 0.94). Based on the models, 584 (8%) and 4391 (61%) rice ITRs are classified as likely functional and nonfunctional with high confidence, respectively. ITRs with conserved expression and ancient retained duplicates, features that were not part of the model, are frequently classified as likely-functional, suggesting these characteristics could serve as pragmatic rules of thumb for identifying candidate sequences likely to be under selection. This study also provides a framework to identify novel genes using comparative transcriptomic data to improve genome annotation that is fundamental for connecting genotype to phenotype in crop and model systems.


Asunto(s)
ADN Intergénico , Genes de Plantas , Poaceae/genética , Transcripción Genética , Evolución Biológica , Genoma de Planta , Aprendizaje Automático , Modelos Genéticos , Fenotipo , Seudogenes , Especificidad de la Especie
11.
Plant Methods ; 11: 10, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25774204

RESUMEN

BACKGROUND: Plant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies. Because these datasets were designed for different audiences, they frequently contain language and details tailored to investigators with different research objectives and backgrounds. Although phenotype comparisons across datasets have long been possible on a small scale, comprehensive queries and analyses that span a broad set of reference species, research disciplines, and knowledge domains continue to be severely limited by the absence of a common semantic framework. RESULTS: We developed a workflow to curate and standardize existing phenotype datasets for six plant species, encompassing both model species and crop plants with established genetic resources. Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato). We applied the same ontologies, annotation standards, formats, and best practices across all six species, thereby ensuring that the shared dataset could be used for cross-species querying and semantic similarity analyses. Curated phenotypes were first converted into a common format using taxonomically broad ontologies such as the Plant Ontology, Gene Ontology, and Phenotype and Trait Ontology. We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes. CONCLUSIONS: The use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics that enhances the utility of model genetic organisms and can be readily applied to species with fewer genetic resources and less well-characterized genomes. In addition, these tools should enhance future efforts to explore the relationships among phenotypic similarity, gene function, and sequence similarity in plants, and to make genotype-to-phenotype predictions relevant to plant biology, crop improvement, and potentially even human health.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA