Búsqueda | BVS CLAP/SMR-OPS/OMS

1.

Transcriptional and epigenetic dynamics during specification of human embryonic stem cells.

Gifford, Casey A; Ziller, Michael J; Gu, Hongcang; Trapnell, Cole; Donaghey, Julie; Tsankov, Alexander; Shalek, Alex K; Kelley, David R; Shishkin, Alexander A; Issner, Robbyn; Zhang, Xiaolan; Coyne, Michael; Fostel, Jennifer L; Holmes, Laurie; Meldrim, Jim; Guttman, Mitchell; Epstein, Charles; Park, Hongkun; Kohlbacher, Oliver; Rinn, John; Gnirke, Andreas; Lander, Eric S; Bernstein, Bradley E; Meissner, Alexander.

Cell ; 153(5): 1149-63, 2013 May 23.

Artículo en Inglés | MEDLINE | ID: mdl-23664763

RESUMEN

Differentiation of human embryonic stem cells (hESCs) provides a unique opportunity to study the regulatory mechanisms that facilitate cellular transitions in a human context. To that end, we performed comprehensive transcriptional and epigenetic profiling of populations derived through directed differentiation of hESCs representing each of the three embryonic germ layers. Integration of whole-genome bisulfite sequencing, chromatin immunoprecipitation sequencing, and RNA sequencing reveals unique events associated with specification toward each lineage. Lineage-specific dynamic alterations in DNA methylation and H3K4me1 are evident at putative distal regulatory elements that are frequently bound by pluripotency factors in the undifferentiated hESCs. In addition, we identified germ-layer-specific H3K27me3 enrichment at sites exhibiting high DNA methylation in the undifferentiated state. A better understanding of these initial specification events will facilitate identification of deficiencies in current approaches, leading to more faithful differentiation strategies as well as providing insights into the rewiring of human regulatory programs during cellular transitions.

Asunto(s)

Células Madre Embrionarias/metabolismo , Epigénesis Genética , Transcripción Genética , Acetilación , Diferenciación Celular , Cromatina/química , Cromatina/metabolismo , Metilación de ADN , Elementos de Facilitación Genéticos , Histonas/metabolismo , Humanos , Metilación

2.

scBasset: sequence-based modeling of single-cell ATAC-seq using convolutional neural networks.

Yuan, Han; Kelley, David R.

Nat Methods ; 19(9): 1088-1096, 2022 09.

Artículo en Inglés | MEDLINE | ID: mdl-35941239

RESUMEN

Single-cell assay for transposase-accessible chromatin using sequencing (scATAC) shows great promise for studying cellular heterogeneity in epigenetic landscapes, but there remain important challenges in the analysis of scATAC data due to the inherent high dimensionality and sparsity. Here we introduce scBasset, a sequence-based convolutional neural network method to model scATAC data. We show that by leveraging the DNA sequence information underlying accessibility peaks and the expressiveness of a neural network model, scBasset achieves state-of-the-art performance across a variety of tasks on scATAC and single-cell multiome datasets, including cell clustering, scATAC profile denoising, data integration across assays and transcription factor activity inference.

Asunto(s)

Secuenciación de Inmunoprecipitación de Cromatina , Cromatina , Cromatina/genética , Epigenómica , Redes Neurales de la Computación , Análisis de Secuencia de ADN/métodos , Análisis de la Célula Individual/métodos , Transposasas/genética

3.

Semisupervised adversarial neural networks for single-cell classification.

Kimmel, Jacob C; Kelley, David R.

Genome Res ; 31(10): 1781-1793, 2021 10.

Artículo en Inglés | MEDLINE | ID: mdl-33627475

RESUMEN

Annotating cell identities is a common bottleneck in the analysis of single-cell genomics experiments. Here, we present scNym, a semisupervised, adversarial neural network that learns to transfer cell identity annotations from one experiment to another. scNym takes advantage of information in both labeled data sets and new, unlabeled data sets to learn rich representations of cell identity that enable effective annotation transfer. We show that scNym effectively transfers annotations across experiments despite biological and technical differences, achieving performance superior to existing methods. We also show that scNym models can synthesize information from multiple training and target data sets to improve performance. We show that in addition to high accuracy, scNym models are well calibrated and interpretable with saliency methods.

Asunto(s)

Redes Neurales de la Computación

4.

Effective gene expression prediction from sequence by integrating long-range interactions.

Avsec, Ziga; Agarwal, Vikram; Visentin, Daniel; Ledsam, Joseph R; Grabska-Barwinska, Agnieszka; Taylor, Kyle R; Assael, Yannis; Jumper, John; Kohli, Pushmeet; Kelley, David R.

Nat Methods ; 18(10): 1196-1203, 2021 10.

Artículo en Inglés | MEDLINE | ID: mdl-34608324

RESUMEN

How noncoding DNA determines gene expression in different cell types is a major unsolved problem, and critical downstream applications in human genetics depend on improved solutions. Here, we report substantially improved gene expression prediction accuracy from DNA sequences through the use of a deep learning architecture, called Enformer, that is able to integrate information from long-range interactions (up to 100 kb away) in the genome. This improvement yielded more accurate variant effect predictions on gene expression for both natural genetic variants and saturation mutagenesis measured by massively parallel reporter assays. Furthermore, Enformer learned to predict enhancer-promoter interactions directly from the DNA sequence competitively with methods that take direct experimental data as input. We expect that these advances will enable more effective fine-mapping of human disease associations and provide a framework to interpret cis-regulatory evolution.

Asunto(s)

ADN/genética , Bases de Datos Genéticas , Epigénesis Genética , Regulación de la Expresión Génica , Aprendizaje Automático , Red Nerviosa , Animales , Línea Celular , Genoma , Genómica/métodos , Humanos , Ratones , Sitios de Carácter Cuantitativo

5.

Predicting 3D genome folding from DNA sequence with Akita.

Fudenberg, Geoff; Kelley, David R; Pollard, Katherine S.

Nat Methods ; 17(11): 1111-1117, 2020 11.

Artículo en Inglés | MEDLINE | ID: mdl-33046897

RESUMEN

In interphase, the human genome sequence folds in three dimensions into a rich variety of locus-specific contact patterns. Cohesin and CTCF (CCCTC-binding factor) are key regulators; perturbing the levels of either greatly disrupts genome-wide folding as assayed by chromosome conformation capture methods. Still, how a given DNA sequence encodes a particular locus-specific folding pattern remains unknown. Here we present a convolutional neural network, Akita, that accurately predicts genome folding from DNA sequence alone. Representations learned by Akita underscore the importance of an orientation-specific grammar for CTCF binding sites. Akita learns predictive nucleotide-level features of genome folding, revealing effects of nucleotides beyond the core CTCF motif. Once trained, Akita enables rapid in silico predictions. Accounting for this, we demonstrate how Akita can be used to perform in silico saturation mutagenesis, interpret eQTLs, make predictions for structural variants and probe species-specific genome folding. Collectively, these results enable decoding genome function from sequence through structure.

Asunto(s)

Factor de Unión a CCCTC/genética , Proteínas de Ciclo Celular/genética , Proteínas Cromosómicas no Histona/genética , Proteínas de Unión al ADN/genética , Genoma Humano , Redes Neurales de la Computación , Análisis de Secuencia de ADN/métodos , Regulación de la Expresión Génica , Humanos , Modelos Genéticos , Cohesinas

6.

Murine single-cell RNA-seq reveals cell-identity- and tissue-specific trajectories of aging.

Kimmel, Jacob C; Penland, Lolita; Rubinstein, Nimrod D; Hendrickson, David G; Kelley, David R; Rosenthal, Adam Z.

Genome Res ; 29(12): 2088-2103, 2019 12.

Artículo en Inglés | MEDLINE | ID: mdl-31754020

RESUMEN

Aging is a pleiotropic process affecting many aspects of mammalian physiology. Mammals are composed of distinct cell type identities and tissue environments, but the influence of these cell identities and environments on the trajectory of aging in individual cells remains unclear. Here, we performed single-cell RNA-seq on >50,000 individual cells across three tissues in young and old mice to allow for direct comparison of aging phenotypes across cell types. We found transcriptional features of aging common across many cell types, as well as features of aging unique to each type. Leveraging matrix factorization and optimal transport methods, we found that both cell identities and tissue environments exert influence on the trajectory and magnitude of aging, with cell identity influence predominating. These results suggest that aging manifests with unique directionality and magnitude across the diverse cell identities in mammals.

Asunto(s)

Envejecimiento , RNA-Seq , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Envejecimiento/genética , Envejecimiento/metabolismo , Animales , Masculino , Ratones

7.

Sequential regulatory activity prediction across chromosomes with convolutional neural networks.

Kelley, David R; Reshef, Yakir A; Bileschi, Maxwell; Belanger, David; McLean, Cory Y; Snoek, Jasper.

Genome Res ; 28(5): 739-750, 2018 05.

Artículo en Inglés | MEDLINE | ID: mdl-29588361

RESUMEN

Models for predicting phenotypic outcomes from genotypes have important applications to understanding genomic function and improving human health. Here, we develop a machine-learning system to predict cell-type-specific epigenetic and transcriptional profiles in large mammalian genomes from DNA sequence alone. By use of convolutional neural networks, this system identifies promoters and distal regulatory elements and synthesizes their content to make effective gene expression predictions. We show that model predictions for the influence of genomic variants on gene expression align well to causal variants underlying eQTLs in human populations and can be useful for generating mechanistic hypotheses to enable fine mapping of disease loci.

Asunto(s)

Cromosomas/genética , Biología Computacional/métodos , Redes Neurales de la Computación , Secuencias Reguladoras de Ácidos Nucleicos/genética , Animales , Epigenómica/métodos , Perfilación de la Expresión Génica/métodos , Regulación de la Expresión Génica , Genómica/métodos , Humanos , Aprendizaje Automático , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Regiones Promotoras Genéticas/genética

8.

Author Correction: scBasset: sequence-based modeling of single-cell ATAC-seq using convolutional neural networks.

Yuan, Han; Kelley, David R.

Nat Methods ; 20(1): 162, 2023 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-36564580

9.

Cross-species regulatory sequence activity prediction.

Kelley, David R.

PLoS Comput Biol ; 16(7): e1008050, 2020 07.

Artículo en Inglés | MEDLINE | ID: mdl-32687525

RESUMEN

Machine learning algorithms trained to predict the regulatory activity of nucleic acid sequences have revealed principles of gene regulation and guided genetic variation analysis. While the human genome has been extensively annotated and studied, model organisms have been less explored. Model organism genomes offer both additional training sequences and unique annotations describing tissue and cell states unavailable in humans. Here, we develop a strategy to train deep convolutional neural networks simultaneously on multiple genomes and apply it to learn sequence predictors for large compendia of human and mouse data. Training on both genomes improves gene expression prediction accuracy on held out and variant sequences. We further demonstrate a novel and powerful approach to apply mouse regulatory models to analyze human genetic variants associated with molecular phenotypes and disease. Together these techniques unleash thousands of non-human epigenetic and transcriptional profiles toward more effective investigation of how gene regulation affects human disease.

Asunto(s)

Regulación de la Expresión Génica , Variación Genética , Aprendizaje Automático , Algoritmos , Animales , Biología Computacional , Bases de Datos Genéticas , Epigenómica , Genoma Humano , Genómica , Hepatocitos/metabolismo , Humanos , Ratones , Modelos Genéticos , Modelos Estadísticos , Mutación , Redes Neurales de la Computación , Sitios de Carácter Cuantitativo , Análisis de Secuencia de ADN , Programas Informáticos , Especificidad de la Especie

10.

Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks.

Kelley, David R; Snoek, Jasper; Rinn, John L.

Genome Res ; 26(7): 990-9, 2016 07.

Artículo en Inglés | MEDLINE | ID: mdl-27197224

RESUMEN

The complex language of eukaryotic gene expression remains incompletely understood. Despite the importance suggested by many noncoding variants statistically associated with human disease, nearly all such variants have unknown mechanisms. Here, we address this challenge using an approach based on a recent machine learning advance-deep convolutional neural networks (CNNs). We introduce the open source package Basset to apply CNNs to learn the functional activity of DNA sequences from genomics data. We trained Basset on a compendium of accessible genomic sites mapped in 164 cell types by DNase-seq, and demonstrate greater predictive accuracy than previous methods. Basset predictions for the change in accessibility between variant alleles were far greater for Genome-wide association study (GWAS) SNPs that are likely to be causal relative to nearby SNPs in linkage disequilibrium with them. With Basset, a researcher can perform a single sequencing assay in their cell type of interest and simultaneously learn that cell's chromatin accessibility code and annotate every mutation in the genome with its influence on present accessibility and latent potential for accessibility. Thus, Basset offers a powerful computational approach to annotate and interpret the noncoding genome.

Asunto(s)

Modelos Genéticos , Análisis de Secuencia de ADN , Secuencia de Bases , Sitios de Unión , Secuencia de Consenso , Humanos , Desequilibrio de Ligamiento , Anotación de Secuencia Molecular , Redes Neurales de la Computación , Polimorfismo de Nucleótido Simple , Máquina de Vectores de Soporte

11.

Long noncoding RNAs regulate adipogenesis.

Sun, Lei; Goff, Loyal A; Trapnell, Cole; Alexander, Ryan; Lo, Kinyui Alice; Hacisuleyman, Ezgi; Sauvageau, Martin; Tazon-Vega, Barbara; Kelley, David R; Hendrickson, David G; Yuan, Bingbing; Kellis, Manolis; Lodish, Harvey F; Rinn, John L.

Proc Natl Acad Sci U S A ; 110(9): 3387-92, 2013 Feb 26.

Artículo en Inglés | MEDLINE | ID: mdl-23401553

RESUMEN

The prevalence of obesity has led to a surge of interest in understanding the detailed mechanisms underlying adipocyte development. Many protein-coding genes, mRNAs, and microRNAs have been implicated in adipocyte development, but the global expression patterns and functional contributions of long noncoding RNA (lncRNA) during adipogenesis have not been explored. Here we profiled the transcriptome of primary brown and white adipocytes, preadipocytes, and cultured adipocytes and identified 175 lncRNAs that are specifically regulated during adipogenesis. Many lncRNAs are adipose-enriched, strongly induced during adipogenesis, and bound at their promoters by key transcription factors such as peroxisome proliferator-activated receptor Î³ (PPARÎ³) and CCAAT/enhancer-binding protein α (CEBPα). RNAi-mediated loss of function screens identified functional lncRNAs with varying impact on adipogenesis. Collectively, we have identified numerous lncRNAs that are functionally required for proper adipogenesis.

Asunto(s)

Adipogénesis/genética , ARN Largo no Codificante/metabolismo , Animales , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Técnicas de Silenciamiento del Gen , Teoría de la Información , Masculino , Ratones , Análisis de Secuencia por Matrices de Oligonucleótidos , Sistemas de Lectura Abierta/genética , Fenotipo , ARN Largo no Codificante/genética , Reproducibilidad de los Resultados , Transcriptoma/genética

12.

Assemblathon 1: a competitive assessment of de novo short read assembly methods.

Earl, Dent; Bradnam, Keith; St John, John; Darling, Aaron; Lin, Dawei; Fass, Joseph; Yu, Hung On Ken; Buffalo, Vince; Zerbino, Daniel R; Diekhans, Mark; Nguyen, Ngan; Ariyaratne, Pramila Nuwantha; Sung, Wing-Kin; Ning, Zemin; Haimel, Matthias; Simpson, Jared T; Fonseca, Nuno A; Birol, Inanç; Docking, T Roderick; Ho, Isaac Y; Rokhsar, Daniel S; Chikhi, Rayan; Lavenier, Dominique; Chapuis, Guillaume; Naquin, Delphine; Maillet, Nicolas; Schatz, Michael C; Kelley, David R; Phillippy, Adam M; Koren, Sergey; Yang, Shiaw-Pyng; Wu, Wei; Chou, Wen-Chi; Srivastava, Anuj; Shaw, Timothy I; Ruby, J Graham; Skewes-Cox, Peter; Betegon, Miguel; Dimon, Michelle T; Solovyev, Victor; Seledtsov, Igor; Kosarev, Petr; Vorobyev, Denis; Ramirez-Gonzalez, Ricardo; Leggett, Richard; MacLean, Dan; Xia, Fangfang; Luo, Ruibang; Li, Zhenyu; Xie, Yinlong.

Genome Res ; 21(12): 2224-41, 2011 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-21926179

RESUMEN

Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.

Asunto(s)

Genoma/fisiología , Genómica/métodos , Análisis de Secuencia de ADN/métodos

13.

Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering.

Kelley, David R; Liu, Bo; Delcher, Arthur L; Pop, Mihai; Salzberg, Steven L.

Nucleic Acids Res ; 40(1): e9, 2012 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-22102569

RESUMEN

Environmental shotgun sequencing (or metagenomics) is widely used to survey the communities of microbial organisms that live in many diverse ecosystems, such as the human body. Finding the protein-coding genes within the sequences is an important step for assessing the functional capacity of a metagenome. In this work, we developed a metagenomics gene prediction system Glimmer-MG that achieves significantly greater accuracy than previous systems via novel approaches to a number of important prediction subtasks. First, we introduce the use of phylogenetic classifications of the sequences to model parameterization. We also cluster the sequences, grouping together those that likely originated from the same organism. Analogous to iterative schemes that are useful for whole genomes, we retrain our models within each cluster on the initial gene predictions before making final predictions. Finally, we model both insertion/deletion and substitution sequencing errors using a different approach than previous software, allowing Glimmer-MG to change coding frame or pass through stop codons by predicting an error. In a comparison among multiple gene finding methods, Glimmer-MG makes the most sensitive and precise predictions on simulated and real metagenomes for all read lengths and error rates tested.

Asunto(s)

Metagenómica/métodos , Análisis de Secuencia de ADN , Programas Informáticos , Análisis por Conglomerados , Tracto Gastrointestinal/microbiología , Genes , Humanos , Metagenoma , Filogenia

14.

Rewriting regulatory DNA to dissect and reprogram gene expression.

Martyn, Gabriella E; Montgomery, Michael T; Jones, Hank; Guo, Katherine; Doughty, Benjamin R; Linder, Johannes; Chen, Ziwei; Cochran, Kelly; Lawrence, Kathryn A; Munson, Glen; Pampari, Anusri; Fulco, Charles P; Kelley, David R; Lander, Eric S; Kundaje, Anshul; Engreitz, Jesse M.

bioRxiv ; 2023 Dec 21.

Artículo en Inglés | MEDLINE | ID: mdl-38187584

RESUMEN

Regulatory DNA sequences within enhancers and promoters bind transcription factors to encode cell type-specific patterns of gene expression. However, the regulatory effects and programmability of such DNA sequences remain difficult to map or predict because we have lacked scalable methods to precisely edit regulatory DNA and quantify the effects in an endogenous genomic context. Here we present an approach to measure the quantitative effects of hundreds of designed DNA sequence variants on gene expression, by combining pooled CRISPR prime editing with RNA fluorescence in situ hybridization and cell sorting (Variant-FlowFISH). We apply this method to mutagenize and rewrite regulatory DNA sequences in an enhancer and the promoter of PPIF in two immune cell lines. Of 672 variant-cell type pairs, we identify 497 that affect PPIF expression. These variants appear to act through a variety of mechanisms including disruption or optimization of existing transcription factor binding sites, as well as creation of de novo sites. Disrupting a single endogenous transcription factor binding site often led to large changes in expression (up to -40% in the enhancer, and -50% in the promoter). The same variant often had different effects across cell types and states, demonstrating a highly tunable regulatory landscape. We use these data to benchmark performance of sequence-based predictive models of gene regulation, and find that certain types of variants are not accurately predicted by existing models. Finally, we computationally design 185 small sequence variants (≤10 bp) and optimize them for specific effects on expression in silico. 84% of these rationally designed edits showed the intended direction of effect, and some had dramatic effects on expression (-100% to +202%). Variant-FlowFISH thus provides a powerful tool to map the effects of variants and transcription factor binding sites on gene expression, test and improve computational models of gene regulation, and reprogram regulatory DNA.

15.

An encyclopedia of enhancer-gene regulatory interactions in the human genome.

Gschwind, Andreas R; Mualim, Kristy S; Karbalayghareh, Alireza; Sheth, Maya U; Dey, Kushal K; Jagoda, Evelyn; Nurtdinov, Ramil N; Xi, Wang; Tan, Anthony S; Jones, Hank; Ma, X Rosa; Yao, David; Nasser, Joseph; Avsec, Ziga; James, Benjamin T; Shamim, Muhammad S; Durand, Neva C; Rao, Suhas S P; Mahajan, Ragini; Doughty, Benjamin R; Andreeva, Kalina; Ulirsch, Jacob C; Fan, Kaili; Perez, Elizabeth M; Nguyen, Tri C; Kelley, David R; Finucane, Hilary K; Moore, Jill E; Weng, Zhiping; Kellis, Manolis; Bassik, Michael C; Price, Alkes L; Beer, Michael A; Guigó, Roderic; Stamatoyannopoulos, John A; Lieberman Aiden, Erez; Greenleaf, William J; Leslie, Christina S; Steinmetz, Lars M; Kundaje, Anshul; Engreitz, Jesse M.

bioRxiv ; 2023 Nov 13.

Artículo en Inglés | MEDLINE | ID: mdl-38014075

RESUMEN

Identifying transcriptional enhancers and their target genes is essential for understanding gene regulation and the impact of human genetic variation on disease1-6. Here we create and evaluate a resource of >13 million enhancer-gene regulatory interactions across 352 cell types and tissues, by integrating predictive models, measurements of chromatin state and 3D contacts, and largescale genetic perturbations generated by the ENCODE Consortium7. We first create a systematic benchmarking pipeline to compare predictive models, assembling a dataset of 10,411 elementgene pairs measured in CRISPR perturbation experiments, >30,000 fine-mapped eQTLs, and 569 fine-mapped GWAS variants linked to a likely causal gene. Using this framework, we develop a new predictive model, ENCODE-rE2G, that achieves state-of-the-art performance across multiple prediction tasks, demonstrating a strategy involving iterative perturbations and supervised machine learning to build increasingly accurate predictive models of enhancer regulation. Using the ENCODE-rE2G model, we build an encyclopedia of enhancer-gene regulatory interactions in the human genome, which reveals global properties of enhancer networks, identifies differences in the functions of genes that have more or less complex regulatory landscapes, and improves analyses to link noncoding variants to target genes and cell types for common, complex diseases. By interpreting the model, we find evidence that, beyond enhancer activity and 3D enhancer-promoter contacts, additional features guide enhancerpromoter communication including promoter class and enhancer-enhancer synergy. Altogether, these genome-wide maps of enhancer-gene regulatory interactions, benchmarking software, predictive models, and insights about enhancer function provide a valuable resource for future studies of gene regulation and human genetics.

16.

The genetic and biochemical determinants of mRNA degradation rates in mammals.

Agarwal, Vikram; Kelley, David R.

Genome Biol ; 23(1): 245, 2022 11 23.

Artículo en Inglés | MEDLINE | ID: mdl-36419176

RESUMEN

BACKGROUND: Degradation rate is a fundamental aspect of mRNA metabolism, and the factors governing it remain poorly characterized. Understanding the genetic and biochemical determinants of mRNA half-life would enable more precise identification of variants that perturb gene expression through post-transcriptional gene regulatory mechanisms. RESULTS: We establish a compendium of 39 human and 27 mouse transcriptome-wide mRNA decay rate datasets. A meta-analysis of these data identified a prevalence of technical noise and measurement bias, induced partially by the underlying experimental strategy. Correcting for these biases allowed us to derive more precise, consensus measurements of half-life which exhibit enhanced consistency between species. We trained substantially improved statistical models based upon genetic and biochemical features to better predict half-life and characterize the factors molding it. Our state-of-the-art model, Saluki, is a hybrid convolutional and recurrent deep neural network which relies only upon an mRNA sequence annotated with coding frame and splice sites to predict half-life (r=0.77). The key novel principle learned by Saluki is that the spatial positioning of splice sites, codons, and RNA-binding motifs within an mRNA is strongly associated with mRNA half-life. Saluki predicts the impact of RNA sequences and genetic mutations therein on mRNA stability, in agreement with functional measurements derived from massively parallel reporter assays. CONCLUSIONS: Our work produces a more robust ground truth for transcriptome-wide mRNA half-lives in mammalian cells. Using these revised measurements, we trained Saluki, a model that is over 50% more accurate in predicting half-life from sequence than existing models. Saluki succinctly captures many of the known determinants of mRNA half-life and can be rapidly deployed to predict the functional consequences of arbitrary mutations in the transcriptome.

Asunto(s)

Mamíferos , Estabilidad del ARN , Humanos , Animales , Ratones , Mamíferos/genética , ARN Mensajero/genética , Transcriptoma , Bioensayo

17.

Novel insights from a multiomics dissection of the Hayflick limit.

Chan, Michelle; Yuan, Han; Soifer, Ilya; Maile, Tobias M; Wang, Rebecca Y; Ireland, Andrea; O'Brien, Jonathon J; Goudeau, Jérôme; Chan, Leanne J G; Vijay, Twaritha; Freund, Adam; Kenyon, Cynthia; Bennett, Bryson D; McAllister, Fiona E; Kelley, David R; Roy, Margaret; Cohen, Robert L; Levinson, Arthur D; Botstein, David; Hendrickson, David G.

Elife ; 112022 02 04.

Artículo en Inglés | MEDLINE | ID: mdl-35119359

RESUMEN

The process wherein dividing cells exhaust proliferative capacity and enter into replicative senescence has become a prominent model for cellular aging in vitro. Despite decades of study, this cellular state is not fully understood in culture and even much less so during aging. Here, we revisit Leonard Hayflick's original observation of replicative senescence in WI-38 human lung fibroblasts equipped with a battery of modern techniques including RNA-seq, single-cell RNA-seq, proteomics, metabolomics, and ATAC-seq. We find evidence that the transition to a senescent state manifests early, increases gradually, and corresponds to a concomitant global increase in DNA accessibility in nucleolar and lamin associated domains. Furthermore, we demonstrate that senescent WI-38 cells acquire a striking resemblance to myofibroblasts in a process similar to the epithelial to mesenchymal transition (EMT) that is regulated by t YAP1/TEAD1 and TGF-ß2. Lastly, we show that verteporfin inhibition of YAP1/TEAD1 activity in aged WI-38 cells robustly attenuates this gene expression program.

Asunto(s)

Senescencia Celular , Transición Epitelial-Mesenquimal , Anciano , Envejecimiento/fisiología , Línea Celular , Senescencia Celular/genética , Fibroblastos/metabolismo , Humanos

18.

The landscape of alternative polyadenylation in single cells of the developing mouse embryo.

Agarwal, Vikram; Lopez-Darwin, Sereno; Kelley, David R; Shendure, Jay.

Nat Commun ; 12(1): 5101, 2021 08 24.

Artículo en Inglés | MEDLINE | ID: mdl-34429411

RESUMEN

3' untranslated regions (3' UTRs) post-transcriptionally regulate mRNA stability, localization, and translation rate. While 3'-UTR isoforms have been globally quantified in limited cell types using bulk measurements, their differential usage among cell types during mammalian development remains poorly characterized. In this study, we examine a dataset comprising ~2 million nuclei spanning E9.5-E13.5 of mouse embryonic development to quantify transcriptome-wide changes in alternative polyadenylation (APA). We observe a global lengthening of 3' UTRs across embryonic stages in all cell types, although we detect shorter 3' UTRs in hematopoietic lineages and longer 3' UTRs in neuronal cell types within each stage. An analysis of RNA-binding protein (RBP) dynamics identifies ELAV-like family members, which are concomitantly induced in neuronal lineages and developmental stages experiencing 3'-UTR lengthening, as putative regulators of APA. By measuring 3'-UTR isoforms in an expansive single cell dataset, our work provides a transcriptome-wide and organism-wide map of the dynamic landscape of alternative polyadenylation during mammalian organogenesis.

Asunto(s)

Desarrollo Embrionario/genética , Desarrollo Embrionario/fisiología , Poliadenilación , Regiones no Traducidas 3' , Animales , Regulación del Desarrollo de la Expresión Génica , Ratones , Células 3T3 NIH , Neuronas/metabolismo , Organogénesis , Isoformas de Proteínas , Estabilidad del ARN , Proteínas de Unión al ARN/metabolismo , Transcriptoma

19.

Differentiation reveals latent features of aging and an energy barrier in murine myogenesis.

Kimmel, Jacob C; Yi, Nelda; Roy, Margaret; Hendrickson, David G; Kelley, David R.

Cell Rep ; 35(4): 109046, 2021 04 27.

Artículo en Inglés | MEDLINE | ID: mdl-33910007

RESUMEN

Skeletal muscle experiences a decline in lean mass and regenerative potential with age, in part due to intrinsic changes in progenitor cells. However, it remains unclear how age-related changes in progenitors manifest across a differentiation trajectory. Here, we perform single-cell RNA sequencing (RNA-seq) on muscle mononuclear cells from young and aged mice and profile muscle stem cells (MuSCs) and fibro-adipose progenitors (FAPs) after differentiation. Differentiation increases the magnitude of age-related change in MuSCs and FAPs, but it also masks a subset of age-related changes present in progenitors. Using a dynamical systems approach and RNA velocity, we find that aged MuSCs follow the same differentiation trajectory as young cells but stall in differentiation near a commitment decision. Our results suggest that differentiation reveals latent features of aging and that fate commitment decisions are delayed in aged myogenic cells in vitro.

Asunto(s)

Envejecimiento/genética , Desarrollo de Músculos/genética , Animales , Diferenciación Celular , Células Cultivadas , Ratones

20.

Leveraging supervised learning for functionally informed fine-mapping of cis-eQTLs identifies an additional 20,913 putative causal eQTLs.

Wang, Qingbo S; Kelley, David R; Ulirsch, Jacob; Kanai, Masahiro; Sadhuka, Shuvom; Cui, Ran; Albors, Carlos; Cheng, Nathan; Okada, Yukinori; Aguet, Francois; Ardlie, Kristin G; MacArthur, Daniel G; Finucane, Hilary K.

Nat Commun ; 12(1): 3394, 2021 06 07.

Artículo en Inglés | MEDLINE | ID: mdl-34099641

RESUMEN

The large majority of variants identified by GWAS are non-coding, motivating detailed characterization of the function of non-coding variants. Experimental methods to assess variants' effect on gene expressions in native chromatin context via direct perturbation are low-throughput. Existing high-throughput computational predictors thus have lacked large gold standard sets of regulatory variants for training and validation. Here, we leverage a set of 14,807 putative causal eQTLs in humans obtained through statistical fine-mapping, and we use 6121 features to directly train a predictor of whether a variant modifies nearby gene expression. We call the resulting prediction the expression modifier score (EMS). We validate EMS by comparing its ability to prioritize functional variants with other major scores. We then use EMS as a prior for statistical fine-mapping of eQTLs to identify an additional 20,913 putatively causal eQTLs, and we incorporate EMS into co-localization analysis to identify 310 additional candidate genes across UK Biobank phenotypes.

Asunto(s)

Mapeo Cromosómico/métodos , Biología Computacional/métodos , Sitios de Carácter Cuantitativo , Aprendizaje Automático Supervisado , Adulto , Estudios de Cohortes , Conjuntos de Datos como Asunto , Perfilación de la Expresión Génica , Humanos , Polimorfismo de Nucleótido Simple

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA