Búsqueda | Biblioteca Virtual en Salud Odontología. Uruguay

Predicting Splicing from Primary Sequence with Deep Learning.

Jaganathan, Kishore; Kyriazopoulou Panagiotopoulou, Sofia; McRae, Jeremy F; Darbandi, Siavash Fazel; Knowles, David; Li, Yang I; Kosmicki, Jack A; Arbelaez, Juan; Cui, Wenwu; Schwartz, Grace B; Chow, Eric D; Kanterakis, Efstathios; Gao, Hong; Kia, Amirali; Batzoglou, Serafim; Sanders, Stephan J; Farh, Kyle Kai-How.

Cell ; 176(3): 535-548.e24, 2019 01 24.

Artículo en Inglés | MEDLINE | ID: mdl-30661751

RESUMEN

The splicing of pre-mRNAs into mature transcripts is remarkable for its precision, but the mechanisms by which the cellular machinery achieves such specificity are incompletely understood. Here, we describe a deep neural network that accurately predicts splice junctions from an arbitrary pre-mRNA transcript sequence, enabling precise prediction of noncoding genetic variants that cause cryptic splicing. Synonymous and intronic mutations with predicted splice-altering consequence validate at a high rate on RNA-seq and are strongly deleterious in the human population. De novo mutations with predicted splice-altering consequence are significantly enriched in patients with autism and intellectual disability compared to healthy controls and validate against RNA-seq in 21 out of 28 of these patients. We estimate that 9%-11% of pathogenic mutations in patients with rare genetic disorders are caused by this previously underappreciated class of disease variation.

Asunto(s)

Predicción/métodos , Precursores del ARN/genética , Empalme del ARN/genética , Algoritmos , Empalme Alternativo/genética , Trastorno Autístico/genética , Aprendizaje Profundo , Exones/genética , Humanos , Discapacidad Intelectual/genética , Intrones/genética , Redes Neurales de la Computación , Precursores del ARN/metabolismo , Sitios de Empalme de ARN/genética , Sitios de Empalme de ARN/fisiología

Resolving the full spectrum of human genome variation using Linked-Reads.

Marks, Patrick; Garcia, Sarah; Barrio, Alvaro Martinez; Belhocine, Kamila; Bernate, Jorge; Bharadwaj, Rajiv; Bjornson, Keith; Catalanotti, Claudia; Delaney, Josh; Fehr, Adrian; Fiddes, Ian T; Galvin, Brendan; Heaton, Haynes; Herschleb, Jill; Hindson, Christopher; Holt, Esty; Jabara, Cassandra B; Jett, Susanna; Keivanfar, Nikka; Kyriazopoulou-Panagiotopoulou, Sofia; Lek, Monkol; Lin, Bill; Lowe, Adam; Mahamdallie, Shazia; Maheshwari, Shamoni; Makarewicz, Tony; Marshall, Jamie; Meschi, Francesca; O'Keefe, Christopher J; Ordonez, Heather; Patel, Pranav; Price, Andrew; Royall, Ariel; Ruark, Elise; Seal, Sheila; Schnall-Levin, Michael; Shah, Preyas; Stafford, David; Williams, Stephen; Wu, Indira; Xu, Andrew Wei; Rahman, Nazneen; MacArthur, Daniel; Church, Deanna M.

Genome Res ; 29(4): 635-645, 2019 04.

Artículo en Inglés | MEDLINE | ID: mdl-30894395

RESUMEN

Large-scale population analyses coupled with advances in technology have demonstrated that the human genome is more diverse than originally thought. To date, this diversity has largely been uncovered using short-read whole-genome sequencing. However, these short-read approaches fail to give a complete picture of a genome. They struggle to identify structural events, cannot access repetitive regions, and fail to resolve the human genome into haplotypes. Here, we describe an approach that retains long range information while maintaining the advantages of short reads. Starting from â¼1 ng of high molecular weight DNA, we produce barcoded short-read libraries. Novel informatic approaches allow for the barcoded short reads to be associated with their original long molecules producing a novel data type known as "Linked-Reads". This approach allows for simultaneous detection of small and large variants from a single library. In this manuscript, we show the advantages of Linked-Reads over standard short-read approaches for reference-based analysis. Linked-Reads allow mapping to 38 Mb of sequence not accessible to short reads, adding sequence in 423 difficult-to-sequence genes including disease-relevant genes STRC, SMN1, and SMN2 Both Linked-Read whole-genome and whole-exome sequencing identify complex structural variations, including balanced events and single exon deletions and duplications. Further, Linked-Reads extend the region of high-confidence calls by 68.9 Mb. The data presented here show that Linked-Reads provide a scalable approach for comprehensive genome analysis that is not possible using short reads alone.

Asunto(s)

Estudio de Asociación del Genoma Completo/métodos , Polimorfismo Genético , Secuenciación Completa del Genoma/métodos , Línea Celular , Genoma Humano , Humanos , Péptidos y Proteínas de Señalización Intercelular , Proteínas de la Membrana/genética , Proteína 1 para la Supervivencia de la Neurona Motora/genética , Proteína 2 para la Supervivencia de la Neurona Motora/genética

Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements.

Kundaje, Anshul; Kyriazopoulou-Panagiotopoulou, Sofia; Libbrecht, Max; Smith, Cheryl L; Raha, Debasish; Winters, Elliott E; Johnson, Steven M; Snyder, Michael; Batzoglou, Serafim; Sidow, Arend.

Genome Res ; 22(9): 1735-47, 2012 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-22955985

RESUMEN

Gene regulation at functional elements (e.g., enhancers, promoters, insulators) is governed by an interplay of nucleosome remodeling, histone modifications, and transcription factor binding. To enhance our understanding of gene regulation, the ENCODE Consortium has generated a wealth of ChIP-seq data on DNA-binding proteins and histone modifications. We additionally generated nucleosome positioning data on two cell lines, K562 and GM12878, by MNase digestion and high-depth sequencing. Here we relate 14 chromatin signals (12 histone marks, DNase, and nucleosome positioning) to the binding sites of 119 DNA-binding proteins across a large number of cell lines. We developed a new method for unsupervised pattern discovery, the Clustered AGgregation Tool (CAGT), which accounts for the inherent heterogeneity in signal magnitude, shape, and implicit strand orientation of chromatin marks. We applied CAGT on a total of 5084 data set pairs to obtain an exhaustive catalog of high-resolution patterns of histone modifications and nucleosome positioning signals around bound transcription factors. Our analyses reveal extensive heterogeneity in how histone modifications are deposited, and how nucleosomes are positioned around binding sites. With the exception of the CTCF/cohesin complex, asymmetry of nucleosome positioning is predominant. Asymmetry of histone modifications is also widespread, for all types of chromatin marks examined, including promoter, enhancer, elongation, and repressive marks. The fine-resolution signal shapes discovered by CAGT unveiled novel correlation patterns between chromatin marks, nucleosome positioning, and sequence content. Meta-analyses of the signal profiles revealed a common vocabulary of chromatin signals shared across multiple cell lines and binding proteins.

Asunto(s)

Ensamble y Desensamble de Cromatina , Heterogeneidad Genética , Secuencias Reguladoras de Ácidos Nucleicos , Sitios de Unión/genética , Línea Celular , Análisis por Conglomerados , Biología Computacional/métodos , Humanos , Células K562 , Nucleosomas/genética , Nucleosomas/metabolismo , Unión Proteica , Programas Informáticos , Sitio de Iniciación de la Transcripción

Reconstruction of genealogical relationships with applications to Phase III of HapMap.

Kyriazopoulou-Panagiotopoulou, Sofia; Kashef Haghighi, Dorna; Aerni, Sarah J; Sundquist, Andreas; Bercovici, Sivan; Batzoglou, Serafim.

Bioinformatics ; 27(13): i333-41, 2011 Jul 01.

Artículo en Inglés | MEDLINE | ID: mdl-21685089

RESUMEN

MOTIVATION: Accurate inference of genealogical relationships between pairs of individuals is paramount in association studies, forensics and evolutionary analyses of wildlife populations. Current methods for relationship inference consider only a small set of close relationships and have limited to no power to distinguish between relationships with the same number of meioses separating the individuals under consideration (e.g. aunt-niece versus niece-aunt or first cousins versus great aunt-niece). RESULTS: We present CARROT (ClAssification of Relationships with ROTations), a novel framework for relationship inference that leverages linkage information to differentiate between rotated relationships, that is, between relationships with the same number of common ancestors and the same number of meioses separating the individuals under consideration. We demonstrate that CARROT clearly outperforms existing methods on simulated data. We also applied CARROT on four populations from Phase III of the HapMap Project and detected previously unreported pairs of third- and fourth-degree relatives. AVAILABILITY: Source code for CARROT is freely available at http://carrot.stanford.edu. CONTACT: sofiakp@stanford.edu.

Asunto(s)

Algoritmos , Genealogía y Heráldica , Animales , Humanos , Cadenas de Markov

Haplotyping germline and cancer genomes with high-throughput linked-read sequencing.

Zheng, Grace X Y; Lau, Billy T; Schnall-Levin, Michael; Jarosz, Mirna; Bell, John M; Hindson, Christopher M; Kyriazopoulou-Panagiotopoulou, Sofia; Masquelier, Donald A; Merrill, Landon; Terry, Jessica M; Mudivarti, Patrice A; Wyatt, Paul W; Bharadwaj, Rajiv; Makarewicz, Anthony J; Li, Yuan; Belgrader, Phillip; Price, Andrew D; Lowe, Adam J; Marks, Patrick; Vurens, Gerard M; Hardenbol, Paul; Montesclaros, Luz; Luo, Melissa; Greenfield, Lawrence; Wong, Alexander; Birch, David E; Short, Steven W; Bjornson, Keith P; Patel, Pranav; Hopmans, Erik S; Wood, Christina; Kaur, Sukhvinder; Lockwood, Glenn K; Stafford, David; Delaney, Joshua P; Wu, Indira; Ordonez, Heather S; Grimes, Susan M; Greer, Stephanie; Lee, Josephine Y; Belhocine, Kamila; Giorda, Kristina M; Heaton, William H; McDermott, Geoffrey P; Bent, Zachary W; Meschi, Francesca; Kondov, Nikola O; Wilson, Ryan; Bernate, Jorge A; Gauby, Shawn.

Nat Biotechnol ; 34(3): 303-11, 2016 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-26829319

RESUMEN

Haplotyping of human chromosomes is a prerequisite for cataloguing the full repertoire of genetic variation. We present a microfluidics-based, linked-read sequencing technology that can phase and haplotype germline and cancer genomes using nanograms of input DNA. This high-throughput platform prepares barcoded libraries for short-read sequencing and computationally reconstructs long-range haplotype and structural variant information. We generate haplotype blocks in a nuclear trio that are concordant with expected inheritance patterns and phase a set of structural variants. We also resolve the structure of the EML4-ALK gene fusion in the NCI-H2228 cancer cell line using phased exome sequencing. Finally, we assign genetic aberrations to specific megabase-scale haplotypes generated from whole-genome sequencing of a primary colorectal adenocarcinoma. This approach resolves haplotype information using up to 100 times less genomic DNA than some methods and enables the accurate detection of structural variants.

Asunto(s)

Haplotipos/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Neoplasias/genética , Análisis de Secuencia de ADN/métodos , ADN/genética , Genoma Humano , Variación Estructural del Genoma , Células Germinativas , Humanos , Conformación de Ácido Nucleico , Proteínas de Fusión Oncogénica/genética , Polimorfismo de Nucleótido Simple

Extensive sequencing of seven human genomes to characterize benchmark reference materials.

Zook, Justin M; Catoe, David; McDaniel, Jennifer; Vang, Lindsay; Spies, Noah; Sidow, Arend; Weng, Ziming; Liu, Yuling; Mason, Christopher E; Alexander, Noah; Henaff, Elizabeth; McIntyre, Alexa B R; Chandramohan, Dhruva; Chen, Feng; Jaeger, Erich; Moshrefi, Ali; Pham, Khoa; Stedman, William; Liang, Tiffany; Saghbini, Michael; Dzakula, Zeljko; Hastie, Alex; Cao, Han; Deikus, Gintaras; Schadt, Eric; Sebra, Robert; Bashir, Ali; Truty, Rebecca M; Chang, Christopher C; Gulbahce, Natali; Zhao, Keyan; Ghosh, Srinka; Hyland, Fiona; Fu, Yutao; Chaisson, Mark; Xiao, Chunlin; Trow, Jonathan; Sherry, Stephen T; Zaranek, Alexander W; Ball, Madeleine; Bobe, Jason; Estep, Preston; Church, George M; Marks, Patrick; Kyriazopoulou-Panagiotopoulou, Sofia; Zheng, Grace X Y; Schnall-Levin, Michael; Ordonez, Heather S; Mudivarti, Patrice A; Giorda, Kristina.

Sci Data ; 3: 160025, 2016 Jun 07.

Artículo en Inglés | MEDLINE | ID: mdl-27271295

RESUMEN

The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.

Asunto(s)

Benchmarking , Genoma Humano , Exoma , Genómica , Humanos , Mutación INDEL

Extensive variation in chromatin states across humans.

Kasowski, Maya; Kyriazopoulou-Panagiotopoulou, Sofia; Grubert, Fabian; Zaugg, Judith B; Kundaje, Anshul; Liu, Yuling; Boyle, Alan P; Zhang, Qiangfeng Cliff; Zakharia, Fouad; Spacek, Damek V; Li, Jingjing; Xie, Dan; Olarerin-George, Anthony; Steinmetz, Lars M; Hogenesch, John B; Kellis, Manolis; Batzoglou, Serafim; Snyder, Michael.

Science ; 342(6159): 750-2, 2013 Nov 08.

Artículo en Inglés | MEDLINE | ID: mdl-24136358

RESUMEN

The majority of disease-associated variants lie outside protein-coding regions, suggesting a link between variation in regulatory regions and disease predisposition. We studied differences in chromatin states using five histone modifications, cohesin, and CTCF in lymphoblastoid lines from 19 individuals of diverse ancestry. We found extensive signal variation in regulatory regions, which often switch between active and repressed states across individuals. Enhancer activity is particularly diverse among individuals, whereas gene expression remains relatively stable. Chromatin variability shows genetic inheritance in trios, correlates with genetic variation and population divergence, and is associated with disruptions of transcription factor binding motifs. Overall, our results provide insights into chromatin variation among humans.

Asunto(s)

Cromatina/genética , Cromatina/metabolismo , Regulación de la Expresión Génica , Predisposición Genética a la Enfermedad/genética , Sitios de Unión , Factor de Unión a CCCTC , Proteínas de Ciclo Celular/genética , Proteínas de Ciclo Celular/metabolismo , Línea Celular Tumoral , Proteínas Cromosómicas no Histona/genética , Proteínas Cromosómicas no Histona/metabolismo , Elementos de Facilitación Genéticos/genética , Variación Genética , Histonas/genética , Histonas/metabolismo , Humanos , Proteínas Represoras/genética , Proteínas Represoras/metabolismo , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Cohesinas

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA