Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Nature ; 608(7922): 353-359, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35922509

RESUMO

Regulation of transcript structure generates transcript diversity and plays an important role in human disease1-7. The advent of long-read sequencing technologies offers the opportunity to study the role of genetic variation in transcript structure8-16. In this Article, we present a large human long-read RNA-seq dataset using the Oxford Nanopore Technologies platform from 88 samples from Genotype-Tissue Expression (GTEx) tissues and cell lines, complementing the GTEx resource. We identified just over 70,000 novel transcripts for annotated genes, and validated the protein expression of 10% of novel transcripts. We developed a new computational package, LORALS, to analyse the genetic effects of rare and common variants on the transcriptome by allele-specific analysis of long reads. We characterized allele-specific expression and transcript structure events, providing new insights into the specific transcript alterations caused by common and rare genetic variants and highlighting the resolution gained from long-read data. We were able to perturb the transcript structure upon knockdown of PTBP1, an RNA binding protein that mediates splicing, thereby finding genetic regulatory effects that are modified by the cellular environment. Finally, we used this dataset to enhance variant interpretation and study rare variants leading to aberrant splicing patterns.


Assuntos
Alelos , Perfilação da Expressão Gênica , Especificidade de Órgãos , RNA-Seq , Transcriptoma , Processamento Alternativo/genética , Linhagem Celular , Conjuntos de Dados como Assunto , Genótipo , Ribonucleoproteínas Nucleares Heterogêneas/deficiência , Ribonucleoproteínas Nucleares Heterogêneas/genética , Humanos , Especificidade de Órgãos/genética , Proteína de Ligação a Regiões Ricas em Polipirimidinas/deficiência , Proteína de Ligação a Regiões Ricas em Polipirimidinas/genética , Reprodutibilidade dos Testes , Transcriptoma/genética
2.
Am J Hum Genet ; 111(5): 863-876, 2024 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-38565148

RESUMO

Copy number variants (CNVs) are significant contributors to the pathogenicity of rare genetic diseases and, with new innovative methods, can now reliably be identified from exome sequencing. Challenges still remain in accurate classification of CNV pathogenicity. CNV calling using GATK-gCNV was performed on exomes from a cohort of 6,633 families (15,759 individuals) with heterogeneous phenotypes and variable prior genetic testing collected at the Broad Institute Center for Mendelian Genomics of the Genomics Research to Elucidate the Genetics of Rare Diseases consortium and analyzed using the seqr platform. The addition of CNV detection to exome analysis identified causal CNVs for 171 families (2.6%). The estimated sizes of CNVs ranged from 293 bp to 80 Mb. The causal CNVs consisted of 140 deletions, 15 duplications, 3 suspected complex structural variants (SVs), 3 insertions, and 10 complex SVs, the latter two groups being identified by orthogonal confirmation methods. To classify CNV variant pathogenicity, we used the 2020 American College of Medical Genetics and Genomics/ClinGen CNV interpretation standards and developed additional criteria to evaluate allelic and functional data as well as variants on the X chromosome to further advance the framework. We interpreted 151 CNVs as likely pathogenic/pathogenic and 20 CNVs as high-interest variants of uncertain significance. Calling CNVs from existing exome data increases the diagnostic yield for individuals undiagnosed after standard testing approaches, providing a higher-resolution alternative to arrays at a fraction of the cost of genome sequencing. Our improvements to the classification approach advances the systematic framework to assess the pathogenicity of CNVs.


Assuntos
Variações do Número de Cópias de DNA , Sequenciamento do Exoma , Exoma , Doenças Raras , Humanos , Variações do Número de Cópias de DNA/genética , Doenças Raras/genética , Doenças Raras/diagnóstico , Exoma/genética , Masculino , Feminino , Estudos de Coortes , Testes Genéticos/métodos
3.
N Engl J Med ; 391(16): 1511-1518, 2024 Oct 24.
Artigo em Inglês | MEDLINE | ID: mdl-39442041

RESUMO

CHASERR encodes a human long noncoding RNA (lncRNA) adjacent to CHD2, a coding gene in which de novo loss-of-function variants cause developmental and epileptic encephalopathy. Here, we report our findings in three unrelated children with a syndromic, early-onset neurodevelopmental disorder, each of whom had a de novo deletion in the CHASERR locus. The children had severe encephalopathy, shared facial dysmorphisms, cortical atrophy, and cerebral hypomyelination - a phenotype that is distinct from the phenotypes of patients with CHD2 haploinsufficiency. We found that the CHASERR deletion results in increased CHD2 protein abundance in patient-derived cell lines and increased expression of the CHD2 transcript in cis. These findings indicate that CHD2 has bidirectional dosage sensitivity in human disease, and we recommend that other lncRNA-encoding genes be evaluated, particularly those upstream of genes associated with mendelian disorders. (Funded by the National Human Genome Research Institute and others.).


Assuntos
Transtornos do Neurodesenvolvimento , RNA Longo não Codificante , Pré-Escolar , Feminino , Humanos , Lactente , Masculino , Encéfalo/patologia , Encéfalo/diagnóstico por imagem , Encéfalo/metabolismo , Proteínas de Ligação a DNA/análise , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Deleção de Genes , Haploinsuficiência , Transtornos do Neurodesenvolvimento/diagnóstico , Transtornos do Neurodesenvolvimento/genética , Transtornos do Neurodesenvolvimento/patologia , Fenótipo , RNA Longo não Codificante/genética , Deleção de Sequência
4.
Genome Res ; 33(12): 2029-2040, 2023 12 27.
Artigo em Inglês | MEDLINE | ID: mdl-38190646

RESUMO

Advances in long-read sequencing (LRS) technologies continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant-calling precision and recall of Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant-calling precision and recall of SVs and indels in HiFi data sets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant call sets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.


Assuntos
Genômica , Nanoporos , Mutação INDEL , Sequenciamento Completo do Genoma
5.
Nat Methods ; 20(4): 559-568, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36959322

RESUMO

Structural variants (SVs) are a major driver of genetic diversity and disease in the human genome and their discovery is imperative to advances in precision medicine. Existing SV callers rely on hand-engineered features and heuristics to model SVs, which cannot scale to the vast diversity of SVs nor fully harness the information available in sequencing datasets. Here we propose an extensible deep-learning framework, Cue, to call and genotype SVs that can learn complex SV abstractions directly from the data. At a high level, Cue converts alignments to images that encode SV-informative signals and uses a stacked hourglass convolutional neural network to predict the type, genotype and genomic locus of the SVs captured in each image. We show that Cue outperforms the state of the art in the detection of several classes of SVs on synthetic and real short-read data and that it can be easily extended to other sequencing platforms, while achieving competitive performance.


Assuntos
Aprendizado Profundo , Software , Humanos , Genótipo , Sinais (Psicologia) , Variação Estrutural do Genoma , Genoma Humano
6.
Nature ; 586(7828): 292-298, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32999459

RESUMO

The RecQ DNA helicase WRN is a synthetic lethal target for cancer cells with microsatellite instability (MSI), a form of genetic hypermutability that arises from impaired mismatch repair1-4. Depletion of WRN induces widespread DNA double-strand breaks in MSI cells, leading to cell cycle arrest and/or apoptosis. However, the mechanism by which WRN protects MSI-associated cancers from double-strand breaks remains unclear. Here we show that TA-dinucleotide repeats are highly unstable in MSI cells and undergo large-scale expansions, distinct from previously described insertion or deletion mutations of a few nucleotides5. Expanded TA repeats form non-B DNA secondary structures that stall replication forks, activate the ATR checkpoint kinase, and require unwinding by the WRN helicase. In the absence of WRN, the expanded TA-dinucleotide repeats are susceptible to cleavage by the MUS81 nuclease, leading to massive chromosome shattering. These findings identify a distinct biomarker that underlies the synthetic lethal dependence on WRN, and support the development of therapeutic agents that target WRN for MSI-associated cancers.


Assuntos
Quebras de DNA de Cadeia Dupla , Expansão das Repetições de DNA/genética , Repetições de Dinucleotídeos/genética , Neoplasias/genética , Helicase da Síndrome de Werner/metabolismo , Proteínas Mutadas de Ataxia Telangiectasia/metabolismo , Linhagem Celular Tumoral , Cromossomos Humanos/genética , Cromossomos Humanos/metabolismo , Cromotripsia , Clivagem do DNA , Replicação do DNA , Proteínas de Ligação a DNA/metabolismo , Endodesoxirribonucleases/metabolismo , Endonucleases/metabolismo , Instabilidade Genômica , Humanos , Recombinases/metabolismo
7.
Proc Natl Acad Sci U S A ; 120(11): e2212270120, 2023 03 14.
Artigo em Inglês | MEDLINE | ID: mdl-36877833

RESUMO

Recently, social media platforms are heavily moderated to prevent the spread of online hate speech, which is usually fertile in toxic words and is directed toward an individual or a community. Owing to such heavy moderation, newer and more subtle techniques are being deployed. One of the most striking among these is fear speech. Fear speech, as the name suggests, attempts to incite fear about a target community. Although subtle, it might be highly effective, often pushing communities toward a physical conflict. Therefore, understanding their prevalence in social media is of paramount importance. This article presents a large-scale study to understand the prevalence of 400K fear speech and over 700K hate speech posts collected from Gab.com. Remarkably, users posting a large number of fear speech accrue more followers and occupy more central positions in social networks than users posting a large number of hate speech. They can also reach out to benign users more effectively than hate speech users through replies, reposts, and mentions. This connects to the fact that, unlike hate speech, fear speech has almost zero toxic content, making it look plausible. Moreover, while fear speech topics mostly portray a community as a perpetrator using a (fake) chain of argumentation, hate speech topics hurl direct multitarget insults, thus pointing to why general users could be more gullible to fear speech. Our findings transcend even to other platforms (Twitter and Facebook) and thus necessitate using sophisticated moderation policies and mass awareness to combat fear speech.


Assuntos
Mídias Sociais , Humanos , Fala , Medo , Fertilidade , Ódio
8.
Genome Res ; 32(3): 569-582, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35074858

RESUMO

Genomic databases of allele frequency are extremely helpful for evaluating clinical variants of unknown significance; however, until now, databases such as the Genome Aggregation Database (gnomAD) have focused on nuclear DNA and have ignored the mitochondrial genome (mtDNA). Here, we present a pipeline to call mtDNA variants that addresses three technical challenges: (1) detecting homoplasmic and heteroplasmic variants, present, respectively, in all or a fraction of mtDNA molecules; (2) circular mtDNA genome; and (3) misalignment of nuclear sequences of mitochondrial origin (NUMTs). We observed that mtDNA copy number per cell varied across gnomAD cohorts and influenced the fraction of NUMT-derived false-positive variant calls, which can account for the majority of putative heteroplasmies. To avoid false positives, we excluded contaminated samples, cell lines, and samples prone to NUMT misalignment due to few mtDNA copies. Furthermore, we report variants with heteroplasmy ≥10%. We applied this pipeline to 56,434 whole-genome sequences in the gnomAD v3.1 database that includes individuals of European (58%), African (25%), Latino (10%), and Asian (5%) ancestry. Our gnomAD v3.1 release contains population frequencies for 10,850 unique mtDNA variants at more than half of all mtDNA bases. Importantly, we report frequencies within each nuclear ancestral population and mitochondrial haplogroup. Homoplasmic variants account for most variant calls (98%) and unique variants (85%). We observed that 1/250 individuals carry a pathogenic mtDNA variant with heteroplasmy above 10%. These mtDNA population allele frequencies are freely accessible and will aid in diagnostic interpretation and research studies.


Assuntos
DNA Mitocondrial , Genoma Mitocondrial , Núcleo Celular/genética , DNA Mitocondrial/genética , Frequência do Gene , Genoma , Humanos , Mitocôndrias/genética , Análise de Sequência de DNA
10.
Genome Res ; 30(8): 1154-1169, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32817236

RESUMO

The characterization of de novo mutations in regions of high sequence and structural diversity from whole-genome sequencing data remains highly challenging. Complex structural variants tend to arise in regions of high repetitiveness and low complexity, challenging both de novo assembly, in which short reads do not capture the long-range context required for resolution, and mapping approaches, in which improper alignment of reads to a reference genome that is highly diverged from that of the sample can lead to false or partial calls. Long-read technologies can potentially solve such problems but are currently unfeasible to use at scale. Here we present Corticall, a graph-based method that combines the advantages of multiple technologies and prior data sources to detect arbitrary classes of genetic variant. We construct multisample, colored de Bruijn graphs from short-read data for all samples, align long-read-derived haplotypes and multiple reference data sources to restore graph connectivity information, and call variants using graph path-finding algorithms and a model for simultaneous alignment and recombination. We validate and evaluate the approach using extensive simulations and use it to characterize the rate and spectrum of de novo mutation events in 119 progeny from four Plasmodium falciparum experimental crosses, using long-read data on the parents to inform reconstructions of the progeny and to detect several known and novel nonallelic homologous recombination events.


Assuntos
Genoma de Protozoário/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mutação/genética , Plasmodium falciparum/genética , Sequenciamento Completo do Genoma/métodos , Algoritmos , Sequência de Bases , Variação Genética/genética , Alinhamento de Sequência , Análise de Sequência de DNA/métodos , Software
11.
Nature ; 506(7487): 185-90, 2014 Feb 13.
Artigo em Inglês | MEDLINE | ID: mdl-24463508

RESUMO

Schizophrenia is a common disease with a complex aetiology, probably involving multiple and heterogeneous genetic factors. Here, by analysing the exome sequences of 2,536 schizophrenia cases and 2,543 controls, we demonstrate a polygenic burden primarily arising from rare (less than 1 in 10,000), disruptive mutations distributed across many genes. Particularly enriched gene sets include the voltage-gated calcium ion channel and the signalling complex formed by the activity-regulated cytoskeleton-associated scaffold protein (ARC) of the postsynaptic density, sets previously implicated by genome-wide association and copy-number variation studies. Similar to reports in autism, targets of the fragile X mental retardation protein (FMRP, product of FMR1) are enriched for case mutations. No individual gene-based test achieves significance after correction for multiple testing and we do not detect any alleles of moderately low frequency (approximately 0.5 to 1 per cent) and moderately large effect. Taken together, these data suggest that population-based exome sequencing can discover risk alleles and complements established gene-mapping paradigms in neuropsychiatric disease.


Assuntos
Herança Multifatorial/genética , Mutação/genética , Esquizofrenia/genética , Transtorno Autístico/genética , Canais de Cálcio/genética , Proteínas do Citoesqueleto/genética , Variações do Número de Cópias de DNA/genética , Proteína 4 Homóloga a Disks-Large , Feminino , Proteína do X Frágil da Deficiência Intelectual/metabolismo , Estudo de Associação Genômica Ampla , Humanos , Deficiência Intelectual/genética , Peptídeos e Proteínas de Sinalização Intracelular/genética , Masculino , Proteínas de Membrana/genética , Proteínas do Tecido Nervoso/genética , Receptores de N-Metil-D-Aspartato/genética
12.
Bioinformatics ; 34(15): 2556-2565, 2018 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-29554215

RESUMO

Motivation: The de Bruijn graph is a simple and efficient data structure that is used in many areas of sequence analysis including genome assembly, read error correction and variant calling. The data structure has a single parameter k, is straightforward to implement and is tractable for large genomes with high sequencing depth. It also enables representation of multiple samples simultaneously to facilitate comparison. However, unlike the string graph, a de Bruijn graph does not retain long range information that is inherent in the read data. For this reason, applications that rely on de Bruijn graphs can produce sub-optimal results given their input data. Results: We present a novel assembly graph data structure: the Linked de Bruijn Graph (LdBG). Constructed by adding annotations on top of a de Bruijn graph, it stores long range connectivity information through the graph. We show that with error-free data it is possible to losslessly store and recover sequence from a Linked de Bruijn graph. With assembly simulations we demonstrate that the LdBG data structure outperforms both our de Bruijn graph and the String Graph Assembler (SGA). Finally we apply the LdBG to Klebsiella pneumoniae short read data to make large (12 kbp) variant calls, which we validate using PacBio sequencing data, and to characterize the genomic context of drug-resistance genes. Availability and implementation: Linked de Bruijn Graphs and associated algorithms are implemented as part of McCortex, which is available under the MIT license at https://github.com/mcveanlab/mccortex. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Visualização de Dados , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Humanos , Klebsiella pneumoniae/genética
13.
Nature ; 485(7397): 242-5, 2012 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-22495311

RESUMO

Autism spectrum disorders (ASD) are believed to have genetic and environmental origins, yet in only a modest fraction of individuals can specific causes be identified. To identify further genetic risk factors, here we assess the role of de novo mutations in ASD by sequencing the exomes of ASD cases and their parents (n = 175 trios). Fewer than half of the cases (46.3%) carry a missense or nonsense de novo variant, and the overall rate of mutation is only modestly higher than the expected rate. In contrast, the proteins encoded by genes that harboured de novo missense or nonsense mutations showed a higher degree of connectivity among themselves and to previous ASD genes as indexed by protein-protein interaction screens. The small increase in the rate of de novo events, when taken together with the protein interaction results, are consistent with an important but limited role for de novo point mutations in ASD, similar to that documented for de novo copy number variants. Genetic models incorporating these data indicate that most of the observed de novo events are unconnected to ASD; those that do confer risk are distributed across many genes and are incompletely penetrant (that is, not necessarily sufficient for disease). Our results support polygenic models in which spontaneous coding mutations in any of a large number of genes increases risk by 5- to 20-fold. Despite the challenge posed by such models, results from de novo events and a large parallel case-control study provide strong evidence in favour of CHD8 and KATNAL2 as genuine autism risk factors.


Assuntos
Transtorno Autístico/genética , Proteínas de Ligação a DNA/genética , Éxons/genética , Predisposição Genética para Doença/genética , Mutação/genética , Fatores de Transcrição/genética , Estudos de Casos e Controles , Exoma/genética , Saúde da Família , Humanos , Modelos Genéticos , Herança Multifatorial/genética , Fenótipo , Distribuição de Poisson , Mapas de Interação de Proteínas
14.
PLoS Genet ; 9(4): e1003443, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23593035

RESUMO

We report on results from whole-exome sequencing (WES) of 1,039 subjects diagnosed with autism spectrum disorders (ASD) and 870 controls selected from the NIMH repository to be of similar ancestry to cases. The WES data came from two centers using different methods to produce sequence and to call variants from it. Therefore, an initial goal was to ensure the distribution of rare variation was similar for data from different centers. This proved straightforward by filtering called variants by fraction of missing data, read depth, and balance of alternative to reference reads. Results were evaluated using seven samples sequenced at both centers and by results from the association study. Next we addressed how the data and/or results from the centers should be combined. Gene-based analyses of association was an obvious choice, but should statistics for association be combined across centers (meta-analysis) or should data be combined and then analyzed (mega-analysis)? Because of the nature of many gene-based tests, we showed by theory and simulations that mega-analysis has better power than meta-analysis. Finally, before analyzing the data for association, we explored the impact of population structure on rare variant analysis in these data. Like other recent studies, we found evidence that population structure can confound case-control studies by the clustering of rare variants in ancestry space; yet, unlike some recent studies, for these data we found that principal component-based analyses were sufficient to control for ancestry and produce test statistics with appropriate distributions. After using a variety of gene-based tests and both meta- and mega-analysis, we found no new risk genes for ASD in this sample. Our results suggest that standard gene-based tests will require much larger samples of cases and controls before being effective for gene discovery, even for a disorder like ASD.


Assuntos
Transtornos Globais do Desenvolvimento Infantil/genética , Exoma , Estudo de Associação Genômica Ampla , Estudos de Casos e Controles , Criança , Transtornos Globais do Desenvolvimento Infantil/fisiopatologia , Predisposição Genética para Doença , Variação Genética , Humanos , Controle da População , Análise de Sequência de DNA , Software
15.
Nat Biotechnol ; 42(4): 582-586, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37291427

RESUMO

Full-length RNA-sequencing methods using long-read technologies can capture complete transcript isoforms, but their throughput is limited. We introduce multiplexed arrays isoform sequencing (MAS-ISO-seq), a technique for programmably concatenating complementary DNAs (cDNAs) into molecules optimal for long-read sequencing, increasing the throughput >15-fold to nearly 40 million cDNA reads per run on the Sequel IIe sequencer. When applied to single-cell RNA sequencing of tumor-infiltrating T cells, MAS-ISO-seq demonstrated a 12- to 32-fold increase in the discovery of differentially spliced genes.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Isoformas de RNA , DNA Complementar/genética , Isoformas de RNA/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Isoformas de Proteínas/genética , Análise de Sequência de RNA/métodos , Transcriptoma , Perfilação da Expressão Gênica/métodos , RNA/genética
16.
Nat Commun ; 15(1): 32, 2024 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-38167262

RESUMO

Single-cell transcriptomics has become the definitive method for classifying cell types and states, and can be augmented with genotype information to improve cell lineage identification. Due to constraints of short-read sequencing, current methods to detect natural genetic barcodes often require cumbersome primer panels and early commitment to targets. Here we devise a flexible long-read sequencing workflow and analysis pipeline, termed nanoranger, that starts from intermediate single-cell cDNA libraries to detect cell lineage-defining features, including single-nucleotide variants, fusion genes, isoforms, sequences of chimeric antigen and TCRs. Through systematic analysis of these classes of natural 'barcodes', we define the optimal targets for nanoranger, namely those loci close to the 5' end of highly expressed genes with transcript lengths shorter than 4 kB. As proof-of-concept, we apply nanoranger to longitudinal tracking of subclones of acute myeloid leukemia (AML) and describe the heterogeneous isoform landscape of thousands of marrow-infiltrating immune cells. We propose that enhanced cellular genotyping using nanoranger can improve the tracking of single-cell tumor and immune cell co-evolution.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Leucemia Mieloide Aguda , Humanos , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Leucemia Mieloide Aguda/genética , Leucemia Mieloide Aguda/patologia , Fenótipo , Perfilação da Expressão Gênica/métodos
17.
bioRxiv ; 2024 Aug 16.
Artigo em Inglês | MEDLINE | ID: mdl-39185246

RESUMO

Single-cell RNA-seq (scRNA-seq) is emerging as a powerful tool for understanding gene function across diverse cells. Recently, this has included the use of allele-specific expression (ASE) analysis to better understand how variation in the human genome affects RNA expression at the single-cell level. We reasoned that because intronic reads are more prevalent in single-nucleus RNA-Seq (snRNA-Seq), and introns are under lower purifying selection and thus enriched for genetic variants, that snRNA-seq should facilitate single-cell analysis of ASE. Here we demonstrate how experimental and computational choices can improve the results of allelic imbalance analysis. We explore how experimental choices, such as RNA source, read length, sequencing depth, genotyping, etc., impact the power of ASE-based methods. We developed a new suite of computational tools to process and analyze scRNA-seq and snRNA-seq for ASE. As hypothesized, we extracted more ASE information from reads in intronic regions than those in exonic regions and show how read length can be set to increase power. Additionally, hybrid selection improved our power to detect allelic imbalance in genes of interest. We also explored methods to recover allele-specific isoform expression levels from both long- and short-read snRNA-seq. To further investigate ASE in the context of human disease, we applied our methods to a Parkinson's disease cohort of 94 individuals and show that ASE analysis had more power than eQTL analysis to identify significant SNP/gene pairs in our direct comparison of the two methods. Overall, we provide an end-to-end experimental and computational approach for future studies.

18.
medRxiv ; 2024 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-38496558

RESUMO

Genes encoding long non-coding RNAs (lncRNAs) comprise a large fraction of the human genome, yet haploinsufficiency of a lncRNA has not been shown to cause a Mendelian disease. CHASERR is a highly conserved human lncRNA adjacent to CHD2-a coding gene in which de novo loss-of-function variants cause developmental and epileptic encephalopathy. Here we report three unrelated individuals each harboring an ultra-rare heterozygous de novo deletion in the CHASERR locus. We report similarities in severe developmental delay, facial dysmorphisms, and cerebral dysmyelination in these individuals, distinguishing them from the phenotypic spectrum of CHD2 haploinsufficiency. We demonstrate reduced CHASERR mRNA expression and corresponding increased CHD2 mRNA and protein in whole blood and patient-derived cell lines-specifically increased expression of the CHD2 allele in cis with the CHASERR deletion, as predicted from a prior mouse model of Chaserr haploinsufficiency. We show for the first time that de novo structural variants facilitated by Alu-mediated non-allelic homologous recombination led to deletion of a non-coding element (the lncRNA CHASERR) to cause a rare syndromic neurodevelopmental disorder. We also demonstrate that CHD2 has bidirectional dosage sensitivity in human disease. This work highlights the need to carefully evaluate other lncRNAs, particularly those upstream of genes associated with Mendelian disorders.

19.
Hum Mol Genet ; 20(7): 1285-9, 2011 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-21212097

RESUMO

Exome sequencing is a powerful tool for discovery of the Mendelian disease genes. Previously, we reported a novel locus for autosomal recessive non-syndromic mental retardation (NSMR) in a consanguineous family [Nolan, D.K., Chen, P., Das, S., Ober, C. and Waggoner, D. (2008) Fine mapping of a locus for nonsyndromic mental retardation on chromosome 19p13. Am. J. Med. Genet. A, 146A, 1414-1422]. Using linkage and homozygosity mapping, we previously localized the gene to chromosome 19p13. The parents of this sibship were recently included in an exome sequencing project. Using a series of filters, we narrowed the putative causal mutation to a single variant site that segregated with NSMR: the mutation was homozygous in five affected siblings but in none of eight unaffected siblings. This mutation causes a substitution of a leucine for a highly conserved proline at amino acid 182 in TECR (trans-2,3-enoyl-CoA reductase), a synaptic glycoprotein. Our results reveal the value of massively parallel sequencing for identification of novel disease genes that could not be found using traditional approaches and identifies only the seventh causal mutation for autosomal recessive NSMR.


Assuntos
Cromossomos Humanos Par 19/genética , Doenças Genéticas Inatas/genética , Deficiência Intelectual/genética , Glicoproteínas de Membrana/genética , Mutação , Oxirredutases/genética , Membranas Sinápticas/genética , Feminino , Doenças Genéticas Inatas/enzimologia , Humanos , Deficiência Intelectual/enzimologia , Masculino , Glicoproteínas de Membrana/metabolismo , Oxirredutases/metabolismo , Linhagem , Membranas Sinápticas/enzimologia
20.
N Engl J Med ; 363(23): 2220-7, 2010 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-20942659

RESUMO

We sequenced all protein-coding regions of the genome (the "exome") in two family members with combined hypolipidemia, marked by extremely low plasma levels of low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, and triglycerides. These two participants were compound heterozygotes for two distinct nonsense mutations in ANGPTL3 (encoding the angiopoietin-like 3 protein). ANGPTL3 has been reported to inhibit lipoprotein lipase and endothelial lipase, thereby increasing plasma triglyceride and HDL cholesterol levels in rodents. Our finding of ANGPTL3 mutations highlights a role for the gene in LDL cholesterol metabolism in humans and shows the usefulness of exome sequencing for identification of novel genetic causes of inherited disorders. (Funded by the National Human Genome Research Institute and others.).


Assuntos
Angiopoietinas/genética , Códon sem Sentido , Hipobetalipoproteinemias/genética , Proteína 3 Semelhante a Angiopoietina , Proteínas Semelhantes a Angiopoietina , HDL-Colesterol/sangue , HDL-Colesterol/genética , LDL-Colesterol/sangue , LDL-Colesterol/genética , Análise Mutacional de DNA , Feminino , Ligação Genética , Humanos , Masculino , Linhagem
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa