Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 12 de 12
1.
medRxiv ; 2024 Feb 07.
Article En | MEDLINE | ID: mdl-38496558

Genes encoding long non-coding RNAs (lncRNAs) comprise a large fraction of the human genome, yet haploinsufficiency of a lncRNA has not been shown to cause a Mendelian disease. CHASERR is a highly conserved human lncRNA adjacent to CHD2-a coding gene in which de novo loss-of-function variants cause developmental and epileptic encephalopathy. Here we report three unrelated individuals each harboring an ultra-rare heterozygous de novo deletion in the CHASERR locus. We report similarities in severe developmental delay, facial dysmorphisms, and cerebral dysmyelination in these individuals, distinguishing them from the phenotypic spectrum of CHD2 haploinsufficiency. We demonstrate reduced CHASERR mRNA expression and corresponding increased CHD2 mRNA and protein in whole blood and patient-derived cell lines-specifically increased expression of the CHD2 allele in cis with the CHASERR deletion, as predicted from a prior mouse model of Chaserr haploinsufficiency. We show for the first time that de novo structural variants facilitated by Alu-mediated non-allelic homologous recombination led to deletion of a non-coding element (the lncRNA CHASERR) to cause a rare syndromic neurodevelopmental disorder. We also demonstrate that CHD2 has bidirectional dosage sensitivity in human disease. This work highlights the need to carefully evaluate other lncRNAs, particularly those upstream of genes associated with Mendelian disorders.

2.
Nat Commun ; 15(1): 32, 2024 01 02.
Article En | MEDLINE | ID: mdl-38167262

Single-cell transcriptomics has become the definitive method for classifying cell types and states, and can be augmented with genotype information to improve cell lineage identification. Due to constraints of short-read sequencing, current methods to detect natural genetic barcodes often require cumbersome primer panels and early commitment to targets. Here we devise a flexible long-read sequencing workflow and analysis pipeline, termed nanoranger, that starts from intermediate single-cell cDNA libraries to detect cell lineage-defining features, including single-nucleotide variants, fusion genes, isoforms, sequences of chimeric antigen and TCRs. Through systematic analysis of these classes of natural 'barcodes', we define the optimal targets for nanoranger, namely those loci close to the 5' end of highly expressed genes with transcript lengths shorter than 4 kB. As proof-of-concept, we apply nanoranger to longitudinal tracking of subclones of acute myeloid leukemia (AML) and describe the heterogeneous isoform landscape of thousands of marrow-infiltrating immune cells. We propose that enhanced cellular genotyping using nanoranger can improve the tracking of single-cell tumor and immune cell co-evolution.


High-Throughput Nucleotide Sequencing , Leukemia, Myeloid, Acute , Humans , Genotype , High-Throughput Nucleotide Sequencing/methods , Leukemia, Myeloid, Acute/genetics , Leukemia, Myeloid, Acute/pathology , Phenotype , Gene Expression Profiling/methods
3.
Nat Biotechnol ; 42(4): 582-586, 2024 Apr.
Article En | MEDLINE | ID: mdl-37291427

Full-length RNA-sequencing methods using long-read technologies can capture complete transcript isoforms, but their throughput is limited. We introduce multiplexed arrays isoform sequencing (MAS-ISO-seq), a technique for programmably concatenating complementary DNAs (cDNAs) into molecules optimal for long-read sequencing, increasing the throughput >15-fold to nearly 40 million cDNA reads per run on the Sequel IIe sequencer. When applied to single-cell RNA sequencing of tumor-infiltrating T cells, MAS-ISO-seq demonstrated a 12- to 32-fold increase in the discovery of differentially spliced genes.


High-Throughput Nucleotide Sequencing , RNA Isoforms , DNA, Complementary/genetics , RNA Isoforms/genetics , High-Throughput Nucleotide Sequencing/methods , Protein Isoforms/genetics , Sequence Analysis, RNA/methods , Transcriptome , Gene Expression Profiling/methods , RNA/genetics
4.
Genome Biol Evol ; 14(9)2022 09 06.
Article En | MEDLINE | ID: mdl-35929770

The brown bear (Ursus arctos) is the second largest and most widespread extant terrestrial carnivore on Earth and has recently emerged as a medical model for human metabolic diseases. Here, we report a fully phased chromosome-level assembly of a male North American brown bear built by combining Pacific Biosciences (PacBio) HiFi data and publicly available Hi-C data. The final genome size is 2.47 Gigabases (Gb) with a scaffold and contig N50 length of 70.08 and 43.94 Megabases (Mb), respectively. Benchmarking Universal Single-Copy Ortholog (BUSCO) analysis revealed that 94.5% of single copy orthologs from Mammalia were present in the genome (the highest of any ursid genome to date). Repetitive elements accounted for 44.48% of the genome and a total of 20,480 protein coding genes were identified. Based on whole genome alignment to the polar bear, the brown bear is highly syntenic with the polar bear, and our phylogenetic analysis of 7,246 single-copy orthologs supports the currently proposed species tree for Ursidae. This highly contiguous genome assembly will support future research on both the evolutionary history of the bear family and the physiological mechanisms behind hibernation, the latter of which has broad medical implications.


Ursidae , Animals , Chromosomes , Genome , Haplotypes , Humans , Phylogeny , Ursidae/genetics
5.
Genome Res ; 30(8): 1154-1169, 2020 08.
Article En | MEDLINE | ID: mdl-32817236

The characterization of de novo mutations in regions of high sequence and structural diversity from whole-genome sequencing data remains highly challenging. Complex structural variants tend to arise in regions of high repetitiveness and low complexity, challenging both de novo assembly, in which short reads do not capture the long-range context required for resolution, and mapping approaches, in which improper alignment of reads to a reference genome that is highly diverged from that of the sample can lead to false or partial calls. Long-read technologies can potentially solve such problems but are currently unfeasible to use at scale. Here we present Corticall, a graph-based method that combines the advantages of multiple technologies and prior data sources to detect arbitrary classes of genetic variant. We construct multisample, colored de Bruijn graphs from short-read data for all samples, align long-read-derived haplotypes and multiple reference data sources to restore graph connectivity information, and call variants using graph path-finding algorithms and a model for simultaneous alignment and recombination. We validate and evaluate the approach using extensive simulations and use it to characterize the rate and spectrum of de novo mutation events in 119 progeny from four Plasmodium falciparum experimental crosses, using long-read data on the parents to inform reconstructions of the progeny and to detect several known and novel nonallelic homologous recombination events.


Genome, Protozoan/genetics , High-Throughput Nucleotide Sequencing/methods , Mutation/genetics , Plasmodium falciparum/genetics , Whole Genome Sequencing/methods , Algorithms , Base Sequence , Genetic Variation/genetics , Sequence Alignment , Sequence Analysis, DNA/methods , Software
6.
Bioinformatics ; 34(15): 2556-2565, 2018 08 01.
Article En | MEDLINE | ID: mdl-29554215

Motivation: The de Bruijn graph is a simple and efficient data structure that is used in many areas of sequence analysis including genome assembly, read error correction and variant calling. The data structure has a single parameter k, is straightforward to implement and is tractable for large genomes with high sequencing depth. It also enables representation of multiple samples simultaneously to facilitate comparison. However, unlike the string graph, a de Bruijn graph does not retain long range information that is inherent in the read data. For this reason, applications that rely on de Bruijn graphs can produce sub-optimal results given their input data. Results: We present a novel assembly graph data structure: the Linked de Bruijn Graph (LdBG). Constructed by adding annotations on top of a de Bruijn graph, it stores long range connectivity information through the graph. We show that with error-free data it is possible to losslessly store and recover sequence from a Linked de Bruijn graph. With assembly simulations we demonstrate that the LdBG data structure outperforms both our de Bruijn graph and the String Graph Assembler (SGA). Finally we apply the LdBG to Klebsiella pneumoniae short read data to make large (12 kbp) variant calls, which we validate using PacBio sequencing data, and to characterize the genomic context of drug-resistance genes. Availability and implementation: Linked de Bruijn Graphs and associated algorithms are implemented as part of McCortex, which is available under the MIT license at https://github.com/mcveanlab/mccortex. Supplementary information: Supplementary data are available at Bioinformatics online.


Data Visualization , Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Software , Algorithms , Humans , Klebsiella pneumoniae/genetics
7.
Eur J Hum Genet ; 25(2): 227-233, 2017 02.
Article En | MEDLINE | ID: mdl-27876817

Germline mutation detection from human DNA sequence data is challenging due to the rarity of such events relative to the intrinsic error rates of sequencing technologies and the uneven coverage across the genome. We developed PhaseByTransmission (PBT) to identify de novo single nucleotide variants and short insertions and deletions (indels) from sequence data collected in parent-offspring trios. We compute the joint probability of the data given the genotype likelihoods in the individual family members, the known familial relationships and a prior probability for the mutation rate. Candidate de novo mutations (DNMs) are reported along with their posterior probability, providing a systematic way to prioritize them for validation. Our tool is integrated in the Genome Analysis Toolkit and can be used together with the ReadBackedPhasing module to infer the parental origin of DNMs based on phase-informative reads. Using simulated data, we show that PBT outperforms existing tools, especially in low coverage data and on the X chromosome. We further show that PBT displays high validation rates on empirical parent-offspring sequencing data for whole-exome data from 104 trios and X-chromosome data from 249 parent-offspring families. Finally, we demonstrate an association between father's age at conception and the number of DNMs in female offspring's X chromosome, consistent with previous literature reports.


Genome-Wide Association Study/methods , Germ-Line Mutation , Pedigree , Polymorphism, Single Nucleotide , Sequence Analysis, DNA/methods , Software , Adult , Child , Chromosomes, Human, X/genetics , Exome , Female , Genotype , Humans , Male , Models, Genetic
8.
Curr Protoc Bioinformatics ; 43: 11.10.1-11.10.33, 2013.
Article En | MEDLINE | ID: mdl-25431634

This unit describes how to use BWA and the Genome Analysis Toolkit (GATK) to map genome sequencing data to a reference and produce high-quality variant calls that can be used in downstream analyses. The complete workflow includes the core NGS data processing steps that are necessary to make the raw data suitable for analysis by the GATK, as well as the key methods involved in variant discovery using the GATK.


Genetic Variation , Genome, Human , Software , Calibration , Databases, Genetic , Haploidy , Haplotypes/genetics , Humans , Molecular Sequence Annotation , Polymorphism, Single Nucleotide/genetics , Sequence Alignment
9.
Sci Transl Med ; 4(138): 138ra78, 2012 Jun 13.
Article En | MEDLINE | ID: mdl-22700954

The translation of "next-generation" sequencing directly to the clinic is still being assessed but has the potential for genetic diseases to reduce costs, advance accuracy, and point to unsuspected yet treatable conditions. To study its capability in the clinic, we performed whole-exome sequencing in 118 probands with a diagnosis of a pediatric-onset neurodevelopmental disease in which most known causes had been excluded. Twenty-two genes not previously identified as disease-causing were identified in this study (19% of cohort), further establishing exome sequencing as a useful tool for gene discovery. New genes identified included EXOC8 in Joubert syndrome and GFM2 in a patient with microcephaly, simplified gyral pattern, and insulin-dependent diabetes. Exome sequencing uncovered 10 probands (8% of cohort) with mutations in genes known to cause a disease different from the initial diagnosis. Upon further medical evaluation, these mutations were found to account for each proband's disease, leading to a change in diagnosis, some of which led to changes in patient management. Our data provide proof of principle that genomic strategies are useful in clarifying diagnosis in a proportion of patients with neurodevelopmental disorders.


Exome/genetics , Female , Humans , Male , Mutation , Pedigree , Sequence Analysis, DNA , Vesicular Transport Proteins/genetics
10.
Nat Genet ; 43(7): 712-4, 2011 Jun 12.
Article En | MEDLINE | ID: mdl-21666693

J.B.S. Haldane proposed in 1947 that the male germline may be more mutagenic than the female germline. Diverse studies have supported Haldane's contention of a higher average mutation rate in the male germline in a variety of mammals, including humans. Here we present, to our knowledge, the first direct comparative analysis of male and female germline mutation rates from the complete genome sequences of two parent-offspring trios. Through extensive validation, we identified 49 and 35 germline de novo mutations (DNMs) in two trio offspring, as well as 1,586 non-germline DNMs arising either somatically or in the cell lines from which the DNA was derived. Most strikingly, in one family, we observed that 92% of germline DNMs were from the paternal germline, whereas, in contrast, in the other family, 64% of DNMs were from the maternal germline. These observations suggest considerable variation in mutation rates within and between families.


Family , Genetic Variation , Genome, Human , Germ-Line Mutation/genetics , Chromosome Mapping , DNA Mutational Analysis , Female , Humans , Male , Polymerase Chain Reaction
11.
Nat Genet ; 43(5): 491-8, 2011 May.
Article En | MEDLINE | ID: mdl-21478889

Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (∼4×) 1000 Genomes Project datasets.


Genetic Variation , Genotype , Sequence Analysis, DNA/methods , Data Interpretation, Statistical , Databases, Nucleic Acid , Exons , Genetics, Population/methods , Genetics, Population/statistics & numerical data , Genome, Human , Humans , Polymorphism, Single Nucleotide , Sequence Alignment/methods , Sequence Alignment/statistics & numerical data , Sequence Analysis, DNA/statistics & numerical data , Software
12.
N Engl J Med ; 363(23): 2220-7, 2010 Dec 02.
Article En | MEDLINE | ID: mdl-20942659

We sequenced all protein-coding regions of the genome (the "exome") in two family members with combined hypolipidemia, marked by extremely low plasma levels of low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, and triglycerides. These two participants were compound heterozygotes for two distinct nonsense mutations in ANGPTL3 (encoding the angiopoietin-like 3 protein). ANGPTL3 has been reported to inhibit lipoprotein lipase and endothelial lipase, thereby increasing plasma triglyceride and HDL cholesterol levels in rodents. Our finding of ANGPTL3 mutations highlights a role for the gene in LDL cholesterol metabolism in humans and shows the usefulness of exome sequencing for identification of novel genetic causes of inherited disorders. (Funded by the National Human Genome Research Institute and others.).


Angiopoietins/genetics , Codon, Nonsense , Hypobetalipoproteinemias/genetics , Angiopoietin-Like Protein 3 , Angiopoietin-like Proteins , Cholesterol, HDL/blood , Cholesterol, HDL/genetics , Cholesterol, LDL/blood , Cholesterol, LDL/genetics , DNA Mutational Analysis , Female , Genetic Linkage , Humans , Male , Pedigree
...