Search | VHL Regional Portal

1.

Placental somatic mutation in human stillbirth and live birth: A pilot case-control study of paired placental, fetal, and maternal whole genomes.

Wallace, Amelia D; Blue, Nathan R; Morgan, Terry; Workalemahu, Tsegaselassie; Silver, Robert M; Quinlan, Aaron R.

Placenta ; 154: 137-144, 2024 Jun 22.

Article in English | MEDLINE | ID: mdl-38972082

ABSTRACT

INTRODUCTION: A high frequency of single nucleotide somatic mutations in the placenta has been recently described, but its relationship to placental dysfunction is unknown. METHODS: We performed a pilot case-control study using paired fetal, maternal, and placental samples collected from healthy live birth controls (n = 10), live births with fetal growth restriction (FGR) due to placental insufficiency (n = 7), and stillbirths with FGR and placental insufficiency (n = 11). We quantified single nucleotide and structural somatic variants using bulk whole genome sequencing (30-60X coverage) in four biopsies from each placenta. We also assessed their association with clinical and histological evidence of placental dysfunction. RESULTS: Seventeen pregnancies had sufficiently high-quality placental, fetal, and maternal DNA for analysis. Each placenta had a median of 473 variants (range 111-870), with 95 % arising in just one biopsy within each placenta. In controls, live births with FGR, and stillbirths, the median variant counts per placenta were 514 (IQR 381-779), 582 (450-735), and 338 (245-441), respectively. After adjusting for depth of sequencing coverage and gestational age at birth, the somatic mutation burden was similar between groups (FGR live births vs. controls, adjusted diff. 59, 95 % CI -218 to +336; stillbirths vs controls, adjusted diff. -34, -351 to +419), and with no association with placental dysfunction (p = 0.7). DISCUSSION: We confirmed the high prevalence of somatic mutation in the human placenta and conclude that the placenta is highly clonal. We were not able to identify any relationship between somatic mutation burden and clinical or histologic placental insufficiency.

2.

Improved characterization of single-cell RNA-seq libraries with paired-end avidity sequencing.

Chamberlin, John T; Gillen, Austin E; Quinlan, Aaron R.

bioRxiv ; 2024 Jul 12.

Article in English | MEDLINE | ID: mdl-39026715

ABSTRACT

Prevailing poly(dT)-primed 3' single-cell RNA-seq protocols generate barcoded cDNA fragments containing the reverse transcriptase priming site, which is expected to be the poly(A) tail or a genomic adenine homopolymer. Direct sequencing across this priming site was historically difficult because of DNA sequencing errors induced by the homopolymeric primer at the 'barcode' end. Here, we evaluate the capability of "avidity base chemistry" DNA sequencing from Element Biosciences to sequence through this homopolymer accurately, and the impact of the additional cDNA sequence on read alignment and precise quantification of polyadenylation site usage. We find that the Element Aviti instrument sequences through the thymine homopolymer into the subsequent cDNA sequence without detectable loss of accuracy. The resulting paired-end alignments enable direct and independent assignment of reads to polyadenylation sites, which bypasses complexities and limitations of conventional approaches but does not consistently improve read mapping rates compared to single-end alignment. We also characterize low-level artifacts and arrive at an adjusted adapter trimming and alignment workflow that significantly improves the alignment of sequence data from Element and Illumina, particularly in the context of extended read lengths. Our analyses confirm that Element avidity sequencing is an effective alternative to Illumina sequencing for standard single-cell RNA-seq, particularly for polyadenylation site analyses but do not rule out the potential for similar performance from other emerging platforms.

3.

Epistasis between mutator alleles contributes to germline mutation spectrum variability in laboratory mice.

Sasani, Thomas A; Quinlan, Aaron R; Harris, Kelley.

Elife ; 122024 Feb 21.

Article in English | MEDLINE | ID: mdl-38381482

ABSTRACT

Maintaining germline genome integrity is essential and enormously complex. Although many proteins are involved in DNA replication, proofreading, and repair, mutator alleles have largely eluded detection in mammals. DNA replication and repair proteins often recognize sequence motifs or excise lesions at specific nucleotides. Thus, we might expect that the spectrum of de novo mutations - the frequencies of C>T, A>G, etc. - will differ between genomes that harbor either a mutator or wild-type allele. Previously, we used quantitative trait locus mapping to discover candidate mutator alleles in the DNA repair gene Mutyh that increased the C>A germline mutation rate in a family of inbred mice known as the BXDs (Sasani et al., 2022, Ashbrook et al., 2021). In this study we developed a new method to detect alleles associated with mutation spectrum variation and applied it to mutation data from the BXDs. We discovered an additional C>A mutator locus on chromosome 6 that overlaps Ogg1, a DNA glycosylase involved in the same base-excision repair network as Mutyh (David et al., 2007). Its effect depends on the presence of a mutator allele near Mutyh, and BXDs with mutator alleles at both loci have greater numbers of C>A mutations than those with mutator alleles at either locus alone. Our new methods for analyzing mutation spectra reveal evidence of epistasis between germline mutator alleles and may be applicable to mutation data from humans and other model organisms.

Subject(s)

Epistasis, Genetic , Germ-Line Mutation , Humans , Animals , Mice , Alleles , Mutation , Chromosome Mapping , Mammals

4.

Effects of parental age and polymer composition on short tandem repeat de novo mutation rates.

Goldberg, Michael E; Noyes, Michelle D; Eichler, Evan E; Quinlan, Aaron R; Harris, Kelley.

Genetics ; 226(4)2024 04 03.

Article in English | MEDLINE | ID: mdl-38298127

ABSTRACT

Short tandem repeats (STRs) are hotspots of genomic variability in the human germline because of their high mutation rates, which have long been attributed largely to polymerase slippage during DNA replication. This model suggests that STR mutation rates should scale linearly with a father's age, as progenitor cells continually divide after puberty. In contrast, it suggests that STR mutation rates should not scale with a mother's age at her child's conception, since oocytes spend a mother's reproductive years arrested in meiosis II and undergo a fixed number of cell divisions that are independent of the age at ovulation. Yet, mirroring recent findings, we find that STR mutation rates covary with paternal and maternal age, implying that some STR mutations are caused by DNA damage in quiescent cells rather than polymerase slippage in replicating progenitor cells. These results echo the recent finding that DNA damage in oocytes is a significant source of de novo single nucleotide variants and corroborate evidence of STR expansion in postmitotic cells. However, we find that the maternal age effect is not confined to known hotspots of oocyte mutagenesis, nor are postzygotic mutations likely to contribute significantly. STR nucleotide composition demonstrates divergent effects on de novo mutation (DNM) rates between sexes. Unlike the paternal lineage, maternally derived DNMs at A/T STRs display a significantly greater association with maternal age than DNMs at G/C-containing STRs. These observations may suggest the mechanism and developmental timing of certain STR mutations and contradict prior attribution of replication slippage as the primary mechanism of STR mutagenesis.

Subject(s)

Microsatellite Repeats , Mutation Rate , Humans , Female , Child , Mutation , Parents , Meiosis , Nucleotides

5.

Differences in molecular sampling and data processing explain variation among single-cell and single-nucleus RNA-seq experiments.

Chamberlin, John T; Lee, Younghee; Marth, Gabor T; Quinlan, Aaron R.

Genome Res ; 34(2): 179-188, 2024 03 20.

Article in English | MEDLINE | ID: mdl-38355308

ABSTRACT

A mechanistic understanding of the biological and technical factors that impact transcript measurements is essential to designing and analyzing single-cell and single-nucleus RNA sequencing experiments. Nuclei contain the same pre-mRNA population as cells, but they contain a small subset of the mRNAs. Nonetheless, early studies argued that single-nucleus analysis yielded results comparable to cellular samples if pre-mRNA measurements were included. However, typical workflows do not distinguish between pre-mRNA and mRNA when estimating gene expression, and variation in their relative abundances across cell types has received limited attention. These gaps are especially important given that incorporating pre-mRNA has become commonplace for both assays, despite known gene length bias in pre-mRNA capture. Here, we reanalyze public data sets from mouse and human to describe the mechanisms and contrasting effects of mRNA and pre-mRNA sampling on gene expression and marker gene selection in single-cell and single-nucleus RNA-seq. We show that pre-mRNA levels vary considerably among cell types, which mediates the degree of gene length bias and limits the generalizability of a recently published normalization method intended to correct for this bias. As an alternative, we repurpose an existing post hoc gene length-based correction method from conventional RNA-seq gene set enrichment analysis. Finally, we show that inclusion of pre-mRNA in bioinformatic processing can impart a larger effect than assay choice itself, which is pivotal to the effective reuse of existing data. These analyses advance our understanding of the sources of variation in single-cell and single-nucleus RNA-seq experiments and provide useful guidance for future studies.

Subject(s)

Cell Nucleus , RNA Precursors , Humans , Animals , Mice , RNA-Seq , RNA, Messenger/genetics , Sequence Analysis, RNA/methods , Cell Nucleus/genetics , Gene Expression Profiling/methods , Single-Cell Analysis

6.

Characterization and visualization of tandem repeats at genome scale.

Dolzhenko, Egor; English, Adam; Dashnow, Harriet; De Sena Brandine, Guilherme; Mokveld, Tom; Rowell, William J; Karniski, Caitlin; Kronenberg, Zev; Danzi, Matt C; Cheung, Warren A; Bi, Chengpeng; Farrow, Emily; Wenger, Aaron; Chua, Khi Pin; Martínez-Cerdeño, Verónica; Bartley, Trevor D; Jin, Peng; Nelson, David L; Zuchner, Stephan; Pastinen, Tomi; Quinlan, Aaron R; Sedlazeck, Fritz J; Eberle, Michael A.

Nat Biotechnol ; 2024 Jan 02.

Article in English | MEDLINE | ID: mdl-38168995

ABSTRACT

Tandem repeat (TR) variation is associated with gene expression changes and numerous rare monogenic diseases. Although long-read sequencing provides accurate full-length sequences and methylation of TRs, there is still a need for computational methods to profile TRs across the genome. Here we introduce the Tandem Repeat Genotyping Tool (TRGT) and an accompanying TR database. TRGT determines the consensus sequences and methylation levels of specified TRs from PacBio HiFi sequencing data. It also reports reads that support each repeat allele. These reads can be subsequently visualized with a companion TR visualization tool. Assessing 937,122 TRs, TRGT showed a Mendelian concordance of 98.38%, allowing a single repeat unit difference. In six samples with known repeat expansions, TRGT detected all expansions while also identifying methylation signals and mosaicism and providing finer repeat length resolution than existing methods. Additionally, we released a database with allele sequences and methylation levels for 937,122 TRs across 100 genomes.

7.

Genome sequencing of Pakistani families with male infertility identifies deleterious genotypes in SPAG6, CCDC9, TKTL1, TUBA3C, and M1AP.

Khan, Muhammad Riaz; Akbari, Arvand; Nicholas, Thomas J; Castillo-Madeen, Helen; Ajmal, Muhammad; Haq, Taqweem Ul; Laan, Maris; Quinlan, Aaron R; Ahuja, Jasvinder S; Shah, Aftab Ali; Conrad, Donald F.

Andrology ; 2023 Dec 10.

Article in English | MEDLINE | ID: mdl-38073178

ABSTRACT

BACKGROUND: There are likely to be hundreds of monogenic forms of human male infertility. Whole genome sequencing (WGS) is the most efficient way to make progress in mapping the causative genetic variants, and ultimately improve clinical management of the disease in each patient. Recruitment of consanguineous families is an effective approach to ascertain the genetic forms of many diseases. OBJECTIVES: To apply WGS to large consanguineous families with likely hereditary male infertility and identify potential genetic cases. MATERIALS AND METHODS: We recruited seven large families with clinically diagnosed male infertility from rural Pakistan, including five with a history of consanguinity. We generated WGS data on 26 individuals (3-5 per family) and analyzed the resulting data with a computational pipeline to identify potentially causal single nucleotide variants, indels, and copy number variants. RESULTS: We identified plausible genetic causes in five of the seven families, including a homozygous 10 kb deletion of exon 2 in a well-established male infertility gene (M1AP), and biallelic missense substitutions (SPAG6, CCDC9, TUBA3C) and an in-frame hemizygous deletion (TKTL1) in genes with emerging relevance. DISCUSSION AND CONCLUSION: The rate of genetic findings using the current approach (71%) was much higher than what we recently achieved using whole-exome sequencing (WES) of unrelated singleton cases (20%). Furthermore, we identified a pathogenic single-exon deletion in M1AP that would be undetectable by WES. Screening more families with WGS, especially in underrepresented populations, will further reveal the types of variants underlying male infertility and accelerate the use of genetics in the patient management.

8.

Epistasis between mutator alleles contributes to germline mutation spectra variability in laboratory mice.

Sasani, Thomas A; Quinlan, Aaron R; Harris, Kelley.

bioRxiv ; 2023 Nov 14.

Article in English | MEDLINE | ID: mdl-37162999

ABSTRACT

Maintaining germline genome integrity is essential and enormously complex. Although many proteins are involved in DNA replication, proofreading, and repair [1], mutator alleles have largely eluded detection in mammals. DNA replication and repair proteins often recognize sequence motifs or excise lesions at specific nucleotides. Thus, we might expect that the spectrum of de novo mutations - the frequencies of C>T, A>G, etc. - will differ between genomes that harbor either a mutator or wild-type allele. Previously, we used quantitative trait locus mapping to discover candidate mutator alleles in the DNA repair gene Mutyh that increased the C>A germline mutation rate in a family of inbred mice known as the BXDs [2,3]. In this study we developed a new method to detect alleles associated with mutation spectrum variation and applied it to mutation data from the BXDs. We discovered an additional C>A mutator locus on chromosome 6 that overlaps Ogg1, a DNA glycosylase involved in the same base-excision repair network as Mutyh [4]. Its effect depended on the presence of a mutator allele near Mutyh, and BXDs with mutator alleles at both loci had greater numbers of C>A mutations than those with mutator alleles at either locus alone. Our new methods for analyzing mutation spectra reveal evidence of epistasis between germline mutator alleles and may be applicable to mutation data from humans and other model organisms.

9.

Whole-genome sequencing analysis in families with recurrent pregnancy loss: A pilot study.

Workalemahu, Tsegaselassie; Avery, Cecile; Lopez, Sarah; Blue, Nathan R; Wallace, Amelia; Quinlan, Aaron R; Coon, Hilary; Warner, Derek; Varner, Michael W; Branch, D Ware; Jorde, Lynn B; Silver, Robert M.

PLoS One ; 18(2): e0281934, 2023.

Article in English | MEDLINE | ID: mdl-36800380

ABSTRACT

One to two percent of couples suffer recurrent pregnancy loss and over 50% of the cases are unexplained. Whole genome sequencing (WGS) analysis has the potential to identify previously unrecognized causes of pregnancy loss, but few studies have been performed, and none have included DNA from families including parents, losses, and live births. We conducted a pilot WGS study in three families with unexplained recurrent pregnancy loss, including parents, healthy live births, and losses, which included an embryonic loss (<10 weeks' gestation), fetal deaths (10-20 weeks' gestation) and stillbirths (≥ 20 weeks' gestation). We used the Illumina platform for WGS and state-of-the-art protocols to identify single nucleotide variants (SNVs) following various modes of inheritance. We identified 87 SNVs involving 75 genes in embryonic loss (n = 1), 370 SNVs involving 228 genes in fetal death (n = 3), and 122 SNVs involving 122 genes in stillbirth (n = 2). Of these, 22 de novo, 6 inherited autosomal dominant and an X-linked recessive SNVs were pathogenic (probability of being loss-of-function intolerant >0.9), impacting known genes (e.g., DICER1, FBN2, FLT4, HERC1, and TAOK1) involved in embryonic/fetal development and congenital abnormalities. Further, we identified inherited missense compound heterozygous SNVs impacting genes (e.g., VWA5B2) in two fetal death samples. The variants were not identified as compound heterozygous SNVs in live births and population controls, providing evidence for haplosufficient genes relevant to pregnancy loss. In this pilot study, we provide evidence for de novo and inherited SNVs relevant to pregnancy loss. Our findings provide justification for conducting WGS using larger numbers of families and warrant validation by targeted sequencing to ascertain causal variants. Elucidating genes causing pregnancy loss may facilitate the development of risk stratification strategies and novel therapeutics.

Subject(s)

Abortion, Habitual , Pregnancy , Female , Humans , Pilot Projects , Abortion, Habitual/genetics , Stillbirth/genetics , Stillbirth/epidemiology , Live Birth , Protein Serine-Threonine Kinases , Ribonuclease III , DEAD-box RNA Helicases

10.

Random allelic expression in the adult human body.

Kravitz, Stephanie N; Ferris, Elliott; Love, Michael I; Thomas, Alun; Quinlan, Aaron R; Gregg, Christopher.

Cell Rep ; 42(1): 111945, 2023 01 31.

Article in English | MEDLINE | ID: mdl-36640362

ABSTRACT

Genes are typically assumed to express both parental alleles similarly, yet cell lines show random allelic expression (RAE) for many autosomal genes that could shape genetic effects. Thus, understanding RAE in human tissues could improve our understanding of phenotypic variation. Here, we develop a methodology to perform genome-wide profiling of RAE and biallelic expression in GTEx datasets for 832 people and 54 tissues. We report 2,762 autosomal genes with some RAE properties similar to randomly inactivated X-linked genes. We found that RAE is associated with rapidly evolving regions in the human genome, adaptive signaling processes, and genes linked to age-related diseases such as neurodegeneration and cancer. We define putative mechanistic subtypes of RAE distinguished by gene overlaps on sense and antisense DNA strands, aggregation in clusters near telomeres, and increased regulatory complexity and inputs compared with biallelic genes. We provide foundations to study RAE in human phenotypes, evolution, and disease.

Subject(s)

Chromosomes , Human Body , Humans , Adult , Alleles , Phenotype , Cell Line

11.

Effects of parental age and polymer composition on short tandem repeat de novo mutation rates.

Goldberg, Michael E; Noyes, Michelle D; Eichler, Evan E; Quinlan, Aaron R; Harris, Kelley.

bioRxiv ; 2023 Dec 23.

Article in English | MEDLINE | ID: mdl-38187618

ABSTRACT

Short tandem repeats (STRs) are hotspots of genomic variability in the human germline because of their high mutation rates, which have long been attributed largely to polymerase slippage during DNA replication. This model suggests that STR mutation rates should scale linearly with a father's age, as progenitor cells continually divide after puberty. In contrast, it suggests that STR mutation rates should not scale with a mother's age at her child's conception, since oocytes spend a mother's reproductive years arrested in meiosis II and undergo a fixed number of cell divisions that are independent of the age at ovulation. Yet, mirroring recent findings, we find that STR mutation rates covary with paternal and maternal age, implying that some STR mutations are caused by DNA damage in quiescent cells rather than the classical mechanism of polymerase slippage in replicating progenitor cells. These results also echo the recent finding that DNA damage in quiescent oocytes is a significant source of de novo SNVs and corroborate evidence of STR expansion in postmitotic cells. However, we find that the maternal age effect is not confined to previously discovered hotspots of oocyte mutagenesis, nor are post-zygotic mutations likely to contribute significantly. STR nucleotide composition demonstrates divergent effects on DNM rates between sexes. Unlike the paternal lineage, maternally derived DNMs at A/T STRs display a significantly greater association with maternal age than DNMs at GC-containing STRs. These observations may suggest the mechanism and developmental timing of certain STR mutations and are especially surprising considering the prior belief in replication slippage as the dominant mechanism of STR mutagenesis.

12.

Extensive Recombination-driven Coronavirus Diversification Expands the Pool of Potential Pandemic Pathogens.

Goldstein, Stephen A; Brown, Joe; Pedersen, Brent S; Quinlan, Aaron R; Elde, Nels C.

Genome Biol Evol ; 14(12)2022 12 08.

Article in English | MEDLINE | ID: mdl-36477201

ABSTRACT

The ongoing SARS-CoV-2 pandemic is the third zoonotic coronavirus identified in the last 20 years. Enzootic and epizootic coronaviruses of diverse lineages also pose a significant threat to livestock, as most recently observed for virulent strains of porcine epidemic diarrhea virus (PEDV) and swine acute diarrhea-associated coronavirus (SADS-CoV). Unique to RNA viruses, coronaviruses encode a proofreading exonuclease (ExoN) that lowers point mutation rates to increase the viability of large RNA virus genomes, which comes with the cost of limiting virus adaptation via point mutation. This limitation can be overcome by high rates of recombination that facilitate rapid increases in genetic diversification. To compare the dynamics of recombination between related sequences, we developed an open-source computational workflow (IDPlot) that bundles nucleotide identity, recombination, and phylogenetic analysis into a single pipeline. We analyzed recombination dynamics among three groups of coronaviruses with noteworthy impacts on human health and agriculture: SARSr-CoV, Betacoronavirus-1, and SADSr-CoV. We found that all three groups undergo recombination with highly diverged viruses from undersampled or unsampled lineages, including in typically highly conserved regions of the genome. In several cases, no parental origin of recombinant regions could be found in genetic databases, demonstrating our shallow characterization of coronavirus diversity and expanding the genetic pool that may contribute to future zoonotic events. Our results also illustrate the limitations of current sampling approaches for anticipating zoonotic threats to human and animal health.

Subject(s)

COVID-19 , SARS-CoV-2 , Animals , Humans , Phylogeny , SARS-CoV-2/genetics , Swine

13.

STRling: a k-mer counting approach that detects short tandem repeat expansions at known and novel loci.

Dashnow, Harriet; Pedersen, Brent S; Hiatt, Laurel; Brown, Joe; Beecroft, Sarah J; Ravenscroft, Gianina; LaCroix, Amy J; Lamont, Phillipa; Roxburgh, Richard H; Rodrigues, Miriam J; Davis, Mark; Mefford, Heather C; Laing, Nigel G; Quinlan, Aaron R.

Genome Biol ; 23(1): 257, 2022 12 14.

Article in English | MEDLINE | ID: mdl-36517892

ABSTRACT

Expansions of short tandem repeats (STRs) cause many rare diseases. Expansion detection is challenging with short-read DNA sequencing data since supporting reads are often mapped incorrectly. Detection is particularly difficult for "novel" STRs, which include new motifs at known loci or STRs absent from the reference genome. We developed STRling to efficiently count k-mers to recover informative reads and call expansions at known and novel STR loci. STRling is sensitive to known STR disease loci, has a low false discovery rate, and resolves novel STR expansions to base-pair position accuracy. It is fast, scalable, open-source, and available at: github.com/quinlan-lab/STRling .

Subject(s)

High-Throughput Nucleotide Sequencing , Microsatellite Repeats , Sequence Analysis, DNA

14.

Combining genetic constraint with predictions of alternative splicing to prioritize deleterious splicing in rare disease studies.

Cormier, Michael J; Pedersen, Brent S; Bayrak-Toydemir, Pinar; Quinlan, Aaron R.

BMC Bioinformatics ; 23(1): 482, 2022 Nov 14.

Article in English | MEDLINE | ID: mdl-36376793

ABSTRACT

BACKGROUND: Despite numerous molecular and computational advances, roughly half of patients with a rare disease remain undiagnosed after exome or genome sequencing. A particularly challenging barrier to diagnosis is identifying variants that cause deleterious alternative splicing at intronic or exonic loci outside of canonical donor or acceptor splice sites. RESULTS: Several existing tools predict the likelihood that a genetic variant causes alternative splicing. We sought to extend such methods by developing a new metric that aids in discerning whether a genetic variant leads to deleterious alternative splicing. Our metric combines genetic variation in the Genome Aggregate Database with alternative splicing predictions from SpliceAI to compare observed and expected levels of splice-altering genetic variation. We infer genic regions with significantly less splice-altering variation than expected to be constrained. The resulting model of regional splicing constraint captures differential splicing constraint across gene and exon categories, and the most constrained genic regions are enriched for pathogenic splice-altering variants. Building from this model, we developed ConSpliceML. This ensemble machine learning approach combines regional splicing constraint with multiple per-nucleotide alternative splicing scores to guide the prediction of deleterious splicing variants in protein-coding genes. ConSpliceML more accurately distinguishes deleterious and benign splicing variants than state-of-the-art splicing prediction methods, especially in "cryptic" splicing regions beyond canonical donor or acceptor splice sites. CONCLUSION: Integrating a model of genetic constraint with annotations from existing alternative splicing tools allows ConSpliceML to prioritize potentially deleterious splice-altering variants in studies of rare human diseases.

Subject(s)

Alternative Splicing , Rare Diseases , Humans , Rare Diseases/genetics , RNA Splicing , Introns , Exons , Mutation , RNA Splice Sites

15.

Annotation of structural variants with reported allele frequencies and related metrics from multiple datasets using SVAFotate.

Nicholas, Thomas J; Cormier, Michael J; Quinlan, Aaron R.

BMC Bioinformatics ; 23(1): 490, 2022 Nov 16.

Article in English | MEDLINE | ID: mdl-36384437

ABSTRACT

BACKGROUND: Identification of deleterious genetic variants using DNA sequencing data relies on increasingly detailed filtering strategies to isolate the small subset of variants that are more likely to underlie a disease phenotype. Datasets reflecting population allele frequencies of different types of variants serve as powerful filtering tools, especially in the context of rare disease analysis. While such population-scale allele frequency datasets now exist for structural variants (SVs), it remains a challenge to match SV calls between multiple datasets, thereby complicating estimates of a putative SV's population allele frequency. RESULTS: We introduce SVAFotate, a software tool that enables the annotation of SVs with variant allele frequency and related information from existing SV datasets. As a result, VCF files annotated by SVAFotate offer a variety of metrics to aid in the stratification of SVs as common or rare in the broader human population. CONCLUSIONS: Here we demonstrate the use of SVAFotate in the classification of SVs with regards to their population frequency and illustrate how SVAFotate's annotations can be used to filter and prioritize SVs. Lastly, we detail how best to utilize these SV annotations in the analysis of genetic variation in studies of rare disease.

Subject(s)

Gene Frequency , High-Throughput Nucleotide Sequencing , Software , Humans , Rare Diseases

16.

Poxviruses capture host genes by LINE-1 retrotransposition.

Fixsen, Sarah M; Cone, Kelsey R; Goldstein, Stephen A; Sasani, Thomas A; Quinlan, Aaron R; Rothenburg, Stefan; Elde, Nels C.

Elife ; 112022 09 07.

Article in English | MEDLINE | ID: mdl-36069526

ABSTRACT

Horizontal gene transfer (HGT) provides a major source of genetic variation. Many viruses, including poxviruses, encode genes with crucial functions directly gained by gene transfer from hosts. The mechanism of transfer to poxvirus genomes is unknown. Using genome analysis and experimental screens of infected cells, we discovered a central role for Long Interspersed Nuclear Element-1 retrotransposition in HGT to virus genomes. The process recapitulates processed pseudogene generation, but with host messenger RNA directed into virus genomes. Intriguingly, hallmark features of retrotransposition appear to favor virus adaption through rapid duplication of captured host genes on arrival. Our study reveals a previously unrecognized conduit of genetic traffic with fundamental implications for the evolution of many virus classes and their hosts.

Subject(s)

Poxviridae , Viruses , Evolution, Molecular , Gene Transfer, Horizontal , Phylogeny , Poxviridae/genetics , RNA, Messenger , Viruses/genetics , Retroelements

17.

Author Correction: Searching thousands of genomes to classify somatic and novel structural variants using STIX.

Chowdhury, Murad; Pedersen, Brent S; Sedlazeck, Fritz J; Quinlan, Aaron R; Layer, Ryan M.

Nat Methods ; 19(6): 770, 2022 Jun.

Article in English | MEDLINE | ID: mdl-35618956

18.

Searching thousands of genomes to classify somatic and novel structural variants using STIX.

Chowdhury, Murad; Pedersen, Brent S; Sedlazeck, Fritz J; Quinlan, Aaron R; Layer, Ryan M.

Nat Methods ; 19(4): 445-448, 2022 04.

Article in English | MEDLINE | ID: mdl-35396485

ABSTRACT

Structural variants are associated with cancers and developmental disorders, but challenges with estimating population frequency remain a barrier to prioritizing mutations over inherited variants. In particular, variability in variant calling heuristics and filtering limits the use of current structural variant catalogs. We present STIX, a method that, instead of relying on variant calls, indexes and searches the raw alignments from thousands of samples to enable more comprehensive allele frequency estimation.

Subject(s)

Genome , Genomic Structural Variation , Neoplasms , Algorithms , Genomic Structural Variation/genetics , High-Throughput Nucleotide Sequencing , Humans , Neoplasms/genetics , Software

19.

Comprehensive variant calling from whole-genome sequencing identifies a complex inversion that disrupts ZFPM2 in familial congenital diaphragmatic hernia.

Nicholas, Thomas J; Al-Sweel, Najla; Farrell, Andrew; Mao, Rong; Bayrak-Toydemir, Pinar; Miller, Christine E; Bentley, Dawn; Palmquist, Rachel; Moore, Barry; Hernandez, Edgar J; Cormier, Michael J; Fredrickson, Eric; Noble, Katherine; Rynearson, Shawn; Holt, Carson; Karren, Mary Anne; Bonkowsky, Joshua L; Tristani-Firouzi, Martin; Yandell, Mark; Marth, Gabor; Quinlan, Aaron R; Brunelli, Luca; Toydemir, Reha M; Shayota, Brian J; Carey, John C; Boyden, Steven E; Malone Jenkins, Sabrina.

Mol Genet Genomic Med ; 10(4): e1888, 2022 04.

Article in English | MEDLINE | ID: mdl-35119225

ABSTRACT

BACKGROUND: Genetic disorders contribute to significant morbidity and mortality in critically ill newborns. Despite advances in genome sequencing technologies, a majority of neonatal cases remain unsolved. Complex structural variants (SVs) often elude conventional genome sequencing variant calling pipelines and will explain a portion of these unsolved cases. METHODS: As part of the Utah NeoSeq project, we used a research-based, rapid whole-genome sequencing (WGS) protocol to investigate the genomic etiology for a newborn with a left-sided congenital diaphragmatic hernia (CDH) and cardiac malformations, whose mother also had a history of CDH and atrial septal defect. RESULTS: Using both a novel, alignment-free and traditional alignment-based variant callers, we identified a maternally inherited complex SV on chromosome 8, consisting of an inversion flanked by deletions. This complex inversion, further confirmed using orthogonal molecular techniques, disrupts the ZFPM2 gene, which is associated with both CDH and various congenital heart defects. CONCLUSIONS: Our results demonstrate that complex structural events, which often are unidentifiable or not reported by clinically validated testing procedures, can be discovered and accurately characterized with conventional, short-read sequencing and underscore the utility of WGS as a first-line diagnostic tool.

Subject(s)

Hernias, Diaphragmatic, Congenital , DNA-Binding Proteins/genetics , Genomics , Hernias, Diaphragmatic, Congenital/genetics , Humans , Infant, Newborn , Transcription Factors/genetics , Whole Genome Sequencing/methods

20.

trfermikit: a tool to discover VNTR-associated deletions.

McHale, Peter; Quinlan, Aaron R.

Bioinformatics ; 38(5): 1231-1234, 2022 02 07.

Article in English | MEDLINE | ID: mdl-34864893

ABSTRACT

SUMMARY: We present trfermikit, a software tool designed to detect deletions larger than 50 bp occurring in Variable Number Tandem Repeats using Illumina DNA sequencing reads. In such regions, it achieves a better tradeoff between sensitivity and false discovery than a state-of-the-art structural variation caller, Manta and complements it by recovering a significant number of deletions that Manta missed. trfermikit is based upon the fermikit pipeline, which performs read assembly, maps the assembly to the reference genome and calls variants from the alignment. AVAILABILITY AND IMPLEMENTATION: https://github.com/petermchale/trfermikit. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Genome , Software , Sequence Analysis, DNA , High-Throughput Nucleotide Sequencing

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL