Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
1.
Brief Bioinform ; 23(4)2022 07 18.
Article in English | MEDLINE | ID: mdl-35753701

ABSTRACT

Advances in whole-genome sequencing (WGS) promise to enable the accurate and comprehensive structural variant (SV) discovery. Dissecting SVs from WGS data presents a substantial number of challenges and a plethora of SV detection methods have been developed. Currently, evidence that investigators can use to select appropriate SV detection tools is lacking. In this article, we have evaluated the performance of SV detection tools on mouse and human WGS data using a comprehensive polymerase chain reaction-confirmed gold standard set of SVs and the genome-in-a-bottle variant set, respectively. In contrast to the previous benchmarking studies, our gold standard dataset included a complete set of SVs allowing us to report both precision and sensitivity rates of the SV detection methods. Our study investigates the ability of the methods to detect deletions, thus providing an optimistic estimate of SV detection performance as the SV detection methods that fail to detect deletions are likely to miss more complex SVs. We found that SV detection tools varied widely in their performance, with several methods providing a good balance between sensitivity and precision. Additionally, we have determined the SV callers best suited for low- and ultralow-pass sequencing data as well as for different deletion length categories.


Subject(s)
Benchmarking , Genome, Human , Animals , High-Throughput Nucleotide Sequencing/methods , Humans , Mice , Whole Genome Sequencing/methods
2.
PLoS Comput Biol ; 18(12): e1010788, 2022 12.
Article in English | MEDLINE | ID: mdl-36516232

ABSTRACT

To date, ancient genome analyses have been largely confined to the study of single nucleotide polymorphisms (SNPs). Copy number variants (CNVs) are a major contributor of disease and of evolutionary adaptation, but identifying CNVs in ancient shotgun-sequenced genomes is hampered by typical low genome coverage (<1×) and short fragments (<80 bps), precluding standard CNV detection software to be effectively applied to ancient genomes. Here we present CONGA, tailored for genotyping CNVs at low coverage. Simulations and down-sampling experiments suggest that CONGA can genotype deletions >1 kbps with F-scores >0.75 at ≥1×, and distinguish between heterozygous and homozygous states. We used CONGA to genotype 10,002 outgroup-ascertained deletions across a heterogenous set of 71 ancient human genomes spanning the last 50,000 years, produced using variable experimental protocols. A fraction of these (21/71) display divergent deletion profiles unrelated to their population origin, but attributable to technical factors such as coverage and read length. The majority of the sample (50/71), despite originating from nine different laboratories and having coverages ranging from 0.44×-26× (median 4×) and average read lengths 52-121 bps (median 69), exhibit coherent deletion frequencies. Across these 50 genomes, inter-individual genetic diversity measured using SNPs and CONGA-genotyped deletions are highly correlated. CONGA-genotyped deletions also display purifying selection signatures, as expected. CONGA thus paves the way for systematic CNV analyses in ancient genomes, despite the technical challenges posed by low and variable genome coverage.


Subject(s)
DNA Copy Number Variations , Genomics , Humans , DNA Copy Number Variations/genetics , Genotype , Genomics/methods , Genome, Human/genetics , Genetics, Population , Polymorphism, Single Nucleotide/genetics
3.
Bioinformatics ; 35(20): 3923-3930, 2019 10 15.
Article in English | MEDLINE | ID: mdl-30937433

ABSTRACT

MOTIVATION: Several algorithms have been developed that use high-throughput sequencing technology to characterize structural variations (SVs). Most of the existing approaches focus on detecting relatively simple types of SVs such as insertions, deletions and short inversions. In fact, complex SVs are of crucial importance and several have been associated with genomic disorders. To better understand the contribution of complex SVs to human disease, we need new algorithms to accurately discover and genotype such variants. Additionally, due to similar sequencing signatures, inverted duplications or gene conversion events that include inverted segmental duplications are often characterized as simple inversions, likewise, duplications and gene conversions in direct orientation may be called as simple deletions. Therefore, there is still a need for accurate algorithms to fully characterize complex SVs and thus improve calling accuracy of more simple variants. RESULTS: We developed novel algorithms to accurately characterize tandem, direct and inverted interspersed segmental duplications using short read whole genome sequencing datasets. We integrated these methods to our TARDIS tool, which is now capable of detecting various types of SVs using multiple sequence signatures such as read pair, read depth and split read. We evaluated the prediction performance of our algorithms through several experiments using both simulated and real datasets. In the simulation experiments, using a 30× coverage TARDIS achieved 96% sensitivity with only 4% false discovery rate. For experiments that involve real data, we used two haploid genomes (CHM1 and CHM13) and one human genome (NA12878) from the Illumina Platinum Genomes set. Comparison of our results with orthogonal PacBio call sets from the same genomes revealed higher accuracy for TARDIS than state-of-the-art methods. Furthermore, we showed a surprisingly low false discovery rate of our approach for discovery of tandem, direct and inverted interspersed segmental duplications prediction on CHM1 (<5% for the top 50 predictions). AVAILABILITY AND IMPLEMENTATION: TARDIS source code is available at https://github.com/BilkentCompGen/tardis, and a corresponding Docker image is available at https://hub.docker.com/r/alkanlab/tardis/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
High-Throughput Nucleotide Sequencing , Segmental Duplications, Genomic , Algorithms , Genome, Human , Genomics , Humans , Software
4.
Methods ; 129: 3-7, 2017 10 01.
Article in English | MEDLINE | ID: mdl-28583483

ABSTRACT

Structural variations (SV) are broadly defined as genomic alterations that affect >50bp of DNA, which are shown to have significant effect on evolution and disease. The advent of high throughput sequencing (HTS) technologies and the ability to perform whole genome sequencing (WGS), makes it feasible to study these variants in depth. However, discovery of all forms of SV using WGS has proven to be challenging as the short reads produced by the predominant HTS platforms (<200bp for current technologies) and the fact that most genomes include large amounts of repeats make it very difficult to unambiguously map and accurately characterize such variants. Furthermore, existing tools for SV discovery are primarily developed for only a few of the SV types, which may have conflicting sequence signatures (i.e. read pairs, read depth, split reads) with other, untargeted SV classes. Here we are introduce a new framework, Tardis, which combines multiple read signatures into a single package to characterize most SV types simultaneously, while preventing such conflicts. Tardis also has a modular structure that makes it easy to extend for the discovery of additional forms of SV.


Subject(s)
Genomic Structural Variation/genetics , Genomics , High-Throughput Nucleotide Sequencing/methods , Software , Algorithms , Genome, Human , High-Throughput Nucleotide Sequencing/trends , Humans , Sequence Analysis, DNA , Whole Genome Sequencing
5.
bioRxiv ; 2024 Apr 20.
Article in English | MEDLINE | ID: mdl-38659906

ABSTRACT

Structural variants (SVs) contribute significantly to human genetic diversity and disease 1-4 . Previously, SVs have remained incompletely resolved by population genomics, with short-read sequencing facing limitations in capturing the whole spectrum of SVs at nucleotide resolution 5-7 . Here we leveraged nanopore sequencing 8 to construct an intermediate coverage resource of 1,019 long-read genomes sampled within 26 human populations from the 1000 Genomes Project. By integrating linear and graph-based approaches for SV analysis via pangenome graph-augmentation, we uncover 167,291 sequence-resolved SVs in these samples, considerably advancing SV characterization compared to population-wide short-read sequencing studies 3,4 . Our analysis details diverse SV classes-deletions, duplications, insertions, and inversions-at population-scale. LINE-1 and SVA retrotransposition activities frequently mediate transductions 9,10 of unique sequences, with both mobile element classes transducing sequences at either the 3'- or 5'-end, depending on the source element locus. Furthermore, analyses of SV breakpoint junctions suggest a continuum of homology-mediated rearrangement processes are integral to SV formation, and highlight evidence for SV recurrence involving repeat sequences. Our open-access dataset underscores the transformative impact of long-read sequencing in advancing the characterisation of polymorphic genomic architectures, and provides a resource for guiding variant prioritisation in future long-read sequencing-based disease studies.

6.
Genome Biol ; 24(1): 78, 2023 04 17.
Article in English | MEDLINE | ID: mdl-37069665

ABSTRACT

BACKGROUND: Changes in microbial community composition as a function of human health and disease states have sparked remarkable interest in the human gut microbiome. However, establishing reproducible insights into the determinants of microbial succession in disease has been a formidable challenge. RESULTS: Here we use fecal microbiota transplantation (FMT) as an in natura experimental model to investigate the association between metabolic independence and resilience in stressed gut environments. Our genome-resolved metagenomics survey suggests that FMT serves as an environmental filter that favors populations with higher metabolic independence, the genomes of which encode complete metabolic modules to synthesize critical metabolites, including amino acids, nucleotides, and vitamins. Interestingly, we observe higher completion of the same biosynthetic pathways in microbes enriched in IBD patients. CONCLUSIONS: These observations suggest a general mechanism that underlies changes in diversity in perturbed gut environments and reveal taxon-independent markers of "dysbiosis" that may explain why widespread yet typically low-abundance members of healthy gut microbiomes can dominate under inflammatory conditions without any causal association with disease.


Subject(s)
Gastrointestinal Microbiome , Microbiota , Humans , Fecal Microbiota Transplantation , Metagenomics , Amino Acids , Feces
7.
Open Res Eur ; 2: 100, 2022.
Article in English | MEDLINE | ID: mdl-37829208

ABSTRACT

A major challenge in zooarchaeology is to morphologically distinguish closely related species' remains, especially using small bone fragments. Shotgun sequencing aDNA from archeological remains and comparative alignment to the candidate species' reference genomes will only apply when reference nuclear genomes of comparable quality are available, and may still fail when coverages are low. Here, we propose an alternative method, MTaxi, that uses highly accessible mitochondrial DNA (mtDNA) to distinguish between pairs of closely related species from ancient DNA sequences. MTaxi utilises mtDNA transversion-type substitutions between pairs of candidate species, assigns reads to either species, and performs a binomial test to determine the sample taxon. We tested MTaxi on sheep/goat and horse/donkey data, between which zooarchaeological classification can be challenging in ways that epitomise our case. The method performed efficiently on simulated ancient genomes down to 0.3x mitochondrial coverage for both sheep/goat and horse/donkey, with no false positives. Trials on n=18 ancient sheep/goat samples and n=10 horse/donkey samples of known species identity also yielded 100% accuracy. Overall, MTaxi provides a straightforward approach to classify closely related species that are difficult to distinguish through zooarchaeological methods using low coverage aDNA data, especially when similar quality reference genomes are unavailable. MTaxi is freely available at https://github.com/goztag/MTaxi.

8.
F1000Res ; 10: 246, 2021.
Article in English | MEDLINE | ID: mdl-34621504

ABSTRACT

In October 2020, 62 scientists from nine nations worked together remotely in the Second Baylor College of Medicine & DNAnexus hackathon, focusing on different related topics on Structural Variation, Pan-genomes, and SARS-CoV-2 related research.   The overarching focus was to assess the current status of the field and identify the remaining challenges. Furthermore, how to combine the strengths of the different interests to drive research and method development forward. Over the four days, eight groups each designed and developed new open-source methods to improve the identification and analysis of variations among species, including humans and SARS-CoV-2. These included improvements in SV calling, genotyping, annotations and filtering. Together with advancements in benchmarking existing methods. Furthermore, groups focused on the diversity of SARS-CoV-2. Daily discussion summary and methods are available publicly at  https://github.com/collaborativebioinformatics provides valuable insights for both participants and the research community.


Subject(s)
COVID-19 , SARS-CoV-2 , Animals , Genome, Viral , Humans , Vertebrates
10.
Nat Biotechnol ; 38(11): 1347-1355, 2020 11.
Article in English | MEDLINE | ID: mdl-32541955

ABSTRACT

New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12,745 isolated, sequence-resolved insertion (7,281) and deletion (5,464) calls ≥50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5,262 insertions and 4,095 deletions supported by ≥1 diploid assembly. We demonstrate that the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked- and long-read sequencing and optical mapping.


Subject(s)
Germ-Line Mutation/genetics , INDEL Mutation/genetics , Diploidy , Genomic Structural Variation , Humans , Molecular Sequence Annotation , Sequence Analysis, DNA
SELECTION OF CITATIONS
SEARCH DETAIL