Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Genome Res ; 33(6): 907-922, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-37433640

RESUMEN

Approximately 13% of the human genome at certain motifs have the potential to form noncanonical (non-B) DNA structures (e.g., G-quadruplexes, cruciforms, and Z-DNA), which regulate many cellular processes but also affect the activity of polymerases and helicases. Because sequencing technologies use these enzymes, they might possess increased errors at non-B structures. To evaluate this, we analyzed error rates, read depth, and base quality of Illumina, Pacific Biosciences (PacBio) HiFi, and Oxford Nanopore Technologies (ONT) sequencing at non-B motifs. All technologies showed altered sequencing success for most non-B motif types, although this could be owing to several factors, including structure formation, biased GC content, and the presence of homopolymers. Single-nucleotide mismatch errors had low biases in HiFi and ONT for all non-B motif types but were increased for G-quadruplexes and Z-DNA in all three technologies. Deletion errors were increased for all non-B types but Z-DNA in Illumina and HiFi, as well as only for G-quadruplexes in ONT. Insertion errors for non-B motifs were highly, moderately, and slightly elevated in Illumina, HiFi, and ONT, respectively. Additionally, we developed a probabilistic approach to determine the number of false positives at non-B motifs depending on sample size and variant frequency, and applied it to publicly available data sets (1000 Genomes, Simons Genome Diversity Project, and gnomAD). We conclude that elevated sequencing errors at non-B DNA motifs should be considered in low-read-depth studies (single-cell, ancient DNA, and pooled-sample population sequencing) and in scoring rare variants. Combining technologies should maximize sequencing accuracy in future studies of non-B DNA.


Asunto(s)
ADN de Forma Z , Nanoporos , Humanos , Motivos de Nucleótidos , Análisis de Secuencia de ADN , ADN/genética , Composición de Base , Secuenciación de Nucleótidos de Alto Rendimiento
2.
NAR Genom Bioinform ; 3(1): lqab019, 2021 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-33817639

RESUMEN

Sequencing technology has achieved great advances in the past decade. Studies have previously shown the quality of specific instruments in controlled conditions. Here, we developed a method able to retroactively determine the error rate of most public sequencing datasets. To do this, we utilized the overlaps between reads that are a feature of many sequencing libraries. With this method, we surveyed 1943 different datasets from seven different sequencing instruments produced by Illumina. We show that among public datasets, the more expensive platforms like HiSeq and NovaSeq have a lower error rate and less variation. But we also discovered that there is great variation within each platform, with the accuracy of a sequencing experiment depending greatly on the experimenter. We show the importance of sequence context, especially the phenomenon where preceding bases bias the following bases toward the same identity. We also show the difference in patterns of sequence bias between instruments. Contrary to expectations based on the underlying chemistry, HiSeq X Ten and NovaSeq 6000 share notable exceptions to the preceding-base bias. Our results demonstrate the importance of the specific circumstances of every sequencing experiment, and the importance of evaluating the quality of each one.

3.
NAR Genom Bioinform ; 3(1): lqab014, 2021 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-33709076

RESUMEN

[This corrects the article DOI: 10.1093/nargab/lqab002.].

4.
NAR Genom Bioinform ; 3(1): lqab002, 2021 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-33575654

RESUMEN

Duplex sequencing is currently the most reliable method to identify ultra-low frequency DNA variants by grouping sequence reads derived from the same DNA molecule into families with information on the forward and reverse strand. However, only a small proportion of reads are assembled into duplex consensus sequences (DCS), and reads with potentially valuable information are discarded at different steps of the bioinformatics pipeline, especially reads without a family. We developed a bioinformatics toolset that analyses the tag and family composition with the purpose to understand data loss and implement modifications to maximize the data output for the variant calling. Specifically, our tools show that tags contain polymerase chain reaction and sequencing errors that contribute to data loss and lower DCS yields. Our tools also identified chimeras, which likely reflect barcode collisions. Finally, we also developed a tool that re-examines variant calls from raw reads and provides different summary data that categorizes the confidence level of a variant call by a tier-based system. With this tool, we can include reads without a family and check the reliability of the call, that increases substantially the sequencing depth for variant calling, a particular important advantage for low-input samples or low-coverage regions.

5.
PLoS Biol ; 18(7): e3000745, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32667908

RESUMEN

Mutations create genetic variation for other evolutionary forces to operate on and cause numerous genetic diseases. Nevertheless, how de novo mutations arise remains poorly understood. Progress in the area is hindered by the fact that error rates of conventional sequencing technologies (1 in 100 or 1,000 base pairs) are several orders of magnitude higher than de novo mutation rates (1 in 10,000,000 or 100,000,000 base pairs per generation). Moreover, previous analyses of germline de novo mutations examined pedigrees (and not germ cells) and thus were likely affected by selection. Here, we applied highly accurate duplex sequencing to detect low-frequency, de novo mutations in mitochondrial DNA (mtDNA) directly from oocytes and from somatic tissues (brain and muscle) of 36 mice from two independent pedigrees. We found mtDNA mutation frequencies 2- to 3-fold higher in 10-month-old than in 1-month-old mice, demonstrating mutation accumulation during the period of only 9 mo. Mutation frequencies and patterns differed between germline and somatic tissues and among mtDNA regions, suggestive of distinct mutagenesis mechanisms. Additionally, we discovered a more pronounced genetic drift of mitochondrial genetic variants in the germline of older versus younger mice, arguing for mtDNA turnover during oocyte meiotic arrest. Our study deciphered for the first time the intricacies of germline de novo mutagenesis using duplex sequencing directly in oocytes, which provided unprecedented resolution and minimized selection effects present in pedigree studies. Moreover, our work provides important information about the origins and accumulation of mutations with aging/maturation and has implications for delayed reproduction in modern human societies. Furthermore, the duplex sequencing method we optimized for single cells opens avenues for investigating low-frequency mutations in other studies.


Asunto(s)
Envejecimiento/genética , Mamíferos/genética , Mitocondrias/genética , Mutación/genética , Oocitos/metabolismo , Especificidad de Órganos/genética , Animales , Análisis Mutacional de ADN , ADN Mitocondrial/genética , Femenino , Frecuencia de los Genes/genética , Flujo Genético , Células Germinativas/metabolismo , Patrón de Herencia/genética , Modelos Logísticos , Masculino , Ratones , Modelos Genéticos , Tasa de Mutación , Nucleótidos/genética , Linaje
6.
BMC Bioinformatics ; 21(1): 96, 2020 Mar 04.
Artículo en Inglés | MEDLINE | ID: mdl-32131723

RESUMEN

BACKGROUND: Duplex sequencing is the most accurate approach for identification of sequence variants present at very low frequencies. Its power comes from pooling together multiple descendants of both strands of original DNA molecules, which allows distinguishing true nucleotide substitutions from PCR amplification and sequencing artifacts. This strategy comes at a cost-sequencing the same molecule multiple times increases dynamic range but significantly diminishes coverage, making whole genome duplex sequencing prohibitively expensive. Furthermore, every duplex experiment produces a substantial proportion of singleton reads that cannot be used in the analysis and are thrown away. RESULTS: In this paper we demonstrate that a significant fraction of these reads contains PCR or sequencing errors within duplex tags. Correction of such errors allows "reuniting" these reads with their respective families increasing the output of the method and making it more cost effective. CONCLUSIONS: We combine an error correction strategy with a number of algorithmic improvements in a new version of the duplex analysis software, Du Novo 2.0. It is written in Python, C, AWK, and Bash. It is open source and readily available through Galaxy, Bioconda, and Github: https://github.com/galaxyproject/dunovo.


Asunto(s)
Interfaz Usuario-Computador , Algoritmos , ADN/química , ADN/metabolismo , Humanos , Alineación de Secuencia , Análisis de Secuencia de ADN
7.
J Virol ; 93(1)2019 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-30305356

RESUMEN

Only a few RNA viruses have been discovered from archaeological samples, the oldest dating from about 750 years ago. Using ancient maize cobs from Antelope house, Arizona, dating from ca. 1,000 CE, we discovered a novel plant virus with a double-stranded RNA genome. The virus is a member of the family Chrysoviridae that infect plants and fungi in a persistent manner. The extracted double-stranded RNA from 312 maize cobs was converted to cDNA, and sequences were determined using an Illumina HiSeq 2000. Assembled contigs from many samples showed similarity to Anthurium mosaic-associated virus and Persea americana chrysovirus, putative species in the Chrysovirus genus, and nearly complete genomes were found in three ancient maize samples. We named this new virus Zea mays chrysovirus 1. Using specific primers, we were able to recover sequences of a closely related virus from modern maize and obtained the nearly complete sequences of the three genomic RNAs. Comparing the nucleotide sequences of the three genomic RNAs of the modern and ancient viruses showed 98, 96.7, and 97.4% identities, respectively. Hence, in 1,000 years of maize cultivation, this virus has undergone about 3% divergence.IMPORTANCE A virus related to plant chrysoviruses was found in numerous ancient samples of maize, with nearly complete genomes in three samples. The age of the ancient samples (i.e., about 1,000 years old) was confirmed by carbon dating. Chrysoviruses are persistent plant viruses. They infect their hosts from generation to generation by transmission through seeds and can remain in their hosts for very long time periods. When modern corn samples were analyzed, a closely related chrysovirus was found with only about 3% divergence from the ancient sequences. This virus represents the oldest known plant virus.


Asunto(s)
Sedimentos Geológicos/virología , Virus de Plantas/clasificación , ARN Bicatenario/genética , Zea mays/virología , Arizona , Evolución Molecular , Tamaño del Genoma , Secuenciación de Nucleótidos de Alto Rendimiento , Filogenia , Virus de Plantas/aislamiento & purificación , Virus ARN/genética , Análisis de Secuencia de ADN , Análisis de Secuencia de ARN
8.
Genome Biol ; 17(1): 180, 2016 08 26.
Artículo en Inglés | MEDLINE | ID: mdl-27566673

RESUMEN

Duplex sequencing was originally developed to detect rare nucleotide polymorphisms normally obscured by the noise of high-throughput sequencing. Here we describe a new, streamlined, reference-free approach for the analysis of duplex sequencing data. We show the approach performs well on simulated data and precisely reproduces previously published results and apply it to a newly produced dataset, enabling us to type low-frequency variants in human mitochondrial DNA. Finally, we provide all necessary tools as stand-alone components as well as integrate them into the Galaxy platform. All analyses performed in this manuscript can be repeated exactly as described at http://usegalaxy.org/duplex .


Asunto(s)
ADN Mitocondrial/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Polimorfismo de Nucleótido Simple/genética , Programas Informáticos , Genómica , Humanos , Análisis de Secuencia de ADN/métodos
9.
Proc Natl Acad Sci U S A ; 111(43): 15474-9, 2014 Oct 28.
Artículo en Inglés | MEDLINE | ID: mdl-25313049

RESUMEN

The manifestation of mitochondrial DNA (mtDNA) diseases depends on the frequency of heteroplasmy (the presence of several alleles in an individual), yet its transmission across generations cannot be readily predicted owing to a lack of data on the size of the mtDNA bottleneck during oogenesis. For deleterious heteroplasmies, a severe bottleneck may abruptly transform a benign (low) frequency in a mother into a disease-causing (high) frequency in her child. Here we present a high-resolution study of heteroplasmy transmission conducted on blood and buccal mtDNA of 39 healthy mother-child pairs of European ancestry (a total of 156 samples, each sequenced at ∼20,000× per site). On average, each individual carried one heteroplasmy, and one in eight individuals carried a disease-associated heteroplasmy, with minor allele frequency ≥1%. We observed frequent drastic heteroplasmy frequency shifts between generations and estimated the effective size of the germ-line mtDNA bottleneck at only ∼30-35 (interquartile range from 9 to 141). Accounting for heteroplasmies, we estimated the mtDNA germ-line mutation rate at 1.3 × 10(-8) (interquartile range from 4.2 × 10(-9) to 4.1 × 10(-8)) mutations per site per year, an order of magnitude higher than for nuclear DNA. Notably, we found a positive association between the number of heteroplasmies in a child and maternal age at fertilization, likely attributable to oocyte aging. This study also took advantage of droplet digital PCR (ddPCR) to validate heteroplasmies and confirm a de novo mutation. Our results can be used to predict the transmission of disease-causing mtDNA variants and illuminate evolutionary dynamics of the mitochondrial genome.


Asunto(s)
ADN Mitocondrial/genética , Células Germinativas/metabolismo , Patrón de Herencia/genética , Edad Materna , Factores de Edad , Niño , Enfermedad/genética , Femenino , Frecuencia de los Genes/genética , Humanos , Mutación INDEL/genética , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN
10.
Genome Biol ; 15(2): 403, 2014 Feb 20.
Artículo en Inglés | MEDLINE | ID: mdl-25001293

RESUMEN

The proliferation of web-based integrative analysis frameworks has enabled users to perform complex analyses directly through the web. Unfortunately, it also revoked the freedom to easily select the most appropriate tools. To address this, we have developed Galaxy ToolShed.


Asunto(s)
Biología Computacional , Internet , Programas Informáticos , Ciencia
11.
Biotechniques ; 56(3): 134-141, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24641477

RESUMEN

Polymorphism discovery is a routine application of next-generation sequencing technology where multiple samples are sent to a service provider for library preparation, subsequent sequencing, and bioinformatic analyses. The decreasing cost and advances in multiplexing approaches have made it possible to analyze hundreds of samples at a reasonable cost. However, because of the manual steps involved in the initial processing of samples and handling of sequencing equipment, cross-contamination remains a significant challenge. It is especially problematic in cases where polymorphism frequencies do not adhere to diploid expectation, for example, heterogeneous tumor samples, organellar genomes, as well as during bacterial and viral sequencing. In these instances, low levels of contamination may be readily mistaken for polymorphisms, leading to false results. Here we describe practical steps designed to reliably detect contamination and uncover its origin, and also provide new, Galaxy-based, readily accessible computational tools and workflows for quality control. All results described in this report can be reproduced interactively on the web as described at http://usegalaxy.org/contamination.


Asunto(s)
Contaminación de ADN , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia/métodos , ADN Mitocondrial/química , ADN Mitocondrial/genética , Internet , Polimorfismo Genético , Reproducibilidad de los Resultados
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...