Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 98
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Nature ; 630(8016): 401-411, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38811727

RESUMEN

Apes possess two sex chromosomes-the male-specific Y chromosome and the X chromosome, which is present in both males and females. The Y chromosome is crucial for male reproduction, with deletions being linked to infertility1. The X chromosome is vital for reproduction and cognition2. Variation in mating patterns and brain function among apes suggests corresponding differences in their sex chromosomes. However, owing to their repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the methodology developed for the telomere-to-telomere (T2T) human genome, we produced gapless assemblies of the X and Y chromosomes for five great apes (bonobo (Pan paniscus), chimpanzee (Pan troglodytes), western lowland gorilla (Gorilla gorilla gorilla), Bornean orangutan (Pongo pygmaeus) and Sumatran orangutan (Pongo abelii)) and a lesser ape (the siamang gibbon (Symphalangus syndactylus)), and untangled the intricacies of their evolution. Compared with the X chromosomes, the ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements-owing to the accumulation of lineage-specific ampliconic regions, palindromes, transposable elements and satellites. Many Y chromosome genes expand in multi-copy families and some evolve under purifying selection. Thus, the Y chromosome exhibits dynamic evolution, whereas the X chromosome is more stable. Mapping short-read sequencing data to these assemblies revealed diversity and selection patterns on sex chromosomes of more than 100 individual great apes. These reference assemblies are expected to inform human evolution and conservation genetics of non-human apes, all of which are endangered species.


Asunto(s)
Hominidae , Cromosoma X , Cromosoma Y , Animales , Femenino , Masculino , Gorilla gorilla/genética , Hominidae/genética , Hominidae/clasificación , Hylobatidae/genética , Pan paniscus/genética , Pan troglodytes/genética , Filogenia , Pongo abelii/genética , Pongo pygmaeus/genética , Telómero/genética , Cromosoma X/genética , Cromosoma Y/genética , Evolución Molecular , Variaciones en el Número de Copia de ADN/genética , Humanos , Especies en Peligro de Extinción , Estándares de Referencia
2.
Trends Genet ; 39(2): 109-124, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36604282

RESUMEN

In addition to the canonical right-handed double helix, other DNA structures, termed 'non-B DNA', can form in the genomes across the tree of life. Non-B DNA regulates multiple cellular processes, including replication and transcription, yet its presence is associated with elevated mutagenicity and genome instability. These discordant cellular roles fuel the enormous potential of non-B DNA to drive genomic and phenotypic evolution. Here we discuss recent studies establishing non-B DNA structures as novel functional elements subject to natural selection, affecting evolution of transposable elements (TEs), and specifying centromeres. By highlighting the contributions of non-B DNA to repeated evolution and adaptation to changing environments, we conclude that evolutionary analyses should include a perspective of not only DNA sequence, but also its structure.


Asunto(s)
Elementos Transponibles de ADN , Genómica , Humanos , Elementos Transponibles de ADN/genética , Secuencia de Bases , Inestabilidad Genómica/genética , Evolución Molecular
3.
Genome Res ; 33(6): 907-922, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-37433640

RESUMEN

Approximately 13% of the human genome at certain motifs have the potential to form noncanonical (non-B) DNA structures (e.g., G-quadruplexes, cruciforms, and Z-DNA), which regulate many cellular processes but also affect the activity of polymerases and helicases. Because sequencing technologies use these enzymes, they might possess increased errors at non-B structures. To evaluate this, we analyzed error rates, read depth, and base quality of Illumina, Pacific Biosciences (PacBio) HiFi, and Oxford Nanopore Technologies (ONT) sequencing at non-B motifs. All technologies showed altered sequencing success for most non-B motif types, although this could be owing to several factors, including structure formation, biased GC content, and the presence of homopolymers. Single-nucleotide mismatch errors had low biases in HiFi and ONT for all non-B motif types but were increased for G-quadruplexes and Z-DNA in all three technologies. Deletion errors were increased for all non-B types but Z-DNA in Illumina and HiFi, as well as only for G-quadruplexes in ONT. Insertion errors for non-B motifs were highly, moderately, and slightly elevated in Illumina, HiFi, and ONT, respectively. Additionally, we developed a probabilistic approach to determine the number of false positives at non-B motifs depending on sample size and variant frequency, and applied it to publicly available data sets (1000 Genomes, Simons Genome Diversity Project, and gnomAD). We conclude that elevated sequencing errors at non-B DNA motifs should be considered in low-read-depth studies (single-cell, ancient DNA, and pooled-sample population sequencing) and in scoring rare variants. Combining technologies should maximize sequencing accuracy in future studies of non-B DNA.


Asunto(s)
ADN de Forma Z , Nanoporos , Humanos , Motivos de Nucleótidos , Análisis de Secuencia de ADN , ADN/genética , Composición de Base , Secuenciación de Nucleótidos de Alto Rendimiento
4.
Proc Natl Acad Sci U S A ; 119(15): e2118740119, 2022 04 12.
Artículo en Inglés | MEDLINE | ID: mdl-35394879

RESUMEN

Mutations in mitochondrial DNA (mtDNA) contribute to multiple diseases. However, how new mtDNA mutations arise and accumulate with age remains understudied because of the high error rates of current sequencing technologies. Duplex sequencing reduces error rates by several orders of magnitude via independently tagging and analyzing each of the two template DNA strands. Here, using duplex sequencing, we obtained high-quality mtDNA sequences for somatic tissues (liver and skeletal muscle) and single oocytes of 30 unrelated rhesus macaques, from 1 to 23 y of age. Sequencing single oocytes minimized effects of natural selection on germline mutations. In total, we identified 17,637 tissue-specific de novo mutations. Their frequency increased ∼3.5-fold in liver and ∼2.8-fold in muscle over the ∼20 y assessed. Mutation frequency in oocytes increased ∼2.5-fold until the age of 9 y, but did not increase after that, suggesting that oocytes of older animals maintain the quality of their mtDNA. We found the light-strand origin of replication (OriL) to be a hotspot for mutation accumulation with aging in liver. Indeed, the 33-nucleotide-long OriL harbored 12 variant hotspots, 10 of which likely disrupt its hairpin structure and affect replication efficiency. Moreover, in somatic tissues, protein-coding variants were subject to positive selection (potentially mitigating toxic effects of mitochondrial activity), the strength of which increased with the number of macaques harboring variants. Our work illuminates the origins and accumulation of somatic and germline mtDNA mutations with aging in primates and has implications for delayed reproduction in modern human societies.


Asunto(s)
Envejecimiento , Mitocondrias , Mutación , Oocitos , Animales , ADN Mitocondrial/genética , ADN Mitocondrial/metabolismo , Humanos , Macaca mulatta/genética , Mitocondrias/genética , Oocitos/metabolismo
5.
Genome Res ; 31(7): 1136-1149, 2021 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-34187812

RESUMEN

Approximately 1% of the human genome has the ability to fold into G-quadruplexes (G4s)-noncanonical strand-specific DNA structures forming at G-rich motifs. G4s regulate several key cellular processes (e.g., transcription) and have been hypothesized to participate in others (e.g., firing of replication origins). Moreover, G4s differ in their thermostability, and this may affect their function. Yet, G4s may also hinder replication, transcription, and translation and may increase genome instability and mutation rates. Therefore, depending on their genomic location, thermostability, and functionality, G4 loci might evolve under different selective pressures, which has never been investigated. Here we conducted the first genome-wide analysis of G4 distribution, thermostability, and selection. We found an overrepresentation, high thermostability, and purifying selection for G4s within genic components in which they are expected to be functional-promoters, CpG islands, and 5' and 3' UTRs. A similar pattern was observed for G4s within replication origins, enhancers, eQTLs, and TAD boundary regions, strongly suggesting their functionality. In contrast, G4s on the nontranscribed strand of exons were underrepresented, were unstable, and evolved neutrally. In general, G4s on the nontranscribed strand of genic components had lower density and were less stable than those on the transcribed strand, suggesting that the former are avoided at the RNA level. Across the genome, purifying selection was stronger at stable G4s. Our results suggest that purifying selection preserves the sequences of functional G4s, whereas nonfunctional G4s are too costly to be tolerated in the genome. Thus, G4s are emerging as fundamental, functional genomic elements.

6.
PLoS Biol ; 18(7): e3000745, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32667908

RESUMEN

Mutations create genetic variation for other evolutionary forces to operate on and cause numerous genetic diseases. Nevertheless, how de novo mutations arise remains poorly understood. Progress in the area is hindered by the fact that error rates of conventional sequencing technologies (1 in 100 or 1,000 base pairs) are several orders of magnitude higher than de novo mutation rates (1 in 10,000,000 or 100,000,000 base pairs per generation). Moreover, previous analyses of germline de novo mutations examined pedigrees (and not germ cells) and thus were likely affected by selection. Here, we applied highly accurate duplex sequencing to detect low-frequency, de novo mutations in mitochondrial DNA (mtDNA) directly from oocytes and from somatic tissues (brain and muscle) of 36 mice from two independent pedigrees. We found mtDNA mutation frequencies 2- to 3-fold higher in 10-month-old than in 1-month-old mice, demonstrating mutation accumulation during the period of only 9 mo. Mutation frequencies and patterns differed between germline and somatic tissues and among mtDNA regions, suggestive of distinct mutagenesis mechanisms. Additionally, we discovered a more pronounced genetic drift of mitochondrial genetic variants in the germline of older versus younger mice, arguing for mtDNA turnover during oocyte meiotic arrest. Our study deciphered for the first time the intricacies of germline de novo mutagenesis using duplex sequencing directly in oocytes, which provided unprecedented resolution and minimized selection effects present in pedigree studies. Moreover, our work provides important information about the origins and accumulation of mutations with aging/maturation and has implications for delayed reproduction in modern human societies. Furthermore, the duplex sequencing method we optimized for single cells opens avenues for investigating low-frequency mutations in other studies.


Asunto(s)
Envejecimiento/genética , Mamíferos/genética , Mitocondrias/genética , Mutación/genética , Oocitos/metabolismo , Especificidad de Órganos/genética , Animales , Análisis Mutacional de ADN , ADN Mitocondrial/genética , Femenino , Frecuencia de los Genes/genética , Flujo Genético , Células Germinativas/metabolismo , Patrón de Herencia/genética , Modelos Logísticos , Masculino , Ratones , Modelos Genéticos , Tasa de Mutación , Nucleótidos/genética , Linaje
7.
J Hered ; 114(1): 35-43, 2023 03 16.
Artículo en Inglés | MEDLINE | ID: mdl-36146896

RESUMEN

The Javan gibbon, Hylobates moloch, is an endangered gibbon species restricted to the forest remnants of western and central Java, Indonesia, and one of the rarest of the Hylobatidae family. Hylobatids consist of 4 genera (Holoock, Hylobates, Symphalangus, and Nomascus) that are characterized by different numbers of chromosomes, ranging from 38 to 52. The underlying cause of this karyotype plasticity is not entirely understood, at least in part, due to the limited availability of genomic data. Here we present the first scaffold-level assembly for H. moloch using a combination of whole-genome Illumina short reads, 10X Chromium linked reads, PacBio, and Oxford Nanopore long reads and proximity-ligation data. This Hylobates genome represents a valuable new resource for comparative genomics studies in primates.


Asunto(s)
Genoma , Hylobates , Animales , Hylobates/genética , Bosques , Especies en Peligro de Extinción , Indonesia
8.
Nucleic Acids Res ; 49(3): 1497-1516, 2021 02 22.
Artículo en Inglés | MEDLINE | ID: mdl-33450015

RESUMEN

Approximately 13% of the human genome can fold into non-canonical (non-B) DNA structures (e.g. G-quadruplexes, Z-DNA, etc.), which have been implicated in vital cellular processes. Non-B DNA also hinders replication, increasing errors and facilitating mutagenesis, yet its contribution to genome-wide variation in mutation rates remains unexplored. Here, we conducted a comprehensive analysis of nucleotide substitution frequencies at non-B DNA loci within noncoding, non-repetitive genome regions, their ±2 kb flanking regions, and 1-Megabase windows, using human-orangutan divergence and human single-nucleotide polymorphisms. Functional data analysis at single-base resolution demonstrated that substitution frequencies are usually elevated at non-B DNA, with patterns specific to each non-B DNA type. Mirror, direct and inverted repeats have higher substitution frequencies in spacers than in repeat arms, whereas G-quadruplexes, particularly stable ones, have higher substitution frequencies in loops than in stems. Several non-B DNA types also affect substitution frequencies in their flanking regions. Finally, non-B DNA explains more variation than any other predictor in multiple regression models for diversity or divergence at 1-Megabase scale. Thus, non-B DNA substantially contributes to variation in substitution frequencies at small and large scales. Our results highlight the role of non-B DNA in germline mutagenesis with implications to evolution and genetic diseases.


Asunto(s)
ADN/química , Variación Genética , Genoma Humano , Animales , Sitios Genéticos , Humanos , Tasa de Mutación , Polimorfismo de Nucleótido Simple , Pongo pygmaeus
9.
Proc Natl Acad Sci U S A ; 117(42): 26273-26280, 2020 10 20.
Artículo en Inglés | MEDLINE | ID: mdl-33020265

RESUMEN

The mammalian male-specific Y chromosome plays a critical role in sex determination and male fertility. However, because of its repetitive and haploid nature, it is frequently absent from genome assemblies and remains enigmatic. The Y chromosomes of great apes represent a particular puzzle: their gene content is more similar between human and gorilla than between human and chimpanzee, even though human and chimpanzee share a more recent common ancestor. To solve this puzzle, here we constructed a dataset including Ys from all extant great ape genera. We generated assemblies of bonobo and orangutan Ys from short and long sequencing reads and aligned them with the publicly available human, chimpanzee, and gorilla Y assemblies. Analyzing this dataset, we found that the genus Pan, which includes chimpanzee and bonobo, experienced accelerated substitution rates. Pan also exhibited elevated gene death rates. These observations are consistent with high levels of sperm competition in Pan Furthermore, we inferred that the great ape common ancestor already possessed multicopy sequences homologous to most human and chimpanzee palindromes. Nonetheless, each species also acquired distinct ampliconic sequences. We also detected increased chromatin contacts between and within palindromes (from Hi-C data), likely facilitating gene conversion and structural rearrangements. Our results highlight the dynamic mode of Y chromosome evolution and open avenues for studies of male-specific dispersal in endangered great ape species.


Asunto(s)
Hominidae/genética , Cromosoma Y/genética , Animales , Evolución Biológica , Evolución Molecular , Conversión Génica , Gorilla gorilla/genética , Humanos , Pan paniscus/genética , Pan troglodytes/genética , Pongo/genética , Análisis de Secuencia de ADN
10.
PLoS Genet ; 15(9): e1008369, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31525193

RESUMEN

The Y chromosome harbors nine multi-copy ampliconic gene families expressed exclusively in testis. The gene copies within each family are >99% identical to each other, which poses a major challenge in evaluating their copy number. Recent studies demonstrated high variation in Y ampliconic gene copy number among humans. However, how this variation affects expression levels in human testis remains understudied. Here we developed a novel computational tool Ampliconic Copy Number Estimator (AmpliCoNE) that utilizes read sequencing depth information to estimate Y ampliconic gene copy number per family. We applied this tool to whole-genome sequencing data of 149 men with matched testis expression data whose samples are part of the Genotype-Tissue Expression (GTEx) project. We found that the Y ampliconic gene families with low copy number in humans were deleted or pseudogenized in non-human great apes, suggesting relaxation of functional constraints. Among the Y ampliconic gene families, higher copy number leads to higher expression. Within the Y ampliconic gene families, copy number does not influence gene expression, rather a high tolerance for variation in gene expression was observed in testis of presumably healthy men. No differences in gene expression levels were found among major Y haplogroups. Age positively correlated with expression levels of the HSFY and PRY gene families in the African subhaplogroup E1b, but not in the European subhaplogroups R1b and I1. We also found that expression of five Y ampliconic gene families is coordinated with that of their non-Y (i.e. X or autosomal) homologs. Indeed, five ampliconic gene families had consistently lower expression levels when compared to their non-Y homologs suggesting dosage regulation, while the HSFY family had higher expression levels than its X homolog and thus lacked dosage regulation.


Asunto(s)
Cromosomas Humanos Y/genética , Genes Ligados a Y/genética , Análisis de Secuencia de ADN/métodos , Animales , Cromosomas Humanos Y/fisiología , Variaciones en el Número de Copia de ADN/genética , Bases de Datos Genéticas , Compensación de Dosificación (Genética)/genética , Compensación de Dosificación (Genética)/fisiología , Epigénesis Genética/genética , Dosificación de Gen/genética , Expresión Génica/genética , Regulación de la Expresión Génica/genética , Genes Ligados a Y/fisiología , Factores de Transcripción del Choque Térmico/genética , Factores de Transcripción del Choque Térmico/metabolismo , Humanos , Masculino , Familia de Multigenes/genética , Testículo/metabolismo
11.
Proc Natl Acad Sci U S A ; 116(50): 25172-25178, 2019 12 10.
Artículo en Inglés | MEDLINE | ID: mdl-31757848

RESUMEN

Heteroplasmy-the presence of multiple mitochondrial DNA (mtDNA) haplotypes in an individual-can lead to numerous mitochondrial diseases. The presentation of such diseases depends on the frequency of the heteroplasmic variant in tissues, which, in turn, depends on the dynamics of mtDNA transmissions during germline and somatic development. Thus, understanding and predicting these dynamics between generations and within individuals is medically relevant. Here, we study patterns of heteroplasmy in 2 tissues from each of 345 humans in 96 multigenerational families, each with, at least, 2 siblings (a total of 249 mother-child transmissions). This experimental design has allowed us to estimate the timing of mtDNA mutations, drift, and selection with unprecedented precision. Our results are remarkably concordant between 2 complementary population-genetic approaches. We find evidence for a severe germline bottleneck (7-10 mtDNA segregating units) that occurs independently in different oocyte lineages from the same mother, while somatic bottlenecks are less severe. We demonstrate that divergence between mother and offspring increases with the mother's age at childbirth, likely due to continued drift of heteroplasmy frequencies in oocytes under meiotic arrest. We show that this period is also accompanied by mutation accumulation leading to more de novo mutations in children born to older mothers. We show that heteroplasmic variants at intermediate frequencies can segregate for many generations in the human population, despite the strong germline bottleneck. We show that selection acts during germline development to keep the frequency of putatively deleterious variants from rising. Our findings have important applications for clinical genetics and genetic counseling.


Asunto(s)
ADN Mitocondrial/genética , Células Germinativas/citología , Edad Materna , Enfermedades Mitocondriales/genética , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Niño , Preescolar , Femenino , Genética de Población , Genética Humana , Humanos , Masculino , Persona de Mediana Edad , Mitocondrias/genética , Linaje , Adulto Joven
12.
Mol Biol Evol ; 37(12): 3576-3600, 2020 12 16.
Artículo en Inglés | MEDLINE | ID: mdl-32722770

RESUMEN

Long INterspersed Elements-1 (L1s) constitute >17% of the human genome and still actively transpose in it. Characterizing L1 transposition across the genome is critical for understanding genome evolution and somatic mutations. However, to date, L1 insertion and fixation patterns have not been studied comprehensively. To fill this gap, we investigated three genome-wide data sets of L1s that integrated at different evolutionary times: 17,037 de novo L1s (from an L1 insertion cell-line experiment conducted in-house), and 1,212 polymorphic and 1,205 human-specific L1s (from public databases). We characterized 49 genomic features-proxying chromatin accessibility, transcriptional activity, replication, recombination, etc.-in the ±50 kb flanks of these elements. These features were contrasted between the three L1 data sets and L1-free regions using state-of-the-art Functional Data Analysis statistical methods, which treat high-resolution data as mathematical functions. Our results indicate that de novo, polymorphic, and human-specific L1s are surrounded by different genomic features acting at specific locations and scales. This led to an integrative model of L1 transposition, according to which L1s preferentially integrate into open-chromatin regions enriched in non-B DNA motifs, whereas they are fixed in regions largely free of purifying selection-depleted of genes and noncoding most conserved elements. Intriguingly, our results suggest that L1 insertions modify local genomic landscape by extending CpG methylation and increasing mononucleotide microsatellite density. Altogether, our findings substantially facilitate understanding of L1 integration and fixation preferences, pave the way for uncovering their role in aging and cancer, and inform their use as mutagenesis tools in genetic studies.


Asunto(s)
Evolución Biológica , Elementos Transponibles de ADN , Genoma Humano , Elementos de Nucleótido Esparcido Largo , Modelos Genéticos , Humanos , Mutagénesis Insercional
13.
Genome Res ; 28(12): 1767-1778, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-30401733

RESUMEN

DNA conformation may deviate from the classical B-form in ∼13% of the human genome. Non-B DNA regulates many cellular processes; however, its effects on DNA polymerization speed and accuracy have not been investigated genome-wide. Such an inquiry is critical for understanding neurological diseases and cancer genome instability. Here, we present the first simultaneous examination of DNA polymerization kinetics and errors in the human genome sequenced with Single-Molecule Real-Time (SMRT) technology. We show that polymerization speed differs between non-B and B-DNA: It decelerates at G-quadruplexes and fluctuates periodically at disease-causing tandem repeats. Analyzing polymerization kinetics profiles, we predict and validate experimentally non-B DNA formation for a novel motif. We demonstrate that several non-B motifs affect sequencing errors (e.g., G-quadruplexes increase error rates), and that sequencing errors are positively associated with polymerase slowdown. Finally, we show that highly divergent G4 motifs have pronounced polymerization slowdown and high sequencing error rates, suggesting similar mechanisms for sequencing errors and germline mutations.


Asunto(s)
ADN/química , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Conformación de Ácido Nucleico , Análisis de Secuencia de ADN , Replicación del ADN , G-Cuádruplex , Genómica/métodos , Genómica/normas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Humanos , Cinética , Mutación , Motivos de Nucleótidos , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN/métodos
14.
Nat Rev Genet ; 16(4): 213-23, 2015 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-25732611

RESUMEN

The variation in local rates of mutations can affect both the evolution of genes and their function in normal and cancer cells. Deciphering the molecular determinants of this variation will be aided by the elucidation of distinct types of mutations, as they differ in regional preferences and in associations with genomic features. Chromatin organization contributes to regional variation in mutation rates, but its contribution differs among mutation types. In both germline and somatic mutations, base substitutions are more abundant in regions of closed chromatin, perhaps reflecting error accumulation late in replication. By contrast, a distinctive mutational state with very high levels of insertions and deletions (indels) and substitutions is enriched in regions of open chromatin. These associations indicate an intricate interplay between the nucleotide sequence of DNA and its dynamic packaging into chromatin, and have important implications for current biomedical research. This Review focuses on recent studies showing associations between chromatin state and mutation rates, including pairwise and multivariate investigations of germline and somatic (particularly cancer) mutations.


Asunto(s)
Ensamble y Desensamble de Cromatina/genética , Variación Genética , Genoma , Tasa de Mutación , Animales , Evolución Molecular , Humanos
15.
BMC Bioinformatics ; 21(1): 96, 2020 Mar 04.
Artículo en Inglés | MEDLINE | ID: mdl-32131723

RESUMEN

BACKGROUND: Duplex sequencing is the most accurate approach for identification of sequence variants present at very low frequencies. Its power comes from pooling together multiple descendants of both strands of original DNA molecules, which allows distinguishing true nucleotide substitutions from PCR amplification and sequencing artifacts. This strategy comes at a cost-sequencing the same molecule multiple times increases dynamic range but significantly diminishes coverage, making whole genome duplex sequencing prohibitively expensive. Furthermore, every duplex experiment produces a substantial proportion of singleton reads that cannot be used in the analysis and are thrown away. RESULTS: In this paper we demonstrate that a significant fraction of these reads contains PCR or sequencing errors within duplex tags. Correction of such errors allows "reuniting" these reads with their respective families increasing the output of the method and making it more cost effective. CONCLUSIONS: We combine an error correction strategy with a number of algorithmic improvements in a new version of the duplex analysis software, Du Novo 2.0. It is written in Python, C, AWK, and Bash. It is open source and readily available through Galaxy, Bioconda, and Github: https://github.com/galaxyproject/dunovo.


Asunto(s)
Interfaz Usuario-Computador , Algoritmos , ADN/química , ADN/metabolismo , Humanos , Alineación de Secuencia , Análisis de Secuencia de ADN
16.
Mol Biol Evol ; 36(11): 2415-2431, 2019 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-31273383

RESUMEN

Satellite repeats are a structural component of centromeres and telomeres, and in some instances, their divergence is known to drive speciation. Due to their highly repetitive nature, satellite sequences have been understudied and underrepresented in genome assemblies. To investigate their turnover in great apes, we studied satellite repeats of unit sizes up to 50 bp in human, chimpanzee, bonobo, gorilla, and Sumatran and Bornean orangutans, using unassembled short and long sequencing reads. The density of satellite repeats, as identified from accurate short reads (Illumina), varied greatly among great ape genomes. These were dominated by a handful of abundant repeated motifs, frequently shared among species, which formed two groups: 1) the (AATGG)n repeat (critical for heat shock response) and its derivatives; and 2) subtelomeric 32-mers involved in telomeric metabolism. Using the densities of abundant repeats, individuals could be classified into species. However, clustering did not reproduce the accepted species phylogeny, suggesting rapid repeat evolution. Several abundant repeats were enriched in males versus females; using Y chromosome assemblies or Fluorescent In Situ Hybridization, we validated their location on the Y. Finally, applying a novel computational tool, we identified many satellite repeats completely embedded within long Oxford Nanopore and Pacific Biosciences reads. Such repeats were up to 59 kb in length and consisted of perfect repeats interspersed with other similar sequences. Our results based on sequencing reads generated with three different technologies provide the first detailed characterization of great ape satellite repeats, and open new avenues for exploring their functions.

17.
Trends Genet ; 33(4): 266-282, 2017 04.
Artículo en Inglés | MEDLINE | ID: mdl-28236503

RESUMEN

Hundreds of vertebrate genomes have been sequenced and assembled to date. However, most sequencing projects have ignored the sex chromosomes unique to the heterogametic sex - Y and W - that are known as sex-limited chromosomes (SLCs). Indeed, haploid and repetitive Y chromosomes in species with male heterogamety (XY), and W chromosomes in species with female heterogamety (ZW), are difficult to sequence and assemble. Nevertheless, obtaining their sequences is important for understanding the intricacies of vertebrate genome function and evolution. Recent progress has been made towards the adaptation of next-generation sequencing (NGS) techniques to deciphering SLC sequences. We review here currently available methodology and results with regard to SLC sequencing and assembly. We focus on vertebrates, but bring in some examples from other taxa.


Asunto(s)
Evolución Molecular , Cromosomas Sexuales/genética , Procesos de Determinación del Sexo , Cromosoma Y/genética , Animales , Femenino , Genoma , Secuenciación de Nucleótidos de Alto Rendimiento , Masculino
18.
Bioinformatics ; 35(22): 4809-4811, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-31290946

RESUMEN

SUMMARY: Tandem DNA repeats can be sequenced with long-read technologies, but cannot be accurately deciphered due to the lack of computational tools taking high error rates of these technologies into account. Here we introduce Noise-Cancelling Repeat Finder (NCRF) to uncover putative tandem repeats of specified motifs in noisy long reads produced by Pacific Biosciences and Oxford Nanopore sequencers. Using simulations, we validated the use of NCRF to locate tandem repeats with motifs of various lengths and demonstrated its superior performance as compared to two alternative tools. Using real human whole-genome sequencing data, NCRF identified long arrays of the (AATGG)n repeat involved in heat shock stress response. AVAILABILITY AND IMPLEMENTATION: NCRF is implemented in C, supported by several python scripts, and is available in bioconda and at https://github.com/makovalab-psu/NoiseCancellingRepeatFinder. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Genoma Humano , Humanos , Nanoporos , Análisis de Secuencia de ADN , Programas Informáticos , Secuencias Repetidas en Tándem
19.
Bioinformatics ; 35(17): 3211-3213, 2019 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-30668667

RESUMEN

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

20.
BMC Genomics ; 20(1): 641, 2019 Aug 09.
Artículo en Inglés | MEDLINE | ID: mdl-31399045

RESUMEN

BACKGROUND: Although the Y chromosome plays an important role in male sex determination and fertility, it is currently understudied due to its haploid and repetitive nature. Methods to isolate Y-specific contigs from a whole-genome assembly broadly fall into two categories. The first involves retrieving Y-contigs using proportion sharing with a female, but such a strategy is prone to false positives in the absence of a high-quality, complete female reference. A second strategy uses the ratio of depth of coverage from male and female reads to select Y-contigs, but such a method requires high-depth sequencing of a female and cannot utilize existing female references. RESULTS: We develop a k-mer based method called DiscoverY, which combines proportion sharing with female with depth of coverage from male reads to classify contigs as Y-chromosomal. We evaluate the performance of DiscoverY on human and gorilla genomes, across different sequencing platforms including Illumina, 10X, and PacBio. In the cases where the male and female data are of high quality, DiscoverY has a high precision and recall and outperforms existing methods. For cases when a high quality female reference is not available, we quantify the effect of using draft reference or even just raw sequencing reads from a female. CONCLUSION: DiscoverY is an effective method to isolate Y-specific contigs from a whole-genome assembly. However, regions homologous to the X chromosome remain difficult to detect.


Asunto(s)
Cromosomas Humanos Y/genética , Análisis de Secuencia de ADN/métodos , Femenino , Haploidia , Humanos , Masculino , Análisis de Secuencia de ADN/economía , Factores de Tiempo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA