Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters











Database
Language
Publication year range
1.
Mol Biol Evol ; 37(12): 3576-3600, 2020 12 16.
Article in English | MEDLINE | ID: mdl-32722770

ABSTRACT

Long INterspersed Elements-1 (L1s) constitute >17% of the human genome and still actively transpose in it. Characterizing L1 transposition across the genome is critical for understanding genome evolution and somatic mutations. However, to date, L1 insertion and fixation patterns have not been studied comprehensively. To fill this gap, we investigated three genome-wide data sets of L1s that integrated at different evolutionary times: 17,037 de novo L1s (from an L1 insertion cell-line experiment conducted in-house), and 1,212 polymorphic and 1,205 human-specific L1s (from public databases). We characterized 49 genomic features-proxying chromatin accessibility, transcriptional activity, replication, recombination, etc.-in the ±50 kb flanks of these elements. These features were contrasted between the three L1 data sets and L1-free regions using state-of-the-art Functional Data Analysis statistical methods, which treat high-resolution data as mathematical functions. Our results indicate that de novo, polymorphic, and human-specific L1s are surrounded by different genomic features acting at specific locations and scales. This led to an integrative model of L1 transposition, according to which L1s preferentially integrate into open-chromatin regions enriched in non-B DNA motifs, whereas they are fixed in regions largely free of purifying selection-depleted of genes and noncoding most conserved elements. Intriguingly, our results suggest that L1 insertions modify local genomic landscape by extending CpG methylation and increasing mononucleotide microsatellite density. Altogether, our findings substantially facilitate understanding of L1 integration and fixation preferences, pave the way for uncovering their role in aging and cancer, and inform their use as mutagenesis tools in genetic studies.


Subject(s)
Biological Evolution , DNA Transposable Elements , Genome, Human , Long Interspersed Nucleotide Elements , Models, Genetic , Humans , Mutagenesis, Insertional
2.
Mol Biol Evol ; 36(11): 2415-2431, 2019 Nov 01.
Article in English | MEDLINE | ID: mdl-31273383

ABSTRACT

Satellite repeats are a structural component of centromeres and telomeres, and in some instances, their divergence is known to drive speciation. Due to their highly repetitive nature, satellite sequences have been understudied and underrepresented in genome assemblies. To investigate their turnover in great apes, we studied satellite repeats of unit sizes up to 50 bp in human, chimpanzee, bonobo, gorilla, and Sumatran and Bornean orangutans, using unassembled short and long sequencing reads. The density of satellite repeats, as identified from accurate short reads (Illumina), varied greatly among great ape genomes. These were dominated by a handful of abundant repeated motifs, frequently shared among species, which formed two groups: 1) the (AATGG)n repeat (critical for heat shock response) and its derivatives; and 2) subtelomeric 32-mers involved in telomeric metabolism. Using the densities of abundant repeats, individuals could be classified into species. However, clustering did not reproduce the accepted species phylogeny, suggesting rapid repeat evolution. Several abundant repeats were enriched in males versus females; using Y chromosome assemblies or Fluorescent In Situ Hybridization, we validated their location on the Y. Finally, applying a novel computational tool, we identified many satellite repeats completely embedded within long Oxford Nanopore and Pacific Biosciences reads. Such repeats were up to 59 kb in length and consisted of perfect repeats interspersed with other similar sequences. Our results based on sequencing reads generated with three different technologies provide the first detailed characterization of great ape satellite repeats, and open new avenues for exploring their functions.

3.
Nat Ecol Evol ; 3(2): 213-222, 2019 02.
Article in English | MEDLINE | ID: mdl-30643241

ABSTRACT

To function properly, mitochondria utilize products of 37 mitochondrial and >1,000 nuclear genes, which should be compatible with each other. Discordance between mitochondrial and nuclear genetic ancestry could contribute to phenotypic variation in admixed populations. Here, we explored potential mitonuclear incompatibility in six admixed human populations from the Americas: African Americans, African Caribbeans, Colombians, Mexicans, Peruvians and Puerto Ricans. By comparing nuclear versus mitochondrial ancestry in these populations, we first show that mitochondrial DNA (mtDNA) copy number decreases with increasing discordance between nuclear and mtDNA ancestry. The direction of this effect is consistent across mtDNA haplogroups of different geographic origins. This observation indicates suboptimal regulation of mtDNA replication when its components are encoded by nuclear and mtDNA genes with different ancestry. Second, while most populations analysed exhibit no such trend, in African Americans and Puerto Ricans, we find a significant enrichment of ancestry at nuclear-encoded mitochondrial genes towards the source populations contributing the most prevalent mtDNA haplogroups (African and Native American, respectively). This possibly reflects compensatory effects of selection in recovering mitonuclear interactions optimized in the source populations. Our results provide evidence of mitonuclear interactions in human admixed populations and we discuss their implications for human health and disease.


Subject(s)
Cell Nucleus/genetics , DNA, Mitochondrial/genetics , Genetic Variation , Caribbean Region , Colombia , DNA Copy Number Variations , Humans , Mexico , Peru , Puerto Rico , United States
4.
Mol Biol Evol ; 31(7): 1816-32, 2014 Jul.
Article in English | MEDLINE | ID: mdl-24809961

ABSTRACT

The integration and fixation preferences of DNA transposons, one of the major classes of eukaryotic transposable elements, have never been evaluated comprehensively on a genome-wide scale. Here, we present a detailed study of the distribution of DNA transposons in the human and bat genomes. We studied three groups of DNA transposons that integrated at different evolutionary times: 1) ancient (>40 My) and currently inactive human elements, 2) younger (<40 My) bat elements, and 3) ex vivo integrations of piggyBat and Sleeping Beauty elements in HeLa cells. Although the distribution of ex vivo elements reflected integration preferences, the distribution of human and (to a lesser extent) bat elements was also affected by selection. We used regression techniques (linear, negative binomial, and logistic regression models with multiple predictors) applied to 20-kb and 1-Mb windows to investigate how the genomic landscape in the vicinity of DNA transposons contributes to their integration and fixation. Our models indicate that genomic landscape explains 16-79% of variability in DNA transposon genome-wide distribution. Importantly, we not only confirmed previously identified predictors (e.g., DNA conformation and recombination hotspots) but also identified several novel predictors (e.g., signatures of double-strand breaks and telomere hexamer). Ex vivo integrations showed a bias toward actively transcribed regions. Older DNA transposons were located in genomic regions scarce in most conserved elements-likely reflecting purifying selection. Our study highlights how DNA transposons are integral to the evolution of bat and human genomes, and has implications for the development of DNA transposon assays for gene therapy and mutagenesis applications.


Subject(s)
Chiroptera/genetics , DNA Transposable Elements , Evolution, Molecular , Animals , Genetic Variation , Genome , HeLa Cells , Humans , Models, Genetic , Mutagenesis, Insertional , Regression Analysis
5.
Biotechniques ; 56(3): 134-141, 2014.
Article in English | MEDLINE | ID: mdl-24641477

ABSTRACT

Polymorphism discovery is a routine application of next-generation sequencing technology where multiple samples are sent to a service provider for library preparation, subsequent sequencing, and bioinformatic analyses. The decreasing cost and advances in multiplexing approaches have made it possible to analyze hundreds of samples at a reasonable cost. However, because of the manual steps involved in the initial processing of samples and handling of sequencing equipment, cross-contamination remains a significant challenge. It is especially problematic in cases where polymorphism frequencies do not adhere to diploid expectation, for example, heterogeneous tumor samples, organellar genomes, as well as during bacterial and viral sequencing. In these instances, low levels of contamination may be readily mistaken for polymorphisms, leading to false results. Here we describe practical steps designed to reliably detect contamination and uncover its origin, and also provide new, Galaxy-based, readily accessible computational tools and workflows for quality control. All results described in this report can be reproduced interactively on the web as described at http://usegalaxy.org/contamination.


Subject(s)
DNA Contamination , Sequence Analysis, DNA/methods , Sequence Analysis/methods , DNA, Mitochondrial/chemistry , DNA, Mitochondrial/genetics , Internet , Polymorphism, Genetic , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL