Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Bioinform Adv ; 4(1): vbae098, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39006965

RESUMEN

Summary: We developed loco-pipe, a Snakemake pipeline that seamlessly streamlines a set of essential population genomic analyses for low-coverage whole genome sequencing (lcWGS) data. loco-pipe is highly automated, easily customizable, massively parallelized, and thus is a valuable tool for both new and experienced users of lcWGS. Availability and implementation: loco-pipe is published under the GPLv3. It is freely available on GitHub (github.com/sudmantlab/loco-pipe) and archived on Zenodo (doi.org/10.5281/zenodo.10425920).

2.
bioRxiv ; 2024 Jun 13.
Artículo en Inglés | MEDLINE | ID: mdl-38370750

RESUMEN

The adoption of agriculture, first documented ~12,000 years ago in the Fertile Crescent, triggered a rapid shift toward starch-rich diets in human populations. Amylase genes facilitate starch digestion and increased salivary amylase copy number has been observed in some modern human populations with high starch intake, though evidence of recent selection is lacking. Here, using 52 long-read diploid assemblies and short read data from ~5,600 contemporary and ancient humans, we resolve the diversity, evolutionary history, and selective impact of structural variation at the amylase locus. We find that amylase genes have higher copy numbers in populations with agricultural subsistence compared to fishing, hunting, and pastoral groups. We identify 28 distinct amylase structural architectures and demonstrate that nearly identical structures have arisen recurrently on different haplotype backgrounds throughout recent human history. AMY1 and AMY2A genes each exhibit multiple duplications/deletions with mutation rates >10,000-fold the SNP mutation rate, whereas AMY2B gene duplications share a single origin. Using a pangenome graph-based approach to infer structural haplotypes across thousands of humans, we identify extensively duplicated haplotypes present at higher frequencies in modern day populations with traditionally agricultural diets. Leveraging 533 ancient human genomes we find that duplication-containing haplotypes (i.e. haplotypes with more amylase gene copies than the ancestral haplotype) have increased in frequency more than seven-fold over the last 12,000 years providing evidence for recent selection in West Eurasians. Together, our study highlights the potential impacts of the agricultural revolution on human genomes and the importance of long-read sequencing in identifying signatures of selection at structurally complex loci.

3.
Mol Ecol Resour ; 22(5): 1678-1692, 2022 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-34825778

RESUMEN

Over the past few decades, there has been an explosion in the amount of publicly available sequencing data. This opens new opportunities for combining data sets to achieve unprecedented sample sizes, spatial coverage or temporal replication in population genomic studies. However, a common concern is that nonbiological differences between data sets may generate patterns of variation in the data that can confound real biological patterns, a problem known as batch effects. In this paper, we compare two batches of low-coverage whole genome sequencing (lcWGS) data generated from the same populations of Atlantic cod (Gadus morhua). First, we show that with a "batch-effect-naive" bioinformatic pipeline, batch effects systematically biased our genetic diversity estimates, population structure inference and selection scans. We then demonstrate that these batch effects resulted from multiple technical differences between our data sets, including the sequencing chemistry (four-channel vs. two-channel), sequencing run, read type (single-end vs. paired-end), read length (125 vs. 150 bp), DNA degradation level (degraded vs. well preserved) and sequencing depth (0.8× vs. 0.3× on average). Lastly, we illustrate that a set of simple bioinformatic strategies (such as different read trimming and single nucleotide polymorphism filtering) can be used to detect batch effects in our data and substantially mitigate their impact. We conclude that combining data sets remains a powerful approach as long as batch effects are explicitly accounted for. We focus on lcWGS data in this paper, which may be particularly vulnerable to certain causes of batch effects, but many of our conclusions also apply to other sequencing strategies.


Asunto(s)
Gadus morhua , Metagenómica , Animales , Gadus morhua/genética , Genoma , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN , Secuenciación Completa del Genoma/métodos
4.
Mol Ecol ; 30(23): 5966-5993, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34250668

RESUMEN

Low-coverage whole genome sequencing (lcWGS) has emerged as a powerful and cost-effective approach for population genomic studies in both model and nonmodel species. However, with read depths too low to confidently call individual genotypes, lcWGS requires specialized analysis tools that explicitly account for genotype uncertainty. A growing number of such tools have become available, but it can be difficult to get an overview of what types of analyses can be performed reliably with lcWGS data, and how the distribution of sequencing effort between the number of samples analysed and per-sample sequencing depths affects inference accuracy. In this introductory guide to lcWGS, we first illustrate how the per-sample cost for lcWGS is now comparable to RAD-seq and Pool-seq in many systems. We then provide an overview of software packages that explicitly account for genotype uncertainty in different types of population genomic inference. Next, we use both simulated and empirical data to assess the accuracy of allele frequency, genetic diversity, and linkage disequilibrium estimation, detection of population structure, and selection scans under different sequencing strategies. Our results show that spreading a given amount of sequencing effort across more samples with lower depth per sample consistently improves the accuracy of most types of inference, with a few notable exceptions. Finally, we assess the potential for using imputation to bolster inference from lcWGS data in nonmodel species, and discuss current limitations and future perspectives for lcWGS-based population genomics research. With this overview, we hope to make lcWGS more approachable and stimulate its broader adoption.


Asunto(s)
Metagenómica , Polimorfismo de Nucleótido Simple , Frecuencia de los Genes , Genómica , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Polimorfismo de Nucleótido Simple/genética , Secuenciación Completa del Genoma
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA