Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
1.
Am J Hum Genet ; 108(11): 2037-2051, 2021 11 04.
Article in English | MEDLINE | ID: mdl-34626535

ABSTRACT

Anatomically modern humans evolved around 300 thousand years ago in Africa. They started to appear in the fossil record outside of Africa as early as 100 thousand years ago, although other hominins existed throughout Eurasia much earlier. Recently, several studies argued in favor of a single out of Africa event for modern humans on the basis of whole-genome sequence analyses. However, the single out of Africa model is in contrast with some of the findings from fossil records, which support two out of Africa events, and uniparental data, which propose a back to Africa movement. Here, we used a deep-learning approach coupled with approximate Bayesian computation and sequential Monte Carlo to revisit these hypotheses from the whole-genome sequence perspective. Our results support the back to Africa model over other alternatives. We estimated that there are two sequential separations between Africa and out of African populations happening around 60-90 thousand years ago and separated by 13-15 thousand years. One of the populations resulting from the more recent split has replaced the older West African population to a large extent, while the other one has founded the out of Africa populations.


Subject(s)
Deep Learning , Evolution, Molecular , Africa , Algorithms , Bayes Theorem , Fossils , Genetic Variation , Humans , Monte Carlo Method , Whole Genome Sequencing/methods
2.
PLoS Comput Biol ; 19(10): e1011584, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37903158

ABSTRACT

Applications of generative models for genomic data have gained significant momentum in the past few years, with scopes ranging from data characterization to generation of genomic segments and functional sequences. In our previous study, we demonstrated that generative adversarial networks (GANs) and restricted Boltzmann machines (RBMs) can be used to create novel high-quality artificial genomes (AGs) which can preserve the complex characteristics of real genomes such as population structure, linkage disequilibrium and selection signals. However, a major drawback of these models is scalability, since the large feature space of genome-wide data increases computational complexity vastly. To address this issue, we implemented a novel convolutional Wasserstein GAN (WGAN) model along with a novel conditional RBM (CRBM) framework for generating AGs with high SNP number. These networks implicitly learn the varying landscape of haplotypic structure in order to capture complex correlation patterns along the genome and generate a wide diversity of plausible haplotypes. We performed comparative analyses to assess both the quality of these generated haplotypes and the amount of possible privacy leakage from the training data. As the importance of genetic privacy becomes more prevalent, the need for effective privacy protection measures for genomic data increases. We used generative neural networks to create large artificial genome segments which possess many characteristics of real genomes without substantial privacy leakage from the training dataset. In the near future, with further improvements in haplotype quality and privacy preservation, large-scale artificial genome databases can be assembled to provide easily accessible surrogates of real databases, allowing researchers to conduct studies with diverse genomic data within a safe ethical framework in terms of donor privacy.


Subject(s)
Genomics , Learning , Databases, Factual , Haplotypes , Neural Networks, Computer
3.
PLoS Genet ; 17(2): e1009303, 2021 02.
Article in English | MEDLINE | ID: mdl-33539374

ABSTRACT

Generative models have shown breakthroughs in a wide spectrum of domains due to recent advancements in machine learning algorithms and increased computational power. Despite these impressive achievements, the ability of generative models to create realistic synthetic data is still under-exploited in genetics and absent from population genetics. Yet a known limitation in the field is the reduced access to many genetic databases due to concerns about violations of individual privacy, although they would provide a rich resource for data mining and integration towards advancing genetic studies. In this study, we demonstrated that deep generative adversarial networks (GANs) and restricted Boltzmann machines (RBMs) can be trained to learn the complex distributions of real genomic datasets and generate novel high-quality artificial genomes (AGs) with none to little privacy loss. We show that our generated AGs replicate characteristics of the source dataset such as allele frequencies, linkage disequilibrium, pairwise haplotype distances and population structure. Moreover, they can also inherit complex features such as signals of selection. To illustrate the promising outcomes of our method, we showed that imputation quality for low frequency alleles can be improved by data augmentation to reference panels with AGs and that the RBM latent space provides a relevant encoding of the data, hence allowing further exploration of the reference dataset and features for solving supervised tasks. Generative models and AGs have the potential to become valuable assets in genetic studies by providing a rich yet compact representation of existing genomes and high-quality, easy-access and anonymous alternatives for private databases.


Subject(s)
Computer Simulation , Genome, Human , Machine Learning , Population/genetics , Algorithms , Alleles , Chromosomes, Human, Pair 15/genetics , Databases, Factual , Databases, Genetic , Deep Learning , HapMap Project , Humans , Markov Chains , Neural Networks, Computer , Polymorphism, Single Nucleotide
4.
Mol Biol Evol ; 36(8): 1628-1642, 2019 08 01.
Article in English | MEDLINE | ID: mdl-30952160

ABSTRACT

Genetic variation in contemporary South Asian populations follows a northwest to southeast decreasing cline of shared West Eurasian ancestry. A growing body of ancient DNA evidence is being used to build increasingly more realistic models of demographic changes in the last few thousand years. Through high-quality modern genomes, these models can be tested for gene and genome level deviations. Using local ancestry deconvolution and masking, we reconstructed population-specific surrogates of the two main ancestral components for more than 500 samples from 25 South Asian populations and showed our approach to be robust via coalescent simulations. Our f3 and f4 statistics-based estimates reveal that the reconstructed haplotypes are good proxies for the source populations that admixed in the area and point to complex interpopulation relationships within the West Eurasian component, compatible with multiple waves of arrival, as opposed to a simpler one wave scenario. Our approach also provides reliable local haplotypes for future downstream analyses. As one such example, the local ancestry deconvolution in South Asians reveals opposite selective pressures on two pigmentation genes (SLC45A2 and SLC24A5) that are common or fixed in West Eurasians, suggesting post-admixture purifying and positive selection signals, respectively.


Subject(s)
Genome, Human , Genomics/methods , Adaptation, Biological , Demography , Haplotypes , Humans , India , Pakistan , Phylogeography , Polymorphism, Single Nucleotide , Principal Component Analysis , Selection, Genetic
5.
Annu Rev Biomed Data Sci ; 6: 173-189, 2023 08 10.
Article in English | MEDLINE | ID: mdl-37137168

ABSTRACT

Following the widespread use of deep learning for genomics, deep generative modeling is also becoming a viable methodology for the broad field. Deep generative models (DGMs) can learn the complex structure of genomic data and allow researchers to generate novel genomic instances that retain the real characteristics of the original dataset. Aside from data generation, DGMs can also be used for dimensionality reduction by mapping the data space to a latent space, as well as for prediction tasks via exploitation of this learned mapping or supervised/semi-supervised DGM designs. In this review, we briefly introduce generative modeling and two currently prevailing architectures, we present conceptual applications along with notable examples in functional and evolutionary genomics, and we provide our perspective on potential challenges and future directions.


Subject(s)
Genomics , Biological Evolution , Genomics/methods , Deep Learning
6.
Genome Biol Evol ; 13(4)2021 04 05.
Article in English | MEDLINE | ID: mdl-33638983

ABSTRACT

Detecting natural selection signals in admixed populations can be problematic since the source of the signal typically dates back prior to the admixture event. On one hand, it is now possible to study various source populations before a particular admixture thanks to the developments in ancient DNA (aDNA) in the last decade. However, aDNA availability is limited to certain geographical regions and the sample sizes and quality of the data might not be sufficient for selection analysis in many cases. In this study, we explore possible ways to improve detection of pre-admixture signals in admixed populations using a local ancestry inference approach. We used masked haplotypes for population branch statistic (PBS) and full haplotypes constructed following our approach from Yelmen et al. (2019) for cross-population extended haplotype homozygosity (XP-EHH), utilizing forward simulations to test the power of our analysis. The PBS results on simulated data showed that using masked haplotypes obtained from ancestry deconvolution instead of the admixed population might improve detection quality. On the other hand, XP-EHH results using the admixed population were better compared with the local ancestry method. We additionally report correlation for XP-EHH scores between source and admixed populations, suggesting that haplotype-based approaches must be used cautiously for recently admixed populations. Additionally, we performed PBS on real South Asian populations masked with local ancestry deconvolution and report here the first possible selection signals on the autochthonous South Asian component of contemporary South Asian populations.


Subject(s)
Selection, Genetic , Asian People/genetics , Computer Simulation , Haplotypes , Humans , Polymorphism, Single Nucleotide
7.
Genome Biol Evol ; 13(4)2021 04 05.
Article in English | MEDLINE | ID: mdl-33585906

ABSTRACT

Contemporary individuals are the combination of genetic fragments inherited from ancestors belonging to multiple populations, as the result of migration and admixture. Isolating and characterizing these layers are crucial to the understanding of the genetic history of a given population. Ancestry deconvolution approaches make use of a large amount of source individuals, therefore constraining the performance of Local Ancestry Inferences when only few genomes are available from a given population. Here we present WINC, a local ancestry framework derived from the combination of ChromoPainter and NNLS approaches, as a method to retrieve local genetic assignments when only a few reference individuals are available. The framework is aided by a score assignment based on source differentiation to maximize the amount of sequences retrieved and is capable of retrieving accurate ancestry assignments when only two individuals for source populations are used.


Subject(s)
Chromosome Painting/methods , Genomics , Humans , Inheritance Patterns , Least-Squares Analysis , Software
8.
Sci Rep ; 9(1): 18811, 2019 12 11.
Article in English | MEDLINE | ID: mdl-31827175

ABSTRACT

The presence of genomic signatures of Eurasian origin in contemporary Ethiopians has been reported by several authors and estimated to have arrived in the area from 3000 years ago. Several studies reported plausible source populations for such a signature, using haplotype based methods on modern data or single-site methods on modern or ancient data. These studies did not reach a consensus and suggested an Anatolian or Sardinia-like proxy, broadly Levantine or Neolithic Levantine as possible sources. We demonstrate, however, that the deeply divergent, autochthonous African component which accounts for ~50% of most contemporary Ethiopian genomes, affects the overall allele frequency spectrum to an extent that makes it hard to control for it and, at once, to discern between subtly different, yet important, Eurasian sources (such as Anatolian or Levant Neolithic ones). Here we re-assess pattern of allele sharing between the Eurasian component of Ethiopians (here called "NAF" for Non African) and ancient and modern proxies. Our results unveil a genomic legacy that may connect the Eurasian genetic component of contemporary Ethiopians with Sea People and with population movements that affected the Mediterranean area and the Levant after the fall of the Minoan civilization.


Subject(s)
Black People/genetics , Genetic Variation , Genome, Human , Human Migration , Asian People/genetics , Ethiopia , Gene Frequency , Genetics, Population , Genomics , Humans , Phylogeny
SELECTION OF CITATIONS
SEARCH DETAIL