Search | VHL Regional Portal

New deep learning-based methods for visualizing ecosystem properties using environmental DNA metabarcoding data.

Lamperti, Letizia; Sanchez, Théophile; Si Moussi, Sara; Mouillot, David; Albouy, Camille; Flück, Benjamin; Bruno, Morgane; Valentini, Alice; Pellissier, Loïc; Manel, Stéphanie.

Mol Ecol Resour ; 23(8): 1946-1958, 2023 Nov.

Article in English | MEDLINE | ID: mdl-37702270

ABSTRACT

Environmental DNA (eDNA) metabarcoding provides an efficient approach for documenting biodiversity patterns in marine and terrestrial ecosystems. The complexity of these data prevents current methods from extracting and analyzing all the relevant ecological information they contain, and new methods may provide better dimensionality reduction and clustering. Here we present two new deep learning-based methods that combine different types of neural networks (NNs) to ordinate eDNA samples and visualize ecosystem properties in a two-dimensional space: the first is based on variational autoencoders and the second on deep metric learning. The strength of our new methods lies in the combination of two inputs: the number of sequences found for each molecular operational taxonomic unit (MOTU) detected and their corresponding nucleotide sequence. Using three different datasets, we show that our methods accurately represent several biodiversity indicators in a two-dimensional latent space: MOTU richness per sample, sequence α-diversity per sample, Jaccard's and sequence ß-diversity between samples. We show that our nonlinear methods are better at extracting features from eDNA datasets while avoiding the major biases associated with eDNA. Our methods outperform traditional dimension reduction methods such as Principal Component Analysis, t-distributed Stochastic Neighbour Embedding, Nonmetric Multidimensional Scaling and Uniform Manifold Approximation and Projection for dimension reduction. Our results suggest that NNs provide a more efficient way of extracting structure from eDNA metabarcoding data, thereby improving their ecological interpretation and thus biodiversity monitoring.

Subject(s)

DNA, Environmental , Deep Learning , Ecosystem , DNA Barcoding, Taxonomic/methods , Environmental Monitoring/methods , Biodiversity

dnadna: a deep learning framework for population genetics inference.

Sanchez, Théophile; Bray, Erik Madison; Jobic, Pierre; Guez, Jérémy; Letournel, Anne-Catherine; Charpiat, Guillaume; Cury, Jean; Jay, Flora.

Bioinformatics ; 39(1)2023 01 01.

Article in English | MEDLINE | ID: mdl-36445000

ABSTRACT

MOTIVATION: We present dnadna, a flexible python-based software for deep learning inference in population genetics. It is task-agnostic and aims at facilitating the development, reproducibility, dissemination and re-usability of neural networks designed for population genetic data. RESULTS: dnadna defines multiple user-friendly workflows. First, users can implement new architectures and tasks, while benefiting from dnadna utility functions, training procedure and test environment, which saves time and decreases the likelihood of bugs. Second, the implemented networks can be re-optimized based on user-specified training sets and/or tasks. Newly implemented architectures and pre-trained networks are easily shareable with the community for further benchmarking or other applications. Finally, users can apply pre-trained networks in order to predict evolutionary history from alternative real or simulated genetic datasets, without requiring extensive knowledge in deep learning or coding in general. dnadna comes with a peer-reviewed, exchangeable neural network, allowing demographic inference from SNP data, that can be used directly or retrained to solve other tasks. Toy networks are also available to ease the exploration of the software, and we expect that the range of available architectures will keep expanding thanks to community contributions. AVAILABILITY AND IMPLEMENTATION: dnadna is a Python (≥3.7) package, its repository is available at gitlab.com/mlgenetics/dnadna and its associated documentation at mlgenetics.gitlab.io/dnadna/.

Subject(s)

Deep Learning , Reproducibility of Results , Neural Networks, Computer , Software , Genetics, Population

Deep learning for population size history inference: Design, comparison and combination with approximate Bayesian computation.

Sanchez, Théophile; Cury, Jean; Charpiat, Guillaume; Jay, Flora.

Mol Ecol Resour ; 21(8): 2645-2660, 2021 Nov.

Article in English | MEDLINE | ID: mdl-32644216

ABSTRACT

For the past decades, simulation-based likelihood-free inference methods have enabled researchers to address numerous population genetics problems. As the richness and amount of simulated and real genetic data keep increasing, the field has a strong opportunity to tackle tasks that current methods hardly solve. However, high data dimensionality forces most methods to summarize large genomic data sets into a relatively small number of handcrafted features (summary statistics). Here, we propose an alternative to summary statistics, based on the automatic extraction of relevant information using deep learning techniques. Specifically, we design artificial neural networks (ANNs) that take as input single nucleotide polymorphic sites (SNPs) found in individuals sampled from a single population and infer the past effective population size history. First, we provide guidelines to construct artificial neural networks that comply with the intrinsic properties of SNP data such as invariance to permutation of haplotypes, long scale interactions between SNPs and variable genomic length. Thanks to a Bayesian hyperparameter optimization procedure, we evaluate the performance of multiple networks and compare them to well-established methods like Approximate Bayesian Computation (ABC). Even without the expert knowledge of summary statistics, our approach compares fairly well to an ABC approach based on handcrafted features. Furthermore, we show that combining deep learning and ABC can improve performance while taking advantage of both frameworks. Finally, we apply our approach to reconstruct the effective population size history of cattle breed populations.

Subject(s)

Deep Learning , Models, Genetic , Animals , Bayes Theorem , Cattle , Computer Simulation , Genetics, Population , Population Density

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL