Your browser doesn't support javascript.
loading
ImaGene: a convolutional neural network to quantify natural selection from genomic data.
Torada, Luis; Lorenzon, Lucrezia; Beddis, Alice; Isildak, Ulas; Pattini, Linda; Mathieson, Sara; Fumagalli, Matteo.
Afiliação
  • Torada L; Department of Life Sciences, Silwood Park campus, Imperial College London, Buckhurst Road, Ascot, SL5 7PY, UK.
  • Lorenzon L; Department of Life Sciences, Silwood Park campus, Imperial College London, Buckhurst Road, Ascot, SL5 7PY, UK.
  • Beddis A; Department of Electronics, Information and Bioengineering, Politecnico di Milano, piazza Leonardo da Vinci 32, Milan, 20133, Italy.
  • Isildak U; Department of Life Sciences, Silwood Park campus, Imperial College London, Buckhurst Road, Ascot, SL5 7PY, UK.
  • Pattini L; Department of Biological Sciences, Middle East Technical University, METU Üniversiteler Mah. Dumlupinar Blv. No:1, Ankara, 06800 Çankaya, Turkey.
  • Mathieson S; Department of Electronics, Information and Bioengineering, Politecnico di Milano, piazza Leonardo da Vinci 32, Milan, 20133, Italy.
  • Fumagalli M; Department of Computer Science, Swarthmore College, 500 College Ave, Swarthmore, 19081, PA, USA.
BMC Bioinformatics ; 20(Suppl 9): 337, 2019 Nov 22.
Article em En | MEDLINE | ID: mdl-31757205
BACKGROUND: The genetic bases of many complex phenotypes are still largely unknown, mostly due to the polygenic nature of the traits and the small effect of each associated mutation. An alternative approach to classic association studies to determining such genetic bases is an evolutionary framework. As sites targeted by natural selection are likely to harbor important functionalities for the carrier, the identification of selection signatures in the genome has the potential to unveil the genetic mechanisms underpinning human phenotypes. Popular methods of detecting such signals rely on compressing genomic information into summary statistics, resulting in the loss of information. Furthermore, few methods are able to quantify the strength of selection. Here we explored the use of deep learning in evolutionary biology and implemented a program, called ImaGene, to apply convolutional neural networks on population genomic data for the detection and quantification of natural selection. RESULTS: ImaGene enables genomic information from multiple individuals to be represented as abstract images. Each image is created by stacking aligned genomic data and encoding distinct alleles into separate colors. To detect and quantify signatures of positive selection, ImaGene implements a convolutional neural network which is trained using simulations. We show how the method implemented in ImaGene can be affected by data manipulation and learning strategies. In particular, we show how sorting images by row and column leads to accurate predictions. We also demonstrate how the misspecification of the correct demographic model for producing training data can influence the quantification of positive selection. We finally illustrate an approach to estimate the selection coefficient, a continuous variable, using multiclass classification techniques. CONCLUSIONS: While the use of deep learning in evolutionary genomics is in its infancy, here we demonstrated its potential to detect informative patterns from large-scale genomic data. We implemented methods to process genomic data for deep learning in a user-friendly program called ImaGene. The joint inference of the evolutionary history of mutations and their functional impact will facilitate mapping studies and provide novel insights into the molecular mechanisms associated with human phenotypes.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Seleção Genética / Software / Redes Neurais de Computação / Genômica / Bases de Dados Genéticas Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2019 Tipo de documento: Article País de publicação: Reino Unido

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Seleção Genética / Software / Redes Neurais de Computação / Genômica / Bases de Dados Genéticas Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2019 Tipo de documento: Article País de publicação: Reino Unido