Pesquisa | BVS IEC

An atlas of genetic scores to predict multi-omic traits.

Xu, Yu; Ritchie, Scott C; Liang, Yujian; Timmers, Paul R H J; Pietzner, Maik; Lannelongue, Loïc; Lambert, Samuel A; Tahir, Usman A; May-Wilson, Sebastian; Foguet, Carles; Johansson, Åsa; Surendran, Praveen; Nath, Artika P; Persyn, Elodie; Peters, James E; Oliver-Williams, Clare; Deng, Shuliang; Prins, Bram; Luan, Jian'an; Bomba, Lorenzo; Soranzo, Nicole; Di Angelantonio, Emanuele; Pirastu, Nicola; Tai, E Shyong; van Dam, Rob M; Parkinson, Helen; Davenport, Emma E; Paul, Dirk S; Yau, Christopher; Gerszten, Robert E; Mälarstig, Anders; Danesh, John; Sim, Xueling; Langenberg, Claudia; Wilson, James F; Butterworth, Adam S; Inouye, Michael.

Nature ; 616(7955): 123-131, 2023 04.

Artigo em Inglês | MEDLINE | ID: mdl-36991119

RESUMO

The use of omic modalities to dissect the molecular underpinnings of common diseases and traits is becoming increasingly common. But multi-omic traits can be genetically predicted, which enables highly cost-effective and powerful analyses for studies that do not have multi-omics1. Here we examine a large cohort (the INTERVAL study2; n = 50,000 participants) with extensive multi-omic data for plasma proteomics (SomaScan, n = 3,175; Olink, n = 4,822), plasma metabolomics (Metabolon HD4, n = 8,153), serum metabolomics (Nightingale, n = 37,359) and whole-blood Illumina RNA sequencing (n = 4,136), and use machine learning to train genetic scores for 17,227 molecular traits, including 10,521 that reach Bonferroni-adjusted significance. We evaluate the performance of genetic scores through external validation across cohorts of individuals of European, Asian and African American ancestries. In addition, we show the utility of these multi-omic genetic scores by quantifying the genetic control of biological pathways and by generating a synthetic multi-omic dataset of the UK Biobank3 to identify disease associations using a phenome-wide scan. We highlight a series of biological insights with regard to genetic mechanisms in metabolism and canonical pathway associations with disease; for example, JAK-STAT signalling and coronary atherosclerosis. Finally, we develop a portal ( https://www.omicspred.org/ ) to facilitate public access to all genetic scores and validation results, as well as to serve as a platform for future extensions and enhancements of multi-omic genetic scores.

Assuntos

Doença da Artéria Coronariana , Multiômica , Humanos , Doença da Artéria Coronariana/genética , Doença da Artéria Coronariana/metabolismo , Metabolômica/métodos , Fenótipo , Proteômica/métodos , Aprendizado de Máquina , Negro ou Afro-Americano/genética , Asiático/genética , População Europeia/genética , Reino Unido , Conjuntos de Dados como Assunto , Internet , Reprodutibilidade dos Testes , Estudos de Coortes , Proteoma/análise , Proteoma/metabolismo , Metaboloma , Plasma/metabolismo , Bases de Dados Factuais

Pitfalls of machine learning models for protein-protein interaction networks.

Lannelongue, Loïc; Inouye, Michael.

Bioinformatics ; 40(2)2024 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-38200587

RESUMO

MOTIVATION: Protein-protein interactions (PPIs) are essential to understanding biological pathways as well as their roles in development and disease. Computational tools, based on classic machine learning, have been successful at predicting PPIs in silico, but the lack of consistent and reliable frameworks for this task has led to network models that are difficult to compare and discrepancies between algorithms that remain unexplained. RESULTS: To better understand the underlying inference mechanisms that underpin these models, we designed an open-source framework for benchmarking that accounts for a range of biological and statistical pitfalls while facilitating reproducibility. We use it to shed light on the impact of network topology and how different algorithms deal with highly connected proteins. By studying functional genomics-based and sequence-based models on human PPIs, we show their complementarity as the former performs best on lone proteins while the latter specializes in interactions involving hubs. We also show that algorithm design has little impact on performance with functional genomic data. We replicate our results between both human and S. cerevisiae data and demonstrate that models using functional genomics are better suited to PPI prediction across species. With rapidly increasing amounts of sequence and functional genomics data, our study provides a principled foundation for future construction, comparison, and application of PPI networks. AVAILABILITY AND IMPLEMENTATION: The code and data are available on GitHub: https://github.com/Llannelongue/B4PPI.

Assuntos

Mapas de Interação de Proteínas , Saccharomyces cerevisiae , Humanos , Mapas de Interação de Proteínas/genética , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Reprodutibilidade dos Testes , Proteínas/metabolismo , Algoritmos , Aprendizado de Máquina , Mapeamento de Interação de Proteínas/métodos

The Carbon Footprint of Bioinformatics.

Grealey, Jason; Lannelongue, Loïc; Saw, Woei-Yuh; Marten, Jonathan; Méric, Guillaume; Ruiz-Carmona, Sergio; Inouye, Michael.

Mol Biol Evol ; 39(3)2022 03 02.

Artigo em Inglês | MEDLINE | ID: mdl-35143670

RESUMO

Bioinformatic research relies on large-scale computational infrastructures which have a nonzero carbon footprint but so far, no study has quantified the environmental costs of bioinformatic tools and commonly run analyses. In this work, we estimate the carbon footprint of bioinformatics (in kilograms of CO2 equivalent units, kgCO2e) using the freely available Green Algorithms calculator (www.green-algorithms.org, last accessed 2022). We assessed 1) bioinformatic approaches in genome-wide association studies (GWAS), RNA sequencing, genome assembly, metagenomics, phylogenetics, and molecular simulations, as well as 2) computation strategies, such as parallelization, CPU (central processing unit) versus GPU (graphics processing unit), cloud versus local computing infrastructure, and geography. In particular, we found that biobank-scale GWAS emitted substantial kgCO2e and simple software upgrades could make it greener, for example, upgrading from BOLT-LMM v1 to v2.3 reduced carbon footprint by 73%. Moreover, switching from the average data center to a more efficient one can reduce carbon footprint by approximately 34%. Memory over-allocation can also be a substantial contributor to an algorithm's greenhouse gas emissions. The use of faster processors or greater parallelization reduces running time but can lead to greater carbon footprint. Finally, we provide guidance on how researchers can reduce power consumption and minimize kgCO2e. Overall, this work elucidates the carbon footprint of common analyses in bioinformatics and provides solutions which empower a move toward greener research.

Assuntos

Pegada de Carbono , Biologia Computacional , Algoritmos , Estudo de Associação Genômica Ampla , Software

Ten simple rules to make your computing more environmentally sustainable.

Lannelongue, Loïc; Grealey, Jason; Bateman, Alex; Inouye, Michael.

PLoS Comput Biol ; 17(9): e1009324, 2021 09.

Artigo em Inglês | MEDLINE | ID: mdl-34543272

Assuntos

Computadores , Conservação dos Recursos Naturais , Guias como Assunto , Tecnologia da Informação , Dióxido de Carbono/análise , Mudança Climática , Resíduo Eletrônico , Humanos , Reciclagem

Environmental Impacts of Machine Learning Applications in Protein Science.

Lannelongue, Loïc; Inouye, Michael.

Cold Spring Harb Perspect Biol ; 15(12)2023 Dec 01.

Artigo em Inglês | MEDLINE | ID: mdl-38040454

RESUMO

Computing tools and machine learning models play an increasingly important role in biology and are now an essential part of discoveries in protein science. The growing energy needs of modern algorithms have raised concerns in the computational science community in light of the climate emergency. In this work, we summarize the different ways in which protein science can negatively impact the environment and we present the carbon footprint of some popular protein algorithms: molecular simulations, inference of protein-protein interactions, and protein structure prediction. We show that large deep learning models such as AlphaFold and ESMFold can have carbon footprints reaching over 100 tonnes of CO2e in some cases. The magnitude of these impacts highlights the importance of monitoring and mitigating them, and we list actions scientists can take to achieve more sustainable protein computational science.

Assuntos

Pegada de Carbono , Aprendizado de Máquina , Algoritmos , Proteínas

How to estimate carbon footprint when training deep learning models? A guide and review.

Bouza, Lucía; Bugeau, Aurélie; Lannelongue, Loïc.

Environ Res Commun ; 5(11): 115014, 2023 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-38022395

RESUMO

Machine learning and deep learning models have become essential in the recent fast development of artificial intelligence in many sectors of the society. It is now widely acknowledge that the development of these models has an environmental cost that has been analyzed in many studies. Several online and software tools have been developed to track energy consumption while training machine learning models. In this paper, we propose a comprehensive introduction and comparison of these tools for AI practitioners wishing to start estimating the environmental impact of their work. We review the specific vocabulary, the technical requirements for each tool. We compare the energy consumption estimated by each tool on two deep neural networks for image processing and on different types of servers. From these experiments, we provide some advice for better choosing the right tool and infrastructure.

GREENER principles for environmentally sustainable computational science.

Lannelongue, Loïc; Aronson, Hans-Erik G; Bateman, Alex; Birney, Ewan; Caplan, Talia; Juckes, Martin; McEntyre, Johanna; Morris, Andrew D; Reilly, Gerry; Inouye, Michael.

Nat Comput Sci ; 3(6): 514-521, 2023 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38177425

RESUMO

The carbon footprint of scientific computing is substantial, but environmentally sustainable computational science (ESCS) is a nascent field with many opportunities to thrive. To realize the immense green opportunities and continued, yet sustainable, growth of computer science, we must take a coordinated approach to our current challenges, including greater awareness and transparency, improved estimation and wider reporting of environmental impacts. Here, we present a snapshot of where ESCS stands today and introduce the GREENER set of principles, as well as guidance for best practices moving forward.

Green Algorithms: Quantifying the Carbon Footprint of Computation.

Lannelongue, Loïc; Grealey, Jason; Inouye, Michael.

Adv Sci (Weinh) ; 8(12): 2100707, 2021 06.

Artigo em Inglês | MEDLINE | ID: mdl-34194954

RESUMO

Climate change is profoundly affecting nearly all aspects of life on earth, including human societies, economies, and health. Various human activities are responsible for significant greenhouse gas (GHG) emissions, including data centers and other sources of large-scale computation. Although many important scientific milestones are achieved thanks to the development of high-performance computing, the resultant environmental impact is underappreciated. In this work, a methodological framework to estimate the carbon footprint of any computational task in a standardized and reliable way is presented and metrics to contextualize GHG emissions are defined. A freely available online tool, Green Algorithms (www.green-algorithms.org) is developed, which enables a user to estimate and report the carbon footprint of their computation. The tool easily integrates with computational processes as it requires minimal information and does not interfere with existing code, while also accounting for a broad range of hardware configurations. Finally, the GHG emissions of algorithms used for particle physics simulations, weather forecasts, and natural language processing are quantified. Taken together, this study develops a simple generalizable framework and freely available tool to quantify the carbon footprint of nearly any computation. Combined with recommendations to minimize unnecessary CO2 emissions, the authors hope to raise awareness and facilitate greener computation.

Gene Regulatory Networks to Explain Coronary Artery Disease Heritability.

Inouye, Michael; Lannelongue, Loïc.

J Am Coll Cardiol ; 73(23): 2958-2960, 2019 06 18.

Artigo em Inglês | MEDLINE | ID: mdl-31196452

Assuntos

Doença da Artéria Coronariana , Redes Reguladoras de Genes , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA