Pesquisa | BVS Integralidade em Saúde

1.

Evidence of epistasis in regions of long-range linkage disequilibrium across five complex diseases in the UK Biobank and eMERGE datasets.

Singhal, Pankhuri; Veturi, Yogasudha; Dudek, Scott M; Lucas, Anastasia; Frase, Alex; van Steen, Kristel; Schrodi, Steven J; Fasel, David; Weng, Chunhua; Pendergrass, Rion; Schaid, Daniel J; Kullo, Iftikhar J; Dikilitas, Ozan; Sleiman, Patrick M A; Hakonarson, Hakon; Moore, Jason H; Williams, Scott M; Ritchie, Marylyn D; Verma, Shefali S.

Am J Hum Genet ; 110(4): 575-591, 2023 04 06.

Artigo em Inglês | MEDLINE | ID: mdl-37028392

RESUMO

Leveraging linkage disequilibrium (LD) patterns as representative of population substructure enables the discovery of additive association signals in genome-wide association studies (GWASs). Standard GWASs are well-powered to interrogate additive models; however, new approaches are required for invesigating other modes of inheritance such as dominance and epistasis. Epistasis, or non-additive interaction between genes, exists across the genome but often goes undetected because of a lack of statistical power. Furthermore, the adoption of LD pruning as customary in standard GWASs excludes detection of sites that are in LD but might underlie the genetic architecture of complex traits. We hypothesize that uncovering long-range interactions between loci with strong LD due to epistatic selection can elucidate genetic mechanisms underlying common diseases. To investigate this hypothesis, we tested for associations between 23 common diseases and 5,625,845 epistatic SNP-SNP pairs (determined by Ohta's D statistics) in long-range LD (>0.25 cM). Across five disease phenotypes, we identified one significant and four near-significant associations that replicated in two large genotype-phenotype datasets (UK Biobank and eMERGE). The genes that were most likely involved in the replicated associations were (1) members of highly conserved gene families with complex roles in multiple pathways, (2) essential genes, and/or (3) genes that were associated in the literature with complex traits that display variable expressivity. These results support the highly pleiotropic and conserved nature of variants in long-range LD under epistatic selection. Our work supports the hypothesis that epistatic interactions regulate diverse clinical mechanisms and might especially be driving factors in conditions with a wide range of phenotypic outcomes.

Assuntos

Epistasia Genética , Estudo de Associação Genômica Ampla , Desequilíbrio de Ligação/genética , Genótipo , Bancos de Espécimes Biológicos , Reino Unido , Polimorfismo de Nucleotídeo Único/genética

2.

Electronic health records and polygenic risk scores for predicting disease risk.

Li, Ruowang; Chen, Yong; Ritchie, Marylyn D; Moore, Jason H.

Nat Rev Genet ; 21(8): 493-502, 2020 08.

Artigo em Inglês | MEDLINE | ID: mdl-32235907

RESUMO

Accurate prediction of disease risk based on the genetic make-up of an individual is essential for effective prevention and personalized treatment. Nevertheless, to date, individual genetic variants from genome-wide association studies have achieved only moderate prediction of disease risk. The aggregation of genetic variants under a polygenic model shows promising improvements in prediction accuracies. Increasingly, electronic health records (EHRs) are being linked to patient genetic data in biobanks, which provides new opportunities for developing and applying polygenic risk scores in the clinic, to systematically examine and evaluate patient susceptibilities to disease. However, the heterogeneous nature of EHR data brings forth many practical challenges along every step of designing and implementing risk prediction strategies. In this Review, we present the unique considerations for using genotype and phenotype data from biobank-linked EHRs for polygenic risk prediction.

Assuntos

Registros Eletrônicos de Saúde , Estudos de Associação Genética , Predisposição Genética para Doença , Herança Multifatorial , Algoritmos , Biologia Computacional/métodos , Estudo de Associação Genômica Ampla , Genômica/métodos , Genótipo , Humanos , Fenótipo , Reprodutibilidade dos Testes , Medição de Risco , Fatores de Risco

3.

KRAGEN: a knowledge graph-enhanced RAG framework for biomedical problem solving using large language models.

Matsumoto, Nicholas; Moran, Jay; Choi, Hyunjun; Hernandez, Miguel E; Venkatesan, Mythreye; Wang, Paul; Moore, Jason H.

Bioinformatics ; 40(6)2024 Jun 03.

Artigo em Inglês | MEDLINE | ID: mdl-38830083

RESUMO

MOTIVATION: Answering and solving complex problems using a large language model (LLM) given a certain domain such as biomedicine is a challenging task that requires both factual consistency and logic, and LLMs often suffer from some major limitations, such as hallucinating false or irrelevant information, or being influenced by noisy data. These issues can compromise the trustworthiness, accuracy, and compliance of LLM-generated text and insights. RESULTS: Knowledge Retrieval Augmented Generation ENgine (KRAGEN) is a new tool that combines knowledge graphs, Retrieval Augmented Generation (RAG), and advanced prompting techniques to solve complex problems with natural language. KRAGEN converts knowledge graphs into a vector database and uses RAG to retrieve relevant facts from it. KRAGEN uses advanced prompting techniques: namely graph-of-thoughts (GoT), to dynamically break down a complex problem into smaller subproblems, and proceeds to solve each subproblem by using the relevant knowledge through the RAG framework, which limits the hallucinations, and finally, consolidates the subproblems and provides a solution. KRAGEN's graph visualization allows the user to interact with and evaluate the quality of the solution's GoT structure and logic. AVAILABILITY AND IMPLEMENTATION: KRAGEN is deployed by running its custom Docker containers. KRAGEN is available as open-source from GitHub at: https://github.com/EpistasisLab/KRAGEN.

Assuntos

Software , Processamento de Linguagem Natural , Resolução de Problemas , Algoritmos , Armazenamento e Recuperação da Informação/métodos , Humanos , Biologia Computacional/métodos , Bases de Dados Factuais

4.

Artificial intelligence: revolutionizing cardiology with large language models.

Boonstra, Machteld J; Weissenbacher, Davy; Moore, Jason H; Gonzalez-Hernandez, Graciela; Asselbergs, Folkert W.

Eur Heart J ; 45(5): 332-345, 2024 Feb 01.

Artigo em Inglês | MEDLINE | ID: mdl-38170821

RESUMO

Natural language processing techniques are having an increasing impact on clinical care from patient, clinician, administrator, and research perspective. Among others are automated generation of clinical notes and discharge letters, medical term coding for billing, medical chatbots both for patients and clinicians, data enrichment in the identification of disease symptoms or diagnosis, cohort selection for clinical trial, and auditing purposes. In the review, an overview of the history in natural language processing techniques developed with brief technical background is presented. Subsequently, the review will discuss implementation strategies of natural language processing tools, thereby specifically focusing on large language models, and conclude with future opportunities in the application of such techniques in the field of cardiology.

Assuntos

Inteligência Artificial , Cardiologia , Humanos , Processamento de Linguagem Natural , Alta do Paciente

5.

Aliro: an automated machine learning tool leveraging large language models.

Choi, Hyunjun; Moran, Jay; Matsumoto, Nicholas; Hernandez, Miguel E; Moore, Jason H.

Bioinformatics ; 39(10)2023 10 03.

Artigo em Inglês | MEDLINE | ID: mdl-37796839

RESUMO

MOTIVATION: Biomedical and healthcare domains generate vast amounts of complex data that can be challenging to analyze using machine learning tools, especially for researchers without computer science training. RESULTS: Aliro is an open-source software package designed to automate machine learning analysis through a clean web interface. By infusing the power of large language models, the user can interact with their data by seamlessly retrieving and executing code pulled from the large language model, accelerating automated discovery of new insights from data. Aliro includes a pre-trained machine learning recommendation system that can assist the user to automate the selection of machine learning algorithms and its hyperparameters and provides visualization of the evaluated model and data. AVAILABILITY AND IMPLEMENTATION: Aliro is deployed by running its custom Docker containers. Aliro is available as open-source from GitHub at: https://github.com/EpistasisLab/Aliro.

Assuntos

Algoritmos , Software , Aprendizado de Máquina , Idioma

6.

Ten simple rules for managing laboratory information.

Berezin, Casey-Tyler; Aguilera, Luis U; Billerbeck, Sonja; Bourne, Philip E; Densmore, Douglas; Freemont, Paul; Gorochowski, Thomas E; Hernandez, Sarah I; Hillson, Nathan J; King, Connor R; Köpke, Michael; Ma, Shuyi; Miller, Katie M; Moon, Tae Seok; Moore, Jason H; Munsky, Brian; Myers, Chris J; Nicholas, Dequina A; Peccoud, Samuel J; Zhou, Wen; Peccoud, Jean.

PLoS Comput Biol ; 19(12): e1011652, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-38060459

RESUMO

Information is the cornerstone of research, from experimental (meta)data and computational processes to complex inventories of reagents and equipment. These 10 simple rules discuss best practices for leveraging laboratory information management systems to transform this large information load into useful scientific findings.

7.

Preference matrix guided sparse canonical correlation analysis for mining brain imaging genetic associations in Alzheimer's disease.

Sha, Jiahang; Bao, Jingxuan; Liu, Kefei; Yang, Shu; Wen, Zixuan; Wen, Junhao; Cui, Yuhan; Tong, Boning; Moore, Jason H; Saykin, Andrew J; Davatzikos, Christos; Long, Qi; Shen, Li.

Methods ; 218: 27-38, 2023 10.

Artigo em Inglês | MEDLINE | ID: mdl-37507059

RESUMO

Investigating the relationship between genetic variation and phenotypic traits is a key issue in quantitative genetics. Specifically for Alzheimer's disease, the association between genetic markers and quantitative traits remains vague while, once identified, will provide valuable guidance for the study and development of genetics-based treatment approaches. Currently, to analyze the association of two modalities, sparse canonical correlation analysis (SCCA) is commonly used to compute one sparse linear combination of the variable features for each modality, giving a pair of linear combination vectors in total that maximizes the cross-correlation between the analyzed modalities. One drawback of the plain SCCA model is that the existing findings and knowledge cannot be integrated into the model as priors to help extract interesting correlations as well as identify biologically meaningful genetic and phenotypic markers. To bridge this gap, we introduce preference matrix guided SCCA (PM-SCCA) that not only takes priors encoded as a preference matrix but also maintains computational simplicity. A simulation study and a real-data experiment are conducted to investigate the effectiveness of the model. Both experiments demonstrate that the proposed PM-SCCA model can capture not only genotype-phenotype correlation but also relevant features effectively.

Assuntos

Doença de Alzheimer , Neuroimagem , Humanos , Neuroimagem/métodos , Análise de Correlação Canônica , Algoritmos , Doença de Alzheimer/diagnóstico por imagem , Doença de Alzheimer/genética , Encéfalo , Imageamento por Ressonância Magnética

8.

Novel EDGE encoding method enhances ability to identify genetic interactions.

Hall, Molly A; Wallace, John; Lucas, Anastasia M; Bradford, Yuki; Verma, Shefali S; Müller-Myhsok, Bertram; Passero, Kristin; Zhou, Jiayan; McGuigan, John; Jiang, Beibei; Pendergrass, Sarah A; Zhang, Yanfei; Peissig, Peggy; Brilliant, Murray; Sleiman, Patrick; Hakonarson, Hakon; Harley, John B; Kiryluk, Krzysztof; Van Steen, Kristel; Moore, Jason H; Ritchie, Marylyn D.

PLoS Genet ; 17(6): e1009534, 2021 06.

Artigo em Inglês | MEDLINE | ID: mdl-34086673

RESUMO

Assumptions are made about the genetic model of single nucleotide polymorphisms (SNPs) when choosing a traditional genetic encoding: additive, dominant, and recessive. Furthermore, SNPs across the genome are unlikely to demonstrate identical genetic models. However, running SNP-SNP interaction analyses with every combination of encodings raises the multiple testing burden. Here, we present a novel and flexible encoding for genetic interactions, the elastic data-driven genetic encoding (EDGE), in which SNPs are assigned a heterozygous value based on the genetic model they demonstrate in a dataset prior to interaction testing. We assessed the power of EDGE to detect genetic interactions using 29 combinations of simulated genetic models and found it outperformed the traditional encoding methods across 10%, 30%, and 50% minor allele frequencies (MAFs). Further, EDGE maintained a low false-positive rate, while additive and dominant encodings demonstrated inflation. We evaluated EDGE and the traditional encodings with genetic data from the Electronic Medical Records and Genomics (eMERGE) Network for five phenotypes: age-related macular degeneration (AMD), age-related cataract, glaucoma, type 2 diabetes (T2D), and resistant hypertension. A multi-encoding genome-wide association study (GWAS) for each phenotype was performed using the traditional encodings, and the top results of the multi-encoding GWAS were considered for SNP-SNP interaction using the traditional encodings and EDGE. EDGE identified a novel SNP-SNP interaction for age-related cataract that no other method identified: rs7787286 (MAF: 0.041; intergenic region of chromosome 7)-rs4695885 (MAF: 0.34; intergenic region of chromosome 4) with a Bonferroni LRT p of 0.018. A SNP-SNP interaction was found in data from the UK Biobank within 25 kb of these SNPs using the recessive encoding: rs60374751 (MAF: 0.030) and rs6843594 (MAF: 0.34) (Bonferroni LRT p: 0.026). We recommend using EDGE to flexibly detect interactions between SNPs exhibiting diverse action.

Assuntos

Modelos Genéticos , Catarata/genética , Conjuntos de Dados como Assunto , Diabetes Mellitus Tipo 2/genética , Frequência do Gene , Estudo de Associação Genômica Ampla , Glaucoma/genética , Humanos , Hipertensão/genética , Degeneração Macular/genética , Fenótipo , Polimorfismo de Nucleotídeo Único

9.

The Alzheimer's Knowledge Base: A Knowledge Graph for Alzheimer Disease Research.

Romano, Joseph D; Truong, Van; Kumar, Rachit; Venkatesan, Mythreye; Graham, Britney E; Hao, Yun; Matsumoto, Nick; Li, Xi; Wang, Zhiping; Ritchie, Marylyn D; Shen, Li; Moore, Jason H.

J Med Internet Res ; 26: e46777, 2024 Apr 18.

Artigo em Inglês | MEDLINE | ID: mdl-38635981

RESUMO

BACKGROUND: As global populations age and become susceptible to neurodegenerative illnesses, new therapies for Alzheimer disease (AD) are urgently needed. Existing data resources for drug discovery and repurposing fail to capture relationships central to the disease's etiology and response to drugs. OBJECTIVE: We designed the Alzheimer's Knowledge Base (AlzKB) to alleviate this need by providing a comprehensive knowledge representation of AD etiology and candidate therapeutics. METHODS: We designed the AlzKB as a large, heterogeneous graph knowledge base assembled using 22 diverse external data sources describing biological and pharmaceutical entities at different levels of organization (eg, chemicals, genes, anatomy, and diseases). AlzKB uses a Web Ontology Language 2 ontology to enforce semantic consistency and allow for ontological inference. We provide a public version of AlzKB and allow users to run and modify local versions of the knowledge base. RESULTS: AlzKB is freely available on the web and currently contains 118,902 entities with 1,309,527 relationships between those entities. To demonstrate its value, we used graph data science and machine learning to (1) propose new therapeutic targets based on similarities of AD to Parkinson disease and (2) repurpose existing drugs that may treat AD. For each use case, AlzKB recovers known therapeutic associations while proposing biologically plausible new ones. CONCLUSIONS: AlzKB is a new, publicly available knowledge resource that enables researchers to discover complex translational associations for AD drug discovery. Through 2 use cases, we show that it is a valuable tool for proposing novel therapeutic hypotheses based on public biomedical knowledge.

Assuntos

Doença de Alzheimer , Humanos , Doença de Alzheimer/tratamento farmacológico , Doença de Alzheimer/genética , Reconhecimento Automatizado de Padrão , Bases de Conhecimento , Aprendizado de Máquina , Conhecimento

10.

Artificial Intelligence and Technology Collaboratories: Innovating aging research and Alzheimer's care.

Abadir, Peter; Oh, Esther; Chellappa, Rama; Choudhry, Niteesh; Demiris, George; Ganesan, Deepak; Karlawish, Jason; Marlin, Benjamin; Li, Rose M; Dehak, Najim; Arbaje, Alicia; Unberath, Mathias; Cudjoe, Thomas; Chute, Christopher; Moore, Jason H; Phan, Phillip; Samus, Quincy; Schoenborn, Nancy L; Battle, Alexis; Walston, Jeremy D.

Alzheimers Dement ; 20(4): 3074-3079, 2024 04.

Artigo em Inglês | MEDLINE | ID: mdl-38324244

RESUMO

This perspective outlines the Artificial Intelligence and Technology Collaboratories (AITC) at Johns Hopkins University, University of Pennsylvania, and University of Massachusetts, highlighting their roles in developing AI-based technologies for older adult care, particularly targeting Alzheimer's disease (AD). These National Institute on Aging (NIA) centers foster collaboration among clinicians, gerontologists, ethicists, business professionals, and engineers to create AI solutions. Key activities include identifying technology needs, stakeholder engagement, training, mentoring, data integration, and navigating ethical challenges. The objective is to apply these innovations effectively in real-world scenarios, including in rural settings. In addition, the AITC focuses on developing best practices for AI application in the care of older adults, facilitating pilot studies, and addressing ethical concerns related to technology development for older adults with cognitive impairment, with the ultimate aim of improving the lives of older adults and their caregivers. HIGHLIGHTS: Addressing the complex needs of older adults with Alzheimer's disease (AD) requires a comprehensive approach, integrating medical and social support. Current gaps in training, techniques, tools, and expertise hinder uniform access across communities and health care settings. Artificial intelligence (AI) and digital technologies hold promise in transforming care for this demographic. Yet, transitioning these innovations from concept to marketable products presents significant challenges, often stalling promising advancements in the developmental phase. The Artificial Intelligence and Technology Collaboratories (AITC) program, funded by the National Institute on Aging (NIA), presents a viable model. These Collaboratories foster the development and implementation of AI methods and technologies through projects aimed at improving care for older Americans, particularly those with AD, and promote the sharing of best practices in AI and technology integration. Why Does This Matter? The National Institute on Aging (NIA) Artificial Intelligence and Technology Collaboratories (AITC) program's mission is to accelerate the adoption of artificial intelligence (AI) and new technologies for the betterment of older adults, especially those with dementia. By bridging scientific and technological expertise, fostering clinical and industry partnerships, and enhancing the sharing of best practices, this program can significantly improve the health and quality of life for older adults with Alzheimer's disease (AD).

Assuntos

Doença de Alzheimer , Isotiocianatos , Estados Unidos , Humanos , Idoso , Doença de Alzheimer/terapia , Inteligência Artificial , Gerociência , Qualidade de Vida , Tecnologia

11.

Genetic heterogeneity: Challenges, impacts, and methods through an associative lens.

Woodward, Alexa A; Urbanowicz, Ryan J; Naj, Adam C; Moore, Jason H.

Genet Epidemiol ; 46(8): 555-571, 2022 12.

Artigo em Inglês | MEDLINE | ID: mdl-35924480

RESUMO

Genetic heterogeneity describes the occurrence of the same or similar phenotypes through different genetic mechanisms in different individuals. Robustly characterizing and accounting for genetic heterogeneity is crucial to pursuing the goals of precision medicine, for discovering novel disease biomarkers, and for identifying targets for treatments. Failure to account for genetic heterogeneity may lead to missed associations and incorrect inferences. Thus, it is critical to review the impact of genetic heterogeneity on the design and analysis of population level genetic studies, aspects that are often overlooked in the literature. In this review, we first contextualize our approach to genetic heterogeneity by proposing a high-level categorization of heterogeneity into "feature," "outcome," and "associative" heterogeneity, drawing on perspectives from epidemiology and machine learning to illustrate distinctions between them. We highlight the unique nature of genetic heterogeneity as a heterogeneous pattern of association that warrants specific methodological considerations. We then focus on the challenges that preclude effective detection and characterization of genetic heterogeneity across a variety of epidemiological contexts. Finally, we discuss systems heterogeneity as an integrated approach to using genetic and other high-dimensional multi-omic data in complex disease research.

Assuntos

Heterogeneidade Genética , Medicina de Precisão , Humanos , Medicina de Precisão/métodos , Aprendizado de Máquina , Fenótipo

12.

PMLB v1.0: an open-source dataset collection for benchmarking machine learning methods.

Romano, Joseph D; Le, Trang T; La Cava, William; Gregg, John T; Goldberg, Daniel J; Chakraborty, Praneel; Ray, Natasha L; Himmelstein, Daniel; Fu, Weixuan; Moore, Jason H.

Bioinformatics ; 38(3): 878-880, 2022 01 12.

Artigo em Inglês | MEDLINE | ID: mdl-34677586

RESUMO

MOTIVATION: Novel machine learning and statistical modeling studies rely on standardized comparisons to existing methods using well-studied benchmark datasets. Few tools exist that provide rapid access to many of these datasets through a standardized, user-friendly interface that integrates well with popular data science workflows. RESULTS: This release of PMLB (Penn Machine Learning Benchmarks) provides the largest collection of diverse, public benchmark datasets for evaluating new machine learning and data science methods aggregated in one location. v1.0 introduces a number of critical improvements developed following discussions with the open-source community. AVAILABILITY AND IMPLEMENTATION: PMLB is available at https://github.com/EpistasisLab/pmlb. Python and R interfaces for PMLB can be installed through the Python Package Index and Comprehensive R Archive Network, respectively.

Assuntos

Benchmarking , Software , Aprendizado de Máquina , Modelos Estatísticos

13.

Critical appraisal of artificial intelligence-based prediction models for cardiovascular disease.

van Smeden, Maarten; Heinze, Georg; Van Calster, Ben; Asselbergs, Folkert W; Vardas, Panos E; Bruining, Nico; de Jaegere, Peter; Moore, Jason H; Denaxas, Spiros; Boulesteix, Anne Laure; Moons, Karel G M.

Eur Heart J ; 43(31): 2921-2930, 2022 08 14.

Artigo em Inglês | MEDLINE | ID: mdl-35639667

RESUMO

The medical field has seen a rapid increase in the development of artificial intelligence (AI)-based prediction models. With the introduction of such AI-based prediction model tools and software in cardiovascular patient care, the cardiovascular researcher and healthcare professional are challenged to understand the opportunities as well as the limitations of the AI-based predictions. In this article, we present 12 critical questions for cardiovascular health professionals to ask when confronted with an AI-based prediction model. We aim to support medical professionals to distinguish the AI-based prediction models that can add value to patient care from the AI that does not.

Assuntos

Inteligência Artificial , Doenças Cardiovasculares , Pessoal de Saúde , Humanos , Software

14.

The Translational Machine: A novel machine-learning approach to illuminate complex genetic architectures.

Askland, Kathleen D; Strong, David; Wright, Marvin N; Moore, Jason H.

Genet Epidemiol ; 45(5): 485-536, 2021 07.

Artigo em Inglês | MEDLINE | ID: mdl-33942369

RESUMO

The Translational Machine (TM) is a machine learning (ML)-based analytic pipeline that translates genotypic/variant call data into biologically contextualized features that richly characterize complex variant architectures and permit greater interpretability and biological replication. It also reduces potentially confounding effects of population substructure on outcome prediction. The TM consists of three main components. First, replicable but flexible feature engineering procedures translate genome-scale data into biologically informative features that appropriately contextualize simple variant calls/genotypes within biological and functional contexts. Second, model-free, nonparametric ML-based feature filtering procedures empirically reduce dimensionality and noise of both original genotype calls and engineered features. Third, a powerful ML algorithm for feature selection is used to differentiate risk variant contributions across variant frequency and functional prediction spectra. The TM simultaneously evaluates potential contributions of variants operative under polygenic and heterogeneous models of genetic architecture. Our TM enables integration of biological information (e.g., genomic annotations) within conceptual frameworks akin to geneset-/pathways-based and collapsing methods, but overcomes some of these methods' limitations. The full TM pipeline is executed in R. Our approach and initial findings from its application to a whole-exome schizophrenia case-control data set are presented. These TM procedures extend the findings of the primary investigation and yield novel results.

Assuntos

Aprendizado de Máquina , Modelos Genéticos , Algoritmos , Genômica , Genótipo , Humanos

15.

The promise of automated machine learning for the genetic analysis of complex traits.

Manduchi, Elisabetta; Romano, Joseph D; Moore, Jason H.

Hum Genet ; 141(9): 1529-1544, 2022 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-34713318

RESUMO

The genetic analysis of complex traits has been dominated by parametric statistical methods due to their theoretical properties, ease of use, computational efficiency, and intuitive interpretation. However, there are likely to be patterns arising from complex genetic architectures which are more easily detected and modeled using machine learning methods. Unfortunately, selecting the right machine learning algorithm and tuning its hyperparameters can be daunting for experts and non-experts alike. The goal of automated machine learning (AutoML) is to let a computer algorithm identify the right algorithms and hyperparameters thus taking the guesswork out of the optimization process. We review the promises and challenges of AutoML for the genetic analysis of complex traits and give an overview of several approaches and some example applications to omics data. It is our hope that this review will motivate studies to develop and evaluate novel AutoML methods and software in the genetics and genomics space. The promise of AutoML is to enable anyone, regardless of training or expertise, to apply machine learning as part of their genetic analysis strategy.

Assuntos

Aprendizado de Máquina , Herança Multifatorial , Algoritmos , Genômica/métodos , Humanos , Software

16.

treeheatr: an R package for interpretable decision tree visualizations.

Le, Trang T; Moore, Jason H.

Bioinformatics ; 37(2): 282-284, 2021 04 19.

Artigo em Inglês | MEDLINE | ID: mdl-32702108

RESUMO

SUMMARY: treeheatr is an R package for creating interpretable decision tree visualizations with the data represented as a heatmap at the tree's leaf nodes. The integrated presentation of the tree structure along with an overview of the data efficiently illustrates how the tree nodes split up the feature space and how well the tree model performs. This visualization can also be examined in depth to uncover the correlation structure in the data and importance of each feature in predicting the outcome. Implemented in an easily installed package with a detailed vignette, treeheatr can be a useful teaching tool to enhance students' understanding of a simple decision tree model before diving into more complex tree-based machine learning methods. AVAILABILITY AND IMPLEMENTATION: The treeheatr package is freely available under the permissive MIT license at https://trang1618.github.io/treeheatr and https://cran.r-project.org/package=treeheatr. It comes with a detailed vignette that is automatically built with GitHub Actions continuous integration.

Assuntos

Aprendizado de Máquina , Software , Árvores de Decisões , Humanos

17.

Evaluating recommender systems for AI-driven biomedical informatics.

La Cava, William; Williams, Heather; Fu, Weixuan; Vitale, Steve; Srivatsan, Durga; Moore, Jason H.

Bioinformatics ; 37(2): 250-256, 2021 04 19.

Artigo em Inglês | MEDLINE | ID: mdl-32766825

RESUMO

MOTIVATION: Many researchers with domain expertise are unable to easily apply machine learning (ML) to their bioinformatics data due to a lack of ML and/or coding expertise. Methods that have been proposed thus far to automate ML mostly require programming experience as well as expert knowledge to tune and apply the algorithms correctly. Here, we study a method of automating biomedical data science using a web-based AI platform to recommend model choices and conduct experiments. We have two goals in mind: first, to make it easy to construct sophisticated models of biomedical processes; and second, to provide a fully automated AI agent that can choose and conduct promising experiments for the user, based on the user's experiments as well as prior knowledge. To validate this framework, we conduct an experiment on 165 classification problems, comparing to state-of-the-art, automated approaches. Finally, we use this tool to develop predictive models of septic shock in critical care patients. RESULTS: We find that matrix factorization-based recommendation systems outperform metalearning methods for automating ML. This result mirrors the results of earlier recommender systems research in other domains. The proposed AI is competitive with state-of-the-art automated ML methods in terms of choosing optimal algorithm configurations for datasets. In our application to prediction of septic shock, the AI-driven analysis produces a competent ML model (AUROC 0.85±0.02) that performs on par with state-of-the-art deep learning results for this task, with much less computational effort. AVAILABILITY AND IMPLEMENTATION: PennAI is available free of charge and open-source. It is distributed under the GNU public license (GPL) version 3. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Aprendizado de Máquina , Humanos , Informática

18.

Estimating prevalence of human traits among populations from polygenic risk scores.

Graham, Britney E; Plotkin, Brian; Muglia, Louis; Moore, Jason H; Williams, Scott M.

Hum Genomics ; 15(1): 70, 2021 12 13.

Artigo em Inglês | MEDLINE | ID: mdl-34903281

RESUMO

The genetic basis of phenotypic variation across populations has not been well explained for most traits. Several factors may cause disparities, from variation in environments to divergent population genetic structure. We hypothesized that a population-level polygenic risk score (PRS) can explain phenotypic variation among geographic populations based solely on risk allele frequencies. We applied a population-specific PRS (psPRS) to 26 populations from the 1000 Genomes to four phenotypes: lactase persistence (LP), melanoma, multiple sclerosis (MS) and height. Our models assumed additive genetic architecture among the polymorphisms in the psPRSs, as is convention. Linear psPRSs explained a significant proportion of trait variance ranging from 0.32 for height in men to 0.88 for melanoma. The best models for LP and height were linear, while those for melanoma and MS were nonlinear. As not all variants in a PRS may confer similar, or even any, risk among diverse populations, we also filtered out SNPs to assess whether variance explained was improved using psPRSs with fewer SNPs. Variance explained usually improved with fewer SNPs in the psPRS and was as high as 0.99 for height in men using only 548 of the initial 4208 SNPs. That reducing SNPs improves psPRSs performance may indicate that missing heritability is partially due to complex architecture that does not mandate additivity, undiscovered variants or spurious associations in the databases. We demonstrated that PRS-based analyses can be used across diverse populations and phenotypes for population prediction and that these comparisons can identify the universal risk variants.

Assuntos

Herança Multifatorial , Polimorfismo de Nucleotídeo Único , Estudo de Associação Genômica Ampla , Humanos , Herança Multifatorial/genética , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Prevalência , Fatores de Risco

19.

Automating Predictive Toxicology Using ComptoxAI.

Romano, Joseph D; Hao, Yun; Moore, Jason H; Penning, Trevor M.

Chem Res Toxicol ; 35(8): 1370-1382, 2022 08 15.

Artigo em Inglês | MEDLINE | ID: mdl-35819939

RESUMO

ComptoxAI is a new data infrastructure for computational and artificial intelligence research in predictive toxicology. Here, we describe and showcase ComptoxAI's graph-structured knowledge base in the context of three real-world use-cases, demonstrating that it can rapidly answer complex questions about toxicology that are infeasible using previous technologies and data resources. These use-cases each demonstrate a tool for information retrieval from the knowledge base being used to solve a specific task: The "shortest path" module is used to identify mechanistic links between perfluorooctanoic acid (PFOA) exposure and nonalcoholic fatty liver disease; the "expand network" module identifies communities that are linked to dioxin toxicity; and the quantitative structure-activity relationship (QSAR) dataset generator predicts pregnane X receptor agonism in a set of 4,021 pesticide ingredients. The contents of ComptoxAI's source data are rigorously aggregated from a diverse array of public third-party databases, and ComptoxAI is designed as a free, public, and open-source toolkit to enable diverse classes of users including biomedical researchers, public health and regulatory officials, and the general public to predict toxicology of unknowns and modes of action.

Assuntos

Biologia Computacional , Toxicologia , Inteligência Artificial , Bases de Dados Factuais , Relação Quantitativa Estrutura-Atividade

20.

SurvMaximin: Robust federated approach to transporting survival risk prediction models.

Wang, Xuan; Zhang, Harrison G; Xiong, Xin; Hong, Chuan; Weber, Griffin M; Brat, Gabriel A; Bonzel, Clara-Lea; Luo, Yuan; Duan, Rui; Palmer, Nathan P; Hutch, Meghan R; Gutiérrez-Sacristán, Alba; Bellazzi, Riccardo; Chiovato, Luca; Cho, Kelly; Dagliati, Arianna; Estiri, Hossein; García-Barrio, Noelia; Griffier, Romain; Hanauer, David A; Ho, Yuk-Lam; Holmes, John H; Keller, Mark S; Klann MEng, Jeffrey G; L'Yi, Sehi; Lozano-Zahonero, Sara; Maidlow, Sarah E; Makoudjou, Adeline; Malovini, Alberto; Moal, Bertrand; Moore, Jason H; Morris, Michele; Mowery, Danielle L; Murphy, Shawn N; Neuraz, Antoine; Yuan Ngiam, Kee; Omenn, Gilbert S; Patel, Lav P; Pedrera-Jiménez, Miguel; Prunotto, Andrea; Jebathilagam Samayamuthu, Malarkodi; Sanz Vidorreta, Fernando J; Schriver, Emily R; Schubert, Petra; Serrano-Balazote, Pablo; South, Andrew M; Tan, Amelia L M; Tan, Byorn W L; Tibollo, Valentina; Tippmann, Patric.

J Biomed Inform ; 134: 104176, 2022 10.

Artigo em Inglês | MEDLINE | ID: mdl-36007785

RESUMO

OBJECTIVE: For multi-center heterogeneous Real-World Data (RWD) with time-to-event outcomes and high-dimensional features, we propose the SurvMaximin algorithm to estimate Cox model feature coefficients for a target population by borrowing summary information from a set of health care centers without sharing patient-level information. MATERIALS AND METHODS: For each of the centers from which we want to borrow information to improve the prediction performance for the target population, a penalized Cox model is fitted to estimate feature coefficients for the center. Using estimated feature coefficients and the covariance matrix of the target population, we then obtain a SurvMaximin estimated set of feature coefficients for the target population. The target population can be an entire cohort comprised of all centers, corresponding to federated learning, or a single center, corresponding to transfer learning. RESULTS: Simulation studies and a real-world international electronic health records application study, with 15 participating health care centers across three countries (France, Germany, and the U.S.), show that the proposed SurvMaximin algorithm achieves comparable or higher accuracy compared with the estimator using only the information of the target site and other existing methods. The SurvMaximin estimator is robust to variations in sample sizes and estimated feature coefficients between centers, which amounts to significantly improved estimates for target sites with fewer observations. CONCLUSIONS: The SurvMaximin method is well suited for both federated and transfer learning in the high-dimensional survival analysis setting. SurvMaximin only requires a one-time summary information exchange from participating centers. Estimated regression vectors can be very heterogeneous. SurvMaximin provides robust Cox feature coefficient estimates without outcome information in the target population and is privacy-preserving.

Assuntos

Algoritmos , Registros Eletrônicos de Saúde , Humanos , Privacidade , Modelos de Riscos Proporcionais , Análise de Sobrevida

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa