Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
1.
Nat Rev Genet ; 21(8): 493-502, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32235907

RESUMO

Accurate prediction of disease risk based on the genetic make-up of an individual is essential for effective prevention and personalized treatment. Nevertheless, to date, individual genetic variants from genome-wide association studies have achieved only moderate prediction of disease risk. The aggregation of genetic variants under a polygenic model shows promising improvements in prediction accuracies. Increasingly, electronic health records (EHRs) are being linked to patient genetic data in biobanks, which provides new opportunities for developing and applying polygenic risk scores in the clinic, to systematically examine and evaluate patient susceptibilities to disease. However, the heterogeneous nature of EHR data brings forth many practical challenges along every step of designing and implementing risk prediction strategies. In this Review, we present the unique considerations for using genotype and phenotype data from biobank-linked EHRs for polygenic risk prediction.


Assuntos
Registros Eletrônicos de Saúde , Estudos de Associação Genética , Predisposição Genética para Doença , Herança Multifatorial , Algoritmos , Biologia Computacional/métodos , Estudo de Associação Genômica Ampla , Genômica/métodos , Genótipo , Humanos , Fenótipo , Reprodutibilidade dos Testes , Medição de Risco , Fatores de Risco
2.
Cell ; 147(7): 1498-510, 2011 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-22196727

RESUMO

Numerous chromatin regulators are required for embryonic stem (ES) cell self-renewal and pluripotency, but few have been studied in detail. Here, we examine the roles of several chromatin regulators whose loss affects the pluripotent state of ES cells. We find that Mbd3 and Brg1 antagonistically regulate a common set of genes by regulating promoter nucleosome occupancy. Furthermore, both Mbd3 and Brg1 play key roles in the biology of 5-hydroxymethylcytosine (5hmC): Mbd3 colocalizes with Tet1 and 5hmC in vivo, Mbd3 knockdown preferentially affects expression of 5hmC-marked genes, Mbd3 localization is Tet1-dependent, and Mbd3 preferentially binds to 5hmC relative to 5-methylcytosine in vitro. Finally, both Mbd3 and Brg1 are themselves required for normal levels of 5hmC in vivo. Together, our results identify an effector for 5hmC, and reveal that control of gene expression by antagonistic chromatin regulators is a surprisingly common regulatory strategy in ES cells.


Assuntos
Citosina/análogos & derivados , Proteínas de Ligação a DNA/metabolismo , Células-Tronco Embrionárias/metabolismo , Complexo Mi-2 de Remodelação de Nucleossomo e Desacetilase/metabolismo , Fatores de Transcrição/metabolismo , 5-Metilcitosina/análogos & derivados , Animais , Montagem e Desmontagem da Cromatina , Citosina/metabolismo , DNA Helicases/metabolismo , Proteínas de Ligação a DNA/genética , Técnicas de Silenciamento de Genes , Humanos , Camundongos , Proteínas Nucleares/metabolismo , Proteínas Proto-Oncogênicas/genética , Proteínas Proto-Oncogênicas/metabolismo , RNA Polimerase II/metabolismo
3.
Cell ; 143(7): 1084-96, 2010 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-21183072

RESUMO

Epigenetic information can be inherited through the mammalian germline and represents a plausible transgenerational carrier of environmental information. To test whether transgenerational inheritance of environmental information occurs in mammals, we carried out an expression profiling screen for genes in mice that responded to paternal diet. Offspring of males fed a low-protein diet exhibited elevated hepatic expression of many genes involved in lipid and cholesterol biosynthesis and decreased levels of cholesterol esters, relative to the offspring of males fed a control diet. Epigenomic profiling of offspring livers revealed numerous modest (∼20%) changes in cytosine methylation depending on paternal diet, including reproducible changes in methylation over a likely enhancer for the key lipid regulator Ppara. These results, in conjunction with recent human epidemiological data, indicate that parental diet can affect cholesterol and lipid metabolism in offspring and define a model system to study environmental reprogramming of the heritable epigenome.


Assuntos
Metilação de DNA , Dieta com Restrição de Proteínas , Impressão Genômica , Metabolismo dos Lipídeos , Animais , Vias Biossintéticas , Colesterol/biossíntese , Citosina/metabolismo , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento , Humanos , Fígado/metabolismo , Masculino , Camundongos
4.
Nat Rev Genet ; 16(2): 85-97, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25582081

RESUMO

Recent technological advances have expanded the breadth of available omic data, from whole-genome sequencing data, to extensive transcriptomic, methylomic and metabolomic data. A key goal of analyses of these data is the identification of effective models that predict phenotypic traits and outcomes, elucidating important biomarkers and generating important insights into the genetic underpinnings of the heritability of complex traits. There is still a need for powerful and advanced analysis strategies to fully harness the utility of these comprehensive high-throughput data, identifying true associations and reducing the number of false associations. In this Review, we explore the emerging approaches for data integration - including meta-dimensional and multi-staged analyses - which aim to deepen our understanding of the role of genetics and genomics in complex outcomes. With the use and further development of these approaches, an improved understanding of the relationship between genomic variation and human phenotypes may be revealed.


Assuntos
Interpretação Estatística de Dados , Variação Genética , Genótipo , Padrões de Herança/fisiologia , Modelos Biológicos , Fenótipo , Biologia de Sistemas/métodos , Humanos , Metanálise como Assunto
5.
Pharmacogenomics J ; 19(2): 178-190, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-29795408

RESUMO

Identifying genetic variants associated with chemotherapeutic induced toxicity is an important step towards personalized treatment of cancer patients. However, annotating and interpreting the associated genetic variants remains challenging because each associated variant is a surrogate for many other variants in the same region. The issue is further complicated when investigating patterns of associated variants with multiple drugs. In this study, we used biological knowledge to annotate and compare genetic variants associated with cellular sensitivity to mechanistically distinct chemotherapeutic drugs, including platinating agents (cisplatin, carboplatin), capecitabine, cytarabine, and paclitaxel. The most significantly associated SNPs from genome wide association studies of cellular sensitivity to each drug in lymphoblastoid cell lines derived from populations of European (CEU) and African (YRI) descent were analyzed for their enrichment in biological pathways and processes. We annotated genetic variants using higher-level biological annotations in efforts to group variants into more interpretable biological modules. Using the higher-level annotations, we observed distinct biological modules associated with cell line populations as well as classes of chemotherapeutic drugs. We also integrated genetic variants and gene expression variables to build predictive models for chemotherapeutic drug cytotoxicity and prioritized the network models based on the enrichment of DNA regulatory data. Several biological annotations, often encompassing different SNPs, were replicated in independent datasets. By using biological knowledge and DNA regulatory information, we propose a novel approach for jointly analyzing genetic variants associated with multiple chemotherapeutic drugs.


Assuntos
Variação Genética/genética , Estudo de Associação Genômica Ampla/métodos , Neoplasias/tratamento farmacológico , Farmacogenética/métodos , População Negra/genética , Capecitabina/efeitos adversos , Capecitabina/uso terapêutico , Carboplatina/efeitos adversos , Carboplatina/uso terapêutico , Linhagem Celular , Cisplatino/efeitos adversos , Cisplatino/uso terapêutico , Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Genoma Humano/genética , Humanos , Anotação de Sequência Molecular , Neoplasias/genética , Paclitaxel/efeitos adversos , Paclitaxel/uso terapêutico , Polimorfismo de Nucleotídeo Único/genética , População Branca/genética
6.
J Biomed Inform ; 56: 220-8, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26048077

RESUMO

Evaluation of survival models to predict cancer patient prognosis is one of the most important areas of emphasis in cancer research. A binary classification approach has difficulty directly predicting survival due to the characteristics of censored observations and the fact that the predictive power depends on the threshold used to set two classes. In contrast, the traditional Cox regression approach has some drawbacks in the sense that it does not allow for the identification of interactions between genomic features, which could have key roles associated with cancer prognosis. In addition, data integration is regarded as one of the important issues in improving the predictive power of survival models since cancer could be caused by multiple alterations through meta-dimensional genomic data including genome, epigenome, transcriptome, and proteome. Here we have proposed a new integrative framework designed to perform these three functions simultaneously: (1) predicting censored survival data; (2) integrating meta-dimensional omics data; (3) identifying interactions within/between meta-dimensional genomic features associated with survival. In order to predict censored survival time, martingale residuals were calculated as a new continuous outcome and a new fitness function used by the grammatical evolution neural network (GENN) based on mean absolute difference of martingale residuals was implemented. To test the utility of the proposed framework, a simulation study was conducted, followed by an analysis of meta-dimensional omics data including copy number, gene expression, DNA methylation, and protein expression data in breast cancer retrieved from The Cancer Genome Atlas (TCGA). On the basis of the results from breast cancer dataset, we were able to identify interactions not only within a single dimension of genomic data but also between meta-dimensional omics data that are associated with survival. Notably, the predictive power of our best meta-dimensional model was 73% which outperformed all of the other models conducted based on a single dimension of genomic data. Breast cancer is an extremely heterogeneous disease and the high levels of genomic diversity within/between breast tumors could affect the risk of therapeutic responses and disease progression. Thus, identifying interactions within/between meta-dimensional omics data associated with survival in breast cancer is expected to deliver direction for improved meta-dimensional prognostic biomarkers and therapeutic targets.


Assuntos
Neoplasias da Mama/mortalidade , Coleta de Dados , Informática Médica/métodos , Análise de Sobrevida , Algoritmos , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Biologia Computacional/métodos , Simulação por Computador , Metilação de DNA , Progressão da Doença , Epigenômica , Feminino , Perfilação da Expressão Gênica , Genoma Humano , Genômica , Humanos , Modelos Estatísticos , Redes Neurais de Computação , Prognóstico , Modelos de Riscos Proporcionais , Proteoma , Software , Transcriptoma , Resultado do Tratamento
7.
Pac Symp Biocomput ; 29: 650-653, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38160314

RESUMO

The following sections are included:Introduction to the workshopWorkshop Presenters.

8.
Res Sq ; 2024 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-38496631

RESUMO

Background: Preeclampsia (PE) is a severe pregnancy complication characterized by hypertension and end-organ damage such as proteinuria. PE poses a significant threat to women's long-term health, including an increased risk of cardiovascular and renal diseases. Most previous studies have been hypothesis-based, potentially overlooking certain significant complications. This study conducts a comprehensive, non-hypothesis-based analysis of PE-complicated diagnoses after pregnancies using multiple large-scale electronic health records (EHR) datasets. Method: From the University of Michigan (UM) Healthcare System, we collected 4,348 PE patients for the cases and 27,377 patients with pregnancies not complicated by PE or related conditions for the controls. We first conducted a non-hypothesis-based analysis to identify any long-term adverse health conditions associated with PE using logistic regression with adjustments to demographics, social history, and medical history. We confirmed the identified complications with UK Biobank data which contain 443 PE cases and 14,870 non-PE controls. We then conducted a survival analysis on complications that exhibited significance in more than 5 consecutive years post-PE. We further examined the potential racial disparities of identified complications between Caucasian and African American patients. Findings: Uncomplicated hypertension, complicated diabetes, congestive heart failure, renal failure, and obesity exhibited significantly increased risks whereas hypothyroidism showed decreased risks, in 5 consecutive years after PE in the UM discovery data. UK Biobank data confirmed the increased risks of uncomplicated hypertension, complicated diabetes, congestive heart failure, renal failure, and obesity. Further survival analysis using UM data indicated significantly increased risks in uncomplicated hypertension, complicated diabetes, congestive heart failure, renal failure, and obesity, and significantly decreased risks in hypothyroidism. There exist racial differences in the risks of developing hypertension and hypothyroidism after PE. PE protects against hypothyroidism in African American postpartum women but not Cacausians; it also increases the risks of uncomplicated hypertension but less severely in African American postpartum women as compared to Cacausians. Interpretation: This study addresses the lack of a comprehensive examination of PE's long-term effects utilizing large-scale EHR and advanced statistical methods. Our findings underscore the need for long-term monitoring and interventions for women with a history of PE, emphasizing the importance of personalized postpartum care. Notably, the racial disparities observed in the impact of PE on hypertension and hypothyroidism highlight the necessity of tailored aftercare based on race.

9.
Artigo em Inglês | MEDLINE | ID: mdl-38723657

RESUMO

The progress of precision medicine research hinges on the gathering and analysis of extensive and diverse clinical datasets. With the continued expansion of modalities, scales, and sources of clinical datasets, it becomes imperative to devise methods for aggregating information from these varied sources to achieve a comprehensive understanding of diseases. In this review, we describe two important approaches for the analysis of diverse clinical datasets, namely the centralized model and federated model. We compare and contrast the strengths and weaknesses inherent in each model and present recent progress in methodologies and their associated challenges. Finally, we present an outlook on the opportunities that both models hold for the future analysis of clinical data.

10.
medRxiv ; 2024 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-38405849

RESUMO

Background: Preeclampsia (PE) is a severe pregnancy complication characterized by hypertension and end-organ damage such as proteinuria. PE poses a significant threat to women's long-term health, including an increased risk of cardiovascular and renal diseases. Most previous studies have been hypothesis-based, potentially overlooking certain significant complications. This study conducts a comprehensive, non-hypothesis-based analysis of PE-complicated diagnoses after pregnancies using multiple large-scale electronic health records (EHR) datasets. Method: From the University of Michigan (UM) Healthcare System, we collected 4,348 PE patients for the cases and 27,377 patients with pregnancies not complicated by PE or related conditions for the controls. We first conducted a non-hypothesis-based analysis to identify any long-term adverse health conditions associated with PE using logistic regression with adjustments to demographics, social history, and medical history. We confirmed the identified complications with UK Biobank data which contain 443 PE cases and 14,870 non-PE controls. We then conducted a survival analysis on complications that exhibited significance in more than 5 consecutive years post-PE. We further examined the potential racial disparities of identified complications between Caucasian and African American patients. Findings: Uncomplicated hypertension, complicated diabetes, congestive heart failure, renal failure, and obesity exhibited significantly increased risks whereas hypothyroidism showed decreased risks, in 5 consecutive years after PE in the UM discovery data. UK Biobank data confirmed the increased risks of uncomplicated hypertension, complicated diabetes, congestive heart failure, renal failure, and obesity. Further survival analysis using UM data indicated significantly increased risks in uncomplicated hypertension, complicated diabetes, congestive heart failure, renal failure, and obesity, and significantly decreased risks in hypothyroidism. There exist racial differences in the risks of developing hypertension and hypothyroidism after PE. PE protects against hypothyroidism in African American postpartum women but not Cacausians; it also increases the risks of uncomplicated hypertension but less severely in African American postpartum women as compared to Cacausians. Interpretation: This study addresses the lack of a comprehensive examination of PE's long-term effects utilizing large-scale EHR and advanced statistical methods. Our findings underscore the need for long-term monitoring and interventions for women with a history of PE, emphasizing the importance of personalized postpartum care. Notably, the racial disparities observed in the impact of PE on hypertension and hypothyroidism highlight the necessity of tailored aftercare based on race.

11.
medRxiv ; 2024 Jan 10.
Artigo em Inglês | MEDLINE | ID: mdl-38260403

RESUMO

Genome-wide association studies (GWAS) have been instrumental in identifying genetic associations for various diseases and traits. However, uncovering genetic underpinnings among traits beyond univariate phenotype associations remains a challenge. Multi-phenotype associations (MPA), or genetic pleiotropy, offer important insights into shared genes and pathways among traits, enhancing our understanding of genetic architectures of complex diseases. GWAS of biobank-linked electronic health record (EHR) data are increasingly being utilized to identify MPA among various traits and diseases. However, methodologies that can efficiently take advantage of distributed EHR to detect MPA are still lacking. Here, we introduce mixWAS, a novel algorithm that efficiently and losslessly integrates multiple EHRs via summary statistics, allowing the detection of MPA among mixed phenotypes while accounting for heterogeneities across EHRs. Simulations demonstrate that mixWAS outperforms the widely used MPA detection method, Phenome-wide association study (PheWAS), across diverse scenarios. Applying mixWAS to data from seven EHRs in the US, we identified 4,534 MPA among blood lipids, BMI, and circulatory diseases. Validation in an independent EHR data from UK confirmed 97.7% of the associations. mixWAS fundamentally improves the detection of MPA and is available as a free, open-source software.

12.
J Am Med Inform Assoc ; 31(4): 809-819, 2024 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-38065694

RESUMO

OBJECTIVES: COVID-19, since its emergence in December 2019, has globally impacted research. Over 360 000 COVID-19-related manuscripts have been published on PubMed and preprint servers like medRxiv and bioRxiv, with preprints comprising about 15% of all manuscripts. Yet, the role and impact of preprints on COVID-19 research and evidence synthesis remain uncertain. MATERIALS AND METHODS: We propose a novel data-driven method for assigning weights to individual preprints in systematic reviews and meta-analyses. This weight termed the "confidence score" is obtained using the survival cure model, also known as the survival mixture model, which takes into account the time elapsed between posting and publication of a preprint, as well as metadata such as the number of first 2-week citations, sample size, and study type. RESULTS: Using 146 preprints on COVID-19 therapeutics posted from the beginning of the pandemic through April 30, 2021, we validated the confidence scores, showing an area under the curve of 0.95 (95% CI, 0.92-0.98). Through a use case on the effectiveness of hydroxychloroquine, we demonstrated how these scores can be incorporated practically into meta-analyses to properly weigh preprints. DISCUSSION: It is important to note that our method does not aim to replace existing measures of study quality but rather serves as a supplementary measure that overcomes some limitations of current approaches. CONCLUSION: Our proposed confidence score has the potential to improve systematic reviews of evidence related to COVID-19 and other clinical conditions by providing a data-driven approach to including unpublished manuscripts.


Assuntos
COVID-19 , Humanos , Revisões Sistemáticas como Assunto , Projetos de Pesquisa , PubMed , Pandemias
13.
J Am Med Inform Assoc ; 31(6): 1303-1312, 2024 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-38713006

RESUMO

OBJECTIVES: Racial disparities in kidney transplant access and posttransplant outcomes exist between non-Hispanic Black (NHB) and non-Hispanic White (NHW) patients in the United States, with the site of care being a key contributor. Using multi-site data to examine the effect of site of care on racial disparities, the key challenge is the dilemma in sharing patient-level data due to regulations for protecting patients' privacy. MATERIALS AND METHODS: We developed a federated learning framework, named dGEM-disparity (decentralized algorithm for Generalized linear mixed Effect Model for disparity quantification). Consisting of 2 modules, dGEM-disparity first provides accurately estimated common effects and calibrated hospital-specific effects by requiring only aggregated data from each center and then adopts a counterfactual modeling approach to assess whether the graft failure rates differ if NHB patients had been admitted at transplant centers in the same distribution as NHW patients were admitted. RESULTS: Utilizing United States Renal Data System data from 39 043 adult patients across 73 transplant centers over 10 years, we found that if NHB patients had followed the distribution of NHW patients in admissions, there would be 38 fewer deaths or graft failures per 10 000 NHB patients (95% CI, 35-40) within 1 year of receiving a kidney transplant on average. DISCUSSION: The proposed framework facilitates efficient collaborations in clinical research networks. Additionally, the framework, by using counterfactual modeling to calculate the event rate, allows us to investigate contributions to racial disparities that may occur at the level of site of care. CONCLUSIONS: Our framework is broadly applicable to other decentralized datasets and disparities research related to differential access to care. Ultimately, our proposed framework will advance equity in human health by identifying and addressing hospital-level racial disparities.


Assuntos
Algoritmos , Negro ou Afro-Americano , Disparidades em Assistência à Saúde , Transplante de Rim , População Branca , Humanos , Estados Unidos , Disparidades em Assistência à Saúde/etnologia , Adulto , Masculino , Feminino , Rejeição de Enxerto/etnologia , Pessoa de Meia-Idade
14.
Pac Symp Biocomput ; 28: 546-548, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36541009

RESUMO

The primary efforts of disease and epidemiological research can be divided into two areas: identifying the causal mechanisms and utilizing important variables for risk prediction. The latter is generally perceived as a more obtainable goal due to the vast number of readily available tools and the faster pace of obtaining results. However, the lower barrier of entry in risk prediction means that it is easy to make predictions, yet it is incredibility more difficult to make sound predictions. As an ever-growing amount of data is being generated, developing risk prediction models and turning them into clinically actionable findings is crucial as the next step. However, there are still sizable gaps before risk prediction models can be implemented clinically. While clinicians are eager to embrace new ways to improve patients' care, they are overwhelmed by a plethora of prediction methods. Thus, the next generation of prediction models will need to shift from making simple predictions towards interpretable, equitable, explainable and ultimately, casual predictions.


Assuntos
Biologia Computacional , Humanos , Medição de Risco , Assistência ao Paciente , Previsões
15.
BioData Min ; 16(1): 20, 2023 Jul 13.
Artigo em Inglês | MEDLINE | ID: mdl-37443040

RESUMO

The introduction of large language models (LLMs) that allow iterative "chat" in late 2022 is a paradigm shift that enables generation of text often indistinguishable from that written by humans. LLM-based chatbots have immense potential to improve academic work efficiency, but the ethical implications of their fair use and inherent bias must be considered. In this editorial, we discuss this technology from the academic's perspective with regard to its limitations and utility for academic writing, education, and programming. We end with our stance with regard to using LLMs and chatbots in academia, which is summarized as (1) we must find ways to effectively use them, (2) their use does not constitute plagiarism (although they may produce plagiarized text), (3) we must quantify their bias, (4) users must be cautious of their poor accuracy, and (5) the future is bright for their application to research and as an academic tool.

16.
Sci Rep ; 13(1): 19078, 2023 11 04.
Artigo em Inglês | MEDLINE | ID: mdl-37925516

RESUMO

In response to the escalating global obesity crisis and its associated health and financial burdens, this paper presents a novel methodology for analyzing longitudinal weight loss data and assessing the effectiveness of financial incentives. Drawing from the Keep It Off trial-a three-arm randomized controlled study with 189 participants-we examined the potential impact of financial incentives on weight loss maintenance. Given that some participants choose not to weigh themselves because of small weight change or weight gains, which is a common phenomenon in many weight-loss studies, traditional methods, for example, the Generalized Estimating Equations (GEE) method tends to overestimate the effect size due to the assumption that data are missing completely at random. To address this challenge, we proposed a framework which can identify evidence of missing not at random and conduct bias correction using the estimating equation derived from pairwise composite likelihood. By analyzing the Keep It Off data, we found that the data in this trial are most likely characterized by non-random missingness. Notably, we also found that the enrollment time (i.e., duration time) would be positively associated with the weight loss maintenance after adjusting for the baseline participant characteristics (e.g., age, sex). Moreover, the lottery-based intervention was found to be more effective in weight loss maintenance compared with the direct payment intervention, though the difference was non-statistically significant. This framework's significance extends beyond weight loss research, offering a semi-parametric approach to assess missing data mechanisms and robustly explore associations between exposures (e.g., financial incentives) and key outcomes (e.g., weight loss maintenance). In essence, the proposed methodology provides a powerful toolkit for analyzing real-world longitudinal data, particularly in scenarios with data missing not at random, enriching comprehension of intricate dataset dynamics.


Assuntos
Projetos de Pesquisa , Redução de Peso , Humanos , Viés , Estudos Longitudinais , Autorrelato , Ensaios Clínicos Controlados Aleatórios como Assunto
17.
bioRxiv ; 2023 Jan 13.
Artigo em Inglês | MEDLINE | ID: mdl-36711526

RESUMO

Background: Quantitative Trait Locus (QTL) analysis and Genome-Wide Association Studies (GWAS) have the power to identify variants that capture significant levels of phenotypic variance in complex traits. However, effort and time are required to select the best methods and optimize parameters and pre-processing steps. Although machine learning approaches have been shown to greatly assist in optimization and data processing, applying them to QTL analysis and GWAS is challenging due to the complexity of large, heterogenous datasets. Here, we describe proof-of-concept for an automated machine learning approach, AutoQTL, with the ability to automate many complex decisions related to analysis of complex traits and generate diverse solutions to describe relationships that exist in genetic data. Results: Using a dataset of 18 putative QTL from a large-scale GWAS of body mass index in the laboratory rat, Rattus norvegicus , AutoQTL captures the phenotypic variance explained under a standard additive model while also providing evidence of non-additive effects including deviations from additivity and 2-way epistatic interactions from simulated data via multiple optimal solutions. Additionally, feature importance metrics provide different insights into the inheritance models and predictive power of multiple GWAS-derived putative QTL. Conclusions: This proof-of-concept illustrates that automated machine learning techniques can be applied to genetic data and has the potential to detect both additive and non-additive effects via various optimal solutions and feature importance metrics. In the future, we aim to expand AutoQTL to accommodate omics-level datasets with intelligent feature selection strategies.

18.
BioData Min ; 16(1): 14, 2023 Apr 10.
Artigo em Inglês | MEDLINE | ID: mdl-37038201

RESUMO

BACKGROUND: Quantitative Trait Locus (QTL) analysis and Genome-Wide Association Studies (GWAS) have the power to identify variants that capture significant levels of phenotypic variance in complex traits. However, effort and time are required to select the best methods and optimize parameters and pre-processing steps. Although machine learning approaches have been shown to greatly assist in optimization and data processing, applying them to QTL analysis and GWAS is challenging due to the complexity of large, heterogenous datasets. Here, we describe proof-of-concept for an automated machine learning approach, AutoQTL, with the ability to automate many complicated decisions related to analysis of complex traits and generate solutions to describe relationships that exist in genetic data. RESULTS: Using a publicly available dataset of 18 putative QTL from a large-scale GWAS of body mass index in the laboratory rat, Rattus norvegicus, AutoQTL captures the phenotypic variance explained under a standard additive model. AutoQTL also detects evidence of non-additive effects including deviations from additivity and 2-way epistatic interactions in simulated data via multiple optimal solutions. Additionally, feature importance metrics provide different insights into the inheritance models and predictive power of multiple GWAS-derived putative QTL. CONCLUSIONS: This proof-of-concept illustrates that automated machine learning techniques can complement standard approaches and have the potential to detect both additive and non-additive effects via various optimal solutions and feature importance metrics. In the future, we aim to expand AutoQTL to accommodate omics-level datasets with intelligent feature selection and feature engineering strategies.

19.
Nat Commun ; 12(1): 168, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33420026

RESUMO

Increasingly, clinical phenotypes with matched genetic data from bio-bank linked electronic health records (EHRs) have been used for pleiotropy analyses. Thus far, pleiotropy analysis using individual-level EHR data has been limited to data from one site. However, it is desirable to integrate EHR data from multiple sites to improve the detection power and generalizability of the results. Due to privacy concerns, individual-level patients' data are not easily shared across institutions. As a result, we introduce Sum-Share, a method designed to efficiently integrate EHR and genetic data from multiple sites to perform pleiotropy analysis. Sum-Share requires only summary-level data and one round of communication from each site, yet it produces identical test statistics compared with that of pooled individual-level data. Consequently, Sum-Share can achieve lossless integration of multiple datasets. Using real EHR data from eMERGE, Sum-Share is able to identify 1734 potential pleiotropic SNPs for five cardiovascular diseases.


Assuntos
Registros Eletrônicos de Saúde/estatística & dados numéricos , Pleiotropia Genética , Comunicação , Bases de Dados Factuais , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Modelos Biológicos , Fenótipo , Polimorfismo de Nucleotídeo Único , Privacidade
20.
AMIA Annu Symp Proc ; 2020: 1383-1391, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33936514

RESUMO

Large-scale biobank cohorts coupled with electronic health records offer unprecedented opportunities to study genotype-phenotype relationships. Genome-wide association studies uncovered disease-associated loci through univariate methods, with the focus on one trait at a time. With genetic variants being identifiedfor thousands of traits, researchers found that 90% of human genetic loci are associated with more than one trait, highlighting the ubiquity of pleiotropy. Recently, multivariate methods have been proposed to effectively identify pleiotropy. However, the statistical performance in natural biomedical data, which often have unbalanced case-control sample sizes, is largely known. In this work, we designed 21 scenarios of real-data informed simulations to thoroughly evaluate the statistical characteristics of univariate and multivariate methods. Our results can serve as a reference guide for the application of multivariate methods. We also investigated potential pleiotropy across type II diabetes, Alzheimer's disease, atherosclerosis of arteries, depression, and atherosclerotic heart disease in the UK Biobank.


Assuntos
Bancos de Espécimes Biológicos , Bioestatística , Estudos de Casos e Controles , Simulação por Computador , Diabetes Mellitus Tipo 2 , Estudo de Associação Genômica Ampla , Humanos , Análise Multivariada , Fenótipo , Tamanho da Amostra , Reino Unido
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA