Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
Patterns (N Y) ; 5(6): 100982, 2024 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-39005490

RESUMO

Phenome-wide association studies (PheWASs) serve as a way of documenting the relationship between genotypes and multiple phenotypes, helping to uncover unexplored genotype-phenotype associations (known as pleiotropy). Secondly, Mendelian randomization (MR) can be harnessed to make causal statements about a pair of phenotypes by comparing their genetic architecture. Thus, approaches that automate both PheWASs and MR can enhance biobank-scale analyses, circumventing the need for multiple tools by providing a comprehensive, end-to-end tool to drive scientific discovery. To this end, we present PYPE, a Python pipeline for running, visualizing, and interpreting PheWASs. PYPE utilizes input genotype or phenotype files to automatically estimate associations between the chosen independent variables and phenotypes. PYPE can also produce a variety of visualizations and can be used to identify nearby genes and functional consequences of significant associations. Finally, PYPE can identify possible causal relationships between phenotypes using MR under a variety of causal effect modeling scenarios.

2.
Genes (Basel) ; 15(5)2024 05 12.
Artigo em Inglês | MEDLINE | ID: mdl-38790246

RESUMO

Mitochondrial DNA (mtDNA) exhibits distinct characteristics distinguishing it from the nuclear genome, necessitating specific analytical methods in genetic studies. This comprehensive review explores the complex role of mtDNA in a variety of genetic studies, including genome-wide, epigenome-wide, and phenome-wide association studies, with a focus on its implications for human traits and diseases. Here, we discuss the structure and gene-encoding properties of mtDNA, along with the influence of environmental factors and epigenetic modifications on its function and variability. Particularly significant are the challenges posed by mtDNA's high mutation rate, heteroplasmy, and copy number variations, and their impact on disease susceptibility and population genetic analyses. The review also highlights recent advances in methodological approaches that enhance our understanding of mtDNA associations, advocating for refined genetic research techniques that accommodate its complexities. By providing a comprehensive overview of the intricacies of mtDNA, this paper underscores the need for an integrated approach to genetic studies that considers the unique properties of mitochondrial genetics. Our findings aim to inform future research and encourage the development of innovative methodologies to better interpret the broad implications of mtDNA in human health and disease.


Assuntos
DNA Mitocondrial , Humanos , DNA Mitocondrial/genética , Variações do Número de Cópias de DNA , Epigênese Genética , Estudo de Associação Genômica Ampla/métodos , Heteroplasmia/genética , Mitocôndrias/genética , Predisposição Genética para Doença
3.
J Transl Med ; 22(1): 366, 2024 Apr 17.
Artigo em Inglês | MEDLINE | ID: mdl-38632662

RESUMO

BACKGROUND: Early-onset prostate cancer (EOPC, ≤ 55 years) has a unique clinical entity harboring high genetic risk, but the majority of EOPC patients still substantial opportunity to be early-detected thus suffering an unfavorable prognosis. A refined understanding of age-based polygenic risk score (PRS) for prostate cancer (PCa) would be essential for personalized risk stratification. METHODS: We included 167,517 male participants [4882 cases including 205 EOPC and 4677 late-onset PCa (LOPC)] from UK Biobank. A General-, an EOPC- and an LOPC-PRS were derived from age-specific genome-wide association studies. Weighted Cox proportional hazard models were applied to estimate the risk of PCa associated with PRSs. The discriminatory capability of PRSs were validated using time-dependent receiver operating characteristic (ROC) curves with additional 4238 males from PLCO and TCGA. Phenome-wide association studies underlying Mendelian Randomization were conducted to discover EOPC linking phenotypes. RESULTS: The 269-PRS calculated via well-established risk variants was more strongly associated with risk of EOPC [hazard ratio (HR) = 2.35, 95% confidence interval (CI) 1.99-2.78] than LOPC (HR = 1.95, 95% CI 1.89-2.01; I2 = 79%). EOPC-PRS was dramatically related to EOPC risk (HR = 4.70, 95% CI 3.98-5.54) but not to LOPC (HR = 0.98, 95% CI 0.96-1.01), while LOPC-PRS had similar risk estimates for EOPC and LOPC (I2 = 0%). Particularly, EOPC-PRS performed optimal discriminatory capability for EOPC (area under the ROC = 0.613). Among the phenomic factors to PCa deposited in the platform of ProAP (Prostate cancer Age-based PheWAS; https://mulongdu.shinyapps.io/proap ), EOPC was preferentially associated with PCa family history while LOPC was prone to environmental and lifestyles exposures. CONCLUSIONS: This study comprehensively profiled the distinct genetic and phenotypic architecture of EOPC. The EOPC-PRS may optimize risk estimate of PCa in young males, particularly those without family history, thus providing guidance for precision population stratification.


Assuntos
Estratificação de Risco Genético , Neoplasias da Próstata , Humanos , Masculino , Estudo de Associação Genômica Ampla , Estudos de Coortes , Fatores de Risco , Predisposição Genética para Doença
4.
J Am Med Inform Assoc ; 31(4): 846-854, 2024 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-38263490

RESUMO

IMPORTANCE: Knowledge gained from cohort studies has dramatically advanced both public and precision health. The All of Us Research Program seeks to enroll 1 million diverse participants who share multiple sources of data, providing unique opportunities for research. It is important to understand the phenomic profiles of its participants to conduct research in this cohort. OBJECTIVES: More than 280 000 participants have shared their electronic health records (EHRs) in the All of Us Research Program. We aim to understand the phenomic profiles of this cohort through comparisons with those in the US general population and a well-established nation-wide cohort, UK Biobank, and to test whether association results of selected commonly studied diseases in the All of Us cohort were comparable to those in UK Biobank. MATERIALS AND METHODS: We included participants with EHRs in All of Us and participants with health records from UK Biobank. The estimates of prevalence of diseases in the US general population were obtained from the Global Burden of Diseases (GBD) study. We conducted phenome-wide association studies (PheWAS) of 9 commonly studied diseases in both cohorts. RESULTS: This study included 287 012 participants from the All of Us EHR cohort and 502 477 participants from the UK Biobank. A total of 314 diseases curated by the GBD were evaluated in All of Us, 80.9% (N = 254) of which were more common in All of Us than in the US general population [prevalence ratio (PR) >1.1, P < 2 × 10-5]. Among 2515 diseases and phenotypes evaluated in both All of Us and UK Biobank, 85.6% (N = 2152) were more common in All of Us (PR >1.1, P < 2 × 10-5). The Pearson correlation coefficients of effect sizes from PheWAS between All of Us and UK Biobank were 0.61, 0.50, 0.60, 0.57, 0.40, 0.53, 0.46, 0.47, and 0.24 for ischemic heart diseases, lung cancer, chronic obstructive pulmonary disease, dementia, colorectal cancer, lower back pain, multiple sclerosis, lupus, and cystic fibrosis, respectively. DISCUSSION: Despite the differences in prevalence of diseases in All of Us compared to the US general population or the UK Biobank, our study supports that All of Us can facilitate rapid investigation of a broad range of diseases. CONCLUSION: Most diseases were more common in All of Us than in the general US population or the UK Biobank. Results of disease-disease association tests from All of Us are comparable to those estimated in another well-studied national cohort.


Assuntos
Fenômica , Saúde da População , Humanos , Bancos de Espécimes Biológicos , Biobanco do Reino Unido , Fenótipo , Reino Unido/epidemiologia
5.
Genome Med ; 15(1): 103, 2023 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-38037155

RESUMO

Gain-of-function (GOF) variants give rise to increased/novel protein functions whereas loss-of-function (LOF) variants lead to diminished protein function. Experimental approaches for identifying GOF and LOF are generally slow and costly, whilst available computational methods have not been optimized to discriminate between GOF and LOF variants. We have developed LoGoFunc, a machine learning method for predicting pathogenic GOF, pathogenic LOF, and neutral genetic variants, trained on a broad range of gene-, protein-, and variant-level features describing diverse biological characteristics. LoGoFunc outperforms other tools trained solely to predict pathogenicity for identifying pathogenic GOF and LOF variants and is available at https://itanlab.shinyapps.io/goflof/ .


Assuntos
Genoma , Proteínas , Humanos , Aprendizado de Máquina
6.
Int J Mol Sci ; 24(15)2023 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-37569411

RESUMO

Rheumatoid arthritis (RA) is a systemic disease characterized by non-infectious inflammation of the joints and surrounding tissues, which can cause severe health problems, affect the patient's daily life, and even cause death. RA can be clinically diagnosed by the occurrence of blood serological markers, rheumatoid factor (RF) and anti-cyclic citrullinated peptide antibody (anti-CCP). However, about 20% of RA patients exhibit negative results for both markers, which makes RA diagnosis difficult and, therefore, may delay the effective treatment. Previous studies found some evidence that human leukocyte antigen (HLA)-related genes might be the susceptibility genes for RA and their polymorphisms might contribute to varieties of susceptibility and disease severity. This study aimed for the genetic polymorphisms of the RA patient genome and their effects on the RA patient's serological makers, RF and anti-CCP. A total of 4580 patients' electronic medical records from 1992 to 2020 were retrieved from the China Medical University Hospital database. The most representative single-nucleotide polymorphisms (SNPs) were identified through a genome-wide association study (GWAS) followed by enzyme-linked immunosorbent assay (ELISA) validation using the blood from 30 additional RA patients. The results showed significant changes at the position of chromosome 6 with rs9270481 being the most significant locus, which indicated the location of the HLA-DRB1 gene. Further, patients with the CC genotype at this locus were more likely to exhibit negative results for RF and anti-CCP than those with the TT genotype. The C allele was also more likely to be associated with negative results for RF and anti-CCP. The results demonstrated that a genetic polymorphism at rs9270481 affected the expression of RF and anti-CCP in RA patients, which might indicate the necessity to develop a personalized treatment plan for each individual patient based on the genetic profile.

7.
Front Nutr ; 10: 1108477, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37063319

RESUMO

Background: Circulating vitamin D has been associated with multiple clinical diseases in observational studies, but the association was inconsistent due to the presence of confounders. We conducted a bidirectional Mendelian randomization (MR) study to explore the healthy atlas of vitamin D in many clinical traits and evaluate their causal association. Methods: Based on a large-scale genome-wide association study (GWAS), the single nucleotide polymorphism (SNPs) instruments of circulating 25-hydroxyvitamin D (25OHD) from 443,734 Europeans and the corresponding effects of 10 clinical diseases and 42 clinical traits in the European population were recruited to conduct a bidirectional two-sample Mendelian randomization study. Under the network of Mendelian randomization analysis, inverse-variance weighting (IVW), weighted median, weighted mode, and Mendelian randomization (MR)-Egger regression were performed to explore the causal effects and pleiotropy. Mendelian randomization pleiotropy RESidual Sum and Outlier (MR-PRESSO) was conducted to uncover and exclude pleiotropic SNPs. Results: The results revealed that genetically decreased vitamin D was inversely related to the estimated BMD (ß = -0.029 g/cm2, p = 0.027), TC (ß = -0.269 mmol/L, p = 0.006), TG (ß = -0.208 mmol/L, p = 0.002), and pulse pressure (ß = -0.241 mmHg, p = 0.043), while positively associated with lymphocyte count (ß = 0.037%, p = 0.015). The results did not reveal any causal association of vitamin D with clinical diseases. On the contrary, genetically protected CKD was significantly associated with increased vitamin D (ß = 0.056, p = 2.361 × 10-26). Conclusion: The putative causal effects of circulating vitamin D on estimated bone mass, plasma triglyceride, and total cholesterol were uncovered, but not on clinical diseases. Vitamin D may be linked to clinical disease by affecting health-related metabolic markers.

8.
Genome Med ; 14(1): 104, 2022 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-36085083

RESUMO

BACKGROUND: Large medical centers in urban areas, like Los Angeles, care for a diverse patient population and offer the potential to study the interplay between genetic ancestry and social determinants of health. Here, we explore the implications of genetic ancestry within the University of California, Los Angeles (UCLA) ATLAS Community Health Initiative-an ancestrally diverse biobank of genomic data linked with de-identified electronic health records (EHRs) of UCLA Health patients (N=36,736). METHODS: We quantify the extensive continental and subcontinental genetic diversity within the ATLAS data through principal component analysis, identity-by-descent, and genetic admixture. We assess the relationship between genetically inferred ancestry (GIA) and >1500 EHR-derived phenotypes (phecodes). Finally, we demonstrate the utility of genetic data linked with EHR to perform ancestry-specific and multi-ancestry genome and phenome-wide scans across a broad set of disease phenotypes. RESULTS: We identify 5 continental-scale GIA clusters including European American (EA), African American (AA), Hispanic Latino American (HL), South Asian American (SAA) and East Asian American (EAA) individuals and 7 subcontinental GIA clusters within the EAA GIA corresponding to Chinese American, Vietnamese American, and Japanese American individuals. Although we broadly find that self-identified race/ethnicity (SIRE) is highly correlated with GIA, we still observe marked differences between the two, emphasizing that the populations defined by these two criteria are not analogous. We find a total of 259 significant associations between continental GIA and phecodes even after accounting for individuals' SIRE, demonstrating that for some phenotypes, GIA provides information not already captured by SIRE. GWAS identifies significant associations for liver disease in the 22q13.31 locus across the HL and EAA GIA groups (HL p-value=2.32×10-16, EAA p-value=6.73×10-11). A subsequent PheWAS at the top SNP reveals significant associations with neurologic and neoplastic phenotypes specifically within the HL GIA group. CONCLUSIONS: Overall, our results explore the interplay between SIRE and GIA within a disease context and underscore the utility of studying the genomes of diverse individuals through biobank-scale genotyping linked with EHR-based phenotyping.


Assuntos
Registros Eletrônicos de Saúde , Saúde Pública , Povo Asiático , Bancos de Espécimes Biológicos , Genômica , Humanos
9.
Front Med (Lausanne) ; 9: 830621, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35991636

RESUMO

Excess thyroid hormones have complex metabolic effects, particularly hyperthyroidism, and are associated with various cardiovascular risk factors. Previous candidate gene studies have indicated that genetic variants may contribute to this variable response. Electronic medical record (EMR) biobanks containing clinical and genomic data on large numbers of individuals have great potential to inform the disease comorbidity development. In this study, we combined electronic medical record (EMR) -derived phenotypes and genotype information to conduct a genome-wide analysis of hyperthyroidism in a 35,009-patient cohort in Taiwan. Diagnostic codes were used to identify 2,767 patients with hyperthyroidism. Our genome-wide association study (GWAS) identified 44 novel genomic risk markers in 10 loci on chromosomes 2, 6, and 14 (P < 5 × 10-14), including CTLA4, HCP5, HLA-B, POU5F1, CCHCR1, HLA-DRA, HLA-DRB9, TSHR, RPL17P3, and CEP128. We further conducted a comorbidity analysis of our results, and the data revealed a strong correlation between hyperthyroidism patients with thyroid storm and stroke. In this study, we demonstrated application of the PheWAS using large EMR biobanks to inform the comorbidity development in hyperthyroidism patients. Our data suggest significant common genetic risk factors in patients with hyperthyroidism. Additionally, our results show that sex, body mass index (BMI), and thyroid storm are associated with an increased risk of stroke in subjects with hyperthyroidism.

10.
Nutrients ; 14(14)2022 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-35889900

RESUMO

Alcohol consumption is associated with the development of cardiovascular diseases, cancer, and liver disease. The biological mechanisms are still largely unclear. Here, we aimed to use an agnostic approach to identify phenotypes mediating the effect of alcohol on various diseases. METHODS: We performed an agnostic association analysis between alcohol consumption (red and white wine, beer/cider, fortified wine, and spirits) with over 7800 phenotypes from the UK biobank comprising 223,728 participants. We performed Mendelian randomisation analysis to infer causality. We additionally performed a Phenome-wide association analysis and a mediation analysis between alcohol consumption as exposure, phenotypes in a causal relationship with alcohol consumption as mediators, and various diseases as the outcome. RESULTS: Of 45 phenotypes in association with alcohol consumption, 20 were in a causal relationship with alcohol consumption. Gamma glutamyltransferase (GGT; ß = 9.44; 95% CI = 5.94, 12.93; Pfdr = 9.04 × 10-7), mean sphered cell volume (ß = 0.189; 95% CI = 0.11, 0.27; Pfdr = 1.00 × 10-4), mean corpuscular volume (ß = 0.271; 95% CI = 0.19, 0.35; Pfdr = 7.09 × 10-10) and mean corpuscular haemoglobin (ß = 0.278; 95% CI = 0.19, 0.36; Pfdr = 1.60 × 10-6) demonstrated the strongest causal relationships. We also identified GGT and physical inactivity as mediators in the pathway between alcohol consumption, liver cirrhosis and alcohol dependence. CONCLUSION: Our study provides evidence of causality between alcohol consumption and 20 phenotypes and a mediation effect for physical activity on health consequences of alcohol consumption.


Assuntos
Consumo de Bebidas Alcoólicas , Bancos de Espécimes Biológicos , Consumo de Bebidas Alcoólicas/efeitos adversos , Consumo de Bebidas Alcoólicas/genética , Alcoolismo , Estudo de Associação Genômica Ampla , Humanos , Análise da Randomização Mendeliana , Polimorfismo de Nucleotídeo Único , Reino Unido/epidemiologia
11.
Front Endocrinol (Lausanne) ; 13: 842673, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35321340

RESUMO

Hyperthyroidism is a prevalent endocrine disorder, and genetics play a major role in the development of thyroid-associated diseases. In particular, the inheritance of HLA has been demonstrated to induce the highest susceptibility to Graves' disease (GD). However, thus far, no studies have reported the contribution of HLA to the development of GD and the complications that follow. Thus, in the present study, to the best of our knowledge, for the first time, a powerful imputation method, HIBAG, was used to predict the HLA subtypes among populations with available genome-wide SNP array data from the China Medical University Hospital (CMUH). The disease status was extracted from the CMUH electronic medical records; a total of 2,998 subjects with GD were identified as the cases to be tested and 29,083 subjects without any diagnosis of thyroid disorders were randomly selected as the controls. A total of 12 HLA class I genotypes (HLA-A*02:07-*11:01, HLA-B*40:01-*46:01 and *46:01-*46:01, and HLA-C*01:02-*01:02, *01:02-*03:04, and *01:02-*07:02) and 17 HLA class II genotypes (HLA-DPA1*02:02-*02:02, HLA-DPB1*02:01-*05:01, *02:02-*05:01, and *04:01-*05:01, HLA-DQA1*03:02, HLA-DRB1*09:01-*15:01, and *09:01-*09:01) were found to be associated with GD in the Taiwanese population. Moreover, the HLA subtypes HLA-A*11:01, HLA-B*46:01, HLA-DPA1*01:03, and HLA-DPB1*05:01 were found to be associated with heart disease, stroke, diabetes, and hypertension among subjects with GD. Our data suggest that several HLA alleles are markedly associated with GD and its comorbidities, including heart disease, hypertension, and diabetes.


Assuntos
Doença de Graves , Cardiopatias , Hipertensão , Alelos , Registros Eletrônicos de Saúde , Eletrônica , Doença de Graves/epidemiologia , Doença de Graves/genética , Antígenos HLA-A/genética , Antígenos HLA-B/genética , Humanos , Hipertensão/genética
12.
Trends Genet ; 38(4): 353-363, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-34991903

RESUMO

In the past 10 years since its introduction, phenome-wide association studies (PheWAS) have uncovered novel genotype-phenotype relationships. Along the way, PheWAS have evolved in many aspects as a study design with the expanded availability of large data repositories with genome-wide data linked to detailed phenotypic data. Advancement in methods, including algorithms, software, and publicly available integrated resources, makes it feasible to more fully realize the potential of PheWAS, overcoming the previous computational and analytical limitations. We review here the most recent improvements and notable applications of PheWAS since the second half of the decade from its inception. We also note the challenges that remain embedded along the entire PheWAS analytical pipeline that necessitate further development of tools and resources to further advance the understanding of the complex genetic architecture underlying human diseases and traits.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Algoritmos , Fenótipo , Software
13.
Genome Med ; 13(1): 172, 2021 10 29.
Artigo em Inglês | MEDLINE | ID: mdl-34715901

RESUMO

BACKGROUND: Deletions and duplications of the multigenic 16p11.2 and 22q11.2 copy number variant (CNV) regions are associated with brain-related disorders including schizophrenia, intellectual disability, obesity, bipolar disorder, and autism spectrum disorder (ASD). The contribution of individual CNV genes to each of these identified phenotypes is unknown, as well as the contribution of these CNV genes to other potentially subtler health implications for carriers. Hypothesizing that DNA copy number exerts most effects via impacts on RNA expression, we attempted a novel in silico fine-mapping approach in non-CNV carriers using both GWAS and biobank data. METHODS: We first asked whether gene expression level in any individual gene in the CNV region alters risk for a known CNV-associated behavioral phenotype(s). Using transcriptomic imputation, we performed association testing for CNV genes within large genotyped cohorts for schizophrenia, IQ, BMI, bipolar disorder, and ASD. Second, we used a biobank containing electronic health data to compare the medical phenome of CNV carriers to controls within 700,000 individuals in order to investigate the full spectrum of health effects of the CNVs. Third, we used genotypes for over 48,000 individuals within the biobank to perform phenome-wide association studies between imputed expressions of individual 16p11.2 and 22q11.2 genes and over 1500 health traits. RESULTS: Using large genotyped cohorts, we found individual genes within 16p11.2 associated with schizophrenia (TMEM219, INO80E, YPEL3), BMI (TMEM219, SPN, TAOK2, INO80E), and IQ (SPN), using conditional analysis to identify upregulation of INO80E as the driver of schizophrenia, and downregulation of SPN and INO80E as increasing BMI. We identified both novel and previously observed over-represented traits within the electronic health records of 16p11.2 and 22q11.2 CNV carriers. In the phenome-wide association study, we found seventeen significant gene-trait pairs, including psychosis (NPIPB11, SLX1B) and mood disorders (SCARF2), and overall enrichment of mental traits. CONCLUSIONS: Our results demonstrate how integration of genetic and clinical data aids in understanding CNV gene function and implicates pleiotropy and multigenicity in CNV biology.


Assuntos
Transtorno Autístico/genética , Deleção Cromossômica , Transtornos Cromossômicos , Cromossomos Humanos Par 16/genética , Variações do Número de Cópias de DNA , Síndrome de DiGeorge/genética , Transcriptoma , Transtorno do Espectro Autista/genética , Genótipo , Humanos , Deficiência Intelectual/genética , Fenótipo , Transtornos Psicóticos/genética , Receptores Depuradores Classe F/genética , Esquizofrenia/genética , Proteínas Supressoras de Tumor/genética
14.
Am J Hum Genet ; 108(11): 2099-2111, 2021 11 04.
Artigo em Inglês | MEDLINE | ID: mdl-34678161

RESUMO

The integration of genomic data into health systems offers opportunities to identify genomic factors underlying the continuum of rare and common disease. We applied a population-scale haplotype association approach based on identity-by-descent (IBD) in a large multi-ethnic biobank to a spectrum of disease outcomes derived from electronic health records (EHRs) and uncovered a risk locus for liver disease. We used genome sequencing and in silico approaches to fine-map the signal to a non-coding variant (c.2784-12T>C) in the gene ABCB4. In vitro analysis confirmed the variant disrupted splicing of the ABCB4 pre-mRNA. Four of five homozygotes had evidence of advanced liver disease, and there was a significant association with liver disease among heterozygotes, suggesting the variant is linked to increased risk of liver disease in an allele dose-dependent manner. Population-level screening revealed the variant to be at a carrier rate of 1.95% in Puerto Rican individuals, likely as the result of a Puerto Rican founder effect. This work demonstrates that integrating EHR and genomic data at a population scale can facilitate strategies for understanding the continuum of genomic risk for common diseases, particularly in populations underrepresented in genomic medicine.


Assuntos
Atenção à Saúde/organização & administração , Predisposição Genética para Doença , Hepatopatias/genética , Subfamília B de Transportador de Cassetes de Ligação de ATP/genética , Registros Eletrônicos de Saúde , Haplótipos , Heterozigoto , Hispânico ou Latino/genética , Homozigoto , Humanos , Porto Rico
15.
Front Genet ; 12: 707836, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34394194

RESUMO

Repurposing is an increasingly attractive method within the field of drug development for its efficiency at identifying new therapeutic opportunities among approved drugs at greatly reduced cost and time of more traditional methods. Repurposing has generated significant interest in the realm of rare disease treatment as an innovative strategy for finding ways to manage these complex conditions. The selection of which agents should be tested in which conditions is currently informed by both human and machine discovery, yet the appropriate balance between these approaches, including the role of artificial intelligence (AI), remains a significant topic of discussion in drug discovery for rare diseases and other conditions. Our drug repurposing team at Vanderbilt University Medical Center synergizes machine learning techniques like phenome-wide association study-a powerful regression method for generating hypotheses about new indications for an approved drug-with the knowledge and creativity of scientific, legal, and clinical domain experts. While our computational approaches generate drug repurposing hits with a high probability of success in a clinical trial, human knowledge remains essential for the hypothesis creation, interpretation, "go-no go" decisions with which machines continue to struggle. Here, we reflect on our experience synergizing AI and human knowledge toward realizable patient outcomes, providing case studies from our portfolio that inform how we balance human knowledge and machine intelligence for drug repurposing in rare disease.

16.
Front Genet ; 12: 682638, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34211504

RESUMO

With the advances in genotyping technologies and electronic health records (EHRs), large biobanks have been great resources to identify novel genetic associations and gene-environment interactions on a genome-wide and even a phenome-wide scale. To date, several phenome-wide association studies (PheWAS) have been performed on biobank data, which provides comprehensive insights into many aspects of human genetics and biology. Although inspiring, PheWAS on large-scale biobank data encounters new challenges including computational burden, unbalanced phenotypic distribution, and genetic relationship. In this paper, we first discuss these new challenges and their potential impact on data analysis. Then, we summarize approaches that are scalable and robust in GWAS and PheWAS. This review can serve as a practical guide for geneticists, epidemiologists, and other medical researchers to identify genetic variations associated with health-related phenotypes in large-scale biobank data analysis. Meanwhile, it can also help statisticians to gain a comprehensive and up-to-date understanding of the current technical tool development.

17.
Am J Hum Genet ; 108(5): 825-839, 2021 05 06.
Artigo em Inglês | MEDLINE | ID: mdl-33836139

RESUMO

In genome-wide association studies, ordinal categorical phenotypes are widely used to measure human behaviors, satisfaction, and preferences. However, because of the lack of analysis tools, methods designed for binary or quantitative traits are commonly used inappropriately to analyze categorical phenotypes. To accurately model the dependence of an ordinal categorical phenotype on covariates, we propose an efficient mixed model association test, proportional odds logistic mixed model (POLMM). POLMM is computationally efficient to analyze large datasets with hundreds of thousands of samples, can control type I error rates at a stringent significance level regardless of the phenotypic distribution, and is more powerful than alternative methods. In contrast, the standard linear mixed model approaches cannot control type I error rates for rare variants when the phenotypic distribution is unbalanced, although they performed well when testing common variants. We applied POLMM to 258 ordinal categorical phenotypes on array genotypes and imputed samples from 408,961 individuals in UK Biobank. In total, we identified 5,885 genome-wide significant variants, of which, 424 variants (7.2%) are rare variants with MAF < 0.01.


Assuntos
Simulação por Computador , Estudo de Associação Genômica Ampla , Modelos Genéticos , Fenótipo , Bancos de Espécimes Biológicos , Criança , Feminino , Humanos , Masculino , Projetos de Pesquisa , Reino Unido
18.
Am J Hum Genet ; 107(5): 815-836, 2020 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-32991828

RESUMO

To facilitate scientific collaboration on polygenic risk scores (PRSs) research, we created an extensive PRS online repository for 35 common cancer traits integrating freely available genome-wide association studies (GWASs) summary statistics from three sources: published GWASs, the NHGRI-EBI GWAS Catalog, and UK Biobank-based GWASs. Our framework condenses these summary statistics into PRSs using various approaches such as linkage disequilibrium pruning/p value thresholding (fixed or data-adaptively optimized thresholds) and penalized, genome-wide effect size weighting. We evaluated the PRSs in two biobanks: the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort at Michigan Medicine, and the population-based UK Biobank (UKB). For each PRS construct, we provide measures on predictive performance and discrimination. Besides PRS evaluation, the Cancer-PRSweb platform features construct downloads and phenome-wide PRS association study results (PRS-PheWAS) for predictive PRSs. We expect this integrated platform to accelerate PRS-related cancer research.


Assuntos
Bancos de Espécimes Biológicos/estatística & dados numéricos , Predisposição Genética para Doença , Genoma Humano , Genômica/métodos , Herança Multifatorial , Neoplasias/genética , Adulto , Idoso , Feminino , Estudo de Associação Genômica Ampla , Humanos , Internet , Desequilíbrio de Ligação , Masculino , Pessoa de Meia-Idade , Neoplasias/classificação , Neoplasias/diagnóstico , Neoplasias/epidemiologia , Fenótipo , Característica Quantitativa Herdável , Fatores de Risco , Reino Unido/epidemiologia , Estados Unidos/epidemiologia
19.
Mol Genet Genomic Med ; 8(10): e1456, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32869547

RESUMO

BACKGROUND: Genetics is best dedicated to interpreting pathogenesis and revealing gene functions. The past decade has witnessed unprecedented progress in genetics, particularly in genome-wide identification of disorder variants through Genome-Wide Association Studies (GWAS) and Phenome-Wide Association Studies (PheWAS). However, it is still a great challenge to use GWAS/PheWAS-derived data to elucidate pathogenesis. METHODS: In this study, we used HotNet2, a heat diffusion-based systems genetics algorithm, to calculate the networks for disease genes obtained from GWAS and PheWAS, with an attempt to get deeper insights into disease pathogenesis at a molecular level. RESULTS: Through HotNet2 calculation, significant networks for 202 (for GWAS) and 167 (for PheWAS) types of diseases were identified and evaluated, respectively. The GWAS-derived disease networks exhibit a stronger biomedical relevance than PheWAS counterparts. Therefore, the GWAS-derived networks were used for pathogenesis interpretation by integrating the accumulated biomedical information. As a result, the pathogenesis for 64 diseases was elucidated in terms of mutation-caused abnormal transcriptional regulation, and 47 diseases were preliminarily interpreted in terms of mutation-caused varied protein-protein interactions. In addition, 3,802 genes (including 46 function-unknown genes) were assigned with new functions by disease network information, some of which were validated through mice gene knockout experiments. CONCLUSIONS: Systems genetics algorithm HotNet2 can efficiently establish genotype-phenotype links at the level of biological networks. Compared with original GWAS/PheWAS results, HotNet2-calculated disease-gene associations have stronger biomedical significance, hence provide better interpretations for the pathogenesis of genome-wide variants, and offer new insights into gene functions as well. These results are also helpful in drug development.


Assuntos
Redes Reguladoras de Genes , Doenças Genéticas Inatas/genética , Estudo de Associação Genômica Ampla/métodos , Anotação de Sequência Molecular/métodos , Mapas de Interação de Proteínas , Algoritmos , Animais , Humanos , Camundongos , Camundongos Endogâmicos C57BL , Conformação Proteica
20.
Stat Methods Med Res ; 29(2): 455-465, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-30943854

RESUMO

Electronic medical records data are valuable resources for discovery research. They contain detailed phenotypic information on individual patients, opening opportunities for simultaneously studying multiple phenotypes. A useful tool for such simultaneous assessment is the phenome-wide association study, which relates a genomic or biological marker of interest to a wide spectrum of disease phenotypes, typically defined by the diagnostic billing codes. One challenge arises when the biomarker of interest is expensive to measure on the entire electronic medical record cohort. Performing phenome-wide association study based on supervised estimation using only subjects who have marker measurements may yield limited power. In this paper, we focus on the setting where the marker is measured on a small fraction of the patients while a few surrogate markers such as historical measurements of the biomarker are available on a large number of patients. We propose an efficient semi-supervised estimation procedure to estimate the covariance between the biomarker and the billing code, leveraging the surrogate marker information. We employ surrogate marker values to impute the missing outcome via a two-step semi-non-parametric approach and demonstrate that our proposed estimator is always more efficient than the supervised counterpart without requiring the imputation model to be correct. We illustrate the proposed procedure by assessing the association between the C-reactive protein and some inflammatory diseases with an electronic medical record study of inflammatory bowel disease performed with the Partners HealthCare electronic medical record database where C-reactive protein was only measured for a small fraction of the patients due to budget constraints.


Assuntos
Interpretação Estatística de Dados , Registros Eletrônicos de Saúde , Estudo de Associação Genômica Ampla , Algoritmos , Viés , Biomarcadores , Doenças Inflamatórias Intestinais
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA