Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 104
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
PLoS Genet ; 17(6): e1009534, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-34086673

RESUMO

Assumptions are made about the genetic model of single nucleotide polymorphisms (SNPs) when choosing a traditional genetic encoding: additive, dominant, and recessive. Furthermore, SNPs across the genome are unlikely to demonstrate identical genetic models. However, running SNP-SNP interaction analyses with every combination of encodings raises the multiple testing burden. Here, we present a novel and flexible encoding for genetic interactions, the elastic data-driven genetic encoding (EDGE), in which SNPs are assigned a heterozygous value based on the genetic model they demonstrate in a dataset prior to interaction testing. We assessed the power of EDGE to detect genetic interactions using 29 combinations of simulated genetic models and found it outperformed the traditional encoding methods across 10%, 30%, and 50% minor allele frequencies (MAFs). Further, EDGE maintained a low false-positive rate, while additive and dominant encodings demonstrated inflation. We evaluated EDGE and the traditional encodings with genetic data from the Electronic Medical Records and Genomics (eMERGE) Network for five phenotypes: age-related macular degeneration (AMD), age-related cataract, glaucoma, type 2 diabetes (T2D), and resistant hypertension. A multi-encoding genome-wide association study (GWAS) for each phenotype was performed using the traditional encodings, and the top results of the multi-encoding GWAS were considered for SNP-SNP interaction using the traditional encodings and EDGE. EDGE identified a novel SNP-SNP interaction for age-related cataract that no other method identified: rs7787286 (MAF: 0.041; intergenic region of chromosome 7)-rs4695885 (MAF: 0.34; intergenic region of chromosome 4) with a Bonferroni LRT p of 0.018. A SNP-SNP interaction was found in data from the UK Biobank within 25 kb of these SNPs using the recessive encoding: rs60374751 (MAF: 0.030) and rs6843594 (MAF: 0.34) (Bonferroni LRT p: 0.026). We recommend using EDGE to flexibly detect interactions between SNPs exhibiting diverse action.


Assuntos
Modelos Genéticos , Catarata/genética , Conjuntos de Dados como Assunto , Diabetes Mellitus Tipo 2/genética , Frequência do Gene , Estudo de Associação Genômica Ampla , Glaucoma/genética , Humanos , Hipertensão/genética , Degeneração Macular/genética , Fenótipo , Polimorfismo de Nucleotídeo Único
2.
BMC Med Inform Decis Mak ; 22(1): 23, 2022 01 28.
Artigo em Inglês | MEDLINE | ID: mdl-35090449

RESUMO

INTRODUCTION: Currently, one of the commonly used methods for disseminating electronic health record (EHR)-based phenotype algorithms is providing a narrative description of the algorithm logic, often accompanied by flowcharts. A challenge with this mode of dissemination is the potential for under-specification in the algorithm definition, which leads to ambiguity and vagueness. METHODS: This study examines incidents of under-specification that occurred during the implementation of 34 narrative phenotyping algorithms in the electronic Medical Record and Genomics (eMERGE) network. We reviewed the online communication history between algorithm developers and implementers within the Phenotype Knowledge Base (PheKB) platform, where questions could be raised and answered regarding the intended implementation of a phenotype algorithm. RESULTS: We developed a taxonomy of under-specification categories via an iterative review process between two groups of annotators. Under-specifications that lead to ambiguity and vagueness were consistently found across narrative phenotype algorithms developed by all involved eMERGE sites. DISCUSSION AND CONCLUSION: Our findings highlight that under-specification is an impediment to the accuracy and efficiency of the implementation of current narrative phenotyping algorithms, and we propose approaches for mitigating these issues and improved methods for disseminating EHR phenotyping algorithms.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Genômica , Humanos , Bases de Conhecimento , Fenótipo
3.
Genes Immun ; 20(7): 555-565, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-30459343

RESUMO

Resting-state white blood cell (WBC) count is a marker of inflammation and immune system health. There is evidence that WBC count is not fixed over time and there is heterogeneity in WBC trajectory that is associated with morbidity and mortality. Latent class mixed modeling (LCMM) is a method that can identify unobserved heterogeneity in longitudinal data and attempts to classify individuals into groups based on a linear model of repeated measurements. We applied LCMM to repeated WBC count measures derived from electronic medical records of participants of the National Human Genetics Research Institute (NHRGI) electronic MEdical Record and GEnomics (eMERGE) network study, revealing two WBC count trajectory phenotypes. Advancing these phenotypes to GWAS, we found genetic associations between trajectory class membership and regions on chromosome 1p34.3 and chromosome 11q13.4. The chromosome 1 region contains CSF3R, which encodes the granulocyte colony-stimulating factor receptor. This protein is a major factor in neutrophil stimulation and proliferation. The association on chromosome 11 contain genes RNF169 and XRRA1; both involved in the regulation of double-strand break DNA repair.


Assuntos
Contagem de Leucócitos/métodos , Leucócitos/classificação , Adulto , Idoso , Bases de Dados Genéticas , Registros Eletrônicos de Saúde , Feminino , Estudo de Associação Genômica Ampla , Humanos , Análise de Classes Latentes , Masculino , Pessoa de Meia-Idade , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Proteínas/genética , Receptores de Fator Estimulador de Colônias/genética , Ubiquitina-Proteína Ligases/genética
4.
Circulation ; 138(17): 1839-1849, 2018 10 23.
Artigo em Inglês | MEDLINE | ID: mdl-29703846

RESUMO

BACKGROUND: Coronary heart disease (CHD) is a leading cause of death globally. Although therapy with statins decreases circulating levels of low-density lipoprotein cholesterol and the incidence of CHD, additional events occur despite statin therapy in some individuals. The genetic determinants of this residual cardiovascular risk remain unknown. METHODS: We performed a 2-stage genome-wide association study of CHD events during statin therapy. We first identified 3099 cases who experienced CHD events (defined as acute myocardial infarction or the need for coronary revascularization) during statin therapy and 7681 controls without CHD events during comparable intensity and duration of statin therapy from 4 sites in the Electronic Medical Records and Genomics Network. We then sought replication of candidate variants in another 160 cases and 1112 controls from a fifth Electronic Medical Records and Genomics site, which joined the network after the initial genome-wide association study. Finally, we performed a phenome-wide association study for other traits linked to the most significant locus. RESULTS: The meta-analysis identified 7 single nucleotide polymorphisms at a genome-wide level of significance within the LPA/PLG locus associated with CHD events on statin treatment. The most significant association was for an intronic single nucleotide polymorphism within LPA/PLG (rs10455872; minor allele frequency, 0.069; odds ratio, 1.58; 95% confidence interval, 1.35-1.86; P=2.6×10-10). In the replication cohort, rs10455872 was also associated with CHD events (odds ratio, 1.71; 95% confidence interval, 1.14-2.57; P=0.009). The association of this single nucleotide polymorphism with CHD events was independent of statin-induced change in low-density lipoprotein cholesterol (odds ratio, 1.62; 95% confidence interval, 1.17-2.24; P=0.004) and persisted in individuals with low-density lipoprotein cholesterol ≤70 mg/dL (odds ratio, 2.43; 95% confidence interval, 1.18-4.75; P=0.015). A phenome-wide association study supported the effect of this region on coronary heart disease and did not identify noncardiovascular phenotypes. CONCLUSIONS: Genetic variations at the LPA locus are associated with CHD events during statin therapy independently of the extent of low-density lipoprotein cholesterol lowering. This finding provides support for exploring strategies targeting circulating concentrations of lipoprotein(a) to reduce CHD events in patients receiving statins.


Assuntos
Doença das Coronárias/genética , Doença das Coronárias/prevenção & controle , Dislipidemias/tratamento farmacológico , Dislipidemias/genética , Inibidores de Hidroximetilglutaril-CoA Redutases/uso terapêutico , Lipoproteína(a)/genética , Polimorfismo de Nucleotídeo Único , Estudos de Casos e Controles , Doença das Coronárias/sangue , Doença das Coronárias/diagnóstico , Bases de Dados Genéticas , Dislipidemias/sangue , Dislipidemias/diagnóstico , Registros Eletrônicos de Saúde , Frequência do Gene , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Inibidores de Hidroximetilglutaril-CoA Redutases/efeitos adversos , Fenótipo , Medição de Risco , Fatores de Risco , Fatores de Tempo , Resultado do Tratamento
5.
Circulation ; 138(22): 2469-2481, 2018 11 27.
Artigo em Inglês | MEDLINE | ID: mdl-30571344

RESUMO

BACKGROUND: Proteomic approaches allow measurement of thousands of proteins in a single specimen, which can accelerate biomarker discovery. However, applying these technologies to massive biobanks is not currently feasible because of the practical barriers and costs of implementing such assays at scale. To overcome these challenges, we used a "virtual proteomic" approach, linking genetically predicted protein levels to clinical diagnoses in >40 000 individuals. METHODS: We used genome-wide association data from the Framingham Heart Study (n=759) to construct genetic predictors for 1129 plasma protein levels. We validated the genetic predictors for 268 proteins and used them to compute predicted protein levels in 41 288 genotyped individuals in the Electronic Medical Records and Genomics (eMERGE) cohort. We tested associations for each predicted protein with 1128 clinical phenotypes. Lead associations were validated with directly measured protein levels and either low-density lipoprotein cholesterol or subclinical atherosclerosis in the MDCS (Malmö Diet and Cancer Study; n=651). RESULTS: In the virtual proteomic analysis in eMERGE, 55 proteins were associated with 89 distinct diagnoses at a false discovery rate q<0.1. Among these, 13 associations involved lipid (n=7) or atherosclerosis (n=6) phenotypes. We tested each association for validation in MDCS using directly measured protein levels. At Bonferroni-adjusted significance thresholds, levels of apolipoprotein E isoforms were associated with hyperlipidemia, and circulating C-type lectin domain family 1 member B and platelet-derived growth factor receptor-ß predicted subclinical atherosclerosis. Odds ratios for carotid atherosclerosis were 1.31 (95% CI, 1.08-1.58; P=0.006) per 1-SD increment in C-type lectin domain family 1 member B and 0.79 (0.66-0.94; P=0.008) per 1-SD increment in platelet-derived growth factor receptor-ß. CONCLUSIONS: We demonstrate a biomarker discovery paradigm to identify candidate biomarkers of cardiovascular and other diseases.


Assuntos
Biomarcadores/sangue , Doenças das Artérias Carótidas/diagnóstico , Estudo de Associação Genômica Ampla , Proteoma/análise , Adulto , Idoso , Idoso de 80 Anos ou mais , Doenças das Artérias Carótidas/genética , Feminino , Genótipo , Humanos , Lectinas Tipo C/análise , Masculino , Pessoa de Meia-Idade , Razão de Chances , Fenótipo , Polimorfismo de Nucleotídeo Único , Proteômica , Receptor beta de Fator de Crescimento Derivado de Plaquetas/sangue
6.
Circ Res ; 120(2): 341-353, 2017 Jan 20.
Artigo em Inglês | MEDLINE | ID: mdl-27899403

RESUMO

RATIONALE: Abdominal aortic aneurysm (AAA) is a complex disease with both genetic and environmental risk factors. Together, 6 previously identified risk loci only explain a small proportion of the heritability of AAA. OBJECTIVE: To identify additional AAA risk loci using data from all available genome-wide association studies. METHODS AND RESULTS: Through a meta-analysis of 6 genome-wide association study data sets and a validation study totaling 10 204 cases and 107 766 controls, we identified 4 new AAA risk loci: 1q32.3 (SMYD2), 13q12.11 (LINC00540), 20q13.12 (near PCIF1/MMP9/ZNF335), and 21q22.2 (ERG). In various database searches, we observed no new associations between the lead AAA single nucleotide polymorphisms and coronary artery disease, blood pressure, lipids, or diabetes mellitus. Network analyses identified ERG, IL6R, and LDLR as modifiers of MMP9, with a direct interaction between ERG and MMP9. CONCLUSIONS: The 4 new risk loci for AAA seem to be specific for AAA compared with other cardiovascular diseases and related traits suggesting that traditional cardiovascular risk factor management may only have limited value in preventing the progression of aneurysmal disease.


Assuntos
Aneurisma da Aorta Abdominal/diagnóstico , Aneurisma da Aorta Abdominal/genética , Loci Gênicos/genética , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Aneurisma da Aorta Abdominal/epidemiologia , Predisposição Genética para Doença/epidemiologia , Variação Genética/genética , Estudo de Associação Genômica Ampla/tendências , Humanos
7.
J Biomed Inform ; 94: 103185, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31028874

RESUMO

OBJECTIVE: To develop machine learning models for classifying the severity of opioid overdose events from clinical data. MATERIALS AND METHODS: Opioid overdoses were identified by diagnoses codes from the Marshfield Clinic population and assigned a severity score via chart review to form a gold standard set of labels. Three primary feature sets were constructed from disparate data sources surrounding each event and used to train machine learning models for phenotyping. RESULTS: Random forest and penalized logistic regression models gave the best performance with cross-validated mean areas under the ROC curves (AUCs) for all severity classes of 0.893 and 0.882 respectively. Features derived from a common data model outperformed features collected from disparate data sources for the same cohort of patients (AUCs 0.893 versus 0.837, p value = 0.002). The addition of features extracted from free text to machine learning models also increased AUCs from 0.827 to 0.893 (p value < 0.0001). Key word features extracted using natural language processing (NLP) such as 'Narcan' and 'Endotracheal Tube' are important for classifying overdose event severity. CONCLUSION: Random forest models using features derived from a common data model and free text can be effective for classifying opioid overdose events.


Assuntos
Analgésicos Opioides/administração & dosagem , Overdose de Drogas , Aprendizado de Máquina , Fenótipo , Registros Eletrônicos de Saúde , Humanos , Índice de Gravidade de Doença
8.
J Biomed Inform ; 96: 103253, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31325501

RESUMO

BACKGROUND: Implementing clinical phenotypes across a network is labor intensive and potentially error prone. Use of a common data model may facilitate the process. METHODS: Electronic Medical Records and Genomics (eMERGE) sites implemented the Observational Health Data Sciences and Informatics (OHDSI) Observational Medical Outcomes Partnership (OMOP) Common Data Model across their electronic health record (EHR)-linked DNA biobanks. Two previously implemented eMERGE phenotypes were converted to OMOP and implemented across the network. RESULTS: It was feasible to implement the common data model across sites, with laboratory data producing the greatest challenge due to local encoding. Sites were then able to execute the OMOP phenotype in less than one day, as opposed to weeks of effort to manually implement an eMERGE phenotype in their bespoke research EHR databases. Of the sites that could compare the current OMOP phenotype implementation with the original eMERGE phenotype implementation, specific agreement ranged from 100% to 43%, with disagreements due to the original phenotype, the OMOP phenotype, changes in data, and issues in the databases. Using the OMOP query as a standard comparison revealed differences in the original implementations despite starting from the same definitions, code lists, flowcharts, and pseudocode. CONCLUSION: Using a common data model can dramatically speed phenotype implementation at the cost of having to populate that data model, though this will produce a net benefit as the number of phenotype implementations increases. Inconsistencies among the implementations of the original queries point to a potential benefit of using a common data model so that actual phenotype code and logic can be shared, mitigating human error in reinterpretation of a narrative phenotype definition.


Assuntos
Transtorno do Deficit de Atenção com Hiperatividade/diagnóstico , Bases de Dados Factuais , Diabetes Mellitus Tipo 2/diagnóstico , Registros Eletrônicos de Saúde , Coleta de Dados , Humanos , Informática Médica , National Human Genome Research Institute (U.S.) , Estudos Observacionais como Assunto , Avaliação de Resultados em Cuidados de Saúde , Fenótipo , Projetos de Pesquisa , Software , Estados Unidos
9.
PLoS Genet ; 12(9): e1006186, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-27623284

RESUMO

Primary open angle glaucoma (POAG) is a complex disease and is one of the major leading causes of blindness worldwide. Genome-wide association studies have successfully identified several common variants associated with glaucoma; however, most of these variants only explain a small proportion of the genetic risk. Apart from the standard approach to identify main effects of variants across the genome, it is believed that gene-gene interactions can help elucidate part of the missing heritability by allowing for the test of interactions between genetic variants to mimic the complex nature of biology. To explain the etiology of glaucoma, we first performed a genome-wide association study (GWAS) on glaucoma case-control samples obtained from electronic medical records (EMR) to establish the utility of EMR data in detecting non-spurious and relevant associations; this analysis was aimed at confirming already known associations with glaucoma and validating the EMR derived glaucoma phenotype. Our findings from GWAS suggest consistent evidence of several known associations in POAG. We then performed an interaction analysis for variants found to be marginally associated with glaucoma (SNPs with main effect p-value <0.01) and observed interesting findings in the electronic MEdical Records and GEnomics Network (eMERGE) network dataset. Genes from the top epistatic interactions from eMERGE data (Likelihood Ratio Test i.e. LRT p-value <1e-05) were then tested for replication in the NEIGHBOR consortium dataset. To replicate our findings, we performed a gene-based SNP-SNP interaction analysis in NEIGHBOR and observed significant gene-gene interactions (p-value <0.001) among the top 17 gene-gene models identified in the discovery phase. Variants from gene-gene interaction analysis that we found to be associated with POAG explain 3.5% of additional genetic variance in eMERGE dataset above what is explained by the SNPs in genes that are replicated from previous GWAS studies (which was only 2.1% variance explained in eMERGE dataset); in the NEIGHBOR dataset, adding replicated SNPs from gene-gene interaction analysis explain 3.4% of total variance whereas GWAS SNPs alone explain only 2.8% of variance. Exploring gene-gene interactions may provide additional insights into many complex traits when explored in properly designed and powered association studies.


Assuntos
Epistasia Genética , Glaucoma de Ângulo Aberto/genética , Polimorfismo de Nucleotídeo Único , Estudos de Casos e Controles , Feminino , Estudo de Associação Genômica Ampla , Humanos , Masculino , Fenótipo
10.
Am J Respir Crit Care Med ; 195(4): 456-463, 2017 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-27611488

RESUMO

RATIONALE: Despite significant advances in knowledge of the genetic architecture of asthma, specific contributors to the variability in the burden between populations remain uncovered. OBJECTIVES: To identify additional genetic susceptibility factors of asthma in European American and African American populations. METHODS: A phenotyping algorithm mining electronic medical records was developed and validated to recruit cases with asthma and control subjects from the Electronic Medical Records and Genomics network. Genome-wide association analyses were performed in pediatric and adult asthma cases and control subjects with European American and African American ancestry followed by metaanalysis. Nominally significant results were reanalyzed conditioning on allergy status. MEASUREMENTS AND MAIN RESULTS: The validation of the algorithm yielded an average of 95.8% positive predictive values for both cases and control subjects. The algorithm accrued 21,644 subjects (65.83% European American and 34.17% African American). We identified four novel population-specific associations with asthma after metaanalyses: loci 6p21.31, 9p21.2, and 10q21.3 in the European American population, and the PTGES gene in African Americans. TEK at 9p21.2, which encodes TIE2, has been shown to be involved in remodeling the airway wall in asthma, and the association remained significant after conditioning by allergy. PTGES, which encodes the prostaglandin E synthase, has also been linked to asthma, where deficient prostaglandin E2 synthesis has been associated with airway remodeling. CONCLUSIONS: This study adds to understanding of the genetic architecture of asthma in European Americans and African Americans and reinforces the need to study populations of diverse ethnic backgrounds to identify shared and unique genetic predictors of asthma.


Assuntos
Asma/genética , Negro ou Afro-Americano/genética , Registros Eletrônicos de Saúde/estatística & dados numéricos , Predisposição Genética para Doença/genética , Prostaglandina-E Sintases/genética , População Branca/genética , Adolescente , Adulto , Remodelação das Vias Aéreas/genética , Remodelação das Vias Aéreas/imunologia , Algoritmos , Asma/etnologia , Criança , Pré-Escolar , Mineração de Dados/métodos , Feminino , Predisposição Genética para Doença/etnologia , Estudo de Associação Genômica Ampla , Humanos , Masculino , Metanálise como Assunto , Fenótipo , Prevalência , Estados Unidos
11.
BMC Infect Dis ; 16(1): 684, 2016 11 17.
Artigo em Inglês | MEDLINE | ID: mdl-27855652

RESUMO

BACKGROUND: Community associated methicillin-resistant Staphylococcus aureus (CA-MRSA) is one of the most common causes of skin and soft tissue infections in the United States, and a variety of genetic host factors are suspected to be risk factors for recurrent infection. Based on the CDC definition, we have developed and validated an electronic health record (EHR) based CA-MRSA phenotype algorithm utilizing both structured and unstructured data. METHODS: The algorithm was validated at three eMERGE consortium sites, and positive predictive value, negative predictive value and sensitivity, were calculated. The algorithm was then run and data collected across seven total sites. The resulting data was used in GWAS analysis. RESULTS: Across seven sites, the CA-MRSA phenotype algorithm identified a total of 349 cases and 7761 controls among the genotyped European and African American biobank populations. PPV ranged from 68 to 100% for cases and 96 to 100% for controls; sensitivity ranged from 94 to 100% for cases and 75 to 100% for controls. Frequency of cases in the populations varied widely by site. There were no plausible GWAS-significant (p < 5 E -8) findings. CONCLUSIONS: Differences in EHR data representation and screening patterns across sites may have affected identification of cases and controls and accounted for varying frequencies across sites. Future work identifying these patterns is necessary.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Estudo de Associação Genômica Ampla/métodos , Staphylococcus aureus Resistente à Meticilina , Fenótipo , Infecções Estafilocócicas/diagnóstico , Adulto , Estudos de Casos e Controles , Infecções Comunitárias Adquiridas/diagnóstico , Infecções Comunitárias Adquiridas/genética , Feminino , Predisposição Genética para Doença , Humanos , Masculino , Fatores de Risco , Sensibilidade e Especificidade , Infecções Estafilocócicas/genética , Estados Unidos
12.
J Med Genet ; 52(4): 282-8, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25587064

RESUMO

BACKGROUND: Whole-genome sequencing (WGS) and whole-exome sequencing (WES) technologies are increasingly used to identify disease-contributing mutations in human genomic studies. It can be a significant challenge to process such data, especially when a large family or cohort is sequenced. Our objective was to develop a big data toolset to efficiently manipulate genome-wide variants, functional annotations and coverage, together with conducting family based sequencing data analysis. METHODS: Hadoop is a framework for reliable, scalable, distributed processing of large data sets using MapReduce programming models. Based on Hadoop and HBase, we developed SeqHBase, a big data-based toolset for analysing family based sequencing data to detect de novo, inherited homozygous, or compound heterozygous mutations that may contribute to disease manifestations. SeqHBase takes as input BAM files (for coverage at every site), variant call format (VCF) files (for variant calls) and functional annotations (for variant prioritisation). RESULTS: We applied SeqHBase to a 5-member nuclear family and a 10-member 3-generation family with WGS data, as well as a 4-member nuclear family with WES data. Analysis times were almost linearly scalable with number of data nodes. With 20 data nodes, SeqHBase took about 5 secs to analyse WES familial data and approximately 1 min to analyse WGS familial data. CONCLUSIONS: These results demonstrate SeqHBase's high efficiency and scalability, which is necessary as WGS and WES are rapidly becoming standard methods to study the genetics of familial disorders.


Assuntos
Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Conjuntos de Dados como Assunto , Exoma , Genoma Humano , Humanos , Mutação
13.
Hum Genet ; 134(6): 659-69, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-25893794

RESUMO

Genetic methods can complement epidemiological surveys and clinical registries in determining prevalence of monogenic autosomal recessive diseases. Several large population-based genetic databases, such as the NHLBI GO Exome Sequencing Project, are now publically available. By assuming Hardy-Weinberg equilibrium, the frequency of individuals homozygous in the general population for a particular pathogenic allele can be directly calculated from a sample of chromosomes where some harbor the pathogenic allele. Further assuming that the penetrance of the pathogenic allele(s) is known, the prevalence of recessive phenotypes can be determined. Such work can inform public health efforts for rare recessive diseases. A Bayesian estimation procedure has yet to be applied to the problem of estimating disease prevalence from large population-based genetic data. A Bayesian framework is developed to derive the posterior probability density of monogenic, autosomal recessive phenotypes. Explicit equations are presented for the credible intervals of these disease prevalence estimates. A primary impediment to performing accurate disease prevalence calculations is the determination of truly pathogenic alleles. This issue is discussed, but in many instances remains a significant barrier to investigations solely reliant on statistical interrogation--functional studies can provide important information for solidifying evidence of variant pathogenicity. We also discuss several challenges to these efforts, including the population structure in the sample of chromosomes, the treatment of allelic heterogeneity, and reduced penetrance of pathogenic variants. To illustrate the application of these methods, we utilized recently published genetic data collected on a large sample from the Schmiedeleut Hutterites. We estimate prevalence and calculate 95% credible intervals for 13 autosomal recessive diseases using these data. In addition, the Bayesian estimation procedure is applied to data from a central European study of hereditary fructose intolerance. The methods described herein show a viable path to robustly estimating both the expected prevalence of autosomal recessive phenotypes and corresponding credible intervals using population-based genetic databases that have recently become available. As these genetic databases increase in number and size with the advent of cost-effective next-generation sequencing, we anticipate that these methods and approaches may be helpful in recessive disease prevalence calculations, potentially impacting public health management, health economic analyses, and treatment of rare diseases.


Assuntos
Bases de Dados Genéticas , Genes Recessivos , Doenças Genéticas Inatas/genética , Modelos Genéticos , Alelos , Animais , Teorema de Bayes , Doenças Genéticas Inatas/epidemiologia , Genética Populacional , Humanos , Prevalência
15.
Hum Genet ; 133(1): 95-109, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24026423

RESUMO

Platelets are enucleated cell fragments derived from megakaryocytes that play key roles in hemostasis and in the pathogenesis of atherothrombosis and cancer. Platelet traits are highly heritable and identification of genetic variants associated with platelet traits and assessing their pleiotropic effects may help to understand the role of underlying biological pathways. We conducted an electronic medical record (EMR)-based study to identify common variants that influence inter-individual variation in the number of circulating platelets (PLT) and mean platelet volume (MPV), by performing a genome-wide association study (GWAS). We characterized genetic variants associated with MPV and PLT using functional, pathway and disease enrichment analyses; we assessed pleiotropic effects of such variants by performing a phenome-wide association study (PheWAS) with a wide range of EMR-derived phenotypes. A total of 13,582 participants in the electronic MEdical Records and GEnomic network had data for PLT and 6,291 participants had data for MPV. We identified five chromosomal regions associated with PLT and eight associated with MPV at genome-wide significance (P < 5E-8). In addition, we replicated 20 SNPs [out of 56 SNPs (α: 0.05/56 = 9E-4)] influencing PLT and 22 SNPs [out of 29 SNPs (α: 0.05/29 = 2E-3)] influencing MPV in a published meta-analysis of GWAS of PLT and MPV. While our GWAS did not find any new associations, our functional analyses revealed that genes in these regions influence thrombopoiesis and encode kinases, membrane proteins, proteins involved in cellular trafficking, transcription factors, proteasome complex subunits, proteins of signal transduction pathways, proteins involved in megakaryocyte development, and platelet production and hemostasis. PheWAS using a single-SNP Bonferroni correction for 1,368 diagnoses (0.05/1368 = 3.6E-5) revealed that several variants in these genes have pleiotropic associations with myocardial infarction, autoimmune, and hematologic disorders. We conclude that multiple genetic loci influence interindividual variation in platelet traits and also have significant pleiotropic effects; the related genes are in multiple functional pathways including those relevant to thrombopoiesis.


Assuntos
Pleiotropia Genética , Estudo de Associação Genômica Ampla/métodos , Volume Plaquetário Médio , Contagem de Plaquetas , Polimorfismo de Nucleotídeo Único , Adulto , Idoso , Idoso de 80 Anos ou mais , Doenças Cardiovasculares/genética , Cromossomos Humanos/genética , Feminino , Loci Gênicos , Hemostasia , Humanos , Masculino , Metanálise como Assunto , Pessoa de Meia-Idade , Fenótipo , Trombopoese/genética
16.
Am J Hum Genet ; 89(4): 529-42, 2011 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-21981779

RESUMO

We repurposed existing genotypes in DNA biobanks across the Electronic Medical Records and Genomics network to perform a genome-wide association study for primary hypothyroidism, the most common thyroid disease. Electronic selection algorithms incorporating billing codes, laboratory values, text queries, and medication records identified 1317 cases and 5053 controls of European ancestry within five electronic medical records (EMRs); the algorithms' positive predictive values were 92.4% and 98.5% for cases and controls, respectively. Four single-nucleotide polymorphisms (SNPs) in linkage disequilibrium at 9q22 near FOXE1 were associated with hypothyroidism at genome-wide significance, the strongest being rs7850258 (odds ratio [OR] 0.74, p = 3.96 × 10(-9)). This association was replicated in a set of 263 cases and 1616 controls (OR = 0.60, p = 5.7 × 10(-6)). A phenome-wide association study (PheWAS) that was performed on this locus with 13,617 individuals and more than 200,000 patient-years of billing data identified associations with additional phenotypes: thyroiditis (OR = 0.58, p = 1.4 × 10(-5)), nodular (OR = 0.76, p = 3.1 × 10(-5)) and multinodular (OR = 0.69, p = 3.9 × 10(-5)) goiters, and thyrotoxicosis (OR = 0.76, p = 1.5 × 10(-3)), but not Graves disease (OR = 1.03, p = 0.82). Thyroid cancer, previously associated with this locus, was not significantly associated in the PheWAS (OR = 1.29, p = 0.09). The strongest association in the PheWAS was hypothyroidism (OR = 0.76, p = 2.7 × 10(-13)), which had an odds ratio that was nearly identical to that of the curated case-control population in the primary analysis, providing further validation of the PheWAS method. Our findings indicate that EMR-linked genomic data could allow discovery of genes associated with many diseases without additional genotyping cost.


Assuntos
Fatores de Transcrição Forkhead/genética , Hipotireoidismo/genética , Idoso , Algoritmos , Feminino , Marcadores Genéticos , Variação Genética , Genoma , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Masculino , Sistemas Computadorizados de Registros Médicos , Pessoa de Meia-Idade , Fenótipo , Valor Preditivo dos Testes
17.
Mol Vis ; 20: 1281-95, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25352737

RESUMO

PURPOSE: Cataract is the leading cause of blindness in the world, and in the United States accounts for approximately 60% of Medicare costs related to vision. The purpose of this study was to identify genetic markers for age-related cataract through a genome-wide association study (GWAS). METHODS: In the electronic medical records and genomics (eMERGE) network, we ran an electronic phenotyping algorithm on individuals in each of five sites with electronic medical records linked to DNA biobanks. We performed a GWAS using 530,101 SNPs from the Illumina 660W-Quad in a total of 7,397 individuals (5,503 cases and 1,894 controls). We also performed an age-at-diagnosis case-only analysis. RESULTS: We identified several statistically significant associations with age-related cataract (45 SNPs) as well as age at diagnosis (44 SNPs). The 45 SNPs associated with cataract at p<1×10(-5) are in several interesting genes, including ALDOB, MAP3K1, and MEF2C. All have potential biologic relationships with cataracts. CONCLUSIONS: This is the first genome-wide association study of age-related cataract, and several regions of interest have been identified. The eMERGE network has pioneered the exploration of genomic associations in biobanks linked to electronic health records, and this study is another example of the utility of such resources. Explorations of age-related cataract including validation and replication of the association results identified herein are needed in future studies.


Assuntos
Catarata/genética , Registros Eletrônicos de Saúde/estatística & dados numéricos , Frutose-Bifosfato Aldolase/genética , Predisposição Genética para Doença , MAP Quinase Quinase Quinase 1/genética , Polimorfismo de Nucleotídeo Único , Fatores Etários , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Catarata/patologia , Bases de Dados de Ácidos Nucleicos , Feminino , Marcadores Genéticos , Genoma Humano , Estudo de Associação Genômica Ampla , Custos de Cuidados de Saúde , Humanos , Fatores de Transcrição MEF2/genética , Masculino , Pessoa de Meia-Idade , Locos de Características Quantitativas , Estados Unidos
18.
J Biomed Inform ; 52: 260-70, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25048351

RESUMO

OBJECTIVE: Electronic health records (EHR) offer medical and pharmacogenomics research unprecedented opportunities to identify and classify patients at risk. EHRs are collections of highly inter-dependent records that include biological, anatomical, physiological, and behavioral observations. They comprise a patient's clinical phenome, where each patient has thousands of date-stamped records distributed across many relational tables. Development of EHR computer-based phenotyping algorithms require time and medical insight from clinical experts, who most often can only review a small patient subset representative of the total EHR records, to identify phenotype features. In this research we evaluate whether relational machine learning (ML) using inductive logic programming (ILP) can contribute to addressing these issues as a viable approach for EHR-based phenotyping. METHODS: Two relational learning ILP approaches and three well-known WEKA (Waikato Environment for Knowledge Analysis) implementations of non-relational approaches (PART, J48, and JRIP) were used to develop models for nine phenotypes. International Classification of Diseases, Ninth Revision (ICD-9) coded EHR data were used to select training cohorts for the development of each phenotypic model. Accuracy, precision, recall, F-Measure, and Area Under the Receiver Operating Characteristic (AUROC) curve statistics were measured for each phenotypic model based on independent manually verified test cohorts. A two-sided binomial distribution test (sign test) compared the five ML approaches across phenotypes for statistical significance. RESULTS: We developed an approach to automatically label training examples using ICD-9 diagnosis codes for the ML approaches being evaluated. Nine phenotypic models for each ML approach were evaluated, resulting in better overall model performance in AUROC using ILP when compared to PART (p=0.039), J48 (p=0.003) and JRIP (p=0.003). DISCUSSION: ILP has the potential to improve phenotyping by independently delivering clinically expert interpretable rules for phenotype definitions, or intuitive phenotypes to assist experts. CONCLUSION: Relational learning using ILP offers a viable approach to EHR-driven phenotyping.


Assuntos
Inteligência Artificial , Mineração de Dados/métodos , Registros Eletrônicos de Saúde/classificação , Algoritmos , Bases de Dados Factuais , Humanos
19.
J Biomed Inform ; 51: 280-6, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-24960203

RESUMO

BACKGROUND: Design patterns, in the context of software development and ontologies, provide generalized approaches and guidance to solving commonly occurring problems, or addressing common situations typically informed by intuition, heuristics and experience. While the biomedical literature contains broad coverage of specific phenotype algorithm implementations, no work to date has attempted to generalize common approaches into design patterns, which may then be distributed to the informatics community to efficiently develop more accurate phenotype algorithms. METHODS: Using phenotyping algorithms stored in the Phenotype KnowledgeBase (PheKB), we conducted an independent iterative review to identify recurrent elements within the algorithm definitions. We extracted and generalized recurrent elements in these algorithms into candidate patterns. The authors then assessed the candidate patterns for validity by group consensus, and annotated them with attributes. RESULTS: A total of 24 electronic Medical Records and Genomics (eMERGE) phenotypes available in PheKB as of 1/25/2013 were downloaded and reviewed. From these, a total of 21 phenotyping patterns were identified, which are available as an online data supplement. CONCLUSIONS: Repeatable patterns within phenotyping algorithms exist, and when codified and cataloged may help to educate both experienced and novice algorithm developers. The dissemination and application of these patterns has the potential to decrease the time to develop algorithms, while improving portability and accuracy.


Assuntos
Algoritmos , Ontologias Biológicas , Mineração de Dados/métodos , Registros Eletrônicos de Saúde/classificação , Genômica/classificação , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão/métodos , Curadoria de Dados/métodos , Registros Eletrônicos de Saúde/organização & administração , Genômica/organização & administração , Fenótipo
20.
Genet Med ; 15(10): 772-8, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-24071798

RESUMO

Genetic testing has had limited impact on routine clinical care. Widespread adoption of electronic health records presents a promising means of disseminating genetic testing into diverse care settings. Practical challenges to integration of genomic data into electronic health records include size and complexity of genetic test results, inadequate use of standards for clinical and genetic data, and limitations in electronic health record capacity to store and analyze genetic data. Related challenges include uncertainty in the interpretation of regulatory requirements for return of results, and privacy concerns specific to genetic testing. Successful integration of genomic data may require significant redesign of existing electronic health record systems.


Assuntos
Registros Eletrônicos de Saúde , Testes Genéticos , Genômica , Privacidade Genética , Genética Médica , Humanos , Armazenamento e Recuperação da Informação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA