Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
1.
PLoS One ; 18(5): e0283553, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37196047

RESUMO

OBJECTIVE: Diverticular disease (DD) is one of the most prevalent conditions encountered by gastroenterologists, affecting ~50% of Americans before the age of 60. Our aim was to identify genetic risk variants and clinical phenotypes associated with DD, leveraging multiple electronic health record (EHR) data sources of 91,166 multi-ancestry participants with a Natural Language Processing (NLP) technique. MATERIALS AND METHODS: We developed a NLP-enriched phenotyping algorithm that incorporated colonoscopy or abdominal imaging reports to identify patients with diverticulosis and diverticulitis from multicenter EHRs. We performed genome-wide association studies (GWAS) of DD in European, African and multi-ancestry participants, followed by phenome-wide association studies (PheWAS) of the risk variants to identify their potential comorbid/pleiotropic effects in clinical phenotypes. RESULTS: Our developed algorithm showed a significant improvement in patient classification performance for DD analysis (algorithm PPVs ≥ 0.94), with up to a 3.5 fold increase in terms of the number of identified patients than the traditional method. Ancestry-stratified analyses of diverticulosis and diverticulitis of the identified subjects replicated the well-established associations between ARHGAP15 loci with DD, showing overall intensified GWAS signals in diverticulitis patients compared to diverticulosis patients. Our PheWAS analyses identified significant associations between the DD GWAS variants and circulatory system, genitourinary, and neoplastic EHR phenotypes. DISCUSSION: As the first multi-ancestry GWAS-PheWAS study, we showcased that heterogenous EHR data can be mapped through an integrative analytical pipeline and reveal significant genotype-phenotype associations with clinical interpretation. CONCLUSION: A systematic framework to process unstructured EHR data with NLP could advance a deep and scalable phenotyping for better patient identification and facilitate etiological investigation of a disease with multilayered data.


Assuntos
Doenças Diverticulares , Diverticulite , Divertículo , Humanos , Registros Eletrônicos de Saúde , Estudo de Associação Genômica Ampla/métodos , Processamento de Linguagem Natural , Fenótipo , Algoritmos , Polimorfismo de Nucleotídeo Único
2.
Methods Inf Med ; 61(1-02): 11-18, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-34991173

RESUMO

OBJECTIVE: Natural language processing (NLP) systems convert unstructured text into analyzable data. Here, we describe the performance measures of NLP to capture granular details on nodules from thyroid ultrasound (US) reports and reveal critical issues with reporting language. METHODS: We iteratively developed NLP tools using clinical Text Analysis and Knowledge Extraction System (cTAKES) and thyroid US reports from 2007 to 2013. We incorporated nine nodule features for NLP extraction. Next, we evaluated the precision, recall, and accuracy of our NLP tools using a separate set of US reports from an academic medical center (A) and a regional health care system (B) during the same period. Two physicians manually annotated each test-set report. A third physician then adjudicated discrepancies. The adjudicated "gold standard" was then used to evaluate NLP performance on the test-set. RESULTS: A total of 243 thyroid US reports contained 6,405 data elements. Inter-annotator agreement for all elements was 91.3%. Compared with the gold standard, overall recall of the NLP tool was 90%. NLP recall for thyroid lobe or isthmus characteristics was: laterality 96% and size 95%. NLP accuracy for nodule characteristics was: laterality 92%, size 92%, calcifications 76%, vascularity 65%, echogenicity 62%, contents 76%, and borders 40%. NLP recall for presence or absence of lymphadenopathy was 61%. Reporting style accounted for 18% errors. For example, the word "heterogeneous" interchangeably referred to nodule contents or echogenicity. While nodule dimensions and laterality were often described, US reports only described contents, echogenicity, vascularity, calcifications, borders, and lymphadenopathy, 46, 41, 17, 15, 9, and 41% of the time, respectively. Most nodule characteristics were equally likely to be described at hospital A compared with hospital B. CONCLUSIONS: NLP can automate extraction of critical information from thyroid US reports. However, ambiguous and incomplete reporting language hinders performance of NLP systems regardless of institutional setting. Standardized or synoptic thyroid US reports could improve NLP performance.


Assuntos
Linfadenopatia , Processamento de Linguagem Natural , Humanos , Glândula Tireoide/diagnóstico por imagem
3.
Front Genet ; 10: 511, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31249589

RESUMO

Uterine fibroids affect up to 77% of women by menopause and account for up to $34 billion in healthcare costs each year. Although fibroid risk is heritable, genetic risk for fibroids is not well understood. We conducted a two-stage case-control meta-analysis of genetic variants in European and African ancestry women with and without fibroids classified by a previously published algorithm requiring pelvic imaging or confirmed diagnosis. Women from seven electronic Medical Records and Genomics (eMERGE) network sites (3,704 imaging-confirmed cases and 5,591 imaging-confirmed controls) and women of African and European ancestry from UK Biobank (UKB, 5,772 cases and 61,457 controls) were included in the discovery genome-wide association study (GWAS) meta-analysis. Variants showing evidence of association in Stage I GWAS (P < 1 × 10-5) were targeted in an independent replication sample of African and European ancestry individuals from the UKB (Stage II) (12,358 cases and 138,477 controls). Logistic regression models were fit with genetic markers imputed to a 1000 Genomes reference and adjusted for principal components for each race- and site-specific dataset, followed by fixed-effects meta-analysis. Final analysis with 21,804 cases and 205,525 controls identified 326 genome-wide significant variants in 11 loci, with three novel loci at chromosome 1q24 (sentinel-SNP rs14361789; P = 4.7 × 10-8), chromosome 16q12.1 (sentinel-SNP rs4785384; P = 1.5 × 10-9) and chromosome 20q13.1 (sentinel-SNP rs6094982; P = 2.6 × 10-8). Our statistically significant findings further support previously reported loci including SNPs near WT1, TNRC6B, SYNE1, BET1L, and CDC42/WNT4. We report evidence of ancestry-specific findings for sentinel-SNP rs10917151 in the CDC42/WNT4 locus (P = 1.76 × 10-24). Ancestry-specific effect-estimates for rs10917151 were in opposite directions (P-Het-between-groups = 0.04) for predominantly African (OR = 0.84) and predominantly European women (OR = 1.16). Genetically-predicted gene expression of several genes including LUZP1 in vagina (P = 4.6 × 10-8), OBFC1 in esophageal mucosa (P = 8.7 × 10-8), NUDT13 in multiple tissues including subcutaneous adipose tissue (P = 3.3 × 10-6), and HEATR3 in skeletal muscle tissue (P = 5.8 × 10-6) were associated with fibroids. The finding for HEATR3 was supported by SNP-based summary Mendelian randomization analysis. Our study suggests that fibroid risk variants act through regulatory mechanisms affecting gene expression and are comprised of alleles that are both ancestry-specific and shared across continental ancestries.

4.
Sci Rep ; 9(1): 6077, 2019 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-30988330

RESUMO

Benign prostatic hyperplasia (BPH) results in a significant public health burden due to the morbidity caused by the disease and many of the available remedies. As much as 70% of men over 70 will develop BPH. Few studies have been conducted to discover the genetic determinants of BPH risk. Understanding the biological basis for this condition may provide necessary insight for development of novel pharmaceutical therapies or risk prediction. We have evaluated SNP-based heritability of BPH in two cohorts and conducted a genome-wide association study (GWAS) of BPH risk using 2,656 cases and 7,763 controls identified from the Electronic Medical Records and Genomics (eMERGE) network. SNP-based heritability estimates suggest that roughly 60% of the phenotypic variation in BPH is accounted for by genetic factors. We used logistic regression to model BPH risk as a function of principal components of ancestry, age, and imputed genotype data, with meta-analysis performed using METAL. The top result was on chromosome 22 in SYN3 at rs2710383 (p-value = 4.6 × 10-7; Odds Ratio = 0.69, 95% confidence interval = 0.55-0.83). Other suggestive signals were near genes GLGC, UNCA13, SORCS1 and between BTBD3 and SPTLC3. We also evaluated genetically-predicted gene expression in prostate tissue. The most significant result was with increasing predicted expression of ETV4 (chr17; p-value = 0.0015). Overexpression of this gene has been associated with poor prognosis in prostate cancer. In conclusion, although there were no genome-wide significant variants identified for BPH susceptibility, we present evidence supporting the heritability of this phenotype, have identified suggestive signals, and evaluated the association between BPH and genetically-predicted gene expression in prostate.


Assuntos
Predisposição Genética para Doença , Padrões de Herança , Hiperplasia Prostática/genética , Idoso , Idoso de 80 Anos ou mais , Biomarcadores/metabolismo , Estudos de Casos e Controles , Registros Eletrônicos de Saúde/estatística & dados numéricos , Perfilação da Expressão Gênica , Estudo de Associação Genômica Ampla , Técnicas de Genotipagem , Humanos , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único , Próstata/patologia , Hiperplasia Prostática/epidemiologia , Hiperplasia Prostática/patologia
5.
Genes Immun ; 20(7): 555-565, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-30459343

RESUMO

Resting-state white blood cell (WBC) count is a marker of inflammation and immune system health. There is evidence that WBC count is not fixed over time and there is heterogeneity in WBC trajectory that is associated with morbidity and mortality. Latent class mixed modeling (LCMM) is a method that can identify unobserved heterogeneity in longitudinal data and attempts to classify individuals into groups based on a linear model of repeated measurements. We applied LCMM to repeated WBC count measures derived from electronic medical records of participants of the National Human Genetics Research Institute (NHRGI) electronic MEdical Record and GEnomics (eMERGE) network study, revealing two WBC count trajectory phenotypes. Advancing these phenotypes to GWAS, we found genetic associations between trajectory class membership and regions on chromosome 1p34.3 and chromosome 11q13.4. The chromosome 1 region contains CSF3R, which encodes the granulocyte colony-stimulating factor receptor. This protein is a major factor in neutrophil stimulation and proliferation. The association on chromosome 11 contain genes RNF169 and XRRA1; both involved in the regulation of double-strand break DNA repair.


Assuntos
Contagem de Leucócitos/métodos , Leucócitos/classificação , Adulto , Idoso , Bases de Dados Genéticas , Registros Eletrônicos de Saúde , Feminino , Estudo de Associação Genômica Ampla , Humanos , Análise de Classes Latentes , Masculino , Pessoa de Meia-Idade , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Proteínas/genética , Receptores de Fator Estimulador de Colônias/genética , Ubiquitina-Proteína Ligases/genética
6.
Circulation ; 138(22): 2469-2481, 2018 11 27.
Artigo em Inglês | MEDLINE | ID: mdl-30571344

RESUMO

BACKGROUND: Proteomic approaches allow measurement of thousands of proteins in a single specimen, which can accelerate biomarker discovery. However, applying these technologies to massive biobanks is not currently feasible because of the practical barriers and costs of implementing such assays at scale. To overcome these challenges, we used a "virtual proteomic" approach, linking genetically predicted protein levels to clinical diagnoses in >40 000 individuals. METHODS: We used genome-wide association data from the Framingham Heart Study (n=759) to construct genetic predictors for 1129 plasma protein levels. We validated the genetic predictors for 268 proteins and used them to compute predicted protein levels in 41 288 genotyped individuals in the Electronic Medical Records and Genomics (eMERGE) cohort. We tested associations for each predicted protein with 1128 clinical phenotypes. Lead associations were validated with directly measured protein levels and either low-density lipoprotein cholesterol or subclinical atherosclerosis in the MDCS (Malmö Diet and Cancer Study; n=651). RESULTS: In the virtual proteomic analysis in eMERGE, 55 proteins were associated with 89 distinct diagnoses at a false discovery rate q<0.1. Among these, 13 associations involved lipid (n=7) or atherosclerosis (n=6) phenotypes. We tested each association for validation in MDCS using directly measured protein levels. At Bonferroni-adjusted significance thresholds, levels of apolipoprotein E isoforms were associated with hyperlipidemia, and circulating C-type lectin domain family 1 member B and platelet-derived growth factor receptor-ß predicted subclinical atherosclerosis. Odds ratios for carotid atherosclerosis were 1.31 (95% CI, 1.08-1.58; P=0.006) per 1-SD increment in C-type lectin domain family 1 member B and 0.79 (0.66-0.94; P=0.008) per 1-SD increment in platelet-derived growth factor receptor-ß. CONCLUSIONS: We demonstrate a biomarker discovery paradigm to identify candidate biomarkers of cardiovascular and other diseases.


Assuntos
Biomarcadores/sangue , Doenças das Artérias Carótidas/diagnóstico , Estudo de Associação Genômica Ampla , Proteoma/análise , Adulto , Idoso , Idoso de 80 Anos ou mais , Doenças das Artérias Carótidas/genética , Feminino , Genótipo , Humanos , Lectinas Tipo C/análise , Masculino , Pessoa de Meia-Idade , Razão de Chances , Fenótipo , Polimorfismo de Nucleotídeo Único , Proteômica , Receptor beta de Fator de Crescimento Derivado de Plaquetas/sangue
7.
J Am Med Inform Assoc ; 25(11): 1540-1546, 2018 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-30124903

RESUMO

Electronic health record (EHR) algorithms for defining patient cohorts are commonly shared as free-text descriptions that require human intervention both to interpret and implement. We developed the Phenotype Execution and Modeling Architecture (PhEMA, http://projectphema.org) to author and execute standardized computable phenotype algorithms. With PhEMA, we converted an algorithm for benign prostatic hyperplasia, developed for the electronic Medical Records and Genomics network (eMERGE), into a standards-based computable format. Eight sites (7 within eMERGE) received the computable algorithm, and 6 successfully executed it against local data warehouses and/or i2b2 instances. Blinded random chart review of cases selected by the computable algorithm shows PPV ≥90%, and 3 out of 5 sites had >90% overlap of selected cases when comparing the computable algorithm to their original eMERGE implementation. This case study demonstrates potential use of PhEMA computable representations to automate phenotyping across different EHR systems, but also highlights some ongoing challenges.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Fenótipo , Hiperplasia Prostática/diagnóstico , Data Warehousing , Bases de Dados Factuais , Genômica , Humanos , Masculino , Estudos de Casos Organizacionais , Hiperplasia Prostática/genética
8.
Artigo em Inglês | MEDLINE | ID: mdl-29706685

RESUMO

Improved prediction of the "most harmful" breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense screening and preventive measures at those women who have the highest risk; however, such prediction models for the "most harmful" breast cancers have rarely been developed. Electronic health records (EHRs) represent an underused data source that has great research and clinical potential. Our goal was to quantify the value of EHR variables in the "most harmful" breast cancer risk prediction. We identified 794 subjects who had breast cancer with primary non-benign tumors with their earliest diagnosis on or after 1/1/2004 from an existing personalized medicine data repository, including 395 "most harmful" breast cancer cases and 399 "least harmful" breast cancer cases. For these subjects, we collected EHR data comprised of 6 components: demographics, diagnoses, symptoms, procedures, medications, and laboratory results. We developed two regularized prediction models, Ridge Logistic Regression (Ridge-LR) and Lasso Logistic Regression (Lasso-LR), to predict the "most harmful" breast cancer one year in advance. The area under the ROC curve (AUC) was used to assess model performance. We observed that the AUCs of Ridge-LR and Lasso-LR models were 0.818 and 0.839 respectively. For both the Ridge-LR and Lasso-LR models, the predictive performance of the whole EHR variables was significantly higher than that of each individual component (p<0.001). In conclusion, EHR variables can be used to predict the "most harmful" breast cancer, providing the possibility to personalize care for those women at the highest risk in clinical practice.

9.
AMIA Annu Symp Proc ; 2018: 1253-1262, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30815167

RESUMO

The predictive capability of combining demographic risk factors, germline genetic variants, and mammogram abnormality features for breast cancer risk prediction is poorly understood. We evaluated the predictive performance of combinations of demographic risk factors, high risk single nucleotide polymorphisms (SNPs), and mammography features for women recommended for breast biopsy in a retrospective case-control study (n = 768) with four logistic regression models. The AUC of the baseline demographic features model was 0.580. Both genetic variants and mammography abnormality features augmented the performance of the baseline model: demographics + SNP (AUC =0.668), demographics + mammography (AUC =0.702). Finally, we found that the demographics + SNP + mammography model (AUC = 0.753) had the greatest predictive power, with a significant performance improvement over the other models. The combination of demographic risk factors, genetic variants and imaging features improves breast cancer risk prediction over prior methods utilizing only a subset of these features.


Assuntos
Neoplasias da Mama , Mamografia , Medição de Risco/métodos , Adulto , Biópsia , Mama/patologia , Neoplasias da Mama/diagnóstico por imagem , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Estudos de Casos e Controles , Feminino , Humanos , Modelos Logísticos , Paridade , Polimorfismo de Nucleotídeo Único , Gravidez , Curva ROC , Estudos Retrospectivos , Fatores de Risco
10.
Proc SPIE Int Soc Opt Eng ; 97872016 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-27279675

RESUMO

Technology advances in genome-wide association studies (GWAS) has engendered optimism that we have entered a new age of precision medicine, in which the risk of breast cancer can be predicted on the basis of a person's genetic variants. The goal of this study is to evaluate the discriminatory power of common genetic variants in breast cancer risk estimation. We conducted a retrospective case-control study drawing from an existing personalized medicine data repository. We collected variables that predict breast cancer risk: 153 high-frequency/low-penetrance genetic variants, reflecting the state-of-the-art GWAS on breast cancer, mammography descriptors and BI-RADS assessment categories in the Breast Imaging Reporting and Data System (BI-RADS) lexicon. We trained and tested naïve Bayes models by using these predictive variables. We generated ROC curves and used the area under the ROC curve (AUC) to quantify predictive performance. We found that genetic variants achieved comparable predictive performance to BI-RADS assessment categories in terms of AUC (0.650 vs. 0.659, p-value = 0.742), but significantly lower predictive performance than the combination of BI-RADS assessment categories and mammography descriptors (0.650 vs. 0.751, p-value < 0.001). A better understanding of relative predictive capability of genetic variants and mammography data may benefit clinicians and patients to make appropriate decisions about breast cancer screening, prevention, and treatment in the era of precision medicine.

11.
J Mach Learn Res ; 172016 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-28559747

RESUMO

Predicting breast cancer risk has long been a goal of medical research in the pursuit of precision medicine. The goal of this study is to develop novel penalized methods to improve breast cancer risk prediction by leveraging structure information in electronic health records. We conducted a retrospective case-control study, garnering 49 mammography descriptors and 77 high-frequency/low-penetrance single-nucleotide polymorphisms (SNPs) from an existing personalized medicine data repository. Structured mammography reports and breast imaging features have long been part of a standard electronic health record (EHR), and genetic markers likely will be in the near future. Lasso and its variants are widely used approaches to integrated learning and feature selection, and our methodological contribution is to incorporate the dependence structure among the features into these approaches. More specifically, we propose a new methodology by combining group penalty and [Formula: see text] (1 ≤ p ≤ 2) fusion penalty to improve breast cancer risk prediction, taking into account structure information in mammography descriptors and SNPs. We demonstrate that our method provides benefits that are both statistically significant and potentially significant to people's lives.

12.
Acad Radiol ; 23(1): 62-9, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26514439

RESUMO

RATIONALE AND OBJECTIVES: The discovery of germline genetic variants associated with breast cancer has engendered interest in risk stratification for improved, targeted detection and diagnosis. However, there has yet to be a comparison of the predictive ability of these genetic variants with mammography abnormality descriptors. MATERIALS AND METHODS: Our institutional review board-approved, Health Insurance Portability and Accountability Act-compliant study utilized a personalized medicine registry in which participants consented to provide a DNA sample and to participate in longitudinal follow-up. In our retrospective, age-matched, case-controlled study of 373 cases and 395 controls who underwent breast biopsy, we collected risk factors selected a priori based on the literature, including demographic variables based on the Gail model, common germline genetic variants, and diagnostic mammography findings according to Breast Imaging Reporting and Data System (BI-RADS). We developed predictive models using logistic regression to determine the predictive ability of (1) demographic variables, (2) 10 selected genetic variants, or (3) mammography BI-RADS features. We evaluated each model in turn by calculating a risk score for each patient using 10-fold cross-validation, used this risk estimate to construct Receiver Operator Characteristic Curve (ROC) curves, and compared the area under the ROC curve (AUC) of each using the DeLong method. RESULTS: The performance of the regression model using demographic risk factors was not statistically different from the model using genetic variants (P = 0.9). The model using mammography features (AUC = 0.689) was superior to both the demographic model (AUC = .598; P < 0.001) and the genetic model (AUC = .601; P < 0.001). CONCLUSIONS: BI-RADS features exceeded the ability of demographic and 10 selected germline genetic variants to predict breast cancer in women recommended for biopsy.


Assuntos
Neoplasias da Mama/diagnóstico por imagem , Mama/patologia , Adulto , Idoso , Idoso de 80 Anos ou mais , Biópsia/métodos , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Métodos Epidemiológicos , Feminino , Genes BRCA1 , Genes BRCA2 , Humanos , Mamografia/métodos , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único/genética , Estados Unidos , Adulto Jovem
13.
AMIA Jt Summits Transl Sci Proc ; 2015: 107-11, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26306250

RESUMO

Recent large-scale genome-wide association studies (GWAS) have identified a number of genetic variants associated with breast cancer which showed great potential for clinical translation, especially in breast cancer diagnosis via mammograms. However, the amount of interaction between these genetic variants and mammographic features that can be leveraged for personalized diagnosis remains unknown. Our study utilizes germline genetic variants and mammographic features that we collected in a breast cancer case-control study. By computing the conditional mutual information between the genetic variants and mammographic features given the breast cancer status, we identified six interaction pairs which elevate breast cancer risk and five interaction pairs which reduce breast cancer risk.

14.
Proc SPIE Int Soc Opt Eng ; 94162015 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-27095854

RESUMO

Combining imaging and genetic information to predict disease presence and behavior is being codified into an emerging discipline called "radiogenomics." Optimal evaluation methodologies for radiogenomics techniques have not been established. We aim to develop a clinical decision framework based on utility analysis to assess prediction models for breast cancer. Our data comes from a retrospective case-control study, collecting Gail model risk factors, genetic variants (single nucleotide polymorphisms-SNPs), and mammographic features in Breast Imaging Reporting and Data System (BI-RADS) lexicon. We first constructed three logistic regression models built on different sets of predictive features: (1) Gail, (2) Gail+SNP, and (3) Gail+SNP+BI-RADS. Then, we generated ROC curves for three models. After we assigned utility values for each category of findings (true negative, false positive, false negative and true positive), we pursued optimal operating points on ROC curves to achieve maximum expected utility (MEU) of breast cancer diagnosis. We used McNemar's test to compare the predictive performance of the three models. We found that SNPs and BI-RADS features augmented the baseline Gail model in terms of the area under ROC curve (AUC) and MEU. SNPs improved sensitivity of the Gail model (0.276 vs. 0.147) and reduced specificity (0.855 vs. 0.912). When additional mammographic features were added, sensitivity increased to 0.457 and specificity to 0.872. SNPs and mammographic features played a significant role in breast cancer risk estimation (p-value < 0.001). Our decision framework comprising utility analysis and McNemar's test provides a novel framework to evaluate prediction models in the realm of radiogenomics.

15.
J Med Imaging (Bellingham) ; 2(4): 041005, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26835489

RESUMO

Combining imaging and genetic information to predict disease presence and progression is being codified into an emerging discipline called "radiogenomics." Optimal evaluation methodologies for radiogenomics have not been well established. We aim to develop a decision framework based on utility analysis to assess predictive models for breast cancer diagnosis. We garnered Gail risk factors, single nucleotide polymorphisms (SNPs), and mammographic features from a retrospective case-control study. We constructed three logistic regression models built on different sets of predictive features: (1) Gail, (2) Gail + Mammo, and (3) Gail + Mammo + SNP. Then we generated receiver operating characteristic (ROC) curves for three models. After we assigned utility values for each category of outcomes (true negatives, false positives, false negatives, and true positives), we pursued optimal operating points on ROC curves to achieve maximum expected utility of breast cancer diagnosis. We performed McNemar's test based on threshold levels at optimal operating points, and found that SNPs and mammographic features played a significant role in breast cancer risk estimation. Our study comprising utility analysis and McNemar's test provides a decision framework to evaluate predictive models in breast cancer risk estimation.

16.
Pac Symp Biocomput ; : 200-11, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24297547

RESUMO

Environment-wide association studies (EWAS) provide a way to uncover the environmental mechanisms involved in complex traits in a high-throughput manner. Genome-wide association studies have led to the discovery of genetic variants associated with many common diseases but do not take into account the environmental component of complex phenotypes. This EWAS assesses the comprehensive association between environmental variables and the outcome of type 2 diabetes (T2D) in the Marshfield Personalized Medicine Research Project Biobank (Marshfield PMRP). We sought replication in two National Health and Nutrition Examination Surveys (NHANES). The Marshfield PMRP currently uses four tools for measuring environmental exposures and outcome traits: 1) the PhenX Toolkit includes standardized exposure and phenotypic measures across several domains, 2) the Diet History Questionnaire (DHQ) is a food frequency questionnaire, 3) the Measurement of a Person's Habitual Physical Activity scores the level of an individual's physical activity, and 4) electronic health records (EHR) employs validated algorithms to establish T2D case-control status. Using PLATO software, 314 environmental variables were tested for association with T2D using logistic regression, adjusting for sex, age, and BMI in over 2,200 European Americans. When available, similar variables were tested with the same methods and adjustment in samples from NHANES III and NHANES 1999-2002. Twelve and 31 associations were identified in the Marshfield samples at p<0.01 and p<0.05, respectively. Seven and 13 measures replicated in at least one of the NHANES at p<0.01 and p<0.05, respectively, with the same direction of effect. The most significant environmental exposures associated with T2D status included decreased alcohol use as well as increased smoking exposure in childhood and adulthood. The results demonstrate the utility of the EWAS method and survey tools for identifying environmental components of complex diseases like type 2 diabetes. These high-throughput and comprehensive investigation methods can easily be applied to investigate the relation between environmental exposures and multiple phenotypes in future analyses.


Assuntos
Diabetes Mellitus Tipo 2/etiologia , Meio Ambiente , Bancos de Espécimes Biológicos , Biologia Computacional , Registros de Dieta , Exposição Ambiental , Feminino , Interação Gene-Ambiente , Humanos , Masculino , Atividade Motora , Inquéritos Nutricionais , Fenótipo , Medicina de Precisão , Software , Wisconsin
17.
Hum Genet ; 133(1): 95-109, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24026423

RESUMO

Platelets are enucleated cell fragments derived from megakaryocytes that play key roles in hemostasis and in the pathogenesis of atherothrombosis and cancer. Platelet traits are highly heritable and identification of genetic variants associated with platelet traits and assessing their pleiotropic effects may help to understand the role of underlying biological pathways. We conducted an electronic medical record (EMR)-based study to identify common variants that influence inter-individual variation in the number of circulating platelets (PLT) and mean platelet volume (MPV), by performing a genome-wide association study (GWAS). We characterized genetic variants associated with MPV and PLT using functional, pathway and disease enrichment analyses; we assessed pleiotropic effects of such variants by performing a phenome-wide association study (PheWAS) with a wide range of EMR-derived phenotypes. A total of 13,582 participants in the electronic MEdical Records and GEnomic network had data for PLT and 6,291 participants had data for MPV. We identified five chromosomal regions associated with PLT and eight associated with MPV at genome-wide significance (P < 5E-8). In addition, we replicated 20 SNPs [out of 56 SNPs (α: 0.05/56 = 9E-4)] influencing PLT and 22 SNPs [out of 29 SNPs (α: 0.05/29 = 2E-3)] influencing MPV in a published meta-analysis of GWAS of PLT and MPV. While our GWAS did not find any new associations, our functional analyses revealed that genes in these regions influence thrombopoiesis and encode kinases, membrane proteins, proteins involved in cellular trafficking, transcription factors, proteasome complex subunits, proteins of signal transduction pathways, proteins involved in megakaryocyte development, and platelet production and hemostasis. PheWAS using a single-SNP Bonferroni correction for 1,368 diagnoses (0.05/1368 = 3.6E-5) revealed that several variants in these genes have pleiotropic associations with myocardial infarction, autoimmune, and hematologic disorders. We conclude that multiple genetic loci influence interindividual variation in platelet traits and also have significant pleiotropic effects; the related genes are in multiple functional pathways including those relevant to thrombopoiesis.


Assuntos
Pleiotropia Genética , Estudo de Associação Genômica Ampla/métodos , Volume Plaquetário Médio , Contagem de Plaquetas , Polimorfismo de Nucleotídeo Único , Adulto , Idoso , Idoso de 80 Anos ou mais , Doenças Cardiovasculares/genética , Cromossomos Humanos/genética , Feminino , Loci Gênicos , Hemostasia , Humanos , Masculino , Metanálise como Assunto , Pessoa de Meia-Idade , Fenótipo , Trombopoese/genética
18.
Artigo em Inglês | MEDLINE | ID: mdl-25717406

RESUMO

Recent large-scale genome-wide association studies (GWAS) have identified a number of new genetic variants associated with breast cancer. However, the degree to which these genetic variants improve breast cancer diagnosis in concert with mammography remains unknown. We conducted a case-control study and collected mammography features and 77 genetic variants which reflect the state of the art GWAS findings on breast cancer. A naïve Bayes model was developed on the mammography features and these genetic variants. We observed that the incorporation of the genetic variants significantly improved breast cancer diagnosis based on mammographic findings.

19.
AMIA Annu Symp Proc ; 2014: 1228-37, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25954434

RESUMO

The goal of this study was to compare the value of mammographic features and genetic variants for breast cancer risk prediction with Bayesian reasoning and information theory. We conducted a retrospective case-control study, collecting mammographic findings and high-frequency/low-penetrance genetic variants from an existing personalized medicine data repository. We trained and tested Bayesian networks for mammographic findings and genetic variants respectively. We found that mammographic findings had a higher discriminative ability than genetic variants for improving breast cancer risk prediction in terms of the area under the ROC curve. We compared the value of each mammographic feature and genetic variant for breast risk prediction in terms of mutual information, with and without consideration of interactions of those risk factors. We also identified the interactions between mammographic features and genetic variants in an attempt to prioritize mammographic features and genetic variants to efficiently predict the risk of breast cancer.


Assuntos
Neoplasias da Mama/genética , Mamografia , Polimorfismo de Nucleotídeo Único , Medição de Risco , Adulto , Idoso , Idoso de 80 Anos ou mais , Área Sob a Curva , Teorema de Bayes , Neoplasias da Mama/diagnóstico por imagem , Reações Falso-Positivas , Feminino , Humanos , Teoria da Informação , Pessoa de Meia-Idade , Curva ROC
20.
AMIA Annu Symp Proc ; 2013: 876-85, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24551380

RESUMO

Several recent genome-wide association studies have identified genetic variants associated with breast cancer. However, how much these genetic variants may help advance breast cancer risk prediction based on other clinical features, like mammographic findings, is unknown. We conducted a retrospective case-control study, collecting mammographic findings and high-frequency/low-penetrance genetic variants from an existing personalized medicine data repository. A Bayesian network was developed using Tree Augmented Naive Bayes (TAN) by training on the mammographic findings, with and without the 22 genetic variants collected. We analyzed the predictive performance using the area under the ROC curve, and found that the genetic variants significantly improved breast cancer risk prediction on mammograms. We also identified the interaction effect between the genetic variants and collected mammographic findings in an attempt to link genotype to mammographic phenotype to better understand disease patterns, mechanisms, and/or natural history.


Assuntos
Teorema de Bayes , Neoplasias da Mama/genética , Mamografia , Medição de Risco/métodos , Neoplasias da Mama/diagnóstico por imagem , Estudos de Casos e Controles , Feminino , Predisposição Genética para Doença , Genótipo , Humanos , Redes Neurais de Computação , Polimorfismo de Nucleotídeo Único , Curva ROC
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA