Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
1.
PLoS One ; 18(5): e0283553, 2023.
Article in English | MEDLINE | ID: mdl-37196047

ABSTRACT

OBJECTIVE: Diverticular disease (DD) is one of the most prevalent conditions encountered by gastroenterologists, affecting ~50% of Americans before the age of 60. Our aim was to identify genetic risk variants and clinical phenotypes associated with DD, leveraging multiple electronic health record (EHR) data sources of 91,166 multi-ancestry participants with a Natural Language Processing (NLP) technique. MATERIALS AND METHODS: We developed a NLP-enriched phenotyping algorithm that incorporated colonoscopy or abdominal imaging reports to identify patients with diverticulosis and diverticulitis from multicenter EHRs. We performed genome-wide association studies (GWAS) of DD in European, African and multi-ancestry participants, followed by phenome-wide association studies (PheWAS) of the risk variants to identify their potential comorbid/pleiotropic effects in clinical phenotypes. RESULTS: Our developed algorithm showed a significant improvement in patient classification performance for DD analysis (algorithm PPVs ≥ 0.94), with up to a 3.5 fold increase in terms of the number of identified patients than the traditional method. Ancestry-stratified analyses of diverticulosis and diverticulitis of the identified subjects replicated the well-established associations between ARHGAP15 loci with DD, showing overall intensified GWAS signals in diverticulitis patients compared to diverticulosis patients. Our PheWAS analyses identified significant associations between the DD GWAS variants and circulatory system, genitourinary, and neoplastic EHR phenotypes. DISCUSSION: As the first multi-ancestry GWAS-PheWAS study, we showcased that heterogenous EHR data can be mapped through an integrative analytical pipeline and reveal significant genotype-phenotype associations with clinical interpretation. CONCLUSION: A systematic framework to process unstructured EHR data with NLP could advance a deep and scalable phenotyping for better patient identification and facilitate etiological investigation of a disease with multilayered data.


Subject(s)
Diverticular Diseases , Diverticulitis , Diverticulum , Humans , Electronic Health Records , Genome-Wide Association Study/methods , Natural Language Processing , Phenotype , Algorithms , Polymorphism, Single Nucleotide
2.
Genet Med ; 21(9): 2135-2144, 2019 09.
Article in English | MEDLINE | ID: mdl-30890783

ABSTRACT

PURPOSE: To provide a validated method to confidently identify exon-containing copy-number variants (CNVs), with a low false discovery rate (FDR), in targeted sequencing data from a clinical laboratory with particular focus on single-exon CNVs. METHODS: DNA sequence coverage data are normalized within each sample and subsequently exonic CNVs are identified in a batch of samples, when the target log2 ratio of the sample to the batch median exceeds defined thresholds. The quality of exonic CNV calls is assessed by C-scores (Z-like scores) using thresholds derived from gold standard samples and simulation studies. We integrate an ExonQC threshold to lower FDR and compare performance with alternate software (VisCap). RESULTS: Thirteen CNVs were used as a truth set to validate Atlas-CNV and compared with VisCap. We demonstrated FDR reduction in validation, simulation, and 10,926 eMERGESeq samples without sensitivity loss. Sixty-four multiexon and 29 single-exon CNVs with high C-scores were assessed by Multiplex Ligation-dependent Probe Amplification (MLPA). CONCLUSION: Atlas-CNV is validated as a method to identify exonic CNVs in targeted sequencing data generated in the clinical laboratory. The ExonQC and C-score assignment can reduce FDR (identification of targets with high variance) and improve calling accuracy of single-exon CNVs respectively. We propose guidelines and criteria to identify high confidence single-exon CNVs.


Subject(s)
DNA Copy Number Variations/genetics , Exons/genetics , Genome, Human/genetics , Software , High-Throughput Nucleotide Sequencing , Humans , Sequence Analysis, DNA
3.
Angiology ; 68(4): 322-329, 2017 Apr.
Article in English | MEDLINE | ID: mdl-27436494

ABSTRACT

Inflammation plays a pivotal role in peripheral artery disease (PAD). Cellular adhesion proteins mediate the interaction of leukocytes with endothelial cells during inflammation. To determine the association of cellular adhesion molecules with ankle-brachial index (ABI) and ABI category (≤1.0 vs >1.0) in a diverse population, 15 adhesion proteins were measured in the Multi-Ethnic Study of Atherosclerosis (MESA). To assess multivariable associations of each protein with ABI and ABI category, linear and logistic regression was used, respectively. Among 2364 participants, 23 presented with poorly compressible arteries (ABI > 1.4) and were excluded and 261 had ABI ≤ 1.0. Adjusting for traditional risk factors, elevated levels of soluble P-selectin, hepatocyte growth factor, and secretory leukocyte protease inhibitor were associated with lower ABI ( P = .0004, .001, and .002, respectively). Per each standard deviation of protein, we found 26%, 20%, and 19% greater odds of lower ABI category ( P = .001, .01, and .02, respectively). Further investigation into the adhesion pathway may shed new light on biological mechanisms implicated in PAD.


Subject(s)
Ankle Brachial Index , Atherosclerosis/blood , Cell Adhesion Molecules/blood , Peripheral Vascular Diseases/blood , Aged , Atherosclerosis/ethnology , Biomarkers/blood , Enzyme-Linked Immunosorbent Assay , Female , Humans , Inflammation/blood , Inflammation/ethnology , Male , Middle Aged , Peripheral Vascular Diseases/ethnology , Risk Factors
4.
Genome Med ; 7(1): 67, 2015.
Article in English | MEDLINE | ID: mdl-26221186

ABSTRACT

BACKGROUND: In an effort to return actionable results from variant data to electronic health records (EHRs), participants in the Electronic Medical Records and Genomics (eMERGE) Network are being sequenced with the targeted Pharmacogenomics Research Network sequence platform (PGRNseq). This cost-effective, highly-scalable, and highly-accurate platform was created to explore rare variation in 84 key pharmacogenetic genes with strong drug phenotype associations. METHODS: To return Clinical Laboratory Improvement Amendments (CLIA) results to our participants at the Group Health Cooperative, we sequenced the DNA of 900 participants (61 % female) with non-CLIA biobanked samples. We then selected 450 of those to be re-consented, to redraw blood, and ultimately to validate CLIA variants in anticipation of returning the results to the participant and EHR. These 450 were selected using an algorithm we designed to harness data from self-reported race, diagnosis and procedure codes, medical notes, laboratory results, and variant-level bioinformatics to ensure selection of an informative sample. We annotated the multi-sample variant call format by a combination of SeattleSeq and SnpEff tools, with additional custom variables including evidence from ClinVar, OMIM, HGMD, and prior clinical associations. RESULTS: We focused our analyses on 27 actionable genes, largely driven by the Clinical Pharmacogenetics Implementation Consortium. We derived a ranking system based on the total number of coding variants per participant (75.2±14.7), and the number of coding variants with high or moderate impact (11.5±3.9). Notably, we identified 11 stop-gained (1 %) and 519 missense (20 %) variants out of a total of 1785 in these 27 genes. Finally, we prioritized variants to be returned to the EHR with prior clinical evidence of pathogenicity or annotated as stop-gain for the following genes: CACNA1S and RYR1 (malignant hyperthermia); SCN5A, KCNH2, and RYR2 (arrhythmia); and LDLR (high cholesterol). CONCLUSIONS: The incorporation of genetics into the EHR for clinical decision support is a complex undertaking for many reasons including lack of prior consent for return of results, lack of biospecimens collected in a CLIA environment, and EHR integration. Our study design accounts for these hurdles and is an example of a pilot system that can be utilized before expanding to an entire health system.

5.
BMC Proc ; 3 Suppl 7: S12, 2009 Dec 15.
Article in English | MEDLINE | ID: mdl-20017985

ABSTRACT

BACKGROUND: There is a long-established association between rheumatoid arthritis and HLA-DRbeta1. The shared epitope (SE) allele is an indicator of the presence of any of the HLA-DRbeta1 alleles associated with RA. Other autoantibodies are also associated with RA, specifically rheumatoid factor IgM (RFUW) and anti-cyclic citrullinated peptide (anti-CCP). METHODS: Using the Genetic Analysis Workshop 16 North American Rheumatoid Arthritis Consortium genome-wide association data, we sought to find non-HLA-DRbeta1 genetic associations by stratifying across SE status, and using the continuous biomarker phenotypes of RFUW and anti-CCP. To evaluate the binary RA phenotype, we applied the recently developed FP test and compared it to logistic regression or a genotype count-based test. We adjusted for multiple testing using the Bonferroni correction, the Q value approach, or permutation-based p-values. A case-only analysis of the biomarkers RFUW and anti-CCP used linear regression and ANOVAs. RESULTS: The initial genome-wide association analysis using all cases and controls provides substantial evidence of an association on chromosomes 9 and 2 within the immune system-related gene UBXD2. In SE-positive subjects, many single-nucleotide polymorphisms were significant, including some on chromosome 6. Due to very few SE negative cases, we had limited power to detect associations in SE negative subjects. We were also unable to find genetic associations with either RFUW or anti-CCP. CONCLUSION: Our analyses have confirmed previous findings for genes PTPN22 and C5. We also identified a novel candidate gene on chromosome 2, UBXD2. Results suggest FP test may be more powerful than the genotype count-based test.

SELECTION OF CITATIONS
SEARCH DETAIL
...