Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
1.
Nat Commun ; 15(1): 989, 2024 Feb 02.
Article in English | MEDLINE | ID: mdl-38307861

ABSTRACT

Proteogenomics studies generate hypotheses on protein function and provide genetic evidence for drug target prioritization. Most previous work has been conducted using affinity-based proteomics approaches. These technologies face challenges, such as uncertainty regarding target identity, non-specific binding, and handling of variants that affect epitope affinity binding. Mass spectrometry-based proteomics can overcome some of these challenges. Here we report a pQTL study using the Proteograph™ Product Suite workflow (Seer, Inc.) where we quantify over 18,000 unique peptides from nearly 3000 proteins in more than 320 blood samples from a multi-ethnic cohort in a bottom-up, peptide-centric, mass spectrometry-based proteomics approach. We identify 184 protein-altering variants in 137 genes that are significantly associated with their corresponding variant peptides, confirming target specificity of co-associated affinity binders, identifying putatively causal cis-encoded proteins and providing experimental evidence for their presence in blood, including proteins that may be inaccessible to affinity-based proteomics.


Subject(s)
Proteogenomics , Proteomics , Humans , Proteomics/methods , Mass Spectrometry/methods , Proteins/analysis , Peptides/analysis , Proteogenomics/methods , Mutant Proteins
2.
bioRxiv ; 2024 Jan 08.
Article in English | MEDLINE | ID: mdl-38260620

ABSTRACT

Alzheimer's disease (AD) and related dementias (ADRD) is a complex disease with multiple pathophysiological drivers that determine clinical symptomology and disease progression. These diseases develop insidiously over time, through many pathways and disease mechanisms and continue to have a huge societal impact for affected individuals and their families. While emerging blood-based biomarkers, such as plasma p-tau181 and p-tau217, accurately detect Alzheimer neuropthology and are associated with faster cognitive decline, the full extension of plasma proteomic changes in ADRD remains unknown. Earlier detection and better classification of the different subtypes may provide opportunities for earlier, more targeted interventions, and perhaps a higher likelihood of successful therapeutic development. In this study, we aim to leverage unbiased mass spectrometry proteomics to identify novel, blood-based biomarkers associated with cognitive decline. 1,786 plasma samples from 1,005 patients were collected over 12 years from partcipants in the Massachusetts Alzheimer's Disease Research Center Longitudinal Cohort Study. Patient metadata includes demographics, final diagnoses, and clinical dementia rating (CDR) scores taken concurrently. The Proteograph™ Product Suite (Seer, Inc.) and liquid-chromatography mass-spectrometry (LC-MS) analysis were used to process the plasma samples in this cohort and generate unbiased proteomics data. Data-independent acquisition (DIA) mass spectrometry results yielded 36,259 peptides and 4,007 protein groups. Linear mixed effects models revealed 138 differentially abundant proteins between AD and healthy controls. Machine learning classification models for AD diagnosis identified potential candidate biomarkers including MBP, BGLAP, and APoD. Cox regression models were created to determine the association of proteins with disease progression and suggest CLNS1A, CRISPLD2, and GOLPH3 as targets of further investigation as potential biomarkers. The Proteograph workflow provided deep, unbiased coverage of the plasma proteome at a speed that enabled a cohort study of almost 1,800 samples, which is the largest, deep, unbiased proteomics study of ADRD conducted to date.

3.
Nat Genet ; 54(9): 1275-1283, 2022 09.
Article in English | MEDLINE | ID: mdl-36038634

ABSTRACT

Genome-wide association studies (GWASs) have identified hundreds of loci associated with Crohn's disease (CD). However, as with all complex diseases, robust identification of the genes dysregulated by noncoding variants typically driving GWAS discoveries has been challenging. Here, to complement GWASs and better define actionable biological targets, we analyzed sequence data from more than 30,000 patients with CD and 80,000 population controls. We directly implicate ten genes in general onset CD for the first time to our knowledge via association to coding variation, four of which lie within established CD GWAS loci. In nine instances, a single coding variant is significantly associated, and in the tenth, ATG4C, we see additionally a significantly increased burden of very rare coding variants in CD cases. In addition to reiterating the central role of innate and adaptive immune cells as well as autophagy in CD pathogenesis, these newly associated genes highlight the emerging role of mesenchymal cells in the development and maintenance of intestinal inflammation.


Subject(s)
Crohn Disease , Crohn Disease/genetics , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Polymorphism, Single Nucleotide/genetics
4.
PLoS Genet ; 18(3): e1010105, 2022 03.
Article in English | MEDLINE | ID: mdl-35324888

ABSTRACT

We present a systematic assessment of polygenic risk score (PRS) prediction across more than 1,500 traits using genetic and phenotype data in the UK Biobank. We report 813 sparse PRS models with significant (p < 2.5 x 10-5) incremental predictive performance when compared against the covariate-only model that considers age, sex, types of genotyping arrays, and the principal component loadings of genotypes. We report a significant correlation between the number of genetic variants selected in the sparse PRS model and the incremental predictive performance (Spearman's ⍴ = 0.61, p = 2.2 x 10-59 for quantitative traits, ⍴ = 0.21, p = 9.6 x 10-4 for binary traits). The sparse PRS model trained on European individuals showed limited transferability when evaluated on non-European individuals in the UK Biobank. We provide the PRS model weights on the Global Biobank Engine (https://biobankengine.stanford.edu/prs).


Subject(s)
Genome-Wide Association Study , Multifactorial Inheritance , Biological Specimen Banks , Genetic Predisposition to Disease , Humans , Multifactorial Inheritance/genetics , Phenotype , Risk Factors , United Kingdom
5.
Am J Hum Genet ; 108(12): 2354-2367, 2021 12 02.
Article in English | MEDLINE | ID: mdl-34822764

ABSTRACT

Whole-genome sequencing studies applied to large populations or biobanks with extensive phenotyping raise new analytic challenges. The need to consider many variants at a locus or group of genes simultaneously and the potential to study many correlated phenotypes with shared genetic architecture provide opportunities for discovery not addressed by the traditional one variant, one phenotype association study. Here, we introduce a Bayesian model comparison approach called MRP (multiple rare variants and phenotypes) for rare-variant association studies that considers correlation, scale, and direction of genetic effects across a group of genetic variants, phenotypes, and studies, requiring only summary statistic data. We apply our method to exome sequencing data (n = 184,698) across 2,019 traits from the UK Biobank, aggregating signals in genes. MRP demonstrates an ability to recover signals such as associations between PCSK9 and LDL cholesterol levels. We additionally find MRP effective in conducting meta-analyses in exome data. Non-biomarker findings include associations between MC1R and red hair color and skin color, IL17RA and monocyte count, and IQGAP2 and mean platelet volume. Finally, we apply MRP in a multi-phenotype setting; after clustering the 35 biomarker phenotypes based on genetic correlation estimates, we find that joint analysis of these phenotypes results in substantial power gains for gene-trait associations, such as in TNFRSF13B in one of the clusters containing diabetes- and lipid-related traits. Overall, we show that the MRP model comparison approach improves upon useful features from widely used meta-analysis approaches for rare-variant association analyses and prioritizes protective modifiers of disease risk.


Subject(s)
Genetic Variation , Genome-Wide Association Study , Models, Genetic , Bayes Theorem , Female , Humans , Male , Phenotype
7.
Eur J Hum Genet ; 29(7): 1071-1081, 2021 07.
Article in English | MEDLINE | ID: mdl-33558700

ABSTRACT

Polygenic risk models have led to significant advances in understanding complex diseases and their clinical presentation. While polygenic risk scores (PRS) can effectively predict outcomes, they do not generally account for disease subtypes or pathways which underlie within-trait diversity. Here, we introduce a latent factor model of genetic risk based on components from Decomposition of Genetic Associations (DeGAs), which we call the DeGAs polygenic risk score (dPRS). We compute DeGAs using genetic associations for 977 traits and find that dPRS performs comparably to standard PRS while offering greater interpretability. We show how to decompose an individual's genetic risk for a trait across DeGAs components, with examples for body mass index (BMI) and myocardial infarction (heart attack) in 337,151 white British individuals in the UK Biobank, with replication in a further set of 25,486 non-British white individuals. We find that BMI polygenic risk factorizes into components related to fat-free mass, fat mass, and overall health indicators like physical activity. Most individuals with high dPRS for BMI have strong contributions from both a fat-mass component and a fat-free mass component, whereas a few "outlier" individuals have strong contributions from only one of the two components. Overall, our method enables fine-scale interpretation of the drivers of genetic risk for complex traits.


Subject(s)
Genetic Association Studies , Genetic Predisposition to Disease , Multifactorial Inheritance , Quantitative Trait, Heritable , Algorithms , Biological Specimen Banks , Databases, Genetic , Genetic Association Studies/methods , Genome-Wide Association Study , Humans , Models, Genetic , Phenotype , Population Surveillance , Reproducibility of Results , Risk Assessment , Risk Factors , United Kingdom/epidemiology
8.
Nat Genet ; 53(2): 185-194, 2021 02.
Article in English | MEDLINE | ID: mdl-33462484

ABSTRACT

Clinical laboratory tests are a critical component of the continuum of care. We evaluate the genetic basis of 35 blood and urine laboratory measurements in the UK Biobank (n = 363,228 individuals). We identify 1,857 loci associated with at least one trait, containing 3,374 fine-mapped associations and additional sets of large-effect (>0.1 s.d.) protein-altering, human leukocyte antigen (HLA) and copy number variant (CNV) associations. Through Mendelian randomization (MR) analysis, we discover 51 causal relationships, including previously known agonistic effects of urate on gout and cystatin C on stroke. Finally, we develop polygenic risk scores (PRSs) for each biomarker and build 'multi-PRS' models for diseases using 35 PRSs simultaneously, which improved chronic kidney disease, type 2 diabetes, gout and alcoholic cirrhosis genetic risk stratification in an independent dataset (FinnGen; n = 135,500) relative to single-disease PRSs. Together, our results delineate the genetic basis of biomarkers and their causal influences on diseases and improve genetic risk stratification for common diseases.


Subject(s)
Biomarkers/blood , Biomarkers/urine , HLA Antigens/genetics , Proteins/genetics , Biological Specimen Banks , Cardiovascular Diseases/genetics , Cardiovascular Diseases/metabolism , DNA Copy Number Variations , Diabetes Mellitus, Type 2/genetics , Diabetes Mellitus, Type 2/metabolism , Genetic Pleiotropy , Humans , Linkage Disequilibrium , Liver-Specific Organic Anion Transporter 1/genetics , Mendelian Randomization Analysis , Polymorphism, Single Nucleotide , Renal Insufficiency, Chronic , Serine Endopeptidases/genetics , United Kingdom
9.
Circ Genom Precis Med ; 13(6): e003014, 2020 12.
Article in English | MEDLINE | ID: mdl-33125279

ABSTRACT

BACKGROUND: The aortic valve is an important determinant of cardiovascular physiology and anatomic location of common human diseases. METHODS: From a sample of 34 287 white British ancestry participants, we estimated functional aortic valve area by planimetry from prospectively obtained cardiac magnetic resonance imaging sequences of the aortic valve. Aortic valve area measurements were submitted to genome-wide association testing, followed by polygenic risk scoring and phenome-wide screening, to identify genetic comorbidities. RESULTS: A genome-wide association study of aortic valve area in these UK Biobank participants showed 3 significant associations, indexed by rs71190365 (chr13:50764607, DLEU1, P=1.8×10-9), rs35991305 (chr12:94191968, CRADD, P=3.4×10-8), and chr17:45013271:C:T (GOSR2, P=5.6×10-8). Replication on an independent set of 8145 unrelated European ancestry participants showed consistent effect sizes in all 3 loci, although rs35991305 did not meet nominal significance. We constructed a polygenic risk score for aortic valve area, which in a separate cohort of 311 728 individuals without imaging demonstrated that smaller aortic valve area is predictive of increased risk for aortic valve disease (odds ratio, 1.14; P=2.3×10-6). After excluding subjects with a medical diagnosis of aortic valve stenosis (remaining n=308 683 individuals), phenome-wide association of >10 000 traits showed multiple links between the polygenic score for aortic valve disease and key health-related comorbidities involving the cardiovascular system and autoimmune disease. Genetic correlation analysis supports a shared genetic etiology with between aortic valve area and birth weight along with other cardiovascular conditions. CONCLUSIONS: These results illustrate the use of automated phenotyping of cardiac imaging data from the general population to investigate the genetic etiology of aortic valve disease, perform clinical prediction, and uncover new clinical and genetic correlates of cardiac anatomy.


Subject(s)
Aortic Valve/diagnostic imaging , Biological Specimen Banks , Cardiovascular Diseases/diagnostic imaging , Cardiovascular Diseases/genetics , Genome-Wide Association Study , Magnetic Resonance Imaging , Adult , Aged , Aortic Valve/pathology , Aortic Valve Stenosis/diagnostic imaging , Aortic Valve Stenosis/genetics , Comorbidity , Female , Genome, Human , Humans , Male , Middle Aged , Multifactorial Inheritance/genetics , Phenomics , Phenotype , Survival Analysis , United Kingdom
10.
PLoS One ; 15(6): e0234647, 2020.
Article in English | MEDLINE | ID: mdl-32569327

ABSTRACT

Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are both suboptimal. In this retrospective study, we aimed to automate the assignment of top-level International Classification of Diseases version 9 (ICD-9) codes to clinical records from human and veterinary data stores using minimal manual labor and feature curation. Automating top-level annotations could in turn enable rapid cohort identification, especially in a veterinary setting. To this end, we trained long short-term memory (LSTM) recurrent neural networks (RNNs) on 52,722 human and 89,591 veterinary records. We investigated the accuracy of both separate-domain and combined-domain models and probed model portability. We established relevant baseline classification performances by training Decision Trees (DT) and Random Forests (RF). We also investigated whether transforming the data using MetaMap Lite, a clinical natural language processing tool, affected classification performance. We showed that the LSTM-RNNs accurately classify veterinary and human text narratives into top-level categories with an average weighted macro F1 score of 0.74 and 0.68 respectively. In the "neoplasia" category, the model trained on veterinary data had a high validation accuracy in veterinary data and moderate accuracy in human data, with F1 scores of 0.91 and 0.70 respectively. Our LSTM method scored slightly higher than that of the DT and RF models. The use of LSTM-RNN models represents a scalable structure that could prove useful in cohort identification for comparative oncology studies. Digitization of human and veterinary health information will continue to be a reality, particularly in the form of unstructured narratives. Our approach is a step forward for these two domains to learn from and inform one another.


Subject(s)
Data Mining , Narrative Medicine , Software , Animals , Automation , Databases as Topic , Humans , Reproducibility of Results , Species Specificity
11.
PLoS Genet ; 16(5): e1008682, 2020 05.
Article in English | MEDLINE | ID: mdl-32369491

ABSTRACT

Protein-altering variants that are protective against human disease provide in vivo validation of therapeutic targets. Here we use genotyping data from UK Biobank (n = 337,151 unrelated White British individuals) and FinnGen (n = 176,899) to conduct a search for protein-altering variants conferring lower intraocular pressure (IOP) and protection against glaucoma. Through rare protein-altering variant association analysis, we find a missense variant in ANGPTL7 in UK Biobank (rs28991009, p.Gln175His, MAF = 0.8%, genotyped in 82,253 individuals with measured IOP and an independent set of 4,238 glaucoma patients and 250,660 controls) that significantly lowers IOP (ß = -0.53 and -0.67 mmHg for heterozygotes, -3.40 and -2.37 mmHg for homozygotes, P = 5.96 x 10-9 and 1.07 x 10-13 for corneal compensated and Goldman-correlated IOP, respectively) and is associated with 34% reduced risk of glaucoma (P = 0.0062). In FinnGen, we identify an ANGPTL7 missense variant at a greater than 50-fold increased frequency in Finland compared with other populations (rs147660927, p.Arg220Cys, MAF Finland = 4.3%), which was genotyped in 6,537 glaucoma patients and 170,362 controls and is associated with a 29% lower glaucoma risk (P = 1.9 x 10-12 for all glaucoma types and also protection against its subtypes including exfoliation, primary open-angle, and primary angle-closure). We further find three rarer variants in UK Biobank, including a protein-truncating variant, which confer a strong composite lowering of IOP (P = 0.0012 and 0.24 for Goldman-correlated and corneal compensated IOP, respectively), suggesting the protective mechanism likely resides in the loss of interaction or function. Our results support inhibition or down-regulation of ANGPTL7 as a therapeutic strategy for glaucoma.


Subject(s)
Angiopoietin-like Proteins/genetics , Glaucoma/genetics , Glaucoma/prevention & control , Intraocular Pressure/genetics , Polymorphism, Single Nucleotide , Adult , Aged , Aged, 80 and over , Angiopoietin-Like Protein 7 , Biological Specimen Banks/statistics & numerical data , Case-Control Studies , Cohort Studies , Female , Finland/epidemiology , Gene Frequency , Genetic Predisposition to Disease , Genetics, Population , Genome-Wide Association Study , Glaucoma/epidemiology , Humans , Loss of Function Mutation/genetics , Male , Middle Aged , Mutation, Missense , United Kingdom/epidemiology
12.
Radiol Artif Intell ; 2(2): e190065, 2020 Mar 18.
Article in English | MEDLINE | ID: mdl-32280948

ABSTRACT

PURPOSE: To develop an automated model for staging knee osteoarthritis severity from radiographs and to compare its performance to that of musculoskeletal radiologists. MATERIALS AND METHODS: Radiographs from the Osteoarthritis Initiative staged by a radiologist committee using the Kellgren-Lawrence (KL) system were used. Before using the images as input to a convolutional neural network model, they were standardized and augmented automatically. The model was trained with 32 116 images, tuned with 4074 images, evaluated with a 4090-image test set, and compared to two individual radiologists using a 50-image test subset. Saliency maps were generated to reveal features used by the model to determine KL grades. RESULTS: With committee scores used as ground truth, the model had an average F1 score of 0.70 and an accuracy of 0.71 for the full test set. For the 50-image subset, the best individual radiologist had an average F1 score of 0.60 and an accuracy of 0.60; the model had an average F1 score of 0.64 and an accuracy of 0.66. Cohen weighted κ between the committee and model was 0.86, comparable to intraexpert repeatability. Saliency maps identified sites of osteophyte formation as influential to predictions. CONCLUSION: An end-to-end interpretable model that takes full radiographs as input and predicts KL scores with state-of-the-art accuracy, performs as well as musculoskeletal radiologists, and does not require manual image preprocessing was developed. Saliency maps suggest the model's predictions were based on clinically relevant information. Supplemental material is available for this article. © RSNA, 2020.

13.
Hum Mol Genet ; 28(R2): R162-R169, 2019 11 21.
Article in English | MEDLINE | ID: mdl-31363759

ABSTRACT

Complex diseases such as inflammatory bowel disease (IBD), which consists of ulcerative colitis and Crohn's disease, are a significant medical burden-70 000 new cases of IBD are diagnosed in the United States annually. In this review, we examine the history of genetic variant discovery in complex disease with a focus on IBD. We cover methods that have been applied to microsatellite, common variant, targeted resequencing and whole-exome and -genome data, specifically focusing on the progression of technologies towards rare-variant discovery. The inception of these methods combined with better availability of population level variation data has led to rapid discovery of IBD-causative and/or -associated variants at over 200 loci; over time, these methods have grown exponentially in both power and ascertainment to detect rare variation. We highlight rare-variant discoveries critical to the elucidation of the pathogenesis of IBD, including those in NOD2, IL23R, CARD9, RNF186 and ADCY7. We additionally identify the major areas of rare-variant discovery that will evolve in the coming years. A better understanding of the genetic basis of IBD and other complex diseases will lead to improved diagnosis, prognosis, treatment and surveillance.


Subject(s)
Genetic Predisposition to Disease , Genome-Wide Association Study , Inflammatory Bowel Diseases/genetics , Asian People/genetics , Asian People/statistics & numerical data , Case-Control Studies , Genetic Linkage , Genome-Wide Association Study/history , Genome-Wide Association Study/statistics & numerical data , History, 20th Century , History, 21st Century , Humans , Inflammatory Bowel Diseases/history , Models, Statistical , Polymorphism, Single Nucleotide , Receptors, Interleukin/genetics , White People/genetics , White People/statistics & numerical data , Exome Sequencing/statistics & numerical data
14.
Pac Symp Biocomput ; 22: 521-532, 2017.
Article in English | MEDLINE | ID: mdl-27897003

ABSTRACT

Autism has been shown to have a major genetic risk component; the architecture of documented autism in families has been over and again shown to be passed down for generations. While inherited risk plays an important role in the autistic nature of children, de novo (germline) mutations have also been implicated in autism risk. Here we find that autism de novo variants verified and published in the literature are Bonferroni-significantly enriched in a gene set implicated in synaptic elimination. Additionally, several of the genes in this synaptic elimination set that were enriched in protein-protein interactions (CACNA1C, SHANK2, SYNGAP1, NLGN3, NRXN1, and PTEN) have been previously confirmed as genes that confer risk for the disorder. The results demonstrate that autism-associated de novos are linked to proper synaptic pruning and density, hinting at the etiology of autism and suggesting pathophysiology for downstream correction and treatment.


Subject(s)
Autistic Disorder/genetics , Germ-Line Mutation , Autistic Disorder/pathology , Computational Biology , Databases, Genetic , Electrical Synapses/genetics , Electrical Synapses/pathology , Female , Gene Regulatory Networks , Genetic Predisposition to Disease , Humans , Male , Models, Genetic , Models, Neurological
SELECTION OF CITATIONS
SEARCH DETAIL