Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
PLOS Glob Public Health ; 4(6): e0003204, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38833495

RESUMO

Cardiovascular diseases (CVDs) are responsible for a large proportion of premature deaths in low- and middle-income countries. Early CVD detection and intervention is critical in these populations, yet many existing CVD risk scores require a physical examination or lab measurements, which can be challenging in such health systems due to limited accessibility. We investigated the potential to use photoplethysmography (PPG), a sensing technology available on most smartphones that can potentially enable large-scale screening at low cost, for CVD risk prediction. We developed a deep learning PPG-based CVD risk score (DLS) to predict the probability of having major adverse cardiovascular events (MACE: non-fatal myocardial infarction, stroke, and cardiovascular death) within ten years, given only age, sex, smoking status and PPG as predictors. We compare the DLS with the office-based refit-WHO score, which adopts the shared predictors from WHO and Globorisk scores (age, sex, smoking status, height, weight and systolic blood pressure) but refitted on the UK Biobank (UKB) cohort. All models were trained on a development dataset (141,509 participants) and evaluated on a geographically separate test (54,856 participants) dataset, both from UKB. DLS's C-statistic (71.1%, 95% CI 69.9-72.4) is non-inferior to office-based refit-WHO score (70.9%, 95% CI 69.7-72.2; non-inferiority margin of 2.5%, p<0.01) in the test dataset. The calibration of the DLS is satisfactory, with a 1.8% mean absolute calibration error. Adding DLS features to the office-based score increases the C-statistic by 1.0% (95% CI 0.6-1.4). DLS predicts ten-year MACE risk comparable with the office-based refit-WHO score. Interpretability analyses suggest that the DLS-extracted features are related to PPG waveform morphology and are independent of heart rate. Our study provides a proof-of-concept and suggests the potential of a PPG-based approach strategies for community-based primary prevention in resource-limited regions.

2.
medRxiv ; 2024 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-38562791

RESUMO

Electronic health records, biobanks, and wearable biosensors contain multiple high-dimensional clinical data (HDCD) modalities (e.g., ECG, Photoplethysmography (PPG), and MRI) for each individual. Access to multimodal HDCD provides a unique opportunity for genetic studies of complex traits because different modalities relevant to a single physiological system (e.g., circulatory system) encode complementary and overlapping information. We propose a novel multimodal deep learning method, M-REGLE, for discovering genetic associations from a joint representation of multiple complementary HDCD modalities. We showcase the effectiveness of this model by applying it to several cardiovascular modalities. M-REGLE jointly learns a lower representation (i.e., latent factors) of multimodal HDCD using a convolutional variational autoencoder, performs genome wide association studies (GWAS) on each latent factor, then combines the results to study the genetics of the underlying system. To validate the advantages of M-REGLE and multimodal learning, we apply it to common cardiovascular modalities (PPG and ECG), and compare its results to unimodal learning methods in which representations are learned from each data modality separately, but the downstream genetic analyses are performed on the combined unimodal representations. M-REGLE identifies 19.3% more loci on the 12-lead ECG dataset, 13.0% more loci on the ECG lead I + PPG dataset, and its genetic risk score significantly outperforms the unimodal risk score at predicting cardiac phenotypes, such as atrial fibrillation (Afib), in multiple biobanks.

3.
medRxiv ; 2023 Apr 29.
Artigo em Inglês | MEDLINE | ID: mdl-37162978

RESUMO

Background: Spirometry measures lung function by selecting the best of multiple efforts meeting pre-specified quality control (QC), and reporting two key metrics: forced expiratory volume in 1 second (FEV1) and forced vital capacity (FVC). We hypothesize that discarded submaximal and QC-failing data meaningfully contribute to the prediction of airflow obstruction and all-cause mortality. Methods: We evaluated volume-time spirometry data from the UK Biobank. We identified "best" spirometry efforts as those passing QC with the maximum FVC. "Discarded" efforts were either submaximal or failed QC. To create a combined representation of lung function we implemented a contrastive learning approach, Spirogram-based Contrastive Learning Framework (Spiro-CLF), which utilized all recorded volume-time curves per participant and applied different transformations (e.g. flow-volume, flow-time). In a held-out 20% testing subset we applied the Spiro-CLF representation of a participant's overall lung function to 1) binary predictions of FEV1/FVC < 0.7 and FEV1 Percent Predicted (FEV1PP) < 80%, indicative of airflow obstruction, and 2) Cox regression for all-cause mortality. Findings: We included 940,705 volume-time curves from 352,684 UK Biobank participants with 2-3 spirometry efforts per individual (66.7% with 3 efforts) and at least one QC-passing spirometry effort. Of all spirometry efforts, 24.1% failed QC and 37.5% were submaximal. Spiro-CLF prediction of FEV1/FVC < 0.7 utilizing discarded spirometry efforts had an Area under the Receiver Operating Characteristics (AUROC) of 0.981 (0.863 for FEV1PP prediction). Incorporating discarded spirometry efforts in all-cause mortality prediction was associated with a concordance index (c-index) of 0.654, which exceeded the c-indices from FEV1 (0.590), FVC (0.559), or FEV1/FVC (0.599) from each participant's single best effort. Interpretation: A contrastive learning model using raw spirometry curves can accurately predict lung function using submaximal and QC-failing efforts. This model also has superior prediction of all-cause mortality compared to standard lung function measurements. Funding: MHC is supported by NIH R01HL137927, R01HL135142, HL147148, and HL089856.BDH is supported by NIH K08HL136928, U01 HL089856, and an Alpha-1 Foundation Research Grant.DH is supported by NIH 2T32HL007427-41EKS is supported by NIH R01 HL152728, R01 HL147148, U01 HL089856, R01 HL133135, P01 HL132825, and P01 HL114501.PJC is supported by NIH R01HL124233 and R01HL147326.SPB is supported by NIH R01HL151421 and UH3HL155806.TY, FH, and CYM are employees of Google LLC.

4.
medRxiv ; 2023 Aug 29.
Artigo em Inglês | MEDLINE | ID: mdl-37163049

RESUMO

High-dimensional clinical data are becoming more accessible in biobank-scale datasets. However, effectively utilizing high-dimensional clinical data for genetic discovery remains challenging. Here we introduce a general deep learning-based framework, REpresentation learning for Genetic discovery on Low-dimensional Embeddings (REGLE), for discovering associations between genetic variants and high-dimensional clinical data. REGLE uses convolutional variational autoencoders to compute a non-linear, low-dimensional, disentangled embedding of the data with highly heritable individual components. REGLE can incorporate expert-defined or clinical features and provides a framework to create accurate disease-specific polygenic risk scores (PRS) in datasets which have minimal expert phenotyping. We apply REGLE to both respiratory and circulatory systems: spirograms which measure lung function and photoplethysmograms (PPG) which measure blood volume changes. Genome-wide association studies on REGLE embeddings identify more genome-wide significant loci than existing methods and replicate known loci for both spirograms and PPG, demonstrating the generality of the framework. Furthermore, these embeddings are associated with overall survival. Finally, we construct a set of PRSs that improve predictive performance of asthma, chronic obstructive pulmonary disease, hypertension, and systolic blood pressure in multiple biobanks. Thus, REGLE embeddings can quantify clinically relevant features that are not currently captured in a standardized or automated way.

5.
Nat Genet ; 55(5): 787-795, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37069358

RESUMO

Chronic obstructive pulmonary disease (COPD), the third leading cause of death worldwide, is highly heritable. While COPD is clinically defined by applying thresholds to summary measures of lung function, a quantitative liability score has more power to identify genetic signals. Here we train a deep convolutional neural network on noisy self-reported and International Classification of Diseases labels to predict COPD case-control status from high-dimensional raw spirograms and use the model's predictions as a liability score. The machine-learning-based (ML-based) liability score accurately discriminates COPD cases and controls, and predicts COPD-related hospitalization without any domain-specific knowledge. Moreover, the ML-based liability score is associated with overall survival and exacerbation events. A genome-wide association study on the ML-based liability score replicates existing COPD and lung function loci and also identifies 67 new loci. Lastly, our method provides a general framework to use ML methods and medical-record-based labels that does not require domain knowledge or expert curation to improve disease prediction and genomic discovery for drug design.


Assuntos
Aprendizado Profundo , Doença Pulmonar Obstrutiva Crônica , Humanos , Estudo de Associação Genômica Ampla/métodos , Doença Pulmonar Obstrutiva Crônica/genética , Loci Gênicos , Polimorfismo de Nucleotídeo Único/genética
6.
Elife ; 122023 04 17.
Artigo em Inglês | MEDLINE | ID: mdl-36975205

RESUMO

Biological age, distinct from an individual's chronological age, has been studied extensively through predictive aging clocks. However, these clocks have limited accuracy in short time-scales. Here we trained deep learning models on fundus images from the EyePACS dataset to predict individuals' chronological age. Our retinal aging clocking, 'eyeAge', predicted chronological age more accurately than other aging clocks (mean absolute error of 2.86 and 3.30 years on quality-filtered data from EyePACS and UK Biobank, respectively). Additionally, eyeAge was independent of blood marker-based measures of biological age, maintaining an all-cause mortality hazard ratio of 1.026 even when adjusted for phenotypic age. The individual-specific nature of eyeAge was reinforced via multiple GWAS hits in the UK Biobank cohort. The top GWAS locus was further validated via knockdown of the fly homolog, Alk, which slowed age-related decline in vision in flies. This study demonstrates the potential utility of a retinal aging clock for studying aging and age-related diseases and quantitatively measuring aging on very short time-scales, opening avenues for quick and actionable evaluation of gero-protective therapeutics.


Assuntos
Envelhecimento , Estudo de Associação Genômica Ampla , Humanos , Pré-Escolar , Envelhecimento/genética , Retina , Fundo de Olho , Diagnóstico por Imagem , Epigênese Genética
7.
Nat Biotechnol ; 41(2): 232-238, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36050551

RESUMO

Circular consensus sequencing with Pacific Biosciences (PacBio) technology generates long (10-25 kilobases), accurate 'HiFi' reads by combining serial observations of a DNA molecule into a consensus sequence. The standard approach to consensus generation, pbccs, uses a hidden Markov model. We introduce DeepConsensus, which uses an alignment-based loss to train a gap-aware transformer-encoder for sequence correction. Compared to pbccs, DeepConsensus reduces read errors by 42%. This increases the yield of PacBio HiFi reads at Q20 by 9%, at Q30 by 27% and at Q40 by 90%. With two SMRT Cells of HG003, reads from DeepConsensus improve hifiasm assembly contiguity (NG50 4.9 megabases (Mb) to 17.2 Mb), increase gene completeness (94% to 97%), reduce the false gene duplication rate (1.1% to 0.5%), improve assembly base accuracy (Q43 to Q45) and reduce variant-calling errors by 24%. DeepConsensus models could be trained to the general problem of analyzing the alignment of other types of sequences, such as unique molecular identifiers or genome assemblies.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA
8.
Nat Commun ; 13(1): 241, 2022 01 11.
Artigo em Inglês | MEDLINE | ID: mdl-35017556

RESUMO

Genome-wide association studies (GWASs) examine the association between genotype and phenotype while adjusting for a set of covariates. Although the covariates may have non-linear or interactive effects, due to the challenge of specifying the model, GWAS often neglect such terms. Here we introduce DeepNull, a method that identifies and adjusts for non-linear and interactive covariate effects using a deep neural network. In analyses of simulated and real data, we demonstrate that DeepNull maintains tight control of the type I error while increasing statistical power by up to 20% in the presence of non-linear and interactive effects. Moreover, in the absence of such effects, DeepNull incurs no loss of power. When applied to 10 phenotypes from the UK Biobank (n = 370K), DeepNull discovered more hits (+6%) and loci (+7%), on average, than conventional association analyses, many of which are biologically plausible or have previously been reported. Finally, DeepNull improves upon linear modeling for phenotypic prediction (+23% on average).


Assuntos
Estudo de Associação Genômica Ampla/métodos , Fenótipo , Simulação por Computador , Modelos Lineares , Projetos de Pesquisa
9.
Commun Biol ; 4(1): 1269, 2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34741098

RESUMO

There is currently a dearth of accessible whole genome sequencing (WGS) data for individuals residing in the Americas with Sub-Saharan African ancestry. We generated whole genome sequencing data at intermediate (15×) coverage for 2,294 individuals with large amounts of Sub-Saharan African ancestry, predominantly Atlantic African admixed with varying amounts of European and American ancestry. We performed extensive comparisons of variant callers, phasing algorithms, and variant filtration on these data to construct a high quality imputation panel containing data from 2,269 unrelated individuals. With the exception of the TOPMed imputation server (which notably cannot be downloaded), our panel substantially outperformed other available panels when imputing African American individuals. The raw sequencing data, variant calls and imputation panel for this cohort are all freely available via dbGaP and should prove an invaluable resource for further study of admixed African genetics.


Assuntos
Genoma Humano , Genótipo , Adulto , Negro ou Afro-Americano , Idoso , Idoso de 80 Anos ou mais , Humanos , Pessoa de Meia-Idade , Estados Unidos , Sequenciamento Completo do Genoma , Adulto Jovem
10.
Am J Hum Genet ; 108(7): 1217-1230, 2021 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-34077760

RESUMO

Genome-wide association studies (GWASs) require accurate cohort phenotyping, but expert labeling can be costly, time intensive, and variable. Here, we develop a machine learning (ML) model to predict glaucomatous optic nerve head features from color fundus photographs. We used the model to predict vertical cup-to-disc ratio (VCDR), a diagnostic parameter and cardinal endophenotype for glaucoma, in 65,680 Europeans in the UK Biobank (UKB). A GWAS of ML-based VCDR identified 299 independent genome-wide significant (GWS; p ≤ 5 × 10-8) hits in 156 loci. The ML-based GWAS replicated 62 of 65 GWS loci from a recent VCDR GWAS in the UKB for which two ophthalmologists manually labeled images for 67,040 Europeans. The ML-based GWAS also identified 93 novel loci, significantly expanding our understanding of the genetic etiologies of glaucoma and VCDR. Pathway analyses support the biological significance of the novel hits to VCDR: select loci near genes involved in neuronal and synaptic biology or harboring variants are known to cause severe Mendelian ophthalmic disease. Finally, the ML-based GWAS results significantly improve polygenic prediction of VCDR and primary open-angle glaucoma in the independent EPIC-Norfolk cohort.


Assuntos
Aprendizado de Máquina , Disco Óptico/anatomia & histologia , Conjuntos de Dados como Assunto , Angiofluoresceinografia , Estudo de Associação Genômica Ampla , Glaucoma de Ângulo Aberto/diagnóstico por imagem , Humanos , Modelos Anatômicos , Disco Óptico/diagnóstico por imagem , Fenótipo , Medição de Risco
11.
Ann Neurol ; 90(1): 76-88, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-33938021

RESUMO

OBJECTIVE: The aim of this study was to search for genes/variants that modify the effect of LRRK2 mutations in terms of penetrance and age-at-onset of Parkinson's disease. METHODS: We performed the first genomewide association study of penetrance and age-at-onset of Parkinson's disease in LRRK2 mutation carriers (776 cases and 1,103 non-cases at their last evaluation). Cox proportional hazard models and linear mixed models were used to identify modifiers of penetrance and age-at-onset of LRRK2 mutations, respectively. We also investigated whether a polygenic risk score derived from a published genomewide association study of Parkinson's disease was able to explain variability in penetrance and age-at-onset in LRRK2 mutation carriers. RESULTS: A variant located in the intronic region of CORO1C on chromosome 12 (rs77395454; p value = 2.5E-08, beta = 1.27, SE = 0.23, risk allele: C) met genomewide significance for the penetrance model. Co-immunoprecipitation analyses of LRRK2 and CORO1C supported an interaction between these 2 proteins. A region on chromosome 3, within a previously reported linkage peak for Parkinson's disease susceptibility, showed suggestive associations in both models (penetrance top variant: p value = 1.1E-07; age-at-onset top variant: p value = 9.3E-07). A polygenic risk score derived from publicly available Parkinson's disease summary statistics was a significant predictor of penetrance, but not of age-at-onset. INTERPRETATION: This study suggests that variants within or near CORO1C may modify the penetrance of LRRK2 mutations. In addition, common Parkinson's disease associated variants collectively increase the penetrance of LRRK2 mutations. ANN NEUROL 2021;90:82-94.


Assuntos
Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/genética , Doença de Parkinson/genética , Idoso , Feminino , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Mutação , Penetrância
12.
Bioinformatics ; 36(24): 5582-5589, 2021 Apr 05.
Artigo em Inglês | MEDLINE | ID: mdl-33399819

RESUMO

MOTIVATION: Population-scale sequenced cohorts are foundational resources for genetic analyses, but processing raw reads into analysis-ready cohort-level variants remains challenging. RESULTS: We introduce an open-source cohort-calling method that uses the highly accurate caller DeepVariant and scalable merging tool GLnexus. Using callset quality metrics based on variant recall and precision in benchmark samples and Mendelian consistency in father-mother-child trios, we optimize the method across a range of cohort sizes, sequencing methods and sequencing depths. The resulting callsets show consistent quality improvements over those generated using existing best practices with reduced cost. We further evaluate our pipeline in the deeply sequenced 1000 Genomes Project (1KGP) samples and show superior callset quality metrics and imputation reference panel performance compared to an independently generated GATK Best Practices pipeline. AVAILABILITY AND IMPLEMENTATION: We publicly release the 1KGP individual-level variant calls and cohort callset (https://console.cloud.google.com/storage/browser/brain-genomics-public/research/cohort/1KGP) to foster additional development and evaluation of cohort merging methods as well as broad studies of genetic variation. Both DeepVariant (https://github.com/google/deepvariant) and GLnexus (https://github.com/dnanexus-rnd/GLnexus) are open-source, and the optimized GLnexus setup discovered in this study is also integrated into GLnexus public releases v1.2.2 and later. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

13.
Nat Biotechnol ; 37(5): 561-566, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-30936564

RESUMO

Benchmark small variant calls are required for developing, optimizing and assessing the performance of sequencing and bioinformatics methods. Here, as part of the Genome in a Bottle (GIAB) Consortium, we apply a reproducible, cloud-based pipeline to integrate multiple short- and linked-read sequencing datasets and provide benchmark calls for human genomes. We generate benchmark calls for one previously analyzed GIAB sample, as well as six genomes from the Personal Genome Project. These new genomes have broad, open consent, making this a 'first of its kind' resource that is available to the community for multiple downstream applications. We produce 17% more benchmark single nucleotide variations, 176% more indels and 12% larger benchmark regions than previously published GIAB benchmarks. We demonstrate that this benchmark reliably identifies errors in existing callsets and highlight challenges in interpreting performance metrics when using benchmarks that are not perfect or comprehensive. Finally, we identify strengths and weaknesses of callsets by stratifying performance according to variant type and genome context.


Assuntos
Benchmarking , Biologia Computacional/tendências , Genoma Humano/genética , Genômica/tendências , Variação Genética/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação INDEL/genética , Polimorfismo de Nucleotídeo Único , Software/tendências
14.
Bioinformatics ; 35(21): 4389-4391, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-30916319

RESUMO

SUMMARY: Reference genomes are refined to reflect error corrections and other improvements. While this process improves novel data generation and analysis, incorporating data analyzed on an older reference genome assembly requires transforming the coordinates and representations of the data to the new assembly. Multiple tools exist to perform this transformation for coordinate-only data types, but none supports accurate transformation of genome-wide short variation. Here we present GenomeWarp, a tool for efficiently transforming variants between genome assemblies. GenomeWarp transforms regions and short variants in a conservative manner to minimize false positive and negative variants in the target genome, and converts over 99% of regions and short variants from a representative human genome. AVAILABILITY AND IMPLEMENTATION: GenomeWarp is written in Java. All source code and the user manual are freely available at https://github.com/verilylifesciences/genomewarp. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica , Software , Genoma Humano , Humanos
15.
Nat Biotechnol ; 36(10): 983-987, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30247488

RESUMO

Despite rapid advances in sequencing technologies, accurately calling genetic variants present in an individual genome from billions of short, errorful sequence reads remains challenging. Here we show that a deep convolutional neural network can call genetic variation in aligned next-generation sequencing read data by learning statistical relationships between images of read pileups around putative variant and true genotype calls. The approach, called DeepVariant, outperforms existing state-of-the-art tools. The learned model generalizes across genome builds and mammalian species, allowing nonhuman sequencing projects to benefit from the wealth of human ground-truth data. We further show that DeepVariant can learn to call variants in a variety of sequencing technologies and experimental designs, including deep whole genomes from 10X Genomics and Ion Ampliseq exomes, highlighting the benefits of using more automated and generalizable techniques for variant calling.


Assuntos
Genoma Humano , Mamíferos/genética , Redes Neurais de Computação , Polimorfismo de Nucleotídeo Único , Animais , Análise Mutacional de DNA , Genômica , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação INDEL , Análise de Sequência de DNA , Software
16.
Genome Res ; 28(5): 739-750, 2018 05.
Artigo em Inglês | MEDLINE | ID: mdl-29588361

RESUMO

Models for predicting phenotypic outcomes from genotypes have important applications to understanding genomic function and improving human health. Here, we develop a machine-learning system to predict cell-type-specific epigenetic and transcriptional profiles in large mammalian genomes from DNA sequence alone. By use of convolutional neural networks, this system identifies promoters and distal regulatory elements and synthesizes their content to make effective gene expression predictions. We show that model predictions for the influence of genomic variants on gene expression align well to causal variants underlying eQTLs in human populations and can be useful for generating mechanistic hypotheses to enable fine mapping of disease loci.


Assuntos
Cromossomos/genética , Biologia Computacional/métodos , Redes Neurais de Computação , Sequências Reguladoras de Ácido Nucleico/genética , Animais , Epigenômica/métodos , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica , Genômica/métodos , Humanos , Aprendizado de Máquina , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Regiões Promotoras Genéticas/genética
17.
Sci Transl Med ; 8(322): 322ra9, 2016 Jan 20.
Artigo em Inglês | MEDLINE | ID: mdl-26791950

RESUMO

More than 100,000 genetic variants are reported to cause Mendelian disease in humans, but the penetrance-the probability that a carrier of the purported disease-causing genotype will indeed develop the disease-is generally unknown. We assess the impact of variants in the prion protein gene (PRNP) on the risk of prion disease by analyzing 16,025 prion disease cases, 60,706 population control exomes, and 531,575 individuals genotyped by 23andMe Inc. We show that missense variants in PRNP previously reported to be pathogenic are at least 30 times more common in the population than expected on the basis of genetic prion disease prevalence. Although some of this excess can be attributed to benign variants falsely assigned as pathogenic, other variants have genuine effects on disease susceptibility but confer lifetime risks ranging from <0.1 to ~100%. We also show that truncating variants in PRNP have position-dependent effects, with true loss-of-function alleles found in healthy older individuals, a finding that supports the safety of therapeutic suppression of prion protein expression.


Assuntos
Penetrância , Doenças Priônicas/genética , Estudos de Casos e Controles , Estudos de Coortes , Predisposição Genética para Doença , Humanos , Mutação/genética , Príons/genética , Fatores de Risco
18.
Lancet Neurol ; 14(10): 1002-9, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26271532

RESUMO

BACKGROUND: Accurate diagnosis and early detection of complex diseases, such as Parkinson's disease, has the potential to be of great benefit for researchers and clinical practice. We aimed to create a non-invasive, accurate classification model for the diagnosis of Parkinson's disease, which could serve as a basis for future disease prediction studies in longitudinal cohorts. METHODS: We developed a model for disease classification using data from the Parkinson's Progression Marker Initiative (PPMI) study for 367 patients with Parkinson's disease and phenotypically typical imaging data and 165 controls without neurological disease. Olfactory function, genetic risk, family history of Parkinson's disease, age, and gender were algorithmically selected by stepwise logistic regression as significant contributors to our classifying model. We then tested the model with data from 825 patients with Parkinson's disease and 261 controls from five independent cohorts with varying recruitment strategies and designs: the Parkinson's Disease Biomarkers Program (PDBP), the Parkinson's Associated Risk Study (PARS), 23andMe, the Longitudinal and Biomarker Study in PD (LABS-PD), and the Morris K Udall Parkinson's Disease Research Center of Excellence cohort (Penn-Udall). Additionally, we used our model to investigate patients who had imaging scans without evidence of dopaminergic deficit (SWEDD). FINDINGS: In the population from PPMI, our initial model correctly distinguished patients with Parkinson's disease from controls at an area under the curve (AUC) of 0·923 (95% CI 0·900-0·946) with high sensitivity (0·834, 95% CI 0·711-0·883) and specificity (0·903, 95% CI 0·824-0·946) at its optimum AUC threshold (0·655). All Hosmer-Lemeshow simulations suggested that when parsed into random subgroups, the subgroup data matched that of the overall cohort. External validation showed good classification of Parkinson's disease, with AUCs of 0·894 (95% CI 0·867-0·921) in the PDBP cohort, 0·998 (0·992-1·000) in PARS, 0·955 (no 95% CI available) in 23andMe, 0·929 (0·896-0·962) in LABS-PD, and 0·939 (0·891-0·986) in the Penn-Udall cohort. Four of 17 SWEDD participants who our model classified as having Parkinson's disease converted to Parkinson's disease within 1 year, whereas only one of 38 SWEDD participants who were not classified as having Parkinson's disease underwent conversion (test of proportions, p=0·003). INTERPRETATION: Our model provides a potential new approach to distinguish participants with Parkinson's disease from controls. If the model can also identify individuals with prodromal or preclinical Parkinson's disease in prospective cohorts, it could facilitate identification of biomarkers and interventions. FUNDING: National Institute on Aging, National Institute of Neurological Disorders and Stroke, and the Michael J Fox Foundation.


Assuntos
Modelos Estatísticos , Doença de Parkinson/diagnóstico , Idoso , Estudos de Coortes , Progressão da Doença , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Doença de Parkinson/genética , Sintomas Prodrômicos
19.
Mol Biol Evol ; 31(8): 2212-22, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-24784137

RESUMO

Analysis of genomic segments shared identical-by-descent (IBD) between individuals is fundamental to many genetic applications, from demographic inference to estimating the heritability of diseases, but IBD detection accuracy in nonsimulated data is largely unknown. In principle, it can be evaluated using known pedigrees, as IBD segments are by definition inherited without recombination down a family tree. We extracted 25,432 genotyped European individuals containing 2,952 father-mother-child trios from the 23andMe, Inc. data set. We then used GERMLINE, a widely used IBD detection method, to detect IBD segments within this cohort. Exploiting known familial relationships, we identified a false-positive rate over 67% for 2-4 centiMorgan (cM) segments, in sharp contrast with accuracies reported in simulated data at these sizes. Nearly all false positives arose from the allowance of haplotype switch errors when detecting IBD, a necessity for retrieving long (>6 cM) segments in the presence of imperfect phasing. We introduce HaploScore, a novel, computationally efficient metric that scores IBD segments proportional to the number of switch errors they contain. Applying HaploScore filtering to the IBD data at a precision of 0.8 produced a 13-fold increase in recall when compared with length-based filtering. We replicate the false IBD findings and demonstrate the generalizability of HaploScore to alternative data sources using an independent cohort of 555 European individuals from the 1000 Genomes project. HaploScore can improve the accuracy of segments reported by any IBD detection method, provided that estimates of the genotyping error rate and switch error rate are available.


Assuntos
Biologia Computacional/métodos , População Branca/genética , Simulação por Computador , Genética Populacional , Genoma Humano , Haplótipos , Humanos , Linhagem
20.
Science ; 343(6167): 189-193, 2014 Jan 10.
Artigo em Inglês | MEDLINE | ID: mdl-24336570

RESUMO

Tumor recurrence is a leading cause of cancer mortality. Therapies for recurrent disease may fail, at least in part, because the genomic alterations driving the growth of recurrences are distinct from those in the initial tumor. To explore this hypothesis, we sequenced the exomes of 23 initial low-grade gliomas and recurrent tumors resected from the same patients. In 43% of cases, at least half of the mutations in the initial tumor were undetected at recurrence, including driver mutations in TP53, ATRX, SMARCA4, and BRAF; this suggests that recurrent tumors are often seeded by cells derived from the initial tumor at a very early stage of their evolution. Notably, tumors from 6 of 10 patients treated with the chemotherapeutic drug temozolomide (TMZ) followed an alternative evolutionary path to high-grade glioma. At recurrence, these tumors were hypermutated and harbored driver mutations in the RB (retinoblastoma) and Akt-mTOR (mammalian target of rapamycin) pathways that bore the signature of TMZ-induced mutagenesis.


Assuntos
Antineoplásicos Alquilantes/efeitos adversos , Neoplasias Encefálicas/tratamento farmacológico , Neoplasias Encefálicas/patologia , Dacarbazina/análogos & derivados , Glioma/tratamento farmacológico , Glioma/patologia , Recidiva Local de Neoplasia/induzido quimicamente , Recidiva Local de Neoplasia/genética , Antineoplásicos Alquilantes/uso terapêutico , Encéfalo/efeitos dos fármacos , Encéfalo/patologia , Neoplasias Encefálicas/genética , DNA Helicases/genética , Análise Mutacional de DNA , Dacarbazina/efeitos adversos , Dacarbazina/uso terapêutico , Glioma/genética , Humanos , Mutagênese/efeitos dos fármacos , Gradação de Tumores , Recidiva Local de Neoplasia/tratamento farmacológico , Proteínas Nucleares/genética , Proteínas Proto-Oncogênicas B-raf/genética , Proteínas Proto-Oncogênicas c-akt/genética , Serina-Treonina Quinases TOR/genética , Temozolomida , Fatores de Transcrição/genética , Proteína Supressora de Tumor p53/genética , Proteína Nuclear Ligada ao X
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA