Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 24
Filter
Add more filters

Publication year range
1.
Am J Hum Genet ; 108(7): 1217-1230, 2021 07 01.
Article in English | MEDLINE | ID: mdl-34077760

ABSTRACT

Genome-wide association studies (GWASs) require accurate cohort phenotyping, but expert labeling can be costly, time intensive, and variable. Here, we develop a machine learning (ML) model to predict glaucomatous optic nerve head features from color fundus photographs. We used the model to predict vertical cup-to-disc ratio (VCDR), a diagnostic parameter and cardinal endophenotype for glaucoma, in 65,680 Europeans in the UK Biobank (UKB). A GWAS of ML-based VCDR identified 299 independent genome-wide significant (GWS; p ≤ 5 × 10-8) hits in 156 loci. The ML-based GWAS replicated 62 of 65 GWS loci from a recent VCDR GWAS in the UKB for which two ophthalmologists manually labeled images for 67,040 Europeans. The ML-based GWAS also identified 93 novel loci, significantly expanding our understanding of the genetic etiologies of glaucoma and VCDR. Pathway analyses support the biological significance of the novel hits to VCDR: select loci near genes involved in neuronal and synaptic biology or harboring variants are known to cause severe Mendelian ophthalmic disease. Finally, the ML-based GWAS results significantly improve polygenic prediction of VCDR and primary open-angle glaucoma in the independent EPIC-Norfolk cohort.


Subject(s)
Machine Learning , Optic Disk/anatomy & histology , Datasets as Topic , Fluorescein Angiography , Genome-Wide Association Study , Glaucoma, Open-Angle/diagnostic imaging , Humans , Models, Anatomic , Optic Disk/diagnostic imaging , Phenotype , Risk Assessment
2.
Ann Neurol ; 90(1): 76-88, 2021 07.
Article in English | MEDLINE | ID: mdl-33938021

ABSTRACT

OBJECTIVE: The aim of this study was to search for genes/variants that modify the effect of LRRK2 mutations in terms of penetrance and age-at-onset of Parkinson's disease. METHODS: We performed the first genomewide association study of penetrance and age-at-onset of Parkinson's disease in LRRK2 mutation carriers (776 cases and 1,103 non-cases at their last evaluation). Cox proportional hazard models and linear mixed models were used to identify modifiers of penetrance and age-at-onset of LRRK2 mutations, respectively. We also investigated whether a polygenic risk score derived from a published genomewide association study of Parkinson's disease was able to explain variability in penetrance and age-at-onset in LRRK2 mutation carriers. RESULTS: A variant located in the intronic region of CORO1C on chromosome 12 (rs77395454; p value = 2.5E-08, beta = 1.27, SE = 0.23, risk allele: C) met genomewide significance for the penetrance model. Co-immunoprecipitation analyses of LRRK2 and CORO1C supported an interaction between these 2 proteins. A region on chromosome 3, within a previously reported linkage peak for Parkinson's disease susceptibility, showed suggestive associations in both models (penetrance top variant: p value = 1.1E-07; age-at-onset top variant: p value = 9.3E-07). A polygenic risk score derived from publicly available Parkinson's disease summary statistics was a significant predictor of penetrance, but not of age-at-onset. INTERPRETATION: This study suggests that variants within or near CORO1C may modify the penetrance of LRRK2 mutations. In addition, common Parkinson's disease associated variants collectively increase the penetrance of LRRK2 mutations. ANN NEUROL 2021;90:82-94.


Subject(s)
Leucine-Rich Repeat Serine-Threonine Protein Kinase-2/genetics , Parkinson Disease/genetics , Aged , Female , Genetic Predisposition to Disease , Genome-Wide Association Study , Genotype , Humans , Male , Middle Aged , Mutation , Penetrance
3.
Nature ; 498(7453): 241-5, 2013 Jun 13.
Article in English | MEDLINE | ID: mdl-23739326

ABSTRACT

Previous investigations of the core gene regulatory circuitry that controls the pluripotency of embryonic stem (ES) cells have largely focused on the roles of transcription, chromatin and non-coding RNA regulators. Alternative splicing represents a widely acting mode of gene regulation, yet its role in regulating ES-cell pluripotency and differentiation is poorly understood. Here we identify the muscleblind-like RNA binding proteins, MBNL1 and MBNL2, as conserved and direct negative regulators of a large program of cassette exon alternative splicing events that are differentially regulated between ES cells and other cell types. Knockdown of MBNL proteins in differentiated cells causes switching to an ES-cell-like alternative splicing pattern for approximately half of these events, whereas overexpression of MBNL proteins in ES cells promotes differentiated-cell-like alternative splicing patterns. Among the MBNL-regulated events is an ES-cell-specific alternative splicing switch in the forkhead family transcription factor FOXP1 that controls pluripotency. Consistent with a central and negative regulatory role for MBNL proteins in pluripotency, their knockdown significantly enhances the expression of key pluripotency genes and the formation of induced pluripotent stem cells during somatic cell reprogramming.


Subject(s)
Alternative Splicing , Cellular Reprogramming , DNA-Binding Proteins/metabolism , Embryonic Stem Cells/cytology , Embryonic Stem Cells/metabolism , RNA-Binding Proteins/metabolism , Alternative Splicing/genetics , Amino Acid Motifs , Animals , Cell Differentiation/genetics , Cell Line , DNA-Binding Proteins/chemistry , DNA-Binding Proteins/deficiency , DNA-Binding Proteins/genetics , Fibroblasts/cytology , Fibroblasts/metabolism , Forkhead Transcription Factors/metabolism , Gene Knockdown Techniques , HEK293 Cells , HeLa Cells , Humans , Induced Pluripotent Stem Cells/cytology , Induced Pluripotent Stem Cells/metabolism , Kinetics , Mice , RNA-Binding Proteins/chemistry , RNA-Binding Proteins/genetics , Repressor Proteins/metabolism
4.
Genome Res ; 24(11): 1774-86, 2014 Nov.
Article in English | MEDLINE | ID: mdl-25258385

ABSTRACT

Alternative splicing (AS) of precursor RNAs is responsible for greatly expanding the regulatory and functional capacity of eukaryotic genomes. Of the different classes of AS, intron retention (IR) is the least well understood. In plants and unicellular eukaryotes, IR is the most common form of AS, whereas in animals, it is thought to represent the least prevalent form. Using high-coverage poly(A)(+) RNA-seq data, we observe that IR is surprisingly frequent in mammals, affecting transcripts from as many as three-quarters of multiexonic genes. A highly correlated set of cis features comprising an "IR code" reliably discriminates retained from constitutively spliced introns. We show that IR acts widely to reduce the levels of transcripts that are less or not required for the physiology of the cell or tissue type in which they are detected. This "transcriptome tuning" function of IR acts through both nonsense-mediated mRNA decay and nuclear sequestration and turnover of IR transcripts. We further show that IR is linked to a cross-talk mechanism involving localized stalling of RNA polymerase II (Pol II) and reduced availability of spliceosomal components. Collectively, the results implicate a global checkpoint-type mechanism whereby reduced recruitment of splicing components coupled to Pol II pausing underlies widespread IR-mediated suppression of inappropriately expressed transcripts.


Subject(s)
Alternative Splicing , Introns/genetics , Mammals/genetics , Transcriptome/genetics , 3T3 Cells , Animals , Cell Differentiation/genetics , Cell Line , Cell Line, Tumor , Cells, Cultured , Evolution, Molecular , HeLa Cells , Humans , K562 Cells , Mammals/classification , Mice , Models, Genetic , Organ Specificity , Principal Component Analysis , RNA Polymerase II/metabolism , RNA Precursors/genetics , RNA Precursors/metabolism , Reverse Transcriptase Polymerase Chain Reaction , Species Specificity , Vertebrates/classification , Vertebrates/genetics
5.
BMC Genomics ; 17(1): 787, 2016 10 07.
Article in English | MEDLINE | ID: mdl-27717327

ABSTRACT

BACKGROUND: Alternative mRNA splicing is critical to proteomic diversity and tissue and species differentiation. Exclusion of cassette exons, also called exon skipping, is the most common type of alternative splicing in mammals. RESULTS: We present a computational model that predicts absolute (though not tissue-differential) percent-spliced-in of cassette exons more accurately than previous models, despite not using any 'hand-crafted' biological features such as motif counts. We achieve nearly identical performance using only the conservation score (mammalian phastCons) of each splice junction normalized by average conservation over 100 bp of the corresponding flanking intron, demonstrating that conservation is an unexpectedly powerful indicator of alternative splicing patterns. Using this method, we provide evidence that intronic splicing regulation occurs predominantly within 100 bp of the alternative splice sites and that conserved elements in this region are, as expected, functioning as splicing regulators. We show that among conserved cassette exons, increased conservation of flanking introns is associated with reduced inclusion. We also propose a new definition of intronic splicing regulatory elements (ISREs) that is independent of conservation, and show that most ISREs do not match known binding sites or splicing factors despite being predictive of percent-spliced-in. CONCLUSIONS: These findings suggest that one mechanism for the evolutionary transition from constitutive to alternative splicing is the emergence of cis-acting splicing inhibitors. The association of our ISREs with differences in splicing suggests the existence of novel RNA-binding proteins and/or novel splicing roles for known RNA-binding proteins.


Subject(s)
Alternative Splicing , Evolution, Molecular , Models, Biological , Animals , Area Under Curve , Brain/metabolism , Exons , Gene Expression Regulation , Humans , Introns , Organ Specificity/genetics , RNA Splice Sites , Regulatory Sequences, Nucleic Acid
6.
Nat Genet ; 55(5): 787-795, 2023 05.
Article in English | MEDLINE | ID: mdl-37069358

ABSTRACT

Chronic obstructive pulmonary disease (COPD), the third leading cause of death worldwide, is highly heritable. While COPD is clinically defined by applying thresholds to summary measures of lung function, a quantitative liability score has more power to identify genetic signals. Here we train a deep convolutional neural network on noisy self-reported and International Classification of Diseases labels to predict COPD case-control status from high-dimensional raw spirograms and use the model's predictions as a liability score. The machine-learning-based (ML-based) liability score accurately discriminates COPD cases and controls, and predicts COPD-related hospitalization without any domain-specific knowledge. Moreover, the ML-based liability score is associated with overall survival and exacerbation events. A genome-wide association study on the ML-based liability score replicates existing COPD and lung function loci and also identifies 67 new loci. Lastly, our method provides a general framework to use ML methods and medical-record-based labels that does not require domain knowledge or expert curation to improve disease prediction and genomic discovery for drug design.


Subject(s)
Deep Learning , Pulmonary Disease, Chronic Obstructive , Humans , Genome-Wide Association Study/methods , Pulmonary Disease, Chronic Obstructive/genetics , Genetic Loci , Polymorphism, Single Nucleotide/genetics
7.
Nat Commun ; 13(1): 241, 2022 01 11.
Article in English | MEDLINE | ID: mdl-35017556

ABSTRACT

Genome-wide association studies (GWASs) examine the association between genotype and phenotype while adjusting for a set of covariates. Although the covariates may have non-linear or interactive effects, due to the challenge of specifying the model, GWAS often neglect such terms. Here we introduce DeepNull, a method that identifies and adjusts for non-linear and interactive covariate effects using a deep neural network. In analyses of simulated and real data, we demonstrate that DeepNull maintains tight control of the type I error while increasing statistical power by up to 20% in the presence of non-linear and interactive effects. Moreover, in the absence of such effects, DeepNull incurs no loss of power. When applied to 10 phenotypes from the UK Biobank (n = 370K), DeepNull discovered more hits (+6%) and loci (+7%), on average, than conventional association analyses, many of which are biologically plausible or have previously been reported. Finally, DeepNull improves upon linear modeling for phenotypic prediction (+23% on average).


Subject(s)
Genome-Wide Association Study/methods , Phenotype , Computer Simulation , Linear Models , Research Design
8.
Nat Commun ; 12(1): 160, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33420020

ABSTRACT

We trained and validated risk prediction models for the three major types of skin cancer- basal cell carcinoma (BCC), squamous cell carcinoma (SCC), and melanoma-on a cross-sectional and longitudinal dataset of 210,000 consented research participants who responded to an online survey covering personal and family history of skin cancer, skin susceptibility, and UV exposure. We developed a primary disease risk score (DRS) that combined all 32 identified genetic and non-genetic risk factors. Top percentile DRS was associated with an up to 13-fold increase (odds ratio per standard deviation increase >2.5) in the risk of developing skin cancer relative to the middle DRS percentile. To derive lifetime risk trajectories for the three skin cancers, we developed a second and age independent disease score, called DRSA. Using incident cases, we demonstrated that DRSA could be used in early detection programs for identifying high risk asymptotic individuals, and predicting when they are likely to develop skin cancer. High DRSA scores were not only associated with earlier disease diagnosis (by up to 14 years), but also with more severe and recurrent forms of skin cancer.


Subject(s)
Carcinoma, Basal Cell/epidemiology , Carcinoma, Squamous Cell/epidemiology , Melanoma/epidemiology , Models, Statistical , Neoplasm Recurrence, Local/epidemiology , Skin Neoplasms/epidemiology , Adult , Aged , Aged, 80 and over , Carcinoma, Basal Cell/etiology , Carcinoma, Basal Cell/pathology , Carcinoma, Squamous Cell/etiology , Cross-Sectional Studies , Datasets as Topic , Direct-To-Consumer Screening and Testing/statistics & numerical data , Female , Follow-Up Studies , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Incidence , Longitudinal Studies , Male , Medical History Taking , Melanoma/etiology , Melanoma/pathology , Middle Aged , Neoplasm Recurrence, Local/etiology , Neoplasm Recurrence, Local/pathology , Odds Ratio , Prospective Studies , Risk Assessment/methods , Risk Factors , Skin/pathology , Skin/radiation effects , Skin Neoplasms/etiology , Skin Neoplasms/pathology , Surveys and Questionnaires/statistics & numerical data , Ultraviolet Rays/adverse effects , White People/genetics
9.
Bioinformatics ; 25(12): i268-75, 2009 Jun 15.
Article in English | MEDLINE | ID: mdl-19477998

ABSTRACT

MOTIVATION: Picking peaks from experimental NMR spectra is a key unsolved problem for automated NMR protein structure determination. Such a process is a prerequisite for resonance assignment, nuclear overhauser enhancement (NOE) distance restraint assignment, and structure calculation tasks. Manual or semi-automatic peak picking, which is currently the prominent way used in NMR labs, is tedious, time consuming and costly. RESULTS: We introduce new ideas, including noise-level estimation, component forming and sub-division, singular value decomposition (SVD)-based peak picking and peak pruning and refinement. PICKY is developed as an automated peak picking method. Different from the previous research on peak picking, we provide a systematic study of the proposed method. PICKY is tested on 32 real 2D and 3D spectra of eight target proteins, and achieves an average of 88% recall and 74% precision. PICKY is efficient. It takes PICKY on average 15.7 s to process an NMR spectrum. More important than these numbers, PICKY actually works in practice. We feed peak lists generated by PICKY to IPASS for resonance assignment, feed IPASS assignment to SPARTA for fragments generation, and feed SPARTA fragments to FALCON for structure calculation. This results in high-resolution structures of several proteins, for example, TM1112, at 1.25 A. AVAILABILITY: PICKY is available upon request. The peak lists of PICKY can be easily loaded by SPARKY to enable a better interactive strategy for rapid peak picking.


Subject(s)
Nuclear Magnetic Resonance, Biomolecular/methods , Proteins/chemistry , Software , Algorithms , Pattern Recognition, Automated
10.
Nat Med ; 26(6): 869-877, 2020 06.
Article in English | MEDLINE | ID: mdl-32461697

ABSTRACT

Human genetic variants predicted to cause loss-of-function of protein-coding genes (pLoF variants) provide natural in vivo models of human gene inactivation and can be valuable indicators of gene function and the potential toxicity of therapeutic inhibitors targeting these genes1,2. Gain-of-kinase-function variants in LRRK2 are known to significantly increase the risk of Parkinson's disease3,4, suggesting that inhibition of LRRK2 kinase activity is a promising therapeutic strategy. While preclinical studies in model organisms have raised some on-target toxicity concerns5-8, the biological consequences of LRRK2 inhibition have not been well characterized in humans. Here, we systematically analyze pLoF variants in LRRK2 observed across 141,456 individuals sequenced in the Genome Aggregation Database (gnomAD)9, 49,960 exome-sequenced individuals from the UK Biobank and over 4 million participants in the 23andMe genotyped dataset. After stringent variant curation, we identify 1,455 individuals with high-confidence pLoF variants in LRRK2. Experimental validation of three variants, combined with previous work10, confirmed reduced protein levels in 82.5% of our cohort. We show that heterozygous pLoF variants in LRRK2 reduce LRRK2 protein levels but that these are not strongly associated with any specific phenotype or disease state. Our results demonstrate the value of large-scale genomic databases and phenotyping of human loss-of-function carriers for target validation in drug discovery.


Subject(s)
Leucine-Rich Repeat Serine-Threonine Protein Kinase-2/genetics , Loss of Function Mutation/genetics , Adult , Aged , Aged, 80 and over , Biological Specimen Banks , Cell Line , Embryonic Stem Cells/metabolism , Female , Gain of Function Mutation/genetics , Heterozygote , Humans , Leucine-Rich Repeat Serine-Threonine Protein Kinase-2/antagonists & inhibitors , Leucine-Rich Repeat Serine-Threonine Protein Kinase-2/metabolism , Longevity/genetics , Lymphocytes/metabolism , Male , Middle Aged , Myocytes, Cardiac/metabolism , Parkinson Disease/drug therapy , Parkinson Disease/genetics , Phenotype
11.
NPJ Parkinsons Dis ; 5: 4, 2019.
Article in English | MEDLINE | ID: mdl-30937360

ABSTRACT

In order to systematically describe the Parkinson's disease phenome, we performed a series of 832 cross-sectional case-control analyses in a large database. Responses to 832 online survey-based phenotypes including diseases, medications, and environmental exposures were analyzed in 23andMe research participants. For each phenotype, survey respondents were used to construct a cohort of Parkinson's disease cases and age-matched and sex-matched controls, and an association test was performed using logistic regression. Cohorts included a median of 3899 Parkinson's disease cases and 49,808 controls, all of European ancestry. Highly correlated phenotypes were removed and the novelty of each significant association was systematically assessed (assigned to one of four categories: known, likely, unclear, or novel). Parkinson's disease diagnosis was associated with 122 phenotypes. We replicated 27 known associations and found 23 associations with a strong a priori link to a known association. We discovered 42 associations that have not previously been reported. Migraine, obsessive-compulsive disorder, and seasonal allergies were associated with Parkinson's disease and tend to occur decades before the typical age of diagnosis for Parkinson's disease. The phenotypes that currently comprise the Parkinson's disease phenome have mostly been explored in relatively small purpose-built studies. Using a single large dataset, we have successfully reproduced many of these established associations and have extended the Parkinson's disease phenome by discovering novel associations. Our work paves the way for studies of these associated phenotypes that explore shared molecular mechanisms with Parkinson's disease, infer causal relationships, and improve our ability to identify individuals at high-risk of Parkinson's disease.

12.
Nat Commun ; 10(1): 690, 2019 02 11.
Article in English | MEDLINE | ID: mdl-30741935

ABSTRACT

The correspondence between cerebral glucose metabolism (indexing energy utilization) and synchronous fluctuations in blood oxygenation (indexing neuronal activity) is relevant for neuronal specialization and is affected by brain disorders. Here, we define novel measures of relative power (rPWR, extent of concurrent energy utilization and activity) and relative cost (rCST, extent that energy utilization exceeds activity), derived from FDG-PET and fMRI. We show that resting-state networks have distinct energetic signatures and that brain could be classified into major bilateral segments based on rPWR and rCST. While medial-visual and default-mode networks have the highest rPWR, frontoparietal networks have the highest rCST. rPWR and rCST estimates are generalizable to other indexes of energy supply and neuronal activity, and are sensitive to neurocognitive effects of acute and chronic alcohol exposure. rPWR and rCST are informative metrics for characterizing brain pathology and alternative energy use, and may provide new multimodal biomarkers of neuropsychiatric disorders.


Subject(s)
Brain Chemistry/physiology , Brain Mapping , Brain/physiology , Glucose/metabolism , Adult , Biomarkers/metabolism , Brain/pathology , Female , Humans , Image Processing, Computer-Assisted , Magnetic Resonance Imaging/methods , Male , Middle Aged , Multimodal Imaging , Nerve Net/physiology , Neurons/metabolism , Positron-Emission Tomography , Young Adult
14.
NPJ Genom Med ; 1: 160271-1602710, 2016 Aug 03.
Article in English | MEDLINE | ID: mdl-27525107

ABSTRACT

De novo mutations (DNMs) are important in Autism Spectrum Disorder (ASD), but so far analyses have mainly been on the ~1.5% of the genome encoding genes. Here, we performed whole genome sequencing (WGS) of 200 ASD parent-child trios and characterized germline and somatic DNMs. We confirmed that the majority of germline DNMs (75.6%) originated from the father, and these increased significantly with paternal age only (p=4.2×10-10). However, when clustered DNMs (those within 20kb) were found in ASD, not only did they mostly originate from the mother (p=7.7×10-13), but they could also be found adjacent to de novo copy number variations (CNVs) where the mutation rate was significantly elevated (p=2.4×10-24). By comparing DNMs detected in controls, we found a significant enrichment of predicted damaging DNMs in ASD cases (p=8.0×10-9; OR=1.84), of which 15.6% (p=4.3×10-3) and 22.5% (p=7.0×10-5) were in the non-coding or genic non-coding, respectively. The non-coding elements most enriched for DNM were untranslated regions of genes, boundaries involved in exon-skipping and DNase I hypersensitive regions. Using microarrays and a novel outlier detection test, we also found aberrant methylation profiles in 2/185 (1.1%) of ASD cases. These same individuals carried independently identified DNMs in the ASD risk- and epigenetic- genes DNMT3A and ADNP. Our data begins to characterize different genome-wide DNMs, and highlight the contribution of non-coding variants, to the etiology of ASD.

15.
NPJ Genom Med ; 12016 Jan 13.
Article in English | MEDLINE | ID: mdl-28567303

ABSTRACT

The standard of care for first-tier clinical investigation of the etiology of congenital malformations and neurodevelopmental disorders is chromosome microarray analysis (CMA) for copy number variations (CNVs), often followed by gene(s)-specific sequencing searching for smaller insertion-deletions (indels) and single nucleotide variant (SNV) mutations. Whole genome sequencing (WGS) has the potential to capture all classes of genetic variation in one experiment; however, the diagnostic yield for mutation detection of WGS compared to CMA, and other tests, needs to be established. In a prospective study we utilized WGS and comprehensive medical annotation to assess 100 patients referred to a paediatric genetics service and compared the diagnostic yield versus standard genetic testing. WGS identified genetic variants meeting clinical diagnostic criteria in 34% of cases, representing a 4-fold increase in diagnostic rate over CMA (8%) (p-value = 1.42e-05) alone and >2-fold increase in CMA plus targeted gene sequencing (13%) (p-value = 0.0009). WGS identified all rare clinically significant CNVs that were detected by CMA. In 26 patients, WGS revealed indel and missense mutations presenting in a dominant (63%) or a recessive (37%) manner. We found four subjects with mutations in at least two genes associated with distinct genetic disorders, including two cases harboring a pathogenic CNV and SNV. When considering medically actionable secondary findings in addition to primary WGS findings, 38% of patients would benefit from genetic counseling. Clinical implementation of WGS as a primary test will provide a higher diagnostic yield than conventional genetic testing and potentially reduce the time required to reach a genetic diagnosis.

16.
Nat Biotechnol ; 33(8): 831-8, 2015 Aug.
Article in English | MEDLINE | ID: mdl-26213851

ABSTRACT

Knowing the sequence specificities of DNA- and RNA-binding proteins is essential for developing models of the regulatory processes in biological systems and for identifying causal disease variants. Here we show that sequence specificities can be ascertained from experimental data with 'deep learning' techniques, which offer a scalable, flexible and unified computational approach for pattern discovery. Using a diverse array of experimental data and evaluation metrics, we find that deep learning outperforms other state-of-the-art methods, even when training on in vitro data and testing on in vivo data. We call this approach DeepBind and have built a stand-alone software tool that is fully automatic and handles millions of sequences per experiment. Specificities determined by DeepBind are readily visualized as a weighted ensemble of position weight matrices or as a 'mutation map' that indicates how variations affect binding within a specific sequence.


Subject(s)
Computational Biology/methods , DNA-Binding Proteins/chemistry , RNA-Binding Proteins/chemistry , Sequence Analysis, Protein/methods , Software , Position-Specific Scoring Matrices
17.
G3 (Bethesda) ; 5(11): 2453-61, 2015 Sep 16.
Article in English | MEDLINE | ID: mdl-26384369

ABSTRACT

Chromosome 22q11.2 microdeletions impart a high but incomplete risk for schizophrenia. Possible mechanisms include genome-wide effects of DGCR8 haploinsufficiency. In a proof-of-principle study to assess the power of this model, we used high-quality, whole-genome sequencing of nine individuals with 22q11.2 deletions and extreme phenotypes (schizophrenia, or no psychotic disorder at age >50 years). The schizophrenia group had a greater burden of rare, damaging variants impacting protein-coding neurofunctional genes, including genes involved in neuron projection (nominal P = 0.02, joint burden of three variant types). Variants in the intact 22q11.2 region were not major contributors. Restricting to genes affected by a DGCR8 mechanism tended to amplify between-group differences. Damaging variants in highly conserved long intergenic noncoding RNA genes also were enriched in the schizophrenia group (nominal P = 0.04). The findings support the 22q11.2 deletion model as a threshold-lowering first hit for schizophrenia risk. If applied to a larger and thus better-powered cohort, this appears to be a promising approach to identify genome-wide rare variants in coding and noncoding sequence that perturb gene networks relevant to idiopathic schizophrenia. Similarly designed studies exploiting genetic models may prove useful to help delineate the genetic architecture of other complex phenotypes.


Subject(s)
DiGeorge Syndrome/complications , Genome, Human , Schizophrenia/genetics , Adolescent , Adult , Case-Control Studies , DiGeorge Syndrome/genetics , Female , Humans , Male , Middle Aged , RNA, Long Noncoding/genetics , RNA-Binding Proteins/genetics , Schizophrenia/epidemiology
18.
Science ; 347(6218): 1254806, 2015 Jan 09.
Article in English | MEDLINE | ID: mdl-25525159

ABSTRACT

To facilitate precision medicine and whole-genome annotation, we developed a machine-learning technique that scores how strongly genetic variants affect RNA splicing, whose alteration contributes to many diseases. Analysis of more than 650,000 intronic and exonic variants revealed widespread patterns of mutation-driven aberrant splicing. Intronic disease mutations that are more than 30 nucleotides from any splice site alter splicing nine times as often as common variants, and missense exonic disease mutations that have the least impact on protein function are five times as likely as others to alter splicing. We detected tens of thousands of disease-causing mutations, including those involved in cancers and spinal muscular atrophy. Examination of intronic and exonic variants found using whole-genome sequencing of individuals with autism revealed misspliced genes with neurodevelopmental phenotypes. Our approach provides evidence for causal variants and should enable new discoveries in precision medicine.


Subject(s)
Artificial Intelligence , Child Development Disorders, Pervasive/genetics , Colorectal Neoplasms, Hereditary Nonpolyposis/genetics , Genome-Wide Association Study/methods , Molecular Sequence Annotation/methods , Muscular Atrophy, Spinal/genetics , RNA Splicing/genetics , Adaptor Proteins, Signal Transducing/genetics , Computer Simulation , DNA/genetics , Exons/genetics , Genetic Code , Genetic Markers , Genetic Variation , Humans , Introns/genetics , Models, Genetic , MutL Protein Homolog 1 , Mutation, Missense , Nuclear Proteins/genetics , Polymorphism, Single Nucleotide , Quantitative Trait Loci , RNA Splice Sites/genetics , RNA-Binding Proteins/genetics
19.
Nat Genet ; 46(7): 742-7, 2014 Jul.
Article in English | MEDLINE | ID: mdl-24859339

ABSTRACT

A universal challenge in genetic studies of autism spectrum disorders (ASDs) is determining whether a given DNA sequence alteration will manifest as disease. Among different population controls, we observed, for specific exons, an inverse correlation between exon expression level in brain and burden of rare missense mutations. For genes that harbor de novo mutations predicted to be deleterious, we found that specific critical exons were significantly enriched in individuals with ASD relative to their siblings without ASD (P < 1.13 × 10(-38); odds ratio (OR) = 2.40). Furthermore, our analysis of genes with high exonic expression in brain and low burden of rare mutations demonstrated enrichment for known ASD-associated genes (P < 3.40 × 10(-11); OR = 6.08) and ASD-relevant fragile-X protein targets (P < 2.91 × 10(-157); OR = 9.52). Our results suggest that brain-expressed exons under purifying selection should be prioritized in genotype-phenotype studies for ASD and related neurodevelopmental conditions.


Subject(s)
Brain/metabolism , Child Development Disorders, Pervasive/genetics , Exons/genetics , Mutation, Missense/genetics , Adolescent , Adult , Brain/pathology , Case-Control Studies , Child, Preschool , Female , Gene Regulatory Networks , Genetic Predisposition to Disease , Humans , Infant , Male , Phenotype , RNA, Messenger/genetics , Real-Time Polymerase Chain Reaction , Reverse Transcriptase Polymerase Chain Reaction
20.
Algorithms Mol Biol ; 8(1): 5, 2013 Feb 25.
Article in English | MEDLINE | ID: mdl-23442792

ABSTRACT

: Previous studies show that the same type of bond lengths and angles fit Gaussian distributions well with small standard deviations on high resolution protein structure data. The mean values of these Gaussian distributions have been widely used as ideal bond lengths and angles in bioinformatics. However, we are not aware of any research done to evaluate how accurately we can model protein structures with dihedral angles and ideal bond lengths and angles.Here, we introduce the protein structure idealization problem. We focus on the protein backbone structure idealization. We describe a fast O(nm/ε) dynamic programming algorithm to find an idealized protein backbone structure that is approximately optimal according to our scoring function. The scoring function evaluates not only the free energy, but also the similarity with the target structure. Thus, the idealized protein structures found by our algorithm are guaranteed to be protein-like and close to the target protein structure.We have implemented our protein structure idealization algorithm and idealized the high resolution protein structures with low sequence identities of the CULLPDB_PC30_RES1.6_R0.25 data set. We demonstrate that idealized backbone structures always exist with small changes and significantly better free energy. We also applied our algorithm to refine protein pseudo-structures determined in NMR experiments.

SELECTION OF CITATIONS
SEARCH DETAIL