RESUMO
Chronic obstructive pulmonary disease (COPD), the third leading cause of death worldwide, is highly heritable. While COPD is clinically defined by applying thresholds to summary measures of lung function, a quantitative liability score has more power to identify genetic signals. Here we train a deep convolutional neural network on noisy self-reported and International Classification of Diseases labels to predict COPD case-control status from high-dimensional raw spirograms and use the model's predictions as a liability score. The machine-learning-based (ML-based) liability score accurately discriminates COPD cases and controls, and predicts COPD-related hospitalization without any domain-specific knowledge. Moreover, the ML-based liability score is associated with overall survival and exacerbation events. A genome-wide association study on the ML-based liability score replicates existing COPD and lung function loci and also identifies 67 new loci. Lastly, our method provides a general framework to use ML methods and medical-record-based labels that does not require domain knowledge or expert curation to improve disease prediction and genomic discovery for drug design.
Assuntos
Aprendizado Profundo , Doença Pulmonar Obstrutiva Crônica , Humanos , Estudo de Associação Genômica Ampla/métodos , Doença Pulmonar Obstrutiva Crônica/genética , Loci Gênicos , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Genome-wide association studies (GWASs) examine the association between genotype and phenotype while adjusting for a set of covariates. Although the covariates may have non-linear or interactive effects, due to the challenge of specifying the model, GWAS often neglect such terms. Here we introduce DeepNull, a method that identifies and adjusts for non-linear and interactive covariate effects using a deep neural network. In analyses of simulated and real data, we demonstrate that DeepNull maintains tight control of the type I error while increasing statistical power by up to 20% in the presence of non-linear and interactive effects. Moreover, in the absence of such effects, DeepNull incurs no loss of power. When applied to 10 phenotypes from the UK Biobank (n = 370K), DeepNull discovered more hits (+6%) and loci (+7%), on average, than conventional association analyses, many of which are biologically plausible or have previously been reported. Finally, DeepNull improves upon linear modeling for phenotypic prediction (+23% on average).
Assuntos
Estudo de Associação Genômica Ampla/métodos , Fenótipo , Simulação por Computador , Modelos Lineares , Projetos de PesquisaRESUMO
Genome-wide association studies (GWASs) require accurate cohort phenotyping, but expert labeling can be costly, time intensive, and variable. Here, we develop a machine learning (ML) model to predict glaucomatous optic nerve head features from color fundus photographs. We used the model to predict vertical cup-to-disc ratio (VCDR), a diagnostic parameter and cardinal endophenotype for glaucoma, in 65,680 Europeans in the UK Biobank (UKB). A GWAS of ML-based VCDR identified 299 independent genome-wide significant (GWS; p ≤ 5 × 10-8) hits in 156 loci. The ML-based GWAS replicated 62 of 65 GWS loci from a recent VCDR GWAS in the UKB for which two ophthalmologists manually labeled images for 67,040 Europeans. The ML-based GWAS also identified 93 novel loci, significantly expanding our understanding of the genetic etiologies of glaucoma and VCDR. Pathway analyses support the biological significance of the novel hits to VCDR: select loci near genes involved in neuronal and synaptic biology or harboring variants are known to cause severe Mendelian ophthalmic disease. Finally, the ML-based GWAS results significantly improve polygenic prediction of VCDR and primary open-angle glaucoma in the independent EPIC-Norfolk cohort.
Assuntos
Aprendizado de Máquina , Disco Óptico/anatomia & histologia , Conjuntos de Dados como Assunto , Angiofluoresceinografia , Estudo de Associação Genômica Ampla , Glaucoma de Ângulo Aberto/diagnóstico por imagem , Humanos , Modelos Anatômicos , Disco Óptico/diagnóstico por imagem , Fenótipo , Medição de RiscoRESUMO
OBJECTIVE: The aim of this study was to search for genes/variants that modify the effect of LRRK2 mutations in terms of penetrance and age-at-onset of Parkinson's disease. METHODS: We performed the first genomewide association study of penetrance and age-at-onset of Parkinson's disease in LRRK2 mutation carriers (776 cases and 1,103 non-cases at their last evaluation). Cox proportional hazard models and linear mixed models were used to identify modifiers of penetrance and age-at-onset of LRRK2 mutations, respectively. We also investigated whether a polygenic risk score derived from a published genomewide association study of Parkinson's disease was able to explain variability in penetrance and age-at-onset in LRRK2 mutation carriers. RESULTS: A variant located in the intronic region of CORO1C on chromosome 12 (rs77395454; p value = 2.5E-08, beta = 1.27, SE = 0.23, risk allele: C) met genomewide significance for the penetrance model. Co-immunoprecipitation analyses of LRRK2 and CORO1C supported an interaction between these 2 proteins. A region on chromosome 3, within a previously reported linkage peak for Parkinson's disease susceptibility, showed suggestive associations in both models (penetrance top variant: p value = 1.1E-07; age-at-onset top variant: p value = 9.3E-07). A polygenic risk score derived from publicly available Parkinson's disease summary statistics was a significant predictor of penetrance, but not of age-at-onset. INTERPRETATION: This study suggests that variants within or near CORO1C may modify the penetrance of LRRK2 mutations. In addition, common Parkinson's disease associated variants collectively increase the penetrance of LRRK2 mutations. ANN NEUROL 2021;90:82-94.
Assuntos
Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/genética , Doença de Parkinson/genética , Idoso , Feminino , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Mutação , PenetrânciaRESUMO
We trained and validated risk prediction models for the three major types of skin cancer- basal cell carcinoma (BCC), squamous cell carcinoma (SCC), and melanoma-on a cross-sectional and longitudinal dataset of 210,000 consented research participants who responded to an online survey covering personal and family history of skin cancer, skin susceptibility, and UV exposure. We developed a primary disease risk score (DRS) that combined all 32 identified genetic and non-genetic risk factors. Top percentile DRS was associated with an up to 13-fold increase (odds ratio per standard deviation increase >2.5) in the risk of developing skin cancer relative to the middle DRS percentile. To derive lifetime risk trajectories for the three skin cancers, we developed a second and age independent disease score, called DRSA. Using incident cases, we demonstrated that DRSA could be used in early detection programs for identifying high risk asymptotic individuals, and predicting when they are likely to develop skin cancer. High DRSA scores were not only associated with earlier disease diagnosis (by up to 14 years), but also with more severe and recurrent forms of skin cancer.
Assuntos
Carcinoma Basocelular/epidemiologia , Carcinoma de Células Escamosas/epidemiologia , Melanoma/epidemiologia , Modelos Estatísticos , Recidiva Local de Neoplasia/epidemiologia , Neoplasias Cutâneas/epidemiologia , Adulto , Idoso , Idoso de 80 Anos ou mais , Carcinoma Basocelular/etiologia , Carcinoma Basocelular/patologia , Carcinoma de Células Escamosas/etiologia , Estudos Transversais , Conjuntos de Dados como Assunto , Triagem e Testes Direto ao Consumidor/estatística & dados numéricos , Feminino , Seguimentos , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Incidência , Estudos Longitudinais , Masculino , Anamnese , Melanoma/etiologia , Melanoma/patologia , Pessoa de Meia-Idade , Recidiva Local de Neoplasia/etiologia , Recidiva Local de Neoplasia/patologia , Razão de Chances , Estudos Prospectivos , Medição de Risco/métodos , Fatores de Risco , Pele/patologia , Pele/efeitos da radiação , Neoplasias Cutâneas/etiologia , Neoplasias Cutâneas/patologia , Inquéritos e Questionários/estatística & dados numéricos , Raios Ultravioleta/efeitos adversos , População Branca/genéticaRESUMO
Human genetic variants predicted to cause loss-of-function of protein-coding genes (pLoF variants) provide natural in vivo models of human gene inactivation and can be valuable indicators of gene function and the potential toxicity of therapeutic inhibitors targeting these genes1,2. Gain-of-kinase-function variants in LRRK2 are known to significantly increase the risk of Parkinson's disease3,4, suggesting that inhibition of LRRK2 kinase activity is a promising therapeutic strategy. While preclinical studies in model organisms have raised some on-target toxicity concerns5-8, the biological consequences of LRRK2 inhibition have not been well characterized in humans. Here, we systematically analyze pLoF variants in LRRK2 observed across 141,456 individuals sequenced in the Genome Aggregation Database (gnomAD)9, 49,960 exome-sequenced individuals from the UK Biobank and over 4 million participants in the 23andMe genotyped dataset. After stringent variant curation, we identify 1,455 individuals with high-confidence pLoF variants in LRRK2. Experimental validation of three variants, combined with previous work10, confirmed reduced protein levels in 82.5% of our cohort. We show that heterozygous pLoF variants in LRRK2 reduce LRRK2 protein levels but that these are not strongly associated with any specific phenotype or disease state. Our results demonstrate the value of large-scale genomic databases and phenotyping of human loss-of-function carriers for target validation in drug discovery.
Assuntos
Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/genética , Mutação com Perda de Função/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Bancos de Espécimes Biológicos , Linhagem Celular , Células-Tronco Embrionárias/metabolismo , Feminino , Mutação com Ganho de Função/genética , Heterozigoto , Humanos , Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/antagonistas & inibidores , Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/metabolismo , Longevidade/genética , Linfócitos/metabolismo , Masculino , Pessoa de Meia-Idade , Miócitos Cardíacos/metabolismo , Doença de Parkinson/tratamento farmacológico , Doença de Parkinson/genética , FenótipoRESUMO
In order to systematically describe the Parkinson's disease phenome, we performed a series of 832 cross-sectional case-control analyses in a large database. Responses to 832 online survey-based phenotypes including diseases, medications, and environmental exposures were analyzed in 23andMe research participants. For each phenotype, survey respondents were used to construct a cohort of Parkinson's disease cases and age-matched and sex-matched controls, and an association test was performed using logistic regression. Cohorts included a median of 3899 Parkinson's disease cases and 49,808 controls, all of European ancestry. Highly correlated phenotypes were removed and the novelty of each significant association was systematically assessed (assigned to one of four categories: known, likely, unclear, or novel). Parkinson's disease diagnosis was associated with 122 phenotypes. We replicated 27 known associations and found 23 associations with a strong a priori link to a known association. We discovered 42 associations that have not previously been reported. Migraine, obsessive-compulsive disorder, and seasonal allergies were associated with Parkinson's disease and tend to occur decades before the typical age of diagnosis for Parkinson's disease. The phenotypes that currently comprise the Parkinson's disease phenome have mostly been explored in relatively small purpose-built studies. Using a single large dataset, we have successfully reproduced many of these established associations and have extended the Parkinson's disease phenome by discovering novel associations. Our work paves the way for studies of these associated phenotypes that explore shared molecular mechanisms with Parkinson's disease, infer causal relationships, and improve our ability to identify individuals at high-risk of Parkinson's disease.
RESUMO
The correspondence between cerebral glucose metabolism (indexing energy utilization) and synchronous fluctuations in blood oxygenation (indexing neuronal activity) is relevant for neuronal specialization and is affected by brain disorders. Here, we define novel measures of relative power (rPWR, extent of concurrent energy utilization and activity) and relative cost (rCST, extent that energy utilization exceeds activity), derived from FDG-PET and fMRI. We show that resting-state networks have distinct energetic signatures and that brain could be classified into major bilateral segments based on rPWR and rCST. While medial-visual and default-mode networks have the highest rPWR, frontoparietal networks have the highest rCST. rPWR and rCST estimates are generalizable to other indexes of energy supply and neuronal activity, and are sensitive to neurocognitive effects of acute and chronic alcohol exposure. rPWR and rCST are informative metrics for characterizing brain pathology and alternative energy use, and may provide new multimodal biomarkers of neuropsychiatric disorders.
Assuntos
Química Encefálica/fisiologia , Mapeamento Encefálico , Encéfalo/fisiologia , Glucose/metabolismo , Adulto , Biomarcadores/metabolismo , Encéfalo/patologia , Feminino , Humanos , Processamento de Imagem Assistida por Computador , Imageamento por Ressonância Magnética/métodos , Masculino , Pessoa de Meia-Idade , Imagem Multimodal , Rede Nervosa/fisiologia , Neurônios/metabolismo , Tomografia por Emissão de Pósitrons , Adulto JovemRESUMO
BACKGROUND: Alternative mRNA splicing is critical to proteomic diversity and tissue and species differentiation. Exclusion of cassette exons, also called exon skipping, is the most common type of alternative splicing in mammals. RESULTS: We present a computational model that predicts absolute (though not tissue-differential) percent-spliced-in of cassette exons more accurately than previous models, despite not using any 'hand-crafted' biological features such as motif counts. We achieve nearly identical performance using only the conservation score (mammalian phastCons) of each splice junction normalized by average conservation over 100 bp of the corresponding flanking intron, demonstrating that conservation is an unexpectedly powerful indicator of alternative splicing patterns. Using this method, we provide evidence that intronic splicing regulation occurs predominantly within 100 bp of the alternative splice sites and that conserved elements in this region are, as expected, functioning as splicing regulators. We show that among conserved cassette exons, increased conservation of flanking introns is associated with reduced inclusion. We also propose a new definition of intronic splicing regulatory elements (ISREs) that is independent of conservation, and show that most ISREs do not match known binding sites or splicing factors despite being predictive of percent-spliced-in. CONCLUSIONS: These findings suggest that one mechanism for the evolutionary transition from constitutive to alternative splicing is the emergence of cis-acting splicing inhibitors. The association of our ISREs with differences in splicing suggests the existence of novel RNA-binding proteins and/or novel splicing roles for known RNA-binding proteins.
Assuntos
Processamento Alternativo , Evolução Molecular , Modelos Biológicos , Animais , Área Sob a Curva , Encéfalo/metabolismo , Éxons , Regulação da Expressão Gênica , Humanos , Íntrons , Especificidade de Órgãos/genética , Sítios de Splice de RNA , Sequências Reguladoras de Ácido NucleicoRESUMO
De novo mutations (DNMs) are important in Autism Spectrum Disorder (ASD), but so far analyses have mainly been on the ~1.5% of the genome encoding genes. Here, we performed whole genome sequencing (WGS) of 200 ASD parent-child trios and characterized germline and somatic DNMs. We confirmed that the majority of germline DNMs (75.6%) originated from the father, and these increased significantly with paternal age only (p=4.2×10-10). However, when clustered DNMs (those within 20kb) were found in ASD, not only did they mostly originate from the mother (p=7.7×10-13), but they could also be found adjacent to de novo copy number variations (CNVs) where the mutation rate was significantly elevated (p=2.4×10-24). By comparing DNMs detected in controls, we found a significant enrichment of predicted damaging DNMs in ASD cases (p=8.0×10-9; OR=1.84), of which 15.6% (p=4.3×10-3) and 22.5% (p=7.0×10-5) were in the non-coding or genic non-coding, respectively. The non-coding elements most enriched for DNM were untranslated regions of genes, boundaries involved in exon-skipping and DNase I hypersensitive regions. Using microarrays and a novel outlier detection test, we also found aberrant methylation profiles in 2/185 (1.1%) of ASD cases. These same individuals carried independently identified DNMs in the ASD risk- and epigenetic- genes DNMT3A and ADNP. Our data begins to characterize different genome-wide DNMs, and highlight the contribution of non-coding variants, to the etiology of ASD.
RESUMO
The standard of care for first-tier clinical investigation of the etiology of congenital malformations and neurodevelopmental disorders is chromosome microarray analysis (CMA) for copy number variations (CNVs), often followed by gene(s)-specific sequencing searching for smaller insertion-deletions (indels) and single nucleotide variant (SNV) mutations. Whole genome sequencing (WGS) has the potential to capture all classes of genetic variation in one experiment; however, the diagnostic yield for mutation detection of WGS compared to CMA, and other tests, needs to be established. In a prospective study we utilized WGS and comprehensive medical annotation to assess 100 patients referred to a paediatric genetics service and compared the diagnostic yield versus standard genetic testing. WGS identified genetic variants meeting clinical diagnostic criteria in 34% of cases, representing a 4-fold increase in diagnostic rate over CMA (8%) (p-value = 1.42e-05) alone and >2-fold increase in CMA plus targeted gene sequencing (13%) (p-value = 0.0009). WGS identified all rare clinically significant CNVs that were detected by CMA. In 26 patients, WGS revealed indel and missense mutations presenting in a dominant (63%) or a recessive (37%) manner. We found four subjects with mutations in at least two genes associated with distinct genetic disorders, including two cases harboring a pathogenic CNV and SNV. When considering medically actionable secondary findings in addition to primary WGS findings, 38% of patients would benefit from genetic counseling. Clinical implementation of WGS as a primary test will provide a higher diagnostic yield than conventional genetic testing and potentially reduce the time required to reach a genetic diagnosis.
RESUMO
Chromosome 22q11.2 microdeletions impart a high but incomplete risk for schizophrenia. Possible mechanisms include genome-wide effects of DGCR8 haploinsufficiency. In a proof-of-principle study to assess the power of this model, we used high-quality, whole-genome sequencing of nine individuals with 22q11.2 deletions and extreme phenotypes (schizophrenia, or no psychotic disorder at age >50 years). The schizophrenia group had a greater burden of rare, damaging variants impacting protein-coding neurofunctional genes, including genes involved in neuron projection (nominal P = 0.02, joint burden of three variant types). Variants in the intact 22q11.2 region were not major contributors. Restricting to genes affected by a DGCR8 mechanism tended to amplify between-group differences. Damaging variants in highly conserved long intergenic noncoding RNA genes also were enriched in the schizophrenia group (nominal P = 0.04). The findings support the 22q11.2 deletion model as a threshold-lowering first hit for schizophrenia risk. If applied to a larger and thus better-powered cohort, this appears to be a promising approach to identify genome-wide rare variants in coding and noncoding sequence that perturb gene networks relevant to idiopathic schizophrenia. Similarly designed studies exploiting genetic models may prove useful to help delineate the genetic architecture of other complex phenotypes.
Assuntos
Síndrome de DiGeorge/complicações , Genoma Humano , Esquizofrenia/genética , Adolescente , Adulto , Estudos de Casos e Controles , Síndrome de DiGeorge/genética , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , RNA Longo não Codificante/genética , Proteínas de Ligação a RNA/genética , Esquizofrenia/epidemiologiaRESUMO
Knowing the sequence specificities of DNA- and RNA-binding proteins is essential for developing models of the regulatory processes in biological systems and for identifying causal disease variants. Here we show that sequence specificities can be ascertained from experimental data with 'deep learning' techniques, which offer a scalable, flexible and unified computational approach for pattern discovery. Using a diverse array of experimental data and evaluation metrics, we find that deep learning outperforms other state-of-the-art methods, even when training on in vitro data and testing on in vivo data. We call this approach DeepBind and have built a stand-alone software tool that is fully automatic and handles millions of sequences per experiment. Specificities determined by DeepBind are readily visualized as a weighted ensemble of position weight matrices or as a 'mutation map' that indicates how variations affect binding within a specific sequence.
Assuntos
Biologia Computacional/métodos , Proteínas de Ligação a DNA/química , Proteínas de Ligação a RNA/química , Análise de Sequência de Proteína/métodos , Software , Matrizes de Pontuação de Posição EspecíficaRESUMO
To facilitate precision medicine and whole-genome annotation, we developed a machine-learning technique that scores how strongly genetic variants affect RNA splicing, whose alteration contributes to many diseases. Analysis of more than 650,000 intronic and exonic variants revealed widespread patterns of mutation-driven aberrant splicing. Intronic disease mutations that are more than 30 nucleotides from any splice site alter splicing nine times as often as common variants, and missense exonic disease mutations that have the least impact on protein function are five times as likely as others to alter splicing. We detected tens of thousands of disease-causing mutations, including those involved in cancers and spinal muscular atrophy. Examination of intronic and exonic variants found using whole-genome sequencing of individuals with autism revealed misspliced genes with neurodevelopmental phenotypes. Our approach provides evidence for causal variants and should enable new discoveries in precision medicine.
Assuntos
Inteligência Artificial , Transtornos Globais do Desenvolvimento Infantil/genética , Neoplasias Colorretais Hereditárias sem Polipose/genética , Estudo de Associação Genômica Ampla/métodos , Anotação de Sequência Molecular/métodos , Atrofia Muscular Espinal/genética , Splicing de RNA/genética , Proteínas Adaptadoras de Transdução de Sinal/genética , Simulação por Computador , DNA/genética , Éxons/genética , Código Genético , Marcadores Genéticos , Variação Genética , Humanos , Íntrons/genética , Modelos Genéticos , Proteína 1 Homóloga a MutL , Mutação de Sentido Incorreto , Proteínas Nucleares/genética , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Sítios de Splice de RNA/genética , Proteínas de Ligação a RNA/genéticaRESUMO
Alternative splicing (AS) of precursor RNAs is responsible for greatly expanding the regulatory and functional capacity of eukaryotic genomes. Of the different classes of AS, intron retention (IR) is the least well understood. In plants and unicellular eukaryotes, IR is the most common form of AS, whereas in animals, it is thought to represent the least prevalent form. Using high-coverage poly(A)(+) RNA-seq data, we observe that IR is surprisingly frequent in mammals, affecting transcripts from as many as three-quarters of multiexonic genes. A highly correlated set of cis features comprising an "IR code" reliably discriminates retained from constitutively spliced introns. We show that IR acts widely to reduce the levels of transcripts that are less or not required for the physiology of the cell or tissue type in which they are detected. This "transcriptome tuning" function of IR acts through both nonsense-mediated mRNA decay and nuclear sequestration and turnover of IR transcripts. We further show that IR is linked to a cross-talk mechanism involving localized stalling of RNA polymerase II (Pol II) and reduced availability of spliceosomal components. Collectively, the results implicate a global checkpoint-type mechanism whereby reduced recruitment of splicing components coupled to Pol II pausing underlies widespread IR-mediated suppression of inappropriately expressed transcripts.
Assuntos
Processamento Alternativo , Íntrons/genética , Mamíferos/genética , Transcriptoma/genética , Células 3T3 , Animais , Diferenciação Celular/genética , Linhagem Celular , Linhagem Celular Tumoral , Células Cultivadas , Evolução Molecular , Células HeLa , Humanos , Células K562 , Mamíferos/classificação , Camundongos , Modelos Genéticos , Especificidade de Órgãos , Análise de Componente Principal , RNA Polimerase II/metabolismo , Precursores de RNA/genética , Precursores de RNA/metabolismo , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Especificidade da Espécie , Vertebrados/classificação , Vertebrados/genéticaRESUMO
A universal challenge in genetic studies of autism spectrum disorders (ASDs) is determining whether a given DNA sequence alteration will manifest as disease. Among different population controls, we observed, for specific exons, an inverse correlation between exon expression level in brain and burden of rare missense mutations. For genes that harbor de novo mutations predicted to be deleterious, we found that specific critical exons were significantly enriched in individuals with ASD relative to their siblings without ASD (P < 1.13 × 10(-38); odds ratio (OR) = 2.40). Furthermore, our analysis of genes with high exonic expression in brain and low burden of rare mutations demonstrated enrichment for known ASD-associated genes (P < 3.40 × 10(-11); OR = 6.08) and ASD-relevant fragile-X protein targets (P < 2.91 × 10(-157); OR = 9.52). Our results suggest that brain-expressed exons under purifying selection should be prioritized in genotype-phenotype studies for ASD and related neurodevelopmental conditions.
Assuntos
Encéfalo/metabolismo , Transtornos Globais do Desenvolvimento Infantil/genética , Éxons/genética , Mutação de Sentido Incorreto/genética , Adolescente , Adulto , Encéfalo/patologia , Estudos de Casos e Controles , Pré-Escolar , Feminino , Redes Reguladoras de Genes , Predisposição Genética para Doença , Humanos , Lactente , Masculino , Fenótipo , RNA Mensageiro/genética , Reação em Cadeia da Polimerase em Tempo Real , Reação em Cadeia da Polimerase Via Transcriptase ReversaRESUMO
Previous investigations of the core gene regulatory circuitry that controls the pluripotency of embryonic stem (ES) cells have largely focused on the roles of transcription, chromatin and non-coding RNA regulators. Alternative splicing represents a widely acting mode of gene regulation, yet its role in regulating ES-cell pluripotency and differentiation is poorly understood. Here we identify the muscleblind-like RNA binding proteins, MBNL1 and MBNL2, as conserved and direct negative regulators of a large program of cassette exon alternative splicing events that are differentially regulated between ES cells and other cell types. Knockdown of MBNL proteins in differentiated cells causes switching to an ES-cell-like alternative splicing pattern for approximately half of these events, whereas overexpression of MBNL proteins in ES cells promotes differentiated-cell-like alternative splicing patterns. Among the MBNL-regulated events is an ES-cell-specific alternative splicing switch in the forkhead family transcription factor FOXP1 that controls pluripotency. Consistent with a central and negative regulatory role for MBNL proteins in pluripotency, their knockdown significantly enhances the expression of key pluripotency genes and the formation of induced pluripotent stem cells during somatic cell reprogramming.
Assuntos
Processamento Alternativo , Reprogramação Celular , Proteínas de Ligação a DNA/metabolismo , Células-Tronco Embrionárias/citologia , Células-Tronco Embrionárias/metabolismo , Proteínas de Ligação a RNA/metabolismo , Processamento Alternativo/genética , Motivos de Aminoácidos , Animais , Diferenciação Celular/genética , Linhagem Celular , Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/deficiência , Proteínas de Ligação a DNA/genética , Fibroblastos/citologia , Fibroblastos/metabolismo , Fatores de Transcrição Forkhead/metabolismo , Técnicas de Silenciamento de Genes , Células HEK293 , Células HeLa , Humanos , Células-Tronco Pluripotentes Induzidas/citologia , Células-Tronco Pluripotentes Induzidas/metabolismo , Cinética , Camundongos , Proteínas de Ligação a RNA/química , Proteínas de Ligação a RNA/genética , Proteínas Repressoras/metabolismoRESUMO
: Previous studies show that the same type of bond lengths and angles fit Gaussian distributions well with small standard deviations on high resolution protein structure data. The mean values of these Gaussian distributions have been widely used as ideal bond lengths and angles in bioinformatics. However, we are not aware of any research done to evaluate how accurately we can model protein structures with dihedral angles and ideal bond lengths and angles.Here, we introduce the protein structure idealization problem. We focus on the protein backbone structure idealization. We describe a fast O(nm/ε) dynamic programming algorithm to find an idealized protein backbone structure that is approximately optimal according to our scoring function. The scoring function evaluates not only the free energy, but also the similarity with the target structure. Thus, the idealized protein structures found by our algorithm are guaranteed to be protein-like and close to the target protein structure.We have implemented our protein structure idealization algorithm and idealized the high resolution protein structures with low sequence identities of the CULLPDB_PC30_RES1.6_R0.25 data set. We demonstrate that idealized backbone structures always exist with small changes and significantly better free energy. We also applied our algorithm to refine protein pseudo-structures determined in NMR experiments.