Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
1.
Nucleic Acids Res ; 52(D1): D1143-D1154, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-38183205

RESUMO

Machine Learning-based scoring and classification of genetic variants aids the assessment of clinical findings and is employed to prioritize variants in diverse genetic studies and analyses. Combined Annotation-Dependent Depletion (CADD) is one of the first methods for the genome-wide prioritization of variants across different molecular functions and has been continuously developed and improved since its original publication. Here, we present our most recent release, CADD v1.7. We explored and integrated new annotation features, among them state-of-the-art protein language model scores (Meta ESM-1v), regulatory variant effect predictions (from sequence-based convolutional neural networks) and sequence conservation scores (Zoonomia). We evaluated the new version on data sets derived from ClinVar, ExAC/gnomAD and 1000 Genomes variants. For coding effects, we tested CADD on 31 Deep Mutational Scanning (DMS) data sets from ProteinGym and, for regulatory effect prediction, we used saturation mutagenesis reporter assay data of promoter and enhancer sequences. The inclusion of new features further improved the overall performance of CADD. As with previous releases, all data sets, genome-wide CADD v1.7 scores, scripts for on-site scoring and an easy-to-use webserver are readily provided via https://cadd.bihealth.org/ or https://cadd.gs.washington.edu/ to the community.


Assuntos
Variação Genética , Genoma Humano , Aprendizado de Máquina , Software , Nucleotídeos , Humanos
2.
BMC Bioinformatics ; 23(Suppl 2): 154, 2022 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-36510125

RESUMO

BACKGROUND: Cis-regulatory regions (CRRs) are non-coding regions of the DNA that fine control the spatio-temporal pattern of transcription; they are involved in a wide range of pivotal processes such as the development of specific cell-lines/tissues and the dynamic cell response to physiological stimuli. Recent studies showed that genetic variants occurring in CRRs are strongly correlated with pathogenicity or deleteriousness. Considering the central role of CRRs in the regulation of physiological and pathological conditions, the correct identification of CRRs and of their tissue-specific activity status through Machine Learning methods plays a major role in dissecting the impact of genetic variants on human diseases. Unfortunately, the problem is still open, though some promising results have been already reported by (deep) machine-learning based methods that predict active promoters and enhancers in specific tissues or cell lines by encoding epigenetic or spectral features directly extracted from DNA sequences. RESULTS: We present the experiments we performed to compare two Deep Neural Networks, a Feed-Forward Neural Network model working on epigenomic features, and a Convolutional Neural Network model working only on genomic sequence, targeted to the identification of enhancer- and promoter-activity in specific cell lines. While performing experiments to understand how the experimental setup influences the prediction performance of the methods, we particularly focused on (1) automatic model selection performed by Bayesian optimization and (2) exploring different data rebalancing setups for reducing negative unbalancing effects. CONCLUSIONS: Results show that (1) automatic model selection by Bayesian optimization improves the quality of the learner; (2) data rebalancing considerably impacts the prediction performance of the models; test set rebalancing may provide over-optimistic results, and should therefore be cautiously applied; (3) despite working on sequence data, convolutional models obtain performance close to those of feed forward models working on epigenomic information, which suggests that also sequence data carries informative content for CRR-activity prediction. We therefore suggest combining both models/data types in future works.


Assuntos
Aprendizado Profundo , Humanos , Teorema de Bayes , Sequências Reguladoras de Ácido Nucleico , Redes Neurais de Computação , Aprendizado de Máquina
3.
Am J Hum Genet ; 105(3): 631-639, 2019 09 05.
Artigo em Inglês | MEDLINE | ID: mdl-31353024

RESUMO

Notch signaling is an established developmental pathway for brain morphogenesis. Given that Delta-like 1 (DLL1) is a ligand for the Notch receptor and that a few individuals with developmental delay, intellectual disability, and brain malformations have microdeletions encompassing DLL1, we hypothesized that insufficiency of DLL1 causes a human neurodevelopmental disorder. We performed exome sequencing in individuals with neurodevelopmental disorders. The cohort was identified using known Matchmaker Exchange nodes such as GeneMatcher. This method identified 15 individuals from 12 unrelated families with heterozygous pathogenic DLL1 variants (nonsense, missense, splice site, and one whole gene deletion). The most common features in our cohort were intellectual disability, autism spectrum disorder, seizures, variable brain malformations, muscular hypotonia, and scoliosis. We did not identify an obvious genotype-phenotype correlation. Analysis of one splice site variant showed an in-frame insertion of 12 bp. In conclusion, heterozygous DLL1 pathogenic variants cause a variable neurodevelopmental phenotype and multi-systemic features. The clinical and molecular data support haploinsufficiency as a mechanism for the pathogenesis of this DLL1-related disorder and affirm the importance of DLL1 in human brain development.


Assuntos
Proteínas de Ligação ao Cálcio/genética , Haploinsuficiência , Proteínas de Membrana/genética , Transtornos do Neurodesenvolvimento/genética , Estudos de Coortes , Feminino , Humanos , Ligantes , Masculino , Linhagem , Sequenciamento do Exoma
4.
Hum Mutat ; 40(9): 1280-1291, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31106481

RESUMO

The integrative analysis of high-throughput reporter assays, machine learning, and profiles of epigenomic chromatin state in a broad array of cells and tissues has the potential to significantly improve our understanding of noncoding regulatory element function and its contribution to human disease. Here, we report results from the CAGI 5 regulation saturation challenge where participants were asked to predict the impact of nucleotide substitution at every base pair within five disease-associated human enhancers and nine disease-associated promoters. A library of mutations covering all bases was generated by saturation mutagenesis and altered activity was assessed in a massively parallel reporter assay (MPRA) in relevant cell lines. Reporter expression was measured relative to plasmid DNA to determine the impact of variants. The challenge was to predict the functional effects of variants on reporter expression. Comparative analysis of the full range of submitted prediction results identifies the most successful models of transcription factor binding sites, machine learning algorithms, and ways to choose among or incorporate diverse datatypes and cell-types for training computational models. These results have the potential to improve the design of future studies on more diverse sets of regulatory elements and aid the interpretation of disease-associated genetic variation.


Assuntos
DNA/química , Epigenômica/métodos , Mutação Puntual , Sítios de Ligação , Linhagem Celular , Cromatina/genética , DNA/metabolismo , Elementos Facilitadores Genéticos , Predisposição Genética para Doença , Humanos , Aprendizado de Máquina , Regiões Promotoras Genéticas , Fatores de Transcrição/metabolismo
5.
Am J Hum Genet ; 99(3): 595-606, 2016 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-27569544

RESUMO

The interpretation of non-coding variants still constitutes a major challenge in the application of whole-genome sequencing in Mendelian disease, especially for single-nucleotide and other small non-coding variants. Here we present Genomiser, an analysis framework that is able not only to score the relevance of variation in the non-coding genome, but also to associate regulatory variants to specific Mendelian diseases. Genomiser scores variants through either existing methods such as CADD or a bespoke machine learning method and combines these with allele frequency, regulatory sequences, chromosomal topological domains, and phenotypic relevance to discover variants associated to specific Mendelian disorders. Overall, Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes, allowing effective detection and discovery of regulatory variants in Mendelian disease.


Assuntos
Algoritmos , Doenças Genéticas Inatas/genética , Genoma Humano/genética , Mutação/genética , Frequência do Gene , Estudo de Associação Genômica Ampla , Humanos , Aprendizado de Máquina , Fases de Leitura Aberta/genética , Fenótipo , Mutação Puntual/genética
6.
J Transl Med ; 16(1): 23, 2018 02 06.
Artigo em Inglês | MEDLINE | ID: mdl-29409514

RESUMO

BACKGROUND: Cancer vaccines can effectively establish clinically relevant tumor immunity. Novel sequencing approaches rapidly identify the mutational fingerprint of tumors, thus allowing to generate personalized tumor vaccines within a few weeks from diagnosis. Here, we report the case of a 62-year-old patient receiving a four-peptide-vaccine targeting the two sole mutations of his pancreatic tumor, identified via exome sequencing. METHODS: Vaccination started during chemotherapy in second complete remission and continued monthly thereafter. We tracked IFN-γ+ T cell responses against vaccine peptides in peripheral blood after 12, 17 and 34 vaccinations by analyzing T-cell receptor (TCR) repertoire diversity and epitope-binding regions of peptide-reactive T-cell lines and clones. By restricting analysis to sorted IFN-γ-producing T cells we could assure epitope-specificity, functionality, and TH1 polarization. RESULTS: A peptide-specific T-cell response against three of the four vaccine peptides could be detected sequentially. Molecular TCR analysis revealed a broad vaccine-reactive TCR repertoire with clones of discernible specificity. Four identical or convergent TCR sequences could be identified at more than one time-point, indicating timely persistence of vaccine-reactive T cells. One dominant TCR expressing a dual TCRVα chain could be found in three T-cell clones. The observed T-cell responses possibly contributed to clinical outcome: The patient is alive 6 years after initial diagnosis and in complete remission for 4 years now. CONCLUSIONS: Therapeutic vaccination with a neoantigen-derived four-peptide vaccine resulted in a diverse and long-lasting immune response against these targets which was associated with prolonged clinical remission. These data warrant confirmation in a larger proof-of concept clinical trial.


Assuntos
Linfócitos T CD4-Positivos/imunologia , Vacinas Anticâncer/imunologia , Carcinoma Ductal Pancreático/terapia , Epitopos/imunologia , Monitorização Imunológica , Neoplasias Pancreáticas/terapia , Receptores de Antígenos de Linfócitos T alfa-beta/genética , Vacinas de Subunidades Antigênicas/imunologia , Sequência de Aminoácidos , Carcinoma Ductal Pancreático/sangue , Carcinoma Ductal Pancreático/imunologia , Carcinoma Ductal Pancreático/secundário , Humanos , Masculino , Pessoa de Meia-Idade , Neoplasias Pancreáticas/sangue , Neoplasias Pancreáticas/imunologia , Neoplasias Pancreáticas/secundário , Peptídeos/química , Peptídeos/imunologia , Resultado do Tratamento , Vacinação
7.
BMC Bioinformatics ; 18(1): 449, 2017 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-29025394

RESUMO

BACKGROUND: The prediction of human gene-abnormal phenotype associations is a fundamental step toward the discovery of novel genes associated with human disorders, especially when no genes are known to be associated with a specific disease. In this context the Human Phenotype Ontology (HPO) provides a standard categorization of the abnormalities associated with human diseases. While the problem of the prediction of gene-disease associations has been widely investigated, the related problem of gene-phenotypic feature (i.e., HPO term) associations has been largely overlooked, even if for most human genes no HPO term associations are known and despite the increasing application of the HPO to relevant medical problems. Moreover most of the methods proposed in literature are not able to capture the hierarchical relationships between HPO terms, thus resulting in inconsistent and relatively inaccurate predictions. RESULTS: We present two hierarchical ensemble methods that we formally prove to provide biologically consistent predictions according to the hierarchical structure of the HPO. The modular structure of the proposed methods, that consists in a "flat" learning first step and a hierarchical combination of the predictions in the second step, allows the predictions of virtually any flat learning method to be enhanced. The experimental results show that hierarchical ensemble methods are able to predict novel associations between genes and abnormal phenotypes with results that are competitive with state-of-the-art algorithms and with a significant reduction of the computational complexity. CONCLUSIONS: Hierarchical ensembles are efficient computational methods that guarantee biologically meaningful predictions that obey the true path rule, and can be used as a tool to improve and make consistent the HPO terms predictions starting from virtually any flat learning method. The implementation of the proposed methods is available as an R package from the CRAN repository.


Assuntos
Algoritmos , Ontologias Biológicas , Área Sob a Curva , Estudos de Associação Genética , Humanos , Anotação de Sequência Molecular , Fenótipo , Curva ROC
8.
Hum Mutat ; 37(4): 359-63, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26820108

RESUMO

Strømme syndrome was first described by Strømme et al. (1993) in siblings presenting with "apple peel" type intestinal atresia, ocular anomalies and microcephaly. The etiology remains unknown to date. We describe the long-term clinical follow-up data for the original pair of siblings as well as two previously unreported siblings with a severe phenotype overlapping that of the Strømme syndrome including fetal autopsy results. Using family-based whole-exome sequencing, we identified truncating mutations in the centrosome gene CENPF in the two nonconsanguineous Caucasian sibling pairs. Compound heterozygous inheritance was confirmed in both families. Recently, mutations in this gene were shown to cause a fetal lethal phenotype, the phenotype and functional data being compatible with a human ciliopathy [Waters et al., 2015]. We show for the first time that Strømme syndrome is an autosomal-recessive disease caused by mutations in CENPF that can result in a wide phenotypic spectrum.


Assuntos
Proteínas Cromossômicas não Histona/genética , Ciliopatias/diagnóstico , Ciliopatias/genética , Anormalidades do Olho/diagnóstico , Anormalidades do Olho/genética , Atresia Intestinal/diagnóstico , Atresia Intestinal/genética , Microcefalia/diagnóstico , Microcefalia/genética , Proteínas dos Microfilamentos/genética , Mutação , Adulto , Análise Mutacional de DNA , Fácies , Feminino , Seguimentos , Genes Recessivos , Estudos de Associação Genética , Heterozigoto , Humanos , Masculino , Linhagem , Fenótipo , Irmãos , Adulto Jovem
9.
BMC Cancer ; 15: 773, 2015 Oct 24.
Artigo em Inglês | MEDLINE | ID: mdl-26498442

RESUMO

BACKGROUND: Splenic marginal zone lymphoma (SMZL) is an indolent B-cell non-Hodgkin lymphoma and represents the most common primary malignancy of the spleen. Its precise molecular pathogenesis is still unknown and specific molecular markers for diagnosis or possible targets for causal therapies are lacking. METHODS: We performed whole exome sequencing (WES) and copy number analysis from laser-microdissected tumor cells of two primary SMZL discovery cases. Selected somatic single nucleotide variants (SNVs) were analyzed using pyrosequencing and Sanger sequencing in an independent validation cohort. RESULTS: Overall, 25 nonsynonymous somatic SNVs were identified, including known mutations in the NOTCH2 and MYD88 genes. Twenty-three of the mutations have not been associated with SMZL before. Many of these seem to be subclonal. Screening of 24 additional SMZL for mutations at the same positions found mutated in the WES approach revealed no recurrence of mutations for ZNF608 and PDE10A, whereas the MYD88 L265P missense mutation was identified in 15% of cases. An analysis of the NOTCH2 PEST domain and the whole coding region of the transcription factor SMYD1 in eight cases identified no additional case with a NOTCH2 mutation, but two additional cases with SMYD1 alterations. CONCLUSIONS: In this first WES approach from microdissected SMZL tissue we confirmed known mutations and discovered new somatic variants. Recurrence of MYD88 mutations in SMZL was validated, but NOTCH2 PEST domain mutations were relatively rare (10 % of cases). Recurrent mutations in the transcription factor SMYD1 have not been described in SMZL before and warrant further investigation.


Assuntos
Exoma/genética , Linfoma de Zona Marginal Tipo Células B/genética , Mutação , Proteínas de Neoplasias/genética , Neoplasias Esplênicas/genética , Biomarcadores Tumorais/genética , Análise Mutacional de DNA , Feminino , Humanos , Masculino , Análise em Microsséries , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único , Fatores de Transcrição/genética
10.
bioRxiv ; 2023 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-36945371

RESUMO

The human genome contains millions of candidate cis-regulatory elements (CREs) with cell-type-specific activities that shape both health and myriad disease states. However, we lack a functional understanding of the sequence features that control the activity and cell-type-specific features of these CREs. Here, we used lentivirus-based massively parallel reporter assays (lentiMPRAs) to test the regulatory activity of over 680,000 sequences, representing a nearly comprehensive set of all annotated CREs among three cell types (HepG2, K562, and WTC11), finding 41.7% to be functional. By testing sequences in both orientations, we find promoters to have significant strand orientation effects. We also observe that their 200 nucleotide cores function as non-cell-type-specific 'on switches' providing similar expression levels to their associated gene. In contrast, enhancers have weaker orientation effects, but increased tissue-specific characteristics. Utilizing our lentiMPRA data, we develop sequence-based models to predict CRE function with high accuracy and delineate regulatory motifs. Testing an additional lentiMPRA library encompassing 60,000 CREs in all three cell types, we further identified factors that determine cell-type specificity. Collectively, our work provides an exhaustive catalog of functional CREs in three widely used cell lines, and showcases how large-scale functional measurements can be used to dissect regulatory grammar.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA