Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
Orphanet J Rare Dis ; 17(1): 436, 2022 12 14.
Artigo em Inglês | MEDLINE | ID: mdl-36517834

RESUMO

INTRODUCTION: Rare disease patient data are typically sensitive, present in multiple registries controlled by different custodians, and non-interoperable. Making these data Findable, Accessible, Interoperable, and Reusable (FAIR) for humans and machines at source enables federated discovery and analysis across data custodians. This facilitates accurate diagnosis, optimal clinical management, and personalised treatments. In Europe, twenty-four European Reference Networks (ERNs) work on rare disease registries in different clinical domains. The process and the implementation choices for making data FAIR ('FAIRification') differ among ERN registries. For example, registries use different software systems and are subject to different legal regulations. To support the ERNs in making informed decisions and to harmonise FAIRification, the FAIRification steward team was established to work as liaisons between ERNs and researchers from the European Joint Programme on Rare Diseases. RESULTS: The FAIRification steward team inventoried the FAIRification challenges of the ERN registries and proposed solutions collectively with involved stakeholders to address them. Ninety-eight FAIRification challenges from 24 ERNs' registries were collected and categorised into "training" (31), "community" (9), "modelling" (12), "implementation" (26), and "legal" (20). After curating and aggregating highly similar challenges, 41 unique FAIRification challenges remained. The two categories with the most challenges were "training" (15) and "implementation" (9), followed by "community" (7), and then "modelling" (5) and "legal" (5). To address all challenges, eleven types of solutions were proposed. Among them, the provision of guidelines and the organisation of training activities resolved the "training" challenges, which ranged from less-technical "coffee-rounds" to technical workshops, from informal FAIR Games to formal hackathons. Obtaining implementation support from technical experts was the solution type for tackling the "implementation" challenges. CONCLUSION: This work shows that a dedicated team of FAIR data stewards is an asset for harmonising the various processes of making data FAIR in a large organisation with multiple stakeholders. Additionally, multi-levelled training activities are required to accommodate the diverse needs of the ERNs. Finally, the lessons learned from the experience of the FAIRification steward team described in this paper may help to increase FAIR awareness and provide insights into FAIRification challenges and solutions of rare disease registries.


Assuntos
Doenças Raras , Software , Humanos , Europa (Continente) , Doenças Raras/terapia , Sistema de Registros
2.
Sci Data ; 9(1): 169, 2022 04 13.
Artigo em Inglês | MEDLINE | ID: mdl-35418585

RESUMO

The genomes of thousands of individuals are profiled within Dutch healthcare and research each year. However, this valuable genomic data, associated clinical data and consent are captured in different ways and stored across many systems and organizations. This makes it difficult to discover rare disease patients, reuse data for personalized medicine and establish research cohorts based on specific parameters. FAIR Genomes aims to enable NGS data reuse by developing metadata standards for the data descriptions needed to FAIRify genomic data while also addressing ELSI issues. We developed a semantic schema of essential data elements harmonized with international FAIR initiatives. The FAIR Genomes schema v1.1 contains 110 elements in 9 modules. It reuses common ontologies such as NCIT, DUO and EDAM, only introducing new terms when necessary. The schema is represented by a YAML file that can be transformed into templates for data entry software (EDC) and programmatic interfaces (JSON, RDF) to ease genomic data sharing in research and healthcare. The schema, documentation and MOLGENIS reference implementation are available at https://fairgenomes.org .


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Metadados , Atenção à Saúde , Genômica , Humanos , Software
3.
J Biomed Semantics ; 13(1): 9, 2022 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-35292119

RESUMO

BACKGROUND: The European Platform on Rare Disease Registration (EU RD Platform) aims to address the fragmentation of European rare disease (RD) patient data, scattered among hundreds of independent and non-coordinating registries, by establishing standards for integration and interoperability. The first practical output of this effort was a set of 16 Common Data Elements (CDEs) that should be implemented by all RD registries. Interoperability, however, requires decisions beyond data elements - including data models, formats, and semantics. Within the European Joint Programme on Rare Diseases (EJP RD), we aim to further the goals of the EU RD Platform by generating reusable RD semantic model templates that follow the FAIR Data Principles. RESULTS: Through a team-based iterative approach, we created semantically grounded models to represent each of the CDEs, using the SemanticScience Integrated Ontology as the core framework for representing the entities and their relationships. Within that framework, we mapped the concepts represented in the CDEs, and their possible values, into domain ontologies such as the Orphanet Rare Disease Ontology, Human Phenotype Ontology and National Cancer Institute Thesaurus. Finally, we created an exemplar, reusable ETL pipeline that we will be deploying over these non-coordinating data repositories to assist them in creating model-compliant FAIR data without requiring site-specific coding nor expertise in Linked Data or FAIR. CONCLUSIONS: Within the EJP RD project, we determined that creating reusable, expert-designed templates reduced or eliminated the requirement for our participating biomedical domain experts and rare disease data hosts to understand OWL semantics. This enabled them to publish highly expressive FAIR data using tools and approaches that were already familiar to them.


Assuntos
Elementos de Dados Comuns , Doenças Raras , Humanos , Sistema de Registros , Semântica , Fluxo de Trabalho
4.
Front Genet ; 13: 824510, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35299955

RESUMO

Background: In the molecular genetic diagnostics of Mendelian disorders, solutions are needed for the major challenge of dealing with the large number of variants of uncertain significance (VUSs) identified using next-generation sequencing (NGS). Recently, promising approaches using constraint metrics to calculate case excess scores (CE), etiological fractions (EF), and gnomAD-derived constraint scores have been reported that estimate the likelihood of rare variants in specific genes or regions that are pathogenic. Our objective is to study the usability of these constraint data into variant interpretation in a diagnostic setting, using our cardiomyopathy cohort. Methods and Results: Patients (N = 2002) referred for clinical genetic diagnostics underwent NGS testing of 55-61 genes associated with cardiomyopathies. Previously classified likely pathogenic (LP) and pathogenic (P) variants were used to validate the use of data from CE, EF, and gnomAD constraint analyses for (re)classification of associated variant types in specific cardiomyopathy subtype-related genes. The classifications corroborated in 94% (354/378) of cases. Next, we reclassified 23 unique VUSs to LP, increasing the diagnostic yield by 1.2%. In addition, 106 unique VUSs (5.3% of patients) were prioritized for co-segregation or functional analyses. Conclusions: Our analysis confirms that the use of constraint metrics data can improve variant interpretation, and we, therefore, recommend using constraint scores on other cohorts and disorders and its inclusion in variant interpretation protocols.

5.
Int J Mol Sci ; 22(22)2021 Nov 12.
Artigo em Inglês | MEDLINE | ID: mdl-34830104

RESUMO

Epidermolysis bullosa is a group of genetic skin conditions characterized by abnormal skin (and mucosal) fragility caused by pathogenic variants in various genes. The disease severity ranges from early childhood mortality in the most severe types to occasional acral blistering in the mildest types. The subtype and severity of EB is linked to the gene involved and the specific variants in that gene, which also determine its mode of inheritance. Current treatment is mainly focused on symptomatic relief such as wound care and blister prevention, because truly curative treatment options are still at the preclinical stage. Given the current level of understanding, the broad spectrum of genes and variants underlying EB makes it impossible to develop a single treatment strategy for all patients. It is likely that many different variant-specific treatment strategies will be needed to ultimately treat all patients. Antisense-oligonucleotide (ASO)-mediated exon skipping aims to counteract pathogenic sequence variants by restoring the open reading frame through the removal of the mutant exon from the pre-messenger RNA. This should lead to the restored production of the protein absent in the affected skin and, consequently, improvement of the phenotype. Several preclinical studies have demonstrated that exon skipping can restore protein production in vitro, in skin equivalents, and in skin grafts derived from EB-patient skin cells, indicating that ASO-mediated exon skipping could be a viable strategy as a topical or systemic treatment. The potential value of exon skipping for EB is supported by a study showing reduced phenotypic severity in patients who carry variants that result in natural exon skipping. In this article, we review the substantial progress made on exon skipping for EB in the past 15 years and highlight the opportunities and current challenges of this RNA-based therapy approach. In addition, we present a prioritization strategy for the development of exon skipping based on genomic information of all EB-involved genes.


Assuntos
Epidermólise Bolhosa , Éxons , Fibroblastos/imunologia , Mutação , Oligonucleotídeos Antissenso , Pele/imunologia , Epidermólise Bolhosa/genética , Epidermólise Bolhosa/imunologia , Epidermólise Bolhosa/terapia , Humanos , Oligonucleotídeos Antissenso/genética , Oligonucleotídeos Antissenso/uso terapêutico
6.
Front Pediatr ; 9: 600556, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34136434

RESUMO

Background: Genetic disorders are a substantial cause of infant morbidity and mortality and are frequently suspected in neonatal intensive care units. Non-specific clinical presentation or limitations to physical examination can result in a plethora of genetic testing techniques, without clear strategies on test ordering. Here, we review our 2-years experiences of rapid genetic testing of NICU patients in order to provide such recommendations. Methods: We retrospectively included all patients admitted to the NICU who received clinical genetic consultation and genetic testing in our University hospital. We documented reasons for referral for genetic consultation, presenting phenotypes, differential diagnoses, genetic testing requested and their outcomes, as well as the consequences of each (rapid) genetic diagnostic approach. We calculated diagnostic yield and turnaround times (TATs). Results: Of 171 included infants that received genetic consultation 140 underwent genetic testing. As a result of testing as first tier, 13/14 patients received a genetic diagnosis from QF-PCR; 14/115 from SNP-array; 12/89 from NGS testing, of whom 4/46 were diagnosed with a small gene panel and 8/43 with a large OMIM-morbid based gene panel. Subsequent secondary or tertiary analysis and/or additional testing resulted in five more diagnoses. TATs ranged from 1 day (QF-PCR) to a median of 14 for NGS and SNP-array testing, with increasing TAT in particular when many consecutive tests were performed. Incidental findings were detected in 5/140 tested patients (3.6%). Conclusion: We recommend implementing a broad NGS gene panel in combination with CNV calling as the first tier of genetic testing for NICU patients given the often unspecific phenotypes of ill infants and the high yield of this large panel.

7.
Sci Rep ; 11(1): 10606, 2021 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-34012022

RESUMO

Allele specific expression (ASE) concerns divergent expression quantity of alternative alleles and is measured by RNA sequencing. Multiple studies show that ASE plays a role in hereditary diseases by modulating penetrance or phenotype severity. However, genome diagnostics is based on DNA sequencing and therefore neglects gene expression regulation such as ASE. To take advantage of ASE in absence of RNA sequencing, it must be predicted using only DNA variation. We have constructed ASE models from BIOS (n = 3432) and GTEx (n = 369) that predict ASE using DNA features. These models are highly reproducible and comprise many different feature types, highlighting the complex regulation that underlies ASE. We applied the BIOS-trained model to population variants in three genes in which ASE plays a clinically relevant role: BRCA2, RET and NF1. This resulted in predicted ASE effects for 27 variants, of which 10 were known pathogenic variants. We demonstrated that ASE can be predicted from DNA features using machine learning. Future efforts may improve sensitivity and translate these models into a new type of genome diagnostic tool that prioritizes candidate pathogenic variants or regulators thereof for follow-up validation by RNA sequencing. All used code and machine learning models are available at GitHub and Zenodo.


Assuntos
Alelos , Regulação da Expressão Gênica , Aprendizado de Máquina , Análise de Sequência de DNA , Viés , Estudos de Viabilidade , Genoma Humano , Humanos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único/genética , Curva ROC
8.
Genome Med ; 12(1): 75, 2020 08 24.
Artigo em Inglês | MEDLINE | ID: mdl-32831124

RESUMO

Exome sequencing is now mainstream in clinical practice. However, identification of pathogenic Mendelian variants remains time-consuming, in part, because the limited accuracy of current computational prediction methods requires manual classification by experts. Here we introduce CAPICE, a new machine-learning-based method for prioritizing pathogenic variants, including SNVs and short InDels. CAPICE outperforms the best general (CADD, GAVIN) and consequence-type-specific (REVEL, ClinPred) computational prediction methods, for both rare and ultra-rare variants. CAPICE is easily added to diagnostic pipelines as pre-computed score file or command-line software, or using online MOLGENIS web service with API. Download CAPICE for free and open-source (LGPLv3) at https://github.com/molgenis/capice .


Assuntos
Biologia Computacional/métodos , Exoma , Variação Genética , Software , Frequência do Gene , Estudos de Associação Genética/métodos , Humanos , Mutação INDEL , Aprendizado de Máquina , Técnicas de Diagnóstico Molecular , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único , Curva ROC , Reprodutibilidade dos Testes
9.
Adv Genet (Hoboken) ; 1(1): e10023, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36619248

RESUMO

Despite an explosive growth of next-generation sequencing data, genome diagnostics only provides a molecular diagnosis to a minority of patients. Software tools that prioritize genes based on patient symptoms using known gene-disease associations may complement variant filtering and interpretation to increase chances of success. However, many of these tools cannot be used in practice because they are embedded within variant prioritization algorithms, or exist as remote services that cannot be relied upon or are unacceptable because of legal/ethical barriers. In addition, many tools are not designed for command-line usage, closed-source, abandoned, or unavailable. We present Variant Interpretation using Biomedical literature Evidence (VIBE), a tool to prioritize disease genes based on Human Phenotype Ontology codes. VIBE is a locally installed executable that ensures operational availability and is built upon DisGeNET-RDF, a comprehensive knowledge platform containing gene-disease associations mostly from literature and variant-disease associations mostly from curated source databases. VIBE's command-line interface and output are designed for easy incorporation into bioinformatic pipelines that annotate and prioritize variants for further clinical interpretation. We evaluate VIBE in a benchmark based on 305 patient cases alongside seven other tools. Our results demonstrate that VIBE offers consistent performance with few cases missed, but we also find high complementarity among all tested tools. VIBE is a powerful, free, open source and locally installable solution for prioritizing genes based on patient symptoms. Project source code, documentation, benchmark and executables are available at https://github.com/molgenis/vibe.

11.
Nat Commun ; 10(1): 2837, 2019 06 28.
Artigo em Inglês | MEDLINE | ID: mdl-31253775

RESUMO

The diagnostic yield of exome and genome sequencing remains low (8-70%), due to incomplete knowledge on the genes that cause disease. To improve this, we use RNA-seq data from 31,499 samples to predict which genes cause specific disease phenotypes, and develop GeneNetwork Assisted Diagnostic Optimization (GADO). We show that this unbiased method, which does not rely upon specific knowledge on individual genes, is effective in both identifying previously unknown disease gene associations, and flagging genes that have previously been incorrectly implicated in disease. GADO can be run on www.genenetwork.nl by supplying HPO-terms and a list of genes that contain candidate variants. Finally, applying GADO to a cohort of 61 patients for whom exome-sequencing analysis had not resulted in a genetic diagnosis, yields likely causative genes for ten cases.


Assuntos
Regulação da Expressão Gênica/fisiologia , Predisposição Genética para Doença , Análise de Sequência de RNA/métodos , Transcriptoma , Bases de Dados de Ácidos Nucleicos , Humanos , Modelos Genéticos , Análise de Componente Principal , Software , Interface Usuário-Computador
12.
Bioinformatics ; 35(6): 1076-1078, 2019 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-30165396

RESUMO

MOTIVATION: The volume and complexity of biological data increases rapidly. Many clinical professionals and biomedical researchers without a bioinformatics background are generating big '-omics' data, but do not always have the tools to manage, process or publicly share these data. RESULTS: Here we present MOLGENIS Research, an open-source web-application to collect, manage, analyze, visualize and share large and complex biomedical datasets, without the need for advanced bioinformatics skills. AVAILABILITY AND IMPLEMENTATION: MOLGENIS Research is freely available (open source software). It can be installed from source code (see http://github.com/molgenis), downloaded as a precompiled WAR file (for your own server), setup inside a Docker container (see http://molgenis.github.io), or requested as a Software-as-a-Service subscription. For a public demo instance and complete installation instructions see http://molgenis.org/research.


Assuntos
Biologia Computacional , Software , Algoritmos , Genoma , Genômica
13.
PLoS One ; 13(8): e0203078, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30161220

RESUMO

AIMS: Likely pathogenic/pathogenic variants in genes encoding desmosomal proteins play an important role in the pathophysiology of arrhythmogenic right ventricular cardiomyopathy (ARVC). However, for a substantial proportion of ARVC patients, the genetic substrate remains unknown. We hypothesized that plectin, a cytolinker protein encoded by the PLEC gene, could play a role in ARVC because it has been proposed to link the desmosomal protein desmoplakin to the cytoskeleton and therefore has a potential function in the desmosomal structure. METHODS: We screened PLEC in 359 ARVC patients and compared the frequency of rare coding PLEC variants (minor allele frequency [MAF] <0.001) between patients and controls. To assess the frequency of rare variants in the control population, we evaluated the rare coding variants (MAF <0.001) found in the European cohort of the Exome Aggregation Database. We further evaluated plectin localization by immunofluorescence in a subset of patients with and without a PLEC variant. RESULTS: Forty ARVC patients carried one or more rare PLEC variants (11%, 40/359). However, rare variants also seem to occur frequently in the control population (18%, 4754/26197 individuals). Nor did we find a difference in the prevalence of rare PLEC variants in ARVC patients with or without a desmosomal likely pathogenic/pathogenic variant (14% versus 8%, respectively). However, immunofluorescence analysis did show decreased plectin junctional localization in myocardial tissue from 5 ARVC patients with PLEC variants. CONCLUSIONS: Although PLEC has been hypothesized as a promising candidate gene for ARVC, our current study did not show an enrichment of rare PLEC variants in ARVC patients compared to controls and therefore does not support a major role for PLEC in this disorder. Although rare PLEC variants were associated with abnormal localization in cardiac tissue, the confluence of data does not support a role for plectin abnormalities in ARVC development.


Assuntos
Displasia Arritmogênica Ventricular Direita/genética , Displasia Arritmogênica Ventricular Direita/metabolismo , Plectina/genética , Plectina/metabolismo , Displasia Arritmogênica Ventricular Direita/patologia , Estudos de Coortes , Frequência do Gene , Variação Genética , Heterozigoto , Humanos , Miocárdio/metabolismo , Miocárdio/patologia , População Branca/genética
14.
Hum Mutat ; 39(3): 333-344, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-29266534

RESUMO

Microvillus inclusion disease (MVID) is a rare but fatal autosomal recessive congenital diarrheal disorder caused by MYO5B mutations. In 2013, we launched an open-access registry for MVID patients and their MYO5B mutations (www.mvid-central.org). Since then, additional unique MYO5B mutations have been identified in MVID patients, but also in non-MVID patients. Animal models have been generated that formally prove the causality between MYO5B and MVID. Importantly, mutations in two other genes, STXBP2 and STX3, have since been associated with variants of MVID, shedding new light on the pathogenesis of this congenital diarrheal disorder. Here, we review these additional genes and their mutations. Furthermore, we discuss recent data from cell studies that indicate that the three genes are functionally linked and, therefore, may constitute a common disease mechanism that unifies a subset of phenotypically linked congenital diarrheal disorders. We present new data based on patient material to support this. To congregate existing and future information on MVID geno-/phenotypes, we have updated and expanded the MVID registry to include all currently known MVID-associated gene mutations, their demonstrated or predicted functional consequences, and associated clinical information.


Assuntos
Diarreia/congênito , Diarreia/genética , Predisposição Genética para Doença , Proteínas Munc18/genética , Mutação/genética , Miosina Tipo V/genética , Proteínas Qa-SNARE/genética , Animais , Humanos
15.
PLoS One ; 12(2): e0171324, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28192439

RESUMO

In high-throughput molecular profiling studies, genotype labels can be wrongly assigned at various experimental steps; the resulting mislabeled samples seriously reduce the power to detect the genetic basis of phenotypic variation. We have developed an approach to detect potential mislabeling, recover the "ideal" genotype and identify "best-matched" labels for mislabeled samples. On average, we identified 4% of samples as mislabeled in eight published datasets, highlighting the necessity of applying a "data cleaning" step before standard data analysis.


Assuntos
Algoritmos , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas/genética , Animais , Simulação por Computador , Genômica/métodos , Genótipo , Humanos , Fenótipo , Reprodutibilidade dos Testes
16.
Genome Biol ; 18(1): 6, 2017 01 16.
Artigo em Inglês | MEDLINE | ID: mdl-28093075

RESUMO

We present Gene-Aware Variant INterpretation (GAVIN), a new method that accurately classifies variants for clinical diagnostic purposes. Classifications are based on gene-specific calibrations of allele frequencies from the ExAC database, likely variant impact using SnpEff, and estimated deleteriousness based on CADD scores for >3000 genes. In a benchmark on 18 clinical gene sets, we achieve a sensitivity of 91.4% and a specificity of 76.9%. This accuracy is unmatched by 12 other tools. We provide GAVIN as an online MOLGENIS service to annotate VCF files and as an open source executable for use in bioinformatic pipelines. It can be found at http://molgenis.org/gavin .


Assuntos
Biologia Computacional/métodos , Variação Genética , Software , Frequência do Gene , Estudos de Associação Genética/métodos , Estudo de Associação Genômica Ampla/métodos , Humanos
17.
Bioinformatics ; 32(14): 2176-83, 2016 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-27153686

RESUMO

MOTIVATION: While the size and number of biobanks, patient registries and other data collections are increasing, biomedical researchers still often need to pool data for statistical power, a task that requires time-intensive retrospective integration. RESULTS: To address this challenge, we developed MOLGENIS/connect, a semi-automatic system to find, match and pool data from different sources. The system shortlists relevant source attributes from thousands of candidates using ontology-based query expansion to overcome variations in terminology. Then it generates algorithms that transform source attributes to a common target DataSchema. These include unit conversion, categorical value matching and complex conversion patterns (e.g. calculation of BMI). In comparison to human-experts, MOLGENIS/connect was able to auto-generate 27% of the algorithms perfectly, with an additional 46% needing only minor editing, representing a reduction in the human effort and expertise needed to pool data. AVAILABILITY AND IMPLEMENTATION: Source code, binaries and documentation are available as open-source under LGPLv3 from http://github.com/molgenis/molgenis and www.molgenis.org/connect CONTACT: : m.a.swertz@rug.nl SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Bancos de Espécimes Biológicos , Biologia Computacional/métodos , Fenótipo , Software , Algoritmos , Ontologias Biológicas , Humanos
18.
Genome Med ; 7(1): 30, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25954321

RESUMO

BACKGROUND: RNA-sequencing (RNA-seq) is a powerful technique for the identification of genetic variants that affect gene-expression levels, either through expression quantitative trait locus (eQTL) mapping or through allele-specific expression (ASE) analysis. Given increasing numbers of RNA-seq samples in the public domain, we here studied to what extent eQTLs and ASE effects can be identified when using public RNA-seq data while deriving the genotypes from the RNA-sequencing reads themselves. METHODS: We downloaded the raw reads for all available human RNA-seq datasets. Using these reads we performed gene expression quantification. All samples were jointly normalized and subjected to a strict quality control. We also derived genotypes using the RNA-seq reads and used imputation to infer non-coding variants. This allowed us to perform eQTL mapping and ASE analyses jointly on all samples that passed quality control. Our results were validated using samples for which DNA-seq genotypes were available. RESULTS: 4,978 public human RNA-seq runs, representing many different tissues and cell-types, passed quality control. Even though these data originated from many different laboratories, samples reflecting the same cell type clustered together, suggesting that technical biases due to different sequencing protocols are limited. In a joint analysis on the 1,262 samples with high quality genotypes, we identified cis-eQTLs effects for 8,034 unique genes (at a false discovery rate ≤0.05). eQTL mapping on individual tissues revealed that a limited number of samples already suffice to identify tissue-specific eQTLs for known disease-associated genetic variants. Additionally, we observed strong ASE effects for 34 rare pathogenic variants, corroborating previously observed effects on the corresponding protein levels. CONCLUSIONS: By deriving and imputing genotypes from RNA-seq data, it is possible to identify both eQTLs and ASE effects. Given the exponential growth of the number of publicly available RNA-seq samples, we expect this approach will become especially relevant for studying the effects of tissue-specific and rare pathogenic genetic variants to aid clinical interpretation of exome and genome sequencing.

19.
Hum Mutat ; 36(7): 712-9, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-25871441

RESUMO

Next-generation sequencing in clinical diagnostics is providing valuable genomic variant data, which can be used to support healthcare decisions. In silico tools to predict pathogenicity are crucial to assess such variants and we have evaluated a new tool, Combined Annotation Dependent Depletion (CADD), and its classification of gene variants in Lynch syndrome by using a set of 2,210 DNA mismatch repair gene variants. These had already been classified by experts from InSiGHT's Variant Interpretation Committee. Overall, we found CADD scores do predict pathogenicity (Spearman's ρ = 0.595, P < 0.001). However, we discovered 31 major discrepancies between the InSiGHT classification and the CADD scores; these were explained in favor of the expert classification using population allele frequencies, cosegregation analyses, disease association studies, or a second-tier test. Of 751 variants that could not be clinically classified by InSiGHT, CADD indicated that 47 variants were worth further study to confirm their putative pathogenicity. We demonstrate CADD is valuable in prioritizing variants in clinically relevant genes for further assessment by expert classification teams.


Assuntos
Biologia Computacional , Reparo de Erro de Pareamento de DNA , Variação Genética , Modelos Moleculares , Neoplasias Colorretais Hereditárias sem Polipose/genética , Estudos de Associação Genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Software
20.
J Am Med Inform Assoc ; 22(1): 65-75, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25361575

RESUMO

OBJECTIVE: Pooling data across biobanks is necessary to increase statistical power, reveal more subtle associations, and synergize the value of data sources. However, searching for desired data elements among the thousands of available elements and harmonizing differences in terminology, data collection, and structure, is arduous and time consuming. MATERIALS AND METHODS: To speed up biobank data pooling we developed BiobankConnect, a system to semi-automatically match desired data elements to available elements by: (1) annotating the desired elements with ontology terms using BioPortal; (2) automatically expanding the query for these elements with synonyms and subclass information using OntoCAT; (3) automatically searching available elements for these expanded terms using Lucene lexical matching; and (4) shortlisting relevant matches sorted by matching score. RESULTS: We evaluated BiobankConnect using human curated matches from EU-BioSHaRE, searching for 32 desired data elements in 7461 available elements from six biobanks. We found 0.75 precision at rank 1 and 0.74 recall at rank 10 compared to a manually curated set of relevant matches. In addition, best matches chosen by BioSHaRE experts ranked first in 63.0% and in the top 10 in 98.4% of cases, indicating that our system has the potential to significantly reduce manual matching work. CONCLUSIONS: BiobankConnect provides an easy user interface to significantly speed up the biobank harmonization process. It may also prove useful for other forms of biomedical data integration. All the software can be downloaded as a MOLGENIS open source app from http://www.github.com/molgenis, with a demo available at http://www.biobankconnect.org.


Assuntos
Indexação e Redação de Resumos , Ontologias Biológicas , Biologia Computacional , Conjuntos de Dados como Assunto , Software , Humanos , Integração de Sistemas , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA