Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Bioinformatics ; 35(6): 1076-1078, 2019 03 15.
Artículo en Inglés | MEDLINE | ID: mdl-30165396

RESUMEN

MOTIVATION: The volume and complexity of biological data increases rapidly. Many clinical professionals and biomedical researchers without a bioinformatics background are generating big '-omics' data, but do not always have the tools to manage, process or publicly share these data. RESULTS: Here we present MOLGENIS Research, an open-source web-application to collect, manage, analyze, visualize and share large and complex biomedical datasets, without the need for advanced bioinformatics skills. AVAILABILITY AND IMPLEMENTATION: MOLGENIS Research is freely available (open source software). It can be installed from source code (see http://github.com/molgenis), downloaded as a precompiled WAR file (for your own server), setup inside a Docker container (see http://molgenis.github.io), or requested as a Software-as-a-Service subscription. For a public demo instance and complete installation instructions see http://molgenis.org/research.


Asunto(s)
Biología Computacional , Programas Informáticos , Algoritmos , Genoma , Genómica
2.
Hum Mutat ; 40(12): 2230-2238, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31433103

RESUMEN

Each year diagnostic laboratories in the Netherlands profile thousands of individuals for heritable disease using next-generation sequencing (NGS). This requires pathogenicity classification of millions of DNA variants on the standard 5-tier scale. To reduce time spent on data interpretation and increase data quality and reliability, the nine Dutch labs decided to publicly share their classifications. Variant classifications of nearly 100,000 unique variants were catalogued and compared in a centralized MOLGENIS database. Variants classified by more than one center were labeled as "consensus" when classifications agreed, and shared internationally with LOVD and ClinVar. When classifications opposed (LB/B vs. LP/P), they were labeled "conflicting", while other nonconsensus observations were labeled "no consensus". We assessed our classifications using the InterVar software to compare to ACMG 2015 guidelines, showing 99.7% overall consistency with only 0.3% discrepancies. Differences in classifications between Dutch labs or between Dutch labs and ACMG were mainly present in genes with low penetrance or for late onset disorders and highlight limitations of the current 5-tier classification system. The data sharing boosted the quality of DNA diagnostics in Dutch labs, an initiative we hope will be followed internationally. Recently, a positive match with a case from outside our consortium resulted in a more definite disease diagnosis.


Asunto(s)
Enfermedades Genéticas Congénitas/diagnóstico , Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Difusión de la Información/métodos , Exactitud de los Datos , Bases de Datos Genéticas , Enfermedades Genéticas Congénitas/genética , Guías como Asunto , Humanos , Laboratorios , Países Bajos , Análisis de Secuencia de ADN
3.
Bioinformatics ; 33(22): 3627-3634, 2017 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-29036577

RESUMEN

MOTIVATION: Biobanks are indispensable for large-scale genetic/epidemiological studies, yet it remains difficult for researchers to determine which biobanks contain data matching their research questions. RESULTS: To overcome this, we developed a new matching algorithm that identifies pairs of related data elements between biobanks and research variables with high precision and recall. It integrates lexical comparison, Unified Medical Language System ontology tagging and semantic query expansion. The result is BiobankUniverse, a fast matchmaking service for biobanks and researchers. Biobankers upload their data elements and researchers their desired study variables, BiobankUniverse automatically shortlists matching attributes between them. Users can quickly explore matching potential and search for biobanks/data elements matching their research. They can also curate matches and define personalized data-universes. AVAILABILITY AND IMPLEMENTATION: BiobankUniverse is available at http://biobankuniverse.com or can be downloaded as part of the open source MOLGENIS suite at http://github.com/molgenis/molgenis. CONTACT: m.a.swertz@rug.nl. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Factuales , Programas Informáticos , Algoritmos
4.
Bioinformatics ; 32(14): 2176-83, 2016 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-27153686

RESUMEN

MOTIVATION: While the size and number of biobanks, patient registries and other data collections are increasing, biomedical researchers still often need to pool data for statistical power, a task that requires time-intensive retrospective integration. RESULTS: To address this challenge, we developed MOLGENIS/connect, a semi-automatic system to find, match and pool data from different sources. The system shortlists relevant source attributes from thousands of candidates using ontology-based query expansion to overcome variations in terminology. Then it generates algorithms that transform source attributes to a common target DataSchema. These include unit conversion, categorical value matching and complex conversion patterns (e.g. calculation of BMI). In comparison to human-experts, MOLGENIS/connect was able to auto-generate 27% of the algorithms perfectly, with an additional 46% needing only minor editing, representing a reduction in the human effort and expertise needed to pool data. AVAILABILITY AND IMPLEMENTATION: Source code, binaries and documentation are available as open-source under LGPLv3 from http://github.com/molgenis/molgenis and www.molgenis.org/connect CONTACT: : m.a.swertz@rug.nl SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Bancos de Muestras Biológicas , Biología Computacional/métodos , Fenotipo , Programas Informáticos , Algoritmos , Ontologías Biológicas , Humanos
5.
Hum Mutat ; 36(4): 403-10, 2015 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-25676813

RESUMEN

Arrhythmogenic cardiomyopathy (ACM) is an inherited cardiac disease characterized by myocardial atrophy, fibro-fatty replacement, and a high risk of ventricular arrhythmias that lead to sudden death. In 2009, genetic data from 57 publications were collected in the arrhythmogenic right ventricular dysplasia/cardiomyopathy (ARVD/C) Genetic Variants Database (freeware available at http://www.arvcdatabase.info), which comprised 481 variants in eight ACM-associated genes. In recent years, deep genetic sequencing has increased our knowledge of the genetics of ACM, revealing a large spectrum of nucleotide variations for which pathogenicity needs to be assessed. As of April 20, 2014, we have updated the ARVD/C database into the ARVD/C database to contain more than 1,400 variants in 12 ACM-related genes (PKP2, DSP, DSC2, DSG2, JUP, TGFB3, TMEM43, LMNA, DES, TTN, PLN, CTNNA3) as reported in more than 160 references. Of these, only 411 nucleotide variants have been reported as pathogenic, whereas the significance of the other approximately 1,000 variants is still unknown. This comprehensive collection of ACM genetic data represents a valuable source of information on the spectrum of ACM-associated genes and aims to facilitate the interpretation of genetic data and genetic counseling.


Asunto(s)
Displasia Ventricular Derecha Arritmogénica/genética , Bases de Datos Genéticas , Variación Genética , Desmosomas/genética , Estudios de Asociación Genética , Humanos , Sistema de Registros
6.
Int J Neonatal Screen ; 10(1)2024 Mar 07.
Artículo en Inglés | MEDLINE | ID: mdl-38535124

RESUMEN

In this study, we compare next-generation sequencing (NGS) approaches (targeted panel (tNGS), whole exome sequencing (WES), and whole genome sequencing (WGS)) for application in newborn screening (NBS). DNA was extracted from dried blood spots (DBS) from 50 patients with genetically confirmed inherited metabolic disorders (IMDs) and 50 control samples. One hundred IMD-related genes were analyzed. Two data-filtering strategies were applied: one to detect only (likely) pathogenic ((L)P) variants, and one to detect (L)P variants in combination with variants of unknown significance (VUS). The variants were filtered and interpreted, defining true/false positives (TP/FP) and true/false negatives (TN/FN). The variant filtering strategies were assessed in a background cohort (BC) of 4833 individuals. Reliable results were obtained within 5 days. TP results (47 patient samples) for tNGS, WES, and WGS results were 33, 31, and 30, respectively, using the (L)P filtering, and 40, 40, and 38, respectively, when including VUS. FN results were 11, 13, and 14, respectively, excluding VUS, and 4, 4, and 6, when including VUS. The remaining FN were mainly samples with a homozygous VUS. All controls were TN. Three BC individuals showed a homozygous (L)P variant, all related to a variable, mild phenotype. The use of NGS-based workflows in NBS seems promising, although more knowledge of data handling, automated variant interpretation, and costs is needed before implementation.

7.
Genome Med ; 12(1): 75, 2020 08 24.
Artículo en Inglés | MEDLINE | ID: mdl-32831124

RESUMEN

Exome sequencing is now mainstream in clinical practice. However, identification of pathogenic Mendelian variants remains time-consuming, in part, because the limited accuracy of current computational prediction methods requires manual classification by experts. Here we introduce CAPICE, a new machine-learning-based method for prioritizing pathogenic variants, including SNVs and short InDels. CAPICE outperforms the best general (CADD, GAVIN) and consequence-type-specific (REVEL, ClinPred) computational prediction methods, for both rare and ultra-rare variants. CAPICE is easily added to diagnostic pipelines as pre-computed score file or command-line software, or using online MOLGENIS web service with API. Download CAPICE for free and open-source (LGPLv3) at https://github.com/molgenis/capice .


Asunto(s)
Biología Computacional/métodos , Exoma , Variación Genética , Programas Informáticos , Frecuencia de los Genes , Estudios de Asociación Genética/métodos , Humanos , Mutación INDEL , Aprendizaje Automático , Técnicas de Diagnóstico Molecular , Anotación de Secuencia Molecular , Polimorfismo de Nucleótido Simple , Curva ROC , Reproducibilidad de los Resultados
8.
Artículo en Inglés | MEDLINE | ID: mdl-26385205

RESUMEN

There is an urgent need to standardize the semantics of biomedical data values, such as phenotypes, to enable comparative and integrative analyses. However, it is unlikely that all studies will use the same data collection protocols. As a result, retrospective standardization is often required, which involves matching of original (unstructured or locally coded) data to widely used coding or ontology systems such as SNOMED CT (clinical terms), ICD-10 (International Classification of Disease) and HPO (Human Phenotype Ontology). This data curation process is usually a time-consuming process performed by a human expert. To help mechanize this process, we have developed SORTA, a computer-aided system for rapidly encoding free text or locally coded values to a formal coding system or ontology. SORTA matches original data values (uploaded in semicolon delimited format) to a target coding system (uploaded in Excel spreadsheet, OWL ontology web language or OBO open biomedical ontologies format). It then semi- automatically shortlists candidate codes for each data value using Lucene and n-gram based matching algorithms, and can also learn from matches chosen by human experts. We evaluated SORTA's applicability in two use cases. For the LifeLines biobank, we used SORTA to recode 90 000 free text values (including 5211 unique values) about physical exercise to MET (Metabolic Equivalent of Task) codes. For the CINEAS clinical symptom coding system, we used SORTA to map to HPO, enriching HPO when necessary (315 terms matched so far). Out of the shortlists at rank 1, we found a precision/recall of 0.97/0.98 in LifeLines and of 0.58/0.45 in CINEAS. More importantly, users found the tool both a major time saver and a quality improvement because SORTA reduced the chances of human mistakes. Thus, SORTA can dramatically ease data (re)coding tasks and we believe it will prove useful for many more projects. Database URL: http://molgenis.org/sorta or as an open source download from http://www.molgenis.org/wiki/SORTA.


Asunto(s)
Ontologías Biológicas , Curaduría de Datos/métodos , Bases de Datos Factuales , Bases del Conocimiento , Programas Informáticos , Animales , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA