Búsqueda | Portal de Búsqueda de la BVS

MOLGENIS/connect: a system for semi-automatic integration of heterogeneous phenotype data with applications in biobanks.

Pang, Chao; van Enckevort, David; de Haan, Mark; Kelpin, Fleur; Jetten, Jonathan; Hendriksen, Dennis; de Boer, Tommy; Charbon, Bart; Winder, Erwin; van der Velde, K Joeri; Doiron, Dany; Fortier, Isabel; Hillege, Hans; Swertz, Morris A.

Bioinformatics ; 32(14): 2176-83, 2016 07 15.

Artículo en Inglés | MEDLINE | ID: mdl-27153686

RESUMEN

MOTIVATION: While the size and number of biobanks, patient registries and other data collections are increasing, biomedical researchers still often need to pool data for statistical power, a task that requires time-intensive retrospective integration. RESULTS: To address this challenge, we developed MOLGENIS/connect, a semi-automatic system to find, match and pool data from different sources. The system shortlists relevant source attributes from thousands of candidates using ontology-based query expansion to overcome variations in terminology. Then it generates algorithms that transform source attributes to a common target DataSchema. These include unit conversion, categorical value matching and complex conversion patterns (e.g. calculation of BMI). In comparison to human-experts, MOLGENIS/connect was able to auto-generate 27% of the algorithms perfectly, with an additional 46% needing only minor editing, representing a reduction in the human effort and expertise needed to pool data. AVAILABILITY AND IMPLEMENTATION: Source code, binaries and documentation are available as open-source under LGPLv3 from http://github.com/molgenis/molgenis and www.molgenis.org/connect CONTACT: : m.a.swertz@rug.nl SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Bancos de Muestras Biológicas , Biología Computacional/métodos , Fenotipo , Programas Informáticos , Algoritmos , Ontologías Biológicas , Humanos

Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration.

Deelen, Patrick; Bonder, Marc Jan; van der Velde, K Joeri; Westra, Harm-Jan; Winder, Erwin; Hendriksen, Dennis; Franke, Lude; Swertz, Morris A.

BMC Res Notes ; 7: 901, 2014 Dec 11.

Artículo en Inglés | MEDLINE | ID: mdl-25495213

RESUMEN

BACKGROUND: To gain statistical power or to allow fine mapping, researchers typically want to pool data before meta-analyses or genotype imputation. However, the necessary harmonization of genetic datasets is currently error-prone because of many different file formats and lack of clarity about which genomic strand is used as reference. FINDINGS: Genotype Harmonizer (GH) is a command-line tool to harmonize genetic datasets by automatically solving issues concerning genomic strand and file format. GH solves the unknown strand issue by aligning ambiguous A/T and G/C SNPs to a specified reference, using linkage disequilibrium patterns without prior knowledge of the used strands. GH supports many common GWAS/NGS genotype formats including PLINK, binary PLINK, VCF, SHAPEIT2 & Oxford GEN. GH is implemented in Java and a large part of the functionality can also be used as Java 'Genotype-IO' API. All software is open source under license LGPLv3 and available from http://www.molgenis.org/systemsgenetics. CONCLUSIONS: GH can be used to harmonize genetic datasets across different file formats and can be easily integrated as a step in routine meta-analysis and imputation pipelines.

Asunto(s)

Biología Computacional/métodos , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Genómica/estadística & datos numéricos , Polimorfismo de Nucleótido Simple , Genotipo , Humanos , Internet , Desequilibrio de Ligamiento , Reproducibilidad de los Resultados

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA