Pesquisa | BVS Doenças Infecciosas e Parasitárias

MOLGENIS/connect: a system for semi-automatic integration of heterogeneous phenotype data with applications in biobanks.

Pang, Chao; van Enckevort, David; de Haan, Mark; Kelpin, Fleur; Jetten, Jonathan; Hendriksen, Dennis; de Boer, Tommy; Charbon, Bart; Winder, Erwin; van der Velde, K Joeri; Doiron, Dany; Fortier, Isabel; Hillege, Hans; Swertz, Morris A.

Bioinformatics ; 32(14): 2176-83, 2016 07 15.

Artigo em Inglês | MEDLINE | ID: mdl-27153686

RESUMO

MOTIVATION: While the size and number of biobanks, patient registries and other data collections are increasing, biomedical researchers still often need to pool data for statistical power, a task that requires time-intensive retrospective integration. RESULTS: To address this challenge, we developed MOLGENIS/connect, a semi-automatic system to find, match and pool data from different sources. The system shortlists relevant source attributes from thousands of candidates using ontology-based query expansion to overcome variations in terminology. Then it generates algorithms that transform source attributes to a common target DataSchema. These include unit conversion, categorical value matching and complex conversion patterns (e.g. calculation of BMI). In comparison to human-experts, MOLGENIS/connect was able to auto-generate 27% of the algorithms perfectly, with an additional 46% needing only minor editing, representing a reduction in the human effort and expertise needed to pool data. AVAILABILITY AND IMPLEMENTATION: Source code, binaries and documentation are available as open-source under LGPLv3 from http://github.com/molgenis/molgenis and www.molgenis.org/connect CONTACT: : m.a.swertz@rug.nl SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Bancos de Espécimes Biológicos , Biologia Computacional/métodos , Fenótipo , Software , Algoritmos , Ontologias Biológicas , Humanos

Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration.

Deelen, Patrick; Bonder, Marc Jan; van der Velde, K Joeri; Westra, Harm-Jan; Winder, Erwin; Hendriksen, Dennis; Franke, Lude; Swertz, Morris A.

BMC Res Notes ; 7: 901, 2014 Dec 11.

Artigo em Inglês | MEDLINE | ID: mdl-25495213

RESUMO

BACKGROUND: To gain statistical power or to allow fine mapping, researchers typically want to pool data before meta-analyses or genotype imputation. However, the necessary harmonization of genetic datasets is currently error-prone because of many different file formats and lack of clarity about which genomic strand is used as reference. FINDINGS: Genotype Harmonizer (GH) is a command-line tool to harmonize genetic datasets by automatically solving issues concerning genomic strand and file format. GH solves the unknown strand issue by aligning ambiguous A/T and G/C SNPs to a specified reference, using linkage disequilibrium patterns without prior knowledge of the used strands. GH supports many common GWAS/NGS genotype formats including PLINK, binary PLINK, VCF, SHAPEIT2 & Oxford GEN. GH is implemented in Java and a large part of the functionality can also be used as Java 'Genotype-IO' API. All software is open source under license LGPLv3 and available from http://www.molgenis.org/systemsgenetics. CONCLUSIONS: GH can be used to harmonize genetic datasets across different file formats and can be easily integrated as a step in routine meta-analysis and imputation pipelines.

Assuntos

Biologia Computacional/métodos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Genômica/estatística & dados numéricos , Polimorfismo de Nucleotídeo Único , Genótipo , Humanos , Internet , Desequilíbrio de Ligação , Reprodutibilidade dos Testes

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA