Pesquisa | BVS Aleitamento Materno

Insane in the vembrane: filtering and transforming VCF/BCF files.

Hartmann, Till; Schröder, Christopher; Kuthe, Elias; Lähnemann, David; Köster, Johannes.

Bioinformatics ; 39(1)2023 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-36519840

RESUMO

SUMMARY: We present vembrane as a command line variant call format (VCF)/binary call format (BCF) filtering tool that consolidates and extends the filtering functionality of previous software to meet any imaginable filtering use case. Vembrane exposes the VCF/BCF file type specification and its inofficial extensions by the annotation tools VEP and SnpEff as Python data structures. vembrane filter enables filtration by Python expressions, requiring only basic knowledge of the Python programming language. vembrane table allows users to generate tables from subsets of annotations or functions thereof. Finally, it is fast, by using pysam and relying on lazy evaluation. AVAILABILITY AND IMPLEMENTATION: Source code and installation instructions are available at github.com/vembrane/vembrane (doi: 10.5281/zenodo.7003981). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Variação Genética , Transtornos Mentais , Humanos , Software , Linguagens de Programação

A detailed comparison of analysis processes for MCC-IMS data in disease classification-Automated methods can replace manual peak annotations.

Horsch, Salome; Kopczynski, Dominik; Kuthe, Elias; Baumbach, Jörg Ingo; Rahmann, Sven; Rahnenführer, Jörg.

PLoS One ; 12(9): e0184321, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28910313

RESUMO

MOTIVATION: Disease classification from molecular measurements typically requires an analysis pipeline from raw noisy measurements to final classification results. Multi capillary column-ion mobility spectrometry (MCC-IMS) is a promising technology for the detection of volatile organic compounds in the air of exhaled breath. From raw measurements, the peak regions representing the compounds have to be identified, quantified, and clustered across different experiments. Currently, several steps of this analysis process require manual intervention of human experts. Our goal is to identify a fully automatic pipeline that yields competitive disease classification results compared to an established but subjective and tedious semi-manual process. METHOD: We combine a large number of modern methods for peak detection, peak clustering, and multivariate classification into analysis pipelines for raw MCC-IMS data. We evaluate all combinations on three different real datasets in an unbiased cross-validation setting. We determine which specific algorithmic combinations lead to high AUC values in disease classifications across the different medical application scenarios. RESULTS: The best fully automated analysis process achieves even better classification results than the established manual process. The best algorithms for the three analysis steps are (i) SGLTR (Savitzky-Golay Laplace-operator filter thresholding regions) and LM (Local Maxima) for automated peak identification, (ii) EM clustering (Expectation Maximization) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) for the clustering step and (iii) RF (Random Forest) for multivariate classification. Thus, automated methods can replace the manual steps in the analysis process to enable an unbiased high throughput use of the technology.

Assuntos

Automação Laboratorial/métodos , Curadoria de Dados , Modelos Teóricos , Análise Espectral , Testes Respiratórios/instrumentação , Testes Respiratórios/métodos , Humanos , Análise Espectral/instrumentação , Análise Espectral/métodos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA