|

simpiTB - a pipeline designed to extract meaningful information from whole genome sequencing data of Mycobacterium tuberculosis complex, allows to combine genomic, phylogenetic and clustering analyses in existing SITVIT databases.

Couvin, David; Stattner, Erick; Segretier, Wilfried; Cazenave, Damien; Rastogi, Nalin.

Infect Genet Evol ; 113: 105466, 2023 09.

Article En | MEDLINE | ID: mdl-37331497

Data obtained from new sequencing technologies are evolving rapidly, leading to the development of specific bioinformatic tools, pipelines and softwares. Several algorithms and tools are today available allowing a better identification and description of Mycobacterium tuberculosis complex (MTBC) isolates worldwide. Our approach consists in applying existing methods to analyze DNA sequencing data (from FASTA or FASTQ files), and tentatively extract meaningful information that would facilitate identification as well as a better understanding and management of MTBC isolates (taking into account whole genome sequencing and classical genotyping data). The aim of this study is to propose a pipeline analysis allowing to potentially simplify MTBC data analysis by providing different ways to interpret genomic or genotyping information based on existing tools. Furthermore, we propose a "reconciledTB" list making a link with results directly obtained from whole genome sequencing (WGS) data and results obtained from classical genotyping analysis (data inferred from SpoTyping and MIRUReader). Data visualization graphics and trees generated provide additional elements to better understand and confer associations among information overlap analyses. Additionally, comparison between data entered in an international genotyping database (SITVITEXTEND) and ensuing data obtained from the pipeline not only provide meaningful information, but further suggest that simpiTB could also be suitable for new data integration in specific TB genotyping databases.

Mycobacterium tuberculosis , Tuberculosis , Humans , Tuberculosis/microbiology , Phylogeny , Genomics , Whole Genome Sequencing/methods

KaruBioNet: a network and discussion group for a better collaboration and structuring of bioinformatics in Guadeloupe (French West Indies).

Couvin, David; Dereeper, Alexis; Meyer, Damien F; Noroy, Christophe; Gaete, Stanie; Bhakkan, Bernard; Poullet, Nausicaa; Gaspard, Sarra; Bezault, Etienne; Marcelino, Isabel; Pruneau, Ludovic; Segretier, Wilfried; Stattner, Erick; Cazenave, Damien; Garnier, Maëlle; Pot, Matthieu; Tressières, Benoît; Deloumeaux, Jacqueline; Breurec, Sébastien; Ferdinand, Séverine; Gonzalez-Rizzo, Silvina; Reynaud, Yann.

Bioinform Adv ; 2(1): vbac010, 2022.

Article En | MEDLINE | ID: mdl-36699379

Summary: Sequencing and other biological data are now more frequently available and at a lower price. Mutual tools and strategies are needed to analyze the huge amount of heterogeneous data generated by several research teams and devices. Bioinformatics represents a growing field in the scientific community globally. This multidisciplinary field provides a great amount of tools and methods that can be used to conduct scientific studies in a more strategic way. Coordinated actions and collaborations are needed to find more innovative and accurate methods for a better understanding of real-life data. A wide variety of organizations are contributing to KaruBioNet in Guadeloupe (French West Indies), a Caribbean archipelago. The purpose of this group is to foster collaboration and mutual aid among people from different disciplines using a 'one health' approach, for a better comprehension and surveillance of humans, plants or animals' health and diseases. The KaruBioNet network particularly aims to help researchers in their studies related to 'omics' data, but also more general aspects concerning biological data analysis. This transdisciplinary network is a platform for discussion, sharing, training and support between scientists interested in bioinformatics and related fields. Starting from a little archipelago in the Caribbean, we envision to facilitate exchange between other Caribbean partners in the future, knowing that the Caribbean is a region with non-negligible biodiversity which should be preserved and protected. Joining forces with other Caribbean countries or territories would strengthen scientific collaborative impact in the region. Information related to this network can be found at: http://www.pasteur-guadeloupe.fr/karubionet.html. Furthermore, a dedicated 'Galaxy KaruBioNet' platform is available at: http://calamar.univ-ag.fr/c3i/galaxy_karubionet.html. Availability and implementation Information about KaruBioNet is availabe at: http://www.pasteur-guadeloupe.fr/karubionet.html. Contact: dcouvin@pasteur-guadeloupe.fr. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

DISGROU: an algorithm for discontinuous subgroup discovery.

Eugenie, Reynald; Stattner, Erick.

PeerJ Comput Sci ; 7: e512, 2021.

Article En | MEDLINE | ID: mdl-33987462

In this paper, we focus on the problem of the search for subgroups in numerical data. This approach aims to identify the subsets of objects, called subgroups, which exhibit interesting characteristics compared to the average, according to a quality measure calculated on a target variable. In this article, we present DISGROU, a new approach that identifies subgroups whose attribute intervals may be discontinuous. Unlike the main algorithms in the field, the originality of our proposal lies in the way it breaks down the intervals of the attributes during the subgroup research phase. The basic assumption of our approach is that the range of attributes defining the groups can be disjoint to improve the quality of the identified subgroups. Indeed the traditional methods in the field perform the subgroup search process only over continuous intervals, which results in the identification of subgroups defined over wider intervals thus containing some irrelevant objects that degrade the quality function. In this way, another advantage of our approach is that it does not require a prior discretization of the attributes, since it works directly on the numerical attributes. The efficiency of our proposal is first demonstrated by comparing the results with two algorithms that are references in the field and then by applying to a case study.

Novel methods included in SpolLineages tool for fast and precise prediction of Mycobacterium tuberculosis complex spoligotype families.

Couvin, David; Segretier, Wilfried; Stattner, Erick; Rastogi, Nalin.

Database (Oxford) ; 20202020 12 15.

Article En | MEDLINE | ID: mdl-33320180

Bioinformatic tools are currently being developed to better understand the Mycobacterium tuberculosis complex (MTBC). Several approaches already exist for the identification of MTBC lineages using classical genotyping methods such as mycobacterial interspersed repetitive units-variable number of tandem DNA repeats and spoligotyping-based families. In the recently released SITVIT2 proprietary database of the Institut Pasteur de la Guadeloupe, a large number of spoligotype families were assigned by either manual curation/expertise or using an in-house algorithm. In this study, we present two complementary data-driven approaches allowing fast and precise family prediction from spoligotyping patterns. The first one is based on data transformation and the use of decision tree classifiers. In contrast, the second one searches for a set of simple rules using binary masks through a specifically designed evolutionary algorithm. The comparison with the three main approaches in the field highlighted the good performances of our contributions and the significant runtime gain. Finally, we propose the 'SpolLineages' software tool (https://github.com/dcouvin/SpolLineages), which implements these approaches for MTBC spoligotype families' identification.

Mycobacterium tuberculosis , Tuberculosis , Computational Biology , Genotype , Genotyping Techniques , Humans , Mycobacterium tuberculosis/genetics , Software , Tuberculosis/genetics