Búsqueda | OPS/OMS Uruguay

Anomaly detection in genomic catalogues using unsupervised multi-view autoencoders.

Ferré, Quentin; Chèneby, Jeanne; Puthier, Denis; Capponi, Cécile; Ballester, Benoît.

BMC Bioinformatics ; 22(1): 460, 2021 Sep 25.

Artículo en Inglés | MEDLINE | ID: mdl-34563116

RESUMEN

BACKGROUND: Accurate identification of Transcriptional Regulator binding locations is essential for analysis of genomic regions, including Cis Regulatory Elements. The customary NGS approaches, predominantly ChIP-Seq, can be obscured by data anomalies and biases which are difficult to detect without supervision. RESULTS: Here, we develop a method to leverage the usual combinations between many experimental series to mark such atypical peaks. We use deep learning to perform a lossy compression of the genomic regions' representations with multiview convolutions. Using artificial data, we show that our method correctly identifies groups of correlating series and evaluates CRE according to group completeness. It is then applied to the ReMap database's large volume of curated ChIP-seq data. We show that peaks lacking known biological correlators are singled out and less confirmed in real data. We propose normalization approaches useful in interpreting black-box models. CONCLUSION: Our approach detects peaks that are less corroborated than average. It can be extended to other similar problems, and can be interpreted to identify correlation groups. It is implemented in an open-source tool called atyPeak.

Asunto(s)

Secuenciación de Inmunoprecipitación de Cromatina , Genómica , Secuencias Reguladoras de Ácidos Nucleicos

OLOGRAM-MODL: mining enriched n-wise combinations of genomic features with Monte Carlo and dictionary learning.

Ferré, Quentin; Capponi, Cécile; Puthier, Denis.

NAR Genom Bioinform ; 3(4): lqab114, 2021 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-34988437

RESUMEN

Most epigenetic marks, such as Transcriptional Regulators or histone marks, are biological objects known to work together in n-wise complexes. A suitable way to infer such functional associations between them is to study the overlaps of the corresponding genomic regions. However, the problem of the statistical significance of n-wise overlaps of genomic features is seldom tackled, which prevent rigorous studies of n-wise interactions. We introduce OLOGRAM-MODL, which considers overlaps between n ≥ 2 sets of genomic regions, and computes their statistical mutual enrichment by Monte Carlo fitting of a Negative Binomial distribution, resulting in more resolutive P-values. An optional machine learning method is proposed to find complexes of interest, using a new itemset mining algorithm based on dictionary learning which is resistant to noise inherent to biological assays. The overall approach is implemented through an easy-to-use CLI interface for workflow integration, and a visual tree-based representation of the results suited for explicability. The viability of the method is experimentally studied using both artificial and biological data. This approach is accessible through the command line interface of the pygtftk toolkit, available on Bioconda and from https://github.com/dputhier/pygtftk.

ISYMOD: a knowledge warehouse for the identification, assembly and analysis of bacterial integrated systems.

Chabalier, Julie; Capponi, Cécile; Quentin, Yves; Fichant, Gwennaele.

Bioinformatics ; 21(7): 1246-56, 2005 Apr 01.

Artículo en Inglés | MEDLINE | ID: mdl-15531617

RESUMEN

MOTIVATION: Complex biological functions emerge from interactions between proteins in stable supra-molecular assemblies and/or through transitory contacts. Most of the time protein partners of the assemblies are composed of one or several domains which exhibit different biochemical functions. Thus the study of cellular process requires the identification of different functional units and their integration in an interaction network; such complexes are referred to as integrated systems. In order to exploit with optimum efficiency the increased release of data, automated bioinformatics strategies are needed to identify, reconstruct and model such systems. For that purpose, we have developed a knowledge warehouse dedicated to the representation and acquisition of bacterial integrated systems involved in the exchange of the bacterial cell with its environment. RESULTS: ISYMOD is a knowledge warehouse that consistently integrates in the same environment the data and the methods used for their acquisition. This is achieved through the construction of (1) a domain knowledge base (DKB) devoted to the storage of the knowledge about the systems, their functional specificities, their partners and how they are related and (2) a methodological knowledge base (MKB) which depicts the task layout used to identify and reconstruct functional integrated systems. Instantiation of the DKB is obtained by solving the tasks of the MKB, whereas some tasks need instances of the DKB to be solved. AROM, an object-based knowledge representation system, has been used to design the DKB, and its task manager, AROMTasks, for developing the MKB. In this study two integrated systems, ABC transporters and two component systems, both involved in adaptation processes of a bacterial cell to its biotope, have been used to evaluate the feasibility of the approach.

Asunto(s)

Transportadoras de Casetes de Unión a ATP/metabolismo , Algoritmos , Bases de Datos Genéticas , Escherichia coli/metabolismo , Perfilación de la Expresión Génica/métodos , Regulación Bacteriana de la Expresión Génica/fisiología , Almacenamiento y Recuperación de la Información/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Transportadoras de Casetes de Unión a ATP/genética , Inteligencia Artificial , Sistemas de Administración de Bases de Datos , Escherichia coli/genética , Modelos Biológicos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Integración de Sistemas

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA