A unified framework for the integration of multiple hierarchical clusterings or networks from multi-source data.

Hulot, Audrey; Laloë, Denis; Jaffrézic, Florence

Hulot, Audrey; Laloë, Denis; Jaffrézic, Florence.

Afiliação

Hulot A; Université Paris-Saclay, INRAE, AgroParisTech, GABI , 78350, Jouy-en-Josas, France. audrey.hulot@outlook.fr.
Laloë D; Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA-Paris , 75005, Paris, France. audrey.hulot@outlook.fr.
Jaffrézic F; Université Paris-Saclay, UVSQ, Inserm, Infection et inflammation , 78180, Montigny-le-Bretonneux, France. audrey.hulot@outlook.fr.

BMC Bioinformatics ; 22(1): 392, 2021 Aug 04.

Article em En | MEDLINE | ID: mdl-34348641

ABSTRACT

ABSTRACT

BACKGROUND:

Integrating data from different sources is a recurring question in computational biology. Much effort has been devoted to the integration of data sets of the same type, typically multiple numerical data tables. However, data types are generally heterogeneous it is a common place to gather data in the form of trees, networks or factorial maps, as these representations all have an appealing visual interpretation that helps to study grouping patterns and interactions between entities. The question we aim to answer in this paper is that of the integration of such representations.

RESULTS:

To this end, we provide a simple procedure to compare data with various types, in particular trees or networks, that relies essentially on two

steps:

the first step projects the representations into a common coordinate system; the second step then uses a multi-table integration approach to compare the projected data. We rely on efficient and well-known methodologies for each step the projection step is achieved by retrieving a distance matrix for each representation form and then applying multidimensional scaling to provide a new set of coordinates from all the pairwise distances. The integration step is then achieved by applying a multiple factor analysis to the multiple tables of the new coordinates. This procedure provides tools to integrate and compare data available, for instance, as tree or network structures. Our approach is complementary to kernel methods, traditionally used to answer the same question.

CONCLUSION:

Our approach is evaluated on simulation and used to analyze two real-world data sets first, we compare several clusterings for different cell-types obtained from a transcriptomics single-cell data set in mouse embryos; second, we use our procedure to aggregate a multi-table data set from the TCGA breast cancer database, in order to compare several protein networks inferred for different breast cancer subtypes.

Assuntos

Biologia Computacional; Recidiva Local de Neoplasia; Animais; Análise por Conglomerados; Simulação por Computador; Humanos; Camundongos; Proteínas

Palavras-chave

Clustering; Data integration; MDS; MFA; Network

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Biologia Computacional / Recidiva Local de Neoplasia Tipo de estudo: Prognostic_studies Limite: Animals / Humans Idioma: En Ano de publicação: 2021 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google