Systematic comparison of ranking aggregation methods for gene lists in experimental results.

Wang, Bo; Law, Andy; Regan, Tim; Parkinson, Nicholas; Cole, Joby; Russell, Clark D; Dockrell, David H; Gutmann, Michael U; Baillie, J Kenneth

Wang, Bo; Law, Andy; Regan, Tim; Parkinson, Nicholas; Cole, Joby; Russell, Clark D; Dockrell, David H; Gutmann, Michael U; Baillie, J Kenneth.

Afiliação

Wang B; Roslin Institute, University of Edinburgh, Edinburgh EH25 9RG, UK.
Law A; Roslin Institute, University of Edinburgh, Edinburgh EH25 9RG, UK.
Regan T; Roslin Institute, University of Edinburgh, Edinburgh EH25 9RG, UK.
Parkinson N; Roslin Institute, University of Edinburgh, Edinburgh EH25 9RG, UK.
Cole J; University of Sheffield, Sheffield S10 2NT, UK.
Russell CD; Centre for Inflammation Research, The Queen's Medical Research Institute, University of Edinburgh, Edinburgh EH16 4TJ, UK.
Dockrell DH; Centre for Inflammation Research, The Queen's Medical Research Institute, University of Edinburgh, Edinburgh EH16 4TJ, UK.
Gutmann MU; School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, UK.
Baillie JK; Roslin Institute, University of Edinburgh, Edinburgh EH25 9RG, UK.

Bioinformatics ; 38(21): 4927-4933, 2022 10 31.

Article em En | MEDLINE | ID: mdl-36094347

RESUMO

MOTIVATION: A common experimental output in biomedical science is a list of genes implicated in a given biological process or disease. The gene lists resulting from a group of studies answering the same, or similar, questions can be combined by ranking aggregation methods to find a consensus or a more reliable answer. Evaluating a ranking aggregation method on a specific type of data before using it is required to support the reliability since the property of a dataset can influence the performance of an algorithm. Such evaluation on gene lists is usually based on a simulated database because of the lack of a known truth for real data. However, simulated datasets tend to be too small compared to experimental data and neglect key features, including heterogeneity of quality, relevance and the inclusion of unranked lists. RESULTS: In this study, a group of existing methods and their variations that are suitable for meta-analysis of gene lists are compared using simulated and real data. Simulated data were used to explore the performance of the aggregation methods as a function of emulating the common scenarios of real genomic data, with various heterogeneity of quality, noise level and a mix of unranked and ranked data using 20 000 possible entities. In addition to the evaluation with simulated data, a comparison using real genomic data on the SARS-CoV-2 virus, cancer (non-small cell lung cancer) and bacteria (macrophage apoptosis) was performed. We summarize the results of our evaluation in a simple flowchart to select a ranking aggregation method, and in an automated implementation using the meta-analysis by information content algorithm to infer heterogeneity of data quality across input datasets. AVAILABILITY AND IMPLEMENTATION: The code for simulated data generation and running edited version of algorithms: https://github.com/baillielab/comparison_of_RA_methods. Code to perform an optimal selection of methods based on the results of this review, using the MAIC algorithm to infer the characteristics of an input dataset, can be downloaded here: https://github.com/baillielab/maic. An online service for running MAIC: https://baillielab.net/maic. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

COVID-19; Carcinoma Pulmonar de Células não Pequenas; Neoplasias Pulmonares; Humanos; Algoritmos; Carcinoma Pulmonar de Células não Pequenas/genética; COVID-19/genética; Neoplasias Pulmonares/genética; Reprodutibilidade dos Testes; SARS-CoV-2; Metanálise como Assunto

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Carcinoma Pulmonar de Células não Pequenas / COVID-19 / Neoplasias Pulmonares Tipo de estudo: Systematic_reviews Limite: Humans Idioma: En Revista: Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2022 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google