Your browser doesn't support javascript.
loading
Systematic comparison of ranking aggregation methods for gene lists in experimental results.
Wang, Bo; Law, Andy; Regan, Tim; Parkinson, Nicholas; Cole, Joby; Russell, Clark D; Dockrell, David H; Gutmann, Michael U; Baillie, J Kenneth.
Affiliation
  • Wang B; Roslin Institute, University of Edinburgh, Edinburgh EH25 9RG, UK.
  • Law A; Roslin Institute, University of Edinburgh, Edinburgh EH25 9RG, UK.
  • Regan T; Roslin Institute, University of Edinburgh, Edinburgh EH25 9RG, UK.
  • Parkinson N; Roslin Institute, University of Edinburgh, Edinburgh EH25 9RG, UK.
  • Cole J; University of Sheffield, Sheffield S10 2NT, UK.
  • Russell CD; Centre for Inflammation Research, The Queen's Medical Research Institute, University of Edinburgh, Edinburgh EH16 4TJ, UK.
  • Dockrell DH; Centre for Inflammation Research, The Queen's Medical Research Institute, University of Edinburgh, Edinburgh EH16 4TJ, UK.
  • Gutmann MU; School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, UK.
  • Baillie JK; Roslin Institute, University of Edinburgh, Edinburgh EH25 9RG, UK.
Bioinformatics ; 38(21): 4927-4933, 2022 10 31.
Article in En | MEDLINE | ID: mdl-36094347
ABSTRACT
MOTIVATION A common experimental output in biomedical science is a list of genes implicated in a given biological process or disease. The gene lists resulting from a group of studies answering the same, or similar, questions can be combined by ranking aggregation methods to find a consensus or a more reliable answer. Evaluating a ranking aggregation method on a specific type of data before using it is required to support the reliability since the property of a dataset can influence the performance of an algorithm. Such evaluation on gene lists is usually based on a simulated database because of the lack of a known truth for real data. However, simulated datasets tend to be too small compared to experimental data and neglect key features, including heterogeneity of quality, relevance and the inclusion of unranked lists.

RESULTS:

In this study, a group of existing methods and their variations that are suitable for meta-analysis of gene lists are compared using simulated and real data. Simulated data were used to explore the performance of the aggregation methods as a function of emulating the common scenarios of real genomic data, with various heterogeneity of quality, noise level and a mix of unranked and ranked data using 20 000 possible entities. In addition to the evaluation with simulated data, a comparison using real genomic data on the SARS-CoV-2 virus, cancer (non-small cell lung cancer) and bacteria (macrophage apoptosis) was performed. We summarize the results of our evaluation in a simple flowchart to select a ranking aggregation method, and in an automated implementation using the meta-analysis by information content algorithm to infer heterogeneity of data quality across input datasets. AVAILABILITY AND IMPLEMENTATION The code for simulated data generation and running edited version of algorithms https//github.com/baillielab/comparison_of_RA_methods. Code to perform an optimal selection of methods based on the results of this review, using the MAIC algorithm to infer the characteristics of an input dataset, can be downloaded here https//github.com/baillielab/maic. An online service for running MAIC https//baillielab.net/maic. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Subject(s)

Full text: 1 Database: MEDLINE Main subject: Carcinoma, Non-Small-Cell Lung / COVID-19 / Lung Neoplasms Type of study: Systematic_reviews Limits: Humans Language: En Journal: Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2022 Type: Article Affiliation country: United kingdom

Full text: 1 Database: MEDLINE Main subject: Carcinoma, Non-Small-Cell Lung / COVID-19 / Lung Neoplasms Type of study: Systematic_reviews Limits: Humans Language: En Journal: Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2022 Type: Article Affiliation country: United kingdom