A comprehensive evaluation of connectivity methods for L1000 data.

Lin, Kequan; Li, Lu; Dai, Yifei; Wang, Huili; Teng, Shuaishuai; Bao, Xilinqiqige; Lu, Zhi John; Wang, Dong

Lin, Kequan; Li, Lu; Dai, Yifei; Wang, Huili; Teng, Shuaishuai; Bao, Xilinqiqige; Lu, Zhi John; Wang, Dong.

Afiliação

Lin K; School of Life Sciences, Tsinghua University, Beijing 100084, China.
Li L; School of Life Sciences, Tsinghua University, Beijing 100084, China.
Dai Y; School of Medicine, Tsinghua University, Beijing 100084, China.
Wang H; School of Medicine, Tsinghua University, Beijing 100084, China.
Teng S; School of Medicine, Tsinghua University, Beijing 100084, China.
Bao X; International Mongolian Hospital of Inner Mongolia, Hohhot 010065, China.
Lu ZJ; School of Life Sciences, Tsinghua University, Beijing 100084, China.
Wang D; Center of Synthetic & Systems Biology, Tsinghua University, Beijing 100084, China.

Brief Bioinform ; 21(6): 2194-2205, 2020 12 01.

Article em En | MEDLINE | ID: mdl-31774912

RESUMO

The methodologies for evaluating similarities between gene expression profiles of different perturbagens are the key to understanding mechanisms of actions (MoAs) of unknown compounds and finding new indications for existing drugs. L1000-based next-generation Connectivity Map (CMap) data is more than a thousand-fold scale-up of the CMap pilot dataset. Although several systematic evaluations have been performed individually to assess the accuracy of the methodologies for the CMap pilot study, the performance of these methodologies needs to be re-evaluated for the L1000 data. Here, using the drug-drug similarities from the Drug Repurposing Hub database as a benchmark standard, we evaluated six popular published methods for the prediction performance of drug-drug relationships based on the partial area under the receiver operating characteristic (ROC) curve at false positive rates of 0.001, 0.005 and 0.01 (AUC0.001, AUC0.005 and AUC0.01). The similarity evaluating algorithm called ZhangScore was generally superior to other methods and exhibited the highest accuracy at the gene signature sizes ranging from 10 to 200. Further, we tested these methods with an experimentally derived gene signature related to estrogen in breast cancer cells, and the results confirmed that ZhangScore was more accurate than other methods. Moreover, based on scoring results of ZhangScore for the gene signature of TOP2A knockdown, in addition to well-known TOP2A inhibitors, we identified a number of potential inhibitors and at least two of them were the subject of previous investigation. Our studies provide potential guidelines for researchers to choose the suitable connectivity method. The six connectivity methods used in this report have been implemented in R package (https://github.com/Jasonlinchina/RCSM).

Assuntos

Biologia Computacional; Reposicionamento de Medicamentos; Perfilação da Expressão Gênica; Algoritmos; Biologia Computacional/métodos; Bases de Dados Factuais; Perfilação da Expressão Gênica/métodos; Projetos Piloto; Transcriptoma

Palavras-chave

L1000; ZhangScore; connectivity map; connectivity methods; drug repurposing; partial area under the ROC

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Biologia Computacional / Perfilação da Expressão Gênica / Reposicionamento de Medicamentos Tipo de estudo: Prognostic_studies Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2020 Tipo de documento: Article País de afiliação: China

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google