Your browser doesn't support javascript.
loading
Feature weighted models to address lineage dependency in drug-resistance prediction from Mycobacterium tuberculosis genome sequences.
Billows, Nina; Phelan, Jody E; Xia, Dong; Peng, Yonghong; Clark, Taane G; Chang, Yu-Mei.
Afiliação
  • Billows N; Department of Comparative Biomedical Sciences, Royal Veterinary College, London, United Kingdom.
  • Phelan JE; Alan Turing Institute, British Library, London, United Kingdom.
  • Xia D; Department of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London, United Kingdom.
  • Peng Y; Department of Comparative Biomedical Sciences, Royal Veterinary College, London, United Kingdom.
  • Clark TG; Department of Computing and Mathematics, Manchester Metropolitan University, Manchester, United Kingdom.
  • Chang YM; Department of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London, United Kingdom.
Bioinformatics ; 39(7)2023 07 01.
Article em En | MEDLINE | ID: mdl-37428143
ABSTRACT
MOTIVATION Tuberculosis (TB) is caused by members of the Mycobacterium tuberculosis complex (MTBC), which has a strain- or lineage-based clonal population structure. The evolution of drug-resistance in the MTBC poses a threat to successful treatment and eradication of TB. Machine learning approaches are being increasingly adopted to predict drug-resistance and characterize underlying mutations from whole genome sequences. However, such approaches may not generalize well in clinical practice due to confounding from the population structure of the MTBC.

RESULTS:

To investigate how population structure affects machine learning prediction, we compared three different approaches to reduce lineage dependency in random forest (RF) models, including stratification, feature selection, and feature weighted models. All RF models achieved moderate-high performance (area under the ROC curve range 0.60-0.98). First-line drugs had higher performance than second-line drugs, but it varied depending on the lineages in the training dataset. Lineage-specific models generally had higher sensitivity than global models which may be underpinned by strain-specific drug-resistance mutations or sampling effects. The application of feature weights and feature selection approaches reduced lineage dependency in the model and had comparable performance to unweighted RF models. AVAILABILITY AND IMPLEMENTATION https//github.com/NinaMercedes/RF_lineages.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Tuberculose / Tuberculose Resistente a Múltiplos Medicamentos / Mycobacterium tuberculosis Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Revista: Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2023 Tipo de documento: Article País de afiliação: Reino Unido

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Tuberculose / Tuberculose Resistente a Múltiplos Medicamentos / Mycobacterium tuberculosis Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Revista: Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2023 Tipo de documento: Article País de afiliação: Reino Unido