Your browser doesn't support javascript.
loading
Employing phylogenetic tree shape statistics to resolve the underlying host population structure.
Kayondo, Hassan W; Ssekagiri, Alfred; Nabakooza, Grace; Bbosa, Nicholas; Ssemwanga, Deogratius; Kaleebu, Pontiano; Mwalili, Samuel; Mango, John M; Leigh Brown, Andrew J; Saenz, Roberto A; Galiwango, Ronald; Kitayimbwa, John M.
Afiliación
  • Kayondo HW; Institute of Basic Sciences, Technology and Innovation (PAUSTI), Pan African University, Nairobi, Kenya. whkayondo@gmail.com.
  • Ssekagiri A; Department of Mathematics, Makerere University, Kampala, Uganda. whkayondo@gmail.com.
  • Nabakooza G; Uganda Virus Research Institute (UVRI), Entebbe, Uganda.
  • Bbosa N; Department of Immunology and Molecular Biology, Makerere University, Kampala, Uganda.
  • Ssemwanga D; Department of Immunology and Molecular Biology, Makerere University, Kampala, Uganda.
  • Kaleebu P; UVRI Centre of Excellence in Infection and Immunity Research and Training (MUII-Plus), Makerere University, Entebbe, Uganda.
  • Mwalili S; Centre for Computational Biology, Uganda Christian University, Mukono, Uganda.
  • Mango JM; Medical Research Council (MRC)/Uganda Virus Research Institute (UVRI) and London School of Hygiene and Tropical Medicine (LSHTM) Uganda Research Unit, Entebbe, Uganda.
  • Leigh Brown AJ; Uganda Virus Research Institute (UVRI), Entebbe, Uganda.
  • Saenz RA; Medical Research Council (MRC)/Uganda Virus Research Institute (UVRI) and London School of Hygiene and Tropical Medicine (LSHTM) Uganda Research Unit, Entebbe, Uganda.
  • Galiwango R; Uganda Virus Research Institute (UVRI), Entebbe, Uganda.
  • Kitayimbwa JM; Medical Research Council (MRC)/Uganda Virus Research Institute (UVRI) and London School of Hygiene and Tropical Medicine (LSHTM) Uganda Research Unit, Entebbe, Uganda.
BMC Bioinformatics ; 22(1): 546, 2021 Nov 10.
Article en En | MEDLINE | ID: mdl-34758743
ABSTRACT

BACKGROUND:

Host population structure is a key determinant of pathogen and infectious disease transmission patterns. Pathogen phylogenetic trees are useful tools to reveal the population structure underlying an epidemic. Determining whether a population is structured or not is useful in informing the type of phylogenetic methods to be used in a given study. We employ tree statistics derived from phylogenetic trees and machine learning classification techniques to reveal an underlying population structure.

RESULTS:

In this paper, we simulate phylogenetic trees from both structured and non-structured host populations. We compute eight statistics for the simulated trees, which are the number of cherries; Sackin, Colless and total cophenetic indices; ladder length; maximum depth; maximum width, and width-to-depth ratio. Based on the estimated tree statistics, we classify the simulated trees as from either a non-structured or a structured population using the decision tree (DT), K-nearest neighbor (KNN) and support vector machine (SVM). We incorporate the basic reproductive number ([Formula see text]) in our tree simulation procedure. Sensitivity analysis is done to investigate whether the classifiers are robust to different choice of model parameters and to size of trees. Cross-validated results for area under the curve (AUC) for receiver operating characteristic (ROC) curves yield mean values of over 0.9 for most of the classification models.

CONCLUSIONS:

Our classification procedure distinguishes well between trees from structured and non-structured populations using the classifiers, the two-sample Kolmogorov-Smirnov, Cucconi and Podgor-Gastwirth tests and the box plots. SVM models were more robust to changes in model parameters and tree size compared to KNN and DT classifiers. Our classification procedure was applied to real -world data and the structured population was revealed with high accuracy of [Formula see text] using SVM-polynomial classifier.
Asunto(s)
Palabras clave

Texto completo: 1 Base de datos: MEDLINE Asunto principal: Máquina de Vectores de Soporte / Aprendizaje Automático Tipo de estudio: Prognostic_studies Idioma: En Revista: BMC Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2021 Tipo del documento: Article País de afiliación: Kenia

Texto completo: 1 Base de datos: MEDLINE Asunto principal: Máquina de Vectores de Soporte / Aprendizaje Automático Tipo de estudio: Prognostic_studies Idioma: En Revista: BMC Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2021 Tipo del documento: Article País de afiliación: Kenia