Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Genomics ; 113(4): 1802-1815, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-33862184

RESUMO

Despite decades of research and advancements in diagnostics and treatment, tuberculosis remains a major public health concern. New computational methods are needed to interrogate the intersection of host- and bacterial genomes. Paired host genotype datum and infecting bacterial isolate information were analysed for associations using a multinomial logistic regression framework implemented in SNPTest. A cohort of 853 admixed South African participants and a Ghanaian cohort of 1359 participants were included. Two directly genotyped variants, namely rs529920 and rs41472447, were identified in the Ghanaian cohort as being statistically significantly associated with risk for infection with strains of different members of the MTBC. Thus, a multinomial logistic regression using paired host-pathogen data may prove valuable for investigating the complex relationships driving infectious disease.


Assuntos
Mycobacterium tuberculosis , Tuberculose , Estudo de Associação Genômica Ampla , Genótipo , Gana/epidemiologia , Humanos , Fenótipo , África do Sul , Tuberculose/genética , Tuberculose/microbiologia
2.
AMIA Annu Symp Proc ; 2021: 891-899, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35309001

RESUMO

The persistence and emergence of new multi-drug resistant Mycobacterium tuberculosis (M. tb) strains continues to advance the devastating tuberculosis (TB) epidemic. Robust systems are needed to accurately and rapidly perform drug-resistance profiling, and machine learning (ML) methods combined with genomic sequence data may provide novel insights into drug-resistance mechanisms. Using 372 M. tb isolates, the combined utility of ML and bioinformatics to perform drug-resistance profiling is demonstrated. SNPs, InDels, and dinucleotide frequencies are explored as input features for three ML models, namely Decision Trees, Random Forest, and the eXtreme Gradient Boosted model. Using SNPs and InDels, all three models performed equally well yielding a 99% accuracy, 97% recall, and 99% F1-score. Using dinucleotide frequencies, the XGBoost algorithm was superior with a 97% accuracy, 94% recall and 97% F1-score. This study validates the use of variants and presents dinucleotide features as another effective feature encoding method for ML-based phenotype classification.


Assuntos
Antituberculosos , Farmacorresistência Bacteriana Múltipla , Aprendizado de Máquina , Mycobacterium tuberculosis , Tuberculose , Antituberculosos/farmacologia , Antituberculosos/uso terapêutico , Farmacorresistência Bacteriana Múltipla/genética , Humanos , Mycobacterium tuberculosis/efeitos dos fármacos , Mycobacterium tuberculosis/genética , Tuberculose/tratamento farmacológico
3.
IEEE Access ; 8: 195263-195273, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-34976561

RESUMO

The world is grappling with the COVID-19 pandemic caused by the 2019 novel SARS-CoV-2. To better understand this novel virus and its relationship with other pathogens, new methods for analyzing the genome are required. In this study, intrinsic dinucleotide genomic signatures were analyzed for whole genome sequence data of eight pathogenic species, including SARS-CoV-2. The genome sequences were transformed into dinucleotide relative frequencies and classified using the extreme gradient boosting (XGBoost) model. The classification models were trained to a) distinguish between the sequences of all eight species and b) distinguish between sequences of SARS-CoV-2 that originate from different geographic regions. Our method attained 100% in all performance metrics and for all tasks in the eight-species classification problem. Moreover, the models achieved 67% balanced accuracy for the task of classifying the SARS-CoV-2 sequences into the six continental regions and achieved 86% balanced accuracy for the task of classifying SARS-CoV-2 samples as either originating from Asia or not. Analysis of the dinucleotide genomic profiles of the eight species revealed a similarity between the SARS-CoV-2 and MERS-CoV viral sequences. Further analysis of SARS-CoV-2 viral sequences from the six continents revealed that samples from Oceania had the highest frequency of TT dinucleotides as well as the lowest CG frequency compared to the other continents. The dinucleotide signatures of AC, AG,CA, CT, GA, GT, TC, and TG were well conserved across most genomes, while the frequencies of other dinucleotide signatures varied considerably. Altogether, the results from this study demonstrate the utility of dinucleotide relative frequencies for discriminating and identifying similar species.

4.
Front Genet ; 10: 34, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30804980

RESUMO

Genotype imputation is a powerful tool for increasing statistical power in an association analysis. Meta-analysis of multiple study datasets also requires a substantial overlap of SNPs for a successful association analysis, which can be achieved by imputation. Quality of imputed datasets is largely dependent on the software used, as well as the reference populations chosen. The accuracy of imputation of available reference populations has not been tested for the five-way admixed South African Colored (SAC) population. In this study, imputation results obtained using three freely-accessible methods were evaluated for accuracy and quality. We show that the African Genome Resource is the best reference panel for imputation of missing genotypes in samples from the SAC population, implemented via the freely accessible Sanger Imputation Server.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA