Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 31
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Bioinformatics ; 39(1)2023 01 01.
Article in English | MEDLINE | ID: mdl-36637196

ABSTRACT

MOTIVATION: The phylogenetic signal of structural variation informs a more comprehensive understanding of evolution. As (near-)complete genome assembly becomes more commonplace, the next methodological challenge for inferring genome rearrangement trees is the identification of syntenic blocks of orthologous sequences. In this article, we studied 94 reference quality genomes of primarily Mycobacterium tuberculosis (Mtb) isolates as a benchmark to evaluate these methods. The clonal nature of Mtb evolution, the manageable genome sizes, along with substantial levels of structural variation make this an ideal benchmarking dataset. RESULTS: We tested several methods for detecting homology and obtaining syntenic blocks and two methods for inferring phylogenies from them, then compared the resulting trees to the standard method's tree, inferred from nucleotide substitutions. We found that, not only the choice of methods, but also their parameters can impact results, and that the tree inference method had less impact than the block determination method. Interestingly, a rearrangement tree based on blocks from the Cactus whole-genome aligner was fully compatible with the highly supported branches of the substitution-based tree, enabling the combination of the two into a high-resolution supertree. Overall, our results indicate that accurate trees can be inferred using genome rearrangements, but the choice of the methods for inferring homology requires care. AVAILABILITY AND IMPLEMENTATION: Analysis scripts and code written for this study are available at https://gitlab.com/LPCDRP/rearrangement-homology.pub and https://gitlab.com/LPCDRP/syntement. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Mycobacterium tuberculosis , Phylogeny , Mycobacterium tuberculosis/genetics , Genome , Synteny
2.
Drug Resist Updat ; 68: 100959, 2023 05.
Article in English | MEDLINE | ID: mdl-37043916

ABSTRACT

Here, we describe a clinical case of pyrazinamide-resistant (PZA-R) tuberculosis (TB) reported as PZA-susceptible (PZA-S) by common molecular diagnostics. Phenotypic susceptibility testing (pDST) indicated PZA-R TB. Targeted Sanger sequencing reported wild-type PncA, indicating PZA-S TB. Whole Genome Sequencing (WGS) by PacBio and IonTorrent both detected deletion of a large portion of pncA, indicating PZA-R. Importantly, both WGS methods showed deletion of part of the primer region targeted by Sanger sequencing. Repeating Sanger sequencing from a culture in presence of PZA returned no result, revealing that 1) two minority susceptible subpopulations had vanished, 2) the PZA-R majority subpopulation harboring the pncA deletion could not be amplified by Sanger primers, and was thus obscured by amplification process. This case demonstrates how a small susceptible subpopulation can entirely obscure majority resistant populations from targeted molecular diagnostics and falsely imply homogenous susceptibility, leading to incorrect diagnosis. To our knowledge, this is the first report of a minority susceptible subpopulation masking a majority resistant population, causing targeted molecular diagnostics to call false susceptibility. The consequence of such genomic events is not limited to PZA. This phenomenon can impact molecular diagnostics' sensitivity whenever the resistance-conferring mutation is not fully within primer-targeted regions. This can be caused by structural changes of genomic context with phenotypic consequence as we report here, or by uncommon mechanisms of resistance. Such false susceptibility calls promote suboptimal treatment and spread of strains that challenge targeted molecular diagnostics. This motivates development of molecular diagnostics unreliant on primer conservation, and impels frequent WGS surveillance for variants that evade prevailing molecular diagnostics.


Subject(s)
Mycobacterium tuberculosis , Tuberculosis, Multidrug-Resistant , Humans , Pyrazinamide/pharmacology , Pyrazinamide/therapeutic use , Antitubercular Agents/pharmacology , Antitubercular Agents/therapeutic use , Mycobacterium tuberculosis/genetics , Pathology, Molecular , Amidohydrolases/genetics , Amidohydrolases/therapeutic use , Microbial Sensitivity Tests , Tuberculosis, Multidrug-Resistant/diagnosis , Tuberculosis, Multidrug-Resistant/drug therapy , Tuberculosis, Multidrug-Resistant/genetics , Mutation
3.
Antimicrob Agents Chemother ; 66(6): e0207521, 2022 06 21.
Article in English | MEDLINE | ID: mdl-35532237

ABSTRACT

Point mutations in the rrs gene and the eis promoter are known to confer resistance to the second-line injectable drugs (SLIDs) amikacin (AMK), capreomycin (CAP), and kanamycin (KAN). While mutations in these canonical genes confer the majority of SLID resistance, alternative mechanisms of resistance are not uncommon and threaten effective treatment decisions when using conventional molecular diagnostics. In total, 1,184 clinical Mycobacterium tuberculosis isolates from 7 countries were studied for genomic markers associated with phenotypic resistance. The markers rrs:A1401G and rrs:G1484T were associated with resistance to all three SLIDs, and three known markers in the eis promoter (eis:G-10A, eis:C-12T, and eis:C-14T) were similarly associated with kanamycin resistance (KAN-R). Among 325, 324, and 270 AMK-R, CAP-R, and KAN-R isolates, 274 (84.3%), 250 (77.2%), and 249 (92.3%) harbored canonical mutations, respectively. Thirteen isolates harbored more than one canonical mutation. Canonical mutations did not account for 103 of the phenotypically resistant isolates. A genome-wide association study identified three genes and promoters with mutations that, on aggregate, were associated with unexplained resistance to at least one SLID. Our analysis associated whiB7 5'-untranslated-region mutations with KAN resistance, supporting clinical relevance for this previously demonstrated mechanism of KAN resistance. We also provide evidence for the novel association of CAP resistance with the promoter of the Rv2680-Rv2681 operon, which encodes an exoribonuclease that may influence the binding of CAP to the ribosome. Aggregating mutations by gene can provide additional insight and therefore is recommended for identifying rare mechanisms of resistance when individual mutations carry insufficient statistical power.


Subject(s)
Drug Resistance, Multiple, Bacterial , Mycobacterium tuberculosis , Amikacin/pharmacology , Antitubercular Agents/pharmacology , Capreomycin/pharmacology , Drug Resistance, Multiple, Bacterial/genetics , Genetic Markers , Genome-Wide Association Study , Kanamycin/pharmacology , Microbial Sensitivity Tests , Mutation , Mycobacterium tuberculosis/drug effects , Mycobacterium tuberculosis/genetics
4.
Article in English | MEDLINE | ID: mdl-33722890

ABSTRACT

Pyrazinamide (PZA) is a widely used antitubercular chemotherapeutic. Typically, PZA resistance (PZA-R) emerges in Mycobacterium tuberculosis strains with existing resistance to isoniazid and rifampin (i.e., multidrug resistance [MDR]) and is conferred by loss-of-function pncA mutations that inhibit conversion to its active form, pyrazinoic acid (POA). PZA-R departing from this canonical scenario is poorly understood. Here, we genotyped pncA and purported alternative PZA-R genes (panD, rpsA, and clpC1) with long-read sequencing of 19 phenotypically PZA-monoresistant isolates collected in Sweden and compared their phylogenetic and genomic characteristics to a large set of MDR PZA-R (MDRPZA-R) isolates. We report the first association of ClpC1 mutations with PZA-R in clinical isolates, in the ClpC1 promoter (clpC1p-138) and the N terminus of ClpC1 (ClpC1Val63Ala). Mutations have emerged in both these regions under POA selection in vitro, and the N-terminal region of ClpC1 has been implicated further, through its POA-dependent efficacy in PanD proteolysis. ClpC1Val63Ala mutants spanned 4 Indo-Oceanic sublineages. Indo-Oceanic isolates invariably harbored ClpC1Val63Ala and were starkly overrepresented (odds ratio [OR] = 22.2, P < 0.00001) among PZA-monoresistant isolates (11/19) compared to MDRPZA-R isolates (5/80). The genetic basis of Indo-Oceanic isolates' overrepresentation in PZA-monoresistant tuberculosis (TB) remains undetermined, but substantial circumstantial evidence suggests that ClpC1Val63Ala confers low-level PZA resistance. Our findings highlight ClpC1 as potentially clinically relevant for PZA-R and reinforce the importance of genetic background in the trajectory of resistance development.


Subject(s)
Mycobacterium tuberculosis , Tuberculosis, Multidrug-Resistant , Amidohydrolases/genetics , Antitubercular Agents/pharmacology , Antitubercular Agents/therapeutic use , Drug Resistance, Bacterial/genetics , Humans , Microbial Sensitivity Tests , Mutation , Mycobacterium tuberculosis/genetics , Phylogeny , Pyrazinamide/pharmacology , Sweden , Tuberculosis, Multidrug-Resistant/drug therapy
5.
BMC Genomics ; 18(1): 302, 2017 04 17.
Article in English | MEDLINE | ID: mdl-28415976

ABSTRACT

BACKGROUND: The genetic basis of virulence in Mycobacterium tuberculosis has been investigated through genome comparisons of virulent (H37Rv) and attenuated (H37Ra) sister strains. Such analysis, however, relies heavily on the accuracy of the sequences. While the H37Rv reference genome has had several corrections to date, that of H37Ra is unmodified since its original publication. RESULTS: Here, we report the assembly and finishing of the H37Ra genome from single-molecule, real-time (SMRT) sequencing. Our assembly reveals that the number of H37Ra-specific variants is less than half of what the Sanger-based H37Ra reference sequence indicates, undermining and, in some cases, invalidating the conclusions of several studies. PE_PPE family genes, which are intractable to commonly-used sequencing platforms because of their repetitive and GC-rich nature, are overrepresented in the set of genes in which all reported H37Ra-specific variants are contradicted. Further, one of the sequencing errors in H37Ra masks a true variant in common with the clinical strain CDC1551 which, when considered in the context of previous work, corresponds to a sequencing error in the H37Rv reference genome. CONCLUSIONS: Our results constrain the set of genomic differences possibly affecting virulence by more than half, which focuses laboratory investigation on pertinent targets and demonstrates the power of SMRT sequencing for producing high-quality reference genomes.


Subject(s)
Mycobacterium tuberculosis/genetics , Virulence/genetics , Bacterial Proteins/genetics , DNA Copy Number Variations , DNA Methylation , DNA, Bacterial/chemistry , DNA, Bacterial/genetics , DNA, Bacterial/metabolism , Genome, Bacterial , Mutation , Promoter Regions, Genetic , Quinone Reductases/genetics , Sequence Analysis, DNA
6.
Antimicrob Agents Chemother ; 59(9): 5267-77, 2015 Sep.
Article in English | MEDLINE | ID: mdl-26077261

ABSTRACT

Pyrazinamide (PZA) is an important first-line drug in the treatment of tuberculosis (TB) and of significant interest to the HIV-infected community due to the prevalence of TB-HIV coinfection in some regions of the world. The mechanism of resistance to PZA is unlike that of any other anti-TB drug. The gene pncA, encoding pyrazinamidase (PZase), is associated with resistance to PZA. However, because single mutations in PZase have a low prevalence, the individual sensitivities are low. Hundreds of distinct mutations in the enzyme have been associated with resistance, while some only appear in susceptible isolates. This makes interpretation of molecular testing difficult and often leads to the simplification that any PZase mutation causes resistance. This systematic review reports a comprehensive global list of mutations observed in PZase and its promoter region in clinical strains, their phenotypic association, their global frequencies and diversity, the method of phenotypic determination, their MIC values when given, and the method of MIC determination and assesses the strength of the association between mutations and phenotypic resistance to PZA. In this systematic review, we report global statistics for 641 mutations in 171 (of 187) codons from 2,760 resistant strains and 96 mutations from 3,329 susceptible strains reported in 61 studies. For diagnostics, individual mutations (or any subset) were not sufficiently sensitive. Assuming similar error profiles of the 5 phenotyping platforms included in this study, the entire enzyme and its promoter provide a combined estimated sensitivity of 83%. This review highlights the need for identification of an alternative mechanism(s) of resistance, at least for the unexplained 17% of cases.


Subject(s)
Amidohydrolases/genetics , Amidohydrolases/metabolism , Antitubercular Agents/pharmacology , Mycobacterium tuberculosis/drug effects , Mycobacterium tuberculosis/enzymology , Pyrazinamide/pharmacology , Drug Resistance, Bacterial/genetics , Microbial Sensitivity Tests , Mutation , Mycobacterium tuberculosis/genetics
7.
J Clin Microbiol ; 52(3): 781-9, 2014 Mar.
Article in English | MEDLINE | ID: mdl-24353002

ABSTRACT

Molecular diagnostic methods based on the detection of mutations conferring drug resistance are promising technologies for rapidly detecting multidrug-/extensively drug-resistant tuberculosis (M/XDR TB), but large studies of mutations as markers of resistance are rare. The Global Consortium for Drug-Resistant TB Diagnostics analyzed 417 Mycobacterium tuberculosis isolates from multinational sites with a high prevalence of drug resistance to determine the sensitivities and specificities of mutations associated with M/XDR TB to inform the development of rapid diagnostic methods. We collected M/XDR TB isolates from regions of high TB burden in India, Moldova, the Philippines, and South Africa. The isolates underwent standardized phenotypic drug susceptibility testing (DST) to isoniazid (INH), rifampin (RIF), moxifloxacin (MOX), ofloxacin (OFX), amikacin (AMK), kanamycin (KAN), and capreomycin (CAP) using MGIT 960 and WHO-recommended critical concentrations. Eight genes (katG, inhA, rpoB, gyrA, gyrB, rrs, eis, and tlyA) were sequenced using Sanger sequencing. Three hundred seventy isolates were INHr, 356 were RIFr, 292 were MOXr/OFXr, 230 were AMKr, 219 were CAPr, and 286 were KANr. Four single nucleotide polymorphisms (SNPs) in katG/inhA had a combined sensitivity of 96% and specificities of 97 to 100% for the detection of INHr. Eleven SNPs in rpoB had a combined sensitivity of 98% for RIFr. Eight SNPs in gyrA codons 88 to 94 had sensitivities of 90% for MOXr/OFXr. The rrs 1401/1484 SNPs had 89 to 90% sensitivity for detecting AMKr/CAPr but 71% sensitivity for KANr. Adding eis promoter SNPs increased the sensitivity to 93% for detecting AMKr and to 91% for detecting KANr. Approximately 30 SNPs in six genes predicted clinically relevant XDR-TB phenotypes with 90 to 98% sensitivity and almost 100% specificity.


Subject(s)
Antitubercular Agents/pharmacology , Drug Resistance, Multiple, Bacterial , Extensively Drug-Resistant Tuberculosis/diagnosis , Molecular Diagnostic Techniques/methods , Mycobacterium tuberculosis/genetics , Point Mutation , Antitubercular Agents/therapeutic use , Bacterial Proteins/genetics , DNA, Bacterial/chemistry , DNA, Bacterial/genetics , Extensively Drug-Resistant Tuberculosis/microbiology , Genotype , Humans , India , Microbial Sensitivity Tests/methods , Moldova , Mycobacterium tuberculosis/isolation & purification , Phenotype , Philippines , Polymorphism, Single Nucleotide , Sensitivity and Specificity , Sequence Analysis, DNA , South Africa
8.
Front Microbiol ; 14: 1265390, 2023.
Article in English | MEDLINE | ID: mdl-38260909

ABSTRACT

Background: Rifampicin (RIF) is a key first-line drug used to treat tuberculosis, a primarily pulmonary disease caused by Mycobacterium tuberculosis. RIF resistance is caused by mutations in rpoB, at the cost of slower growth and reduced transcription efficiency. Antibiotic resistance to RIF is prevalent despite this fitness cost. Compensatory mutations in rpoABC genes have been shown to alleviate the fitness cost of rpoB:S450L, explaining how RIF resistant strains harbor this mutation can spread so rapidly. Unfortunately, the full set of RIF compensatory mutations is still unknown, particularly those compensating for rarer RIF resistance mutations. Objectives: We performed an association study on a globally representative set of 4,309 whole genome sequenced clinical M. tuberculosis isolates to identify novel putative compensatory mutations, determine the prevalence of known and previously reported putative compensatory mutations, and determine which RIF resistance markers associate with these compensatory mutations. Results and conclusions: Of the 1,079 RIF resistant isolates, 638 carried previously reported putative and high-probability compensatory mutations. Our strict criteria identified 46 additional mutations in rpoABC for which no strong prior evidence of their compensatory role exists. Of these, 35 have previously been reported. As such, our independent corroboration adds to the mounting evidence that these 35 also carry a compensatory role. The remaining 11 are novel putative compensatory markers, reported here for the first time. Six of these 11 novel putative compensatory mutations had two or more mutation events. Most compensatory mutations appear to be specifically compensating for the fitness loss due to rpoB:S450L. However, an outbreak of 22 closely related isolates each carried three rpoB mutations, the rare RIFR markers D435G and L452P and the putative compensatory mutation I1106T. This suggests compensation may require specific combinations of rpoABC mutations. Here, we report only mutations that met our very strict criteria. It is highly likely that many additional rpoABC mutations compensate for rare resistance-causing mutations and therefore did not carry the statistical power to be reported here. These findings aid in the identification of RIF resistant M. tuberculosis strains with restored fitness, which pose a greater risk of causing resistant outbreaks.

9.
Nucleic Acids Res ; 38(Database issue): D633-9, 2010 Jan.
Article in English | MEDLINE | ID: mdl-19755503

ABSTRACT

Annotating the function of all human genes is a critical, yet formidable, challenge. Current gene annotation efforts focus on centralized curation resources, but it is increasingly clear that this approach does not scale with the rapid growth of the biomedical literature. The Gene Wiki utilizes an alternative and complementary model based on the principle of community intelligence. Directly integrated within the online encyclopedia, Wikipedia, the goal of this effort is to build a gene-specific review article for every gene in the human genome, where each article is collaboratively written, continuously updated and community reviewed. Previously, we described the creation of Gene Wiki 'stubs' for approximately 9000 human genes. Here, we describe ongoing systematic improvements to these articles to increase their utility. Moreover, we retrospectively examine the community usage and improvement of the Gene Wiki, providing evidence of a critical mass of users and editors. Gene Wiki articles are freely accessible within the Wikipedia web site, and additional links and information are available at http://en.wikipedia.org/wiki/Portal:Gene_Wiki.


Subject(s)
Computational Biology/methods , Databases, Genetic , Databases, Nucleic Acid , Access to Information , Computational Biology/trends , Databases, Protein , Genetics , Humans , Information Storage and Retrieval/methods , Internet , Models, Genetic , Protein Interaction Mapping , Software
10.
BMC Bioinformatics ; 12: 467, 2011 Dec 07.
Article in English | MEDLINE | ID: mdl-22151536

ABSTRACT

BACKGROUND: Simultaneous measurement of gene expression on a genomic scale can be accomplished using microarray technology or by sequencing based methods. Researchers who perform high throughput gene expression assays often deposit their data in public databases, but heterogeneity of measurement platforms leads to challenges for the combination and comparison of data sets. Researchers wishing to perform cross platform normalization face two major obstacles. First, a choice must be made about which method or methods to employ. Nine are currently available, and no rigorous comparison exists. Second, software for the selected method must be obtained and incorporated into a data analysis workflow. RESULTS: Using two publicly available cross-platform testing data sets, cross-platform normalization methods are compared based on inter-platform concordance and on the consistency of gene lists obtained with transformed data. Scatter and ROC-like plots are produced and new statistics based on those plots are introduced to measure the effectiveness of each method. Bootstrapping is employed to obtain distributions for those statistics. The consistency of platform effects across studies is explored theoretically and with respect to the testing data sets. CONCLUSIONS: Our comparisons indicate that four methods, DWD, EB, GQ, and XPN, are generally effective, while the remaining methods do not adequately correct for platform effects. Of the four successful methods, XPN generally shows the highest inter-platform concordance when treatment groups are equally sized, while DWD is most robust to differently sized treatment groups and consistently shows the smallest loss in gene detection. We provide an R package, CONOR, capable of performing the nine cross-platform normalization methods considered. The package can be downloaded at http://alborz.sdsu.edu/conor and is available from CRAN.


Subject(s)
Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Software , Algorithms , Genome , Humans , Male , Spermatozoa/metabolism
11.
Microb Genom ; 7(3)2021 03.
Article in English | MEDLINE | ID: mdl-33502304

ABSTRACT

Whole-genome sequencing (WGS) is fundamental to Mycobacterium tuberculosis basic research and many clinical applications. Coverage across Illumina-sequenced M. tuberculosis genomes is known to vary with sequence context, but this bias is poorly characterized. Here, through a novel application of phylogenomics that distinguishes genuine coverage bias from deletions, we discern Illumina 'blind spots' in the M. tuberculosis reference genome for seven sequencing workflows. We find blind spots to be widespread, affecting 529 genes, and provide their exact coordinates, enabling salvage of unaffected regions. Fifty-seven pe/ppe genes (the primary families assumed to exhibit Illumina bias) lack blind spots entirely, while the remaining pe/ppe genes account for 55.1 % of blind spots. Surprisingly, we find coverage bias persists in homopolymers as short as 6 bp, shorter tracts than previously reported. While G+C-rich regions challenge all Illumina sequencing workflows, a modified Nextera library preparation that amplifies DNA with a high-fidelity polymerase markedly attenuates coverage bias in G+C-rich and homopolymeric sequences, expanding the 'Illumina-sequenceable' genome. Through these findings, and by defining workflow-specific exclusion criteria, we spotlight effective strategies for handling bias in M. tuberculosis Illumina WGS. This empirical analysis framework may be used to systematically evaluate coverage bias in other species using existing sequencing data.


Subject(s)
High-Throughput Nucleotide Sequencing/standards , Mycobacterium tuberculosis/genetics , Tuberculosis/microbiology , Bias , Gene Library , Genome, Bacterial , High-Throughput Nucleotide Sequencing/methods , Humans , Mycobacterium tuberculosis/classification , Mycobacterium tuberculosis/isolation & purification , Whole Genome Sequencing/methods , Whole Genome Sequencing/standards , Workflow
12.
mBio ; 13(1): e0043921, 2021 02 22.
Article in English | MEDLINE | ID: mdl-35100871

ABSTRACT

Pyrazinamide (PZA) plays a crucial role in first-line tuberculosis drug therapy. Unlike other antimicrobial agents, PZA is active against Mycobacterium tuberculosis only at low pH. The basis for this conditional drug susceptibility remains undefined. In this study, we utilized a genome-wide approach to interrogate potentiation of PZA action. We found that mutations in numerous genes involved in central metabolism as well as cell envelope maintenance and stress response are associated with PZA resistance. Further, we demonstrate that constitutive activation of the cell envelope stress response can drive PZA susceptibility independent of environmental pH. Consequently, exposure to peptidoglycan synthesis inhibitors, such as beta-lactams and d-cycloserine, potentiate PZA action through triggering this response. These findings illuminate a regulatory mechanism for conditional PZA susceptibility and reveal new avenues for enhancing potency of this important drug through targeting activation of the cell envelope stress response. IMPORTANCE For decades, pyrazinamide has served as a cornerstone of tuberculosis therapy. Unlike any other antitubercular drug, pyrazinamide requires an acidic environment to exert its action. Despite its importance, the driver of this conditional susceptibility has remained unknown. In this study, a genome-wide approach revealed that pyrazinamide action is governed by the cell envelope stress response. This observation was validated by orthologous approaches that demonstrate that a central player of this response, SigE, is both necessary and sufficient for potentiation of pyrazinamide action. Moreover, constitutive activation of this response through deletion of the anti-sigma factor gene rseA or exposure of bacilli to drugs that target the cell wall was found to potently drive pyrazinamide susceptibility independent of environmental pH. These findings force a paradigm shift in our understanding of pyrazinamide action and open new avenues for improving diagnostic and therapeutic tools for tuberculosis.


Subject(s)
Mycobacterium tuberculosis , Tuberculosis , Humans , Pyrazinamide/therapeutic use , Mycobacterium tuberculosis/genetics , Amidohydrolases/metabolism , Antitubercular Agents/pharmacology , Tuberculosis/microbiology , Mutation , Microbial Sensitivity Tests
13.
mSystems ; 6(6): e0067321, 2021 Dec 21.
Article in English | MEDLINE | ID: mdl-34726489

ABSTRACT

Accurate and timely functional genome annotation is essential for translating basic pathogen research into clinically impactful advances. Here, through literature curation and structure-function inference, we systematically update the functional genome annotation of Mycobacterium tuberculosis virulent type strain H37Rv. First, we systematically curated annotations for 589 genes from 662 publications, including 282 gene products absent from leading databases. Second, we modeled 1,711 underannotated proteins and developed a semiautomated pipeline that captured shared function between 400 protein models and structural matches of known function on Protein Data Bank, including drug efflux proteins, metabolic enzymes, and virulence factors. In aggregate, these structure- and literature-derived annotations update 940/1,725 underannotated H37Rv genes and generate hundreds of functional hypotheses. Retrospectively applying the annotation to a recent whole-genome transposon mutant screen provided missing function for 48% (13/27) of underannotated genes altering antibiotic efficacy and 33% (23/69) required for persistence during mouse tuberculosis (TB) infection. Prospective application of the protein models enabled us to functionally interpret novel laboratory generated pyrazinamide (PZA)-resistant mutants of unknown function, which implicated the emerging coenzyme A depletion model of PZA action in the mutants' PZA resistance. Our findings demonstrate the functional insight gained by integrating structural modeling and systematic literature curation, even for widely studied microorganisms. Functional annotations and protein structure models are available at https://tuberculosis.sdsu.edu/H37Rv in human- and machine-readable formats. IMPORTANCE Mycobacterium tuberculosis, the primary causative agent of tuberculosis, kills more humans than any other infectious bacterium. Yet 40% of its genome is functionally uncharacterized, leaving much about the genetic basis of its resistance to antibiotics, capacity to withstand host immunity, and basic metabolism yet undiscovered. Irregular literature curation for functional annotation contributes to this gap. We systematically curated functions from literature and structural similarity for over half of poorly characterized genes, expanding the functionally annotated Mycobacterium tuberculosis proteome. Applying this updated annotation to recent in vivo functional screens added functional information to dozens of clinically pertinent proteins described as having unknown function. Integrating the annotations with a prospective functional screen identified new mutants resistant to a first-line TB drug, supporting an emerging hypothesis for its mode of action. These improvements in functional interpretation of clinically informative studies underscore the translational value of this functional knowledge. Structure-derived annotations identify hundreds of high-confidence candidates for mechanisms of antibiotic resistance, virulence factors, and basic metabolism and other functions key in clinical and basic tuberculosis research. More broadly, they provide a systematic framework for improving prokaryotic reference annotations.

14.
Elife ; 92020 10 27.
Article in English | MEDLINE | ID: mdl-33107429

ABSTRACT

This study assembles DNA adenine methylomes for 93 Mycobacterium tuberculosis complex (MTBC) isolates from seven lineages paired with fully-annotated, finished, de novo assembled genomes. Integrative analysis yielded four key results. First, methyltransferase allele-methylome mapping corrected methyltransferase variant effects previously obscured by reference-based variant calling. Second, heterogeneity analysis of partially active methyltransferase alleles revealed that intracellular stochastic methylation generates a mosaic of methylomes within isogenic cultures, which we formalize as 'intercellular mosaic methylation' (IMM). Mutation-driven IMM was nearly ubiquitous in the globally prominent Beijing sublineage. Third, promoter methylation is widespread and associated with differential expression in the ΔhsdM transcriptome, suggesting promoter HsdM-methylation directly influences transcription. Finally, comparative and functional analyses identified 351 sites hypervariable across isolates and numerous putative regulatory interactions. This multi-omic integration revealed features of methylomic variability in clinical isolates and provides a rational basis for hypothesizing the functions of DNA adenine methylation in MTBC physiology and adaptive evolution.


Subject(s)
Adenine/metabolism , DNA Methylation , Epigenome , Genetic Variation , Mycobacterium tuberculosis/genetics , Mutation , Mycobacterium tuberculosis/metabolism
15.
BMC Genet ; 10: 28, 2009 Jun 19.
Article in English | MEDLINE | ID: mdl-19545374

ABSTRACT

BACKGROUND: The Isolation by Distance Web Service (IBDWS) is a user-friendly web interface for analyzing patterns of isolation by distance in population genetic data. IBDWS enables researchers to perform a variety of statistical tests such as Mantel tests and reduced major axis regression (RMA), and returns vector based graphs. The more than 60 citations since 2005 confirm the popularity and utility of this website. Despite its usefulness, the data sets with over 65 populations can take hours or days to complete due to the computational intensity of the statistical tests. This is especially troublesome for web-based software analysis, since users tend to expect real-time results on the order of seconds, or at most, minutes. Moreover, as genetic data continue to increase and diversify, so does the demand for more processing power. In order to increase the speed and efficiency of IBDWS, we first determined which aspects of the code were most time consuming and whether they might be amenable to improvements by parallelization or algorithmic optimization. RESULTS: Runtime tests uncovered two areas of IBDWS that consumed significant amounts of time: randomizations within the Mantel test and the RMA calculations. We found that these sections of code could be restructured and parallelized to improve efficiency. The code was first optimized by combining two similar randomization routines, implementing a Fisher-Yates shuffling algorithm, and then parallelizing those routines. Tests of the parallelization and Fisher-Yates algorithmic improvements were performed on a variety of data sets ranging from 10 to 150 populations. All tested algorithms showed runtime reductions and a very close fit to the predicted speedups based on time-complexity calculations. In the case of 150 populations with 10,000 randomizations, data were analyzed 23 times faster. CONCLUSION: Since the implementation of the new algorithms in late 2007, datasets have continued to increase substantially in size and many exceed the largest population sizes we used in our test sets. The fact that the website has continued to work well in "real-world" tests, and receives a considerable number of new citations provides the strongest testimony to the effectiveness of our improvements. However, we soon expect the need to upgrade the number of nodes in our cluster significantly as dataset sizes continue to expand. The parallel implementation can be found at http://ibdws.sdsu.edu/.


Subject(s)
Computational Biology/methods , Internet , User-Computer Interface , Algorithms , Genetics, Population , Software
16.
Nucleic Acids Res ; 35(1): 21-34, 2007.
Article in English | MEDLINE | ID: mdl-17148477

ABSTRACT

In animals, most small nuclear RNAs (snRNAs) are synthesized by RNA polymerase II (Pol II), but U6 snRNA is synthesized by RNA polymerase III (Pol III). In Drosophila melanogaster, the promoters for the Pol II-transcribed snRNA genes consist of approximately 21 bp PSEA and approximately 8 bp PSEB. U6 genes utilize a PSEA but have a TATA box instead of the PSEB. The PSEAs of the two classes of genes bind the same protein complex, DmSNAPc. However, the PSEAs that recruit Pol II and Pol III differ in sequence at a few nucleotide positions that play an important role in determining RNA polymerase specificity. We have now performed a bioinformatic analysis to examine the conservation and divergence of the snRNA gene promoter elements in other species of insects. The 5' half of the PSEA is well-conserved, but the 3' half is divergent. Moreover, within each species positions exist where the PSEAs of the Pol III-transcribed genes differ from those of the Pol II-transcribed genes. Interestingly, the specific positions vary among species. Nevertheless, we speculate that these nucleotide differences within the 3' half of the PSEA act similarly to induce conformational alterations in DNA-bound SNAPc that result in RNA polymerase specificity.


Subject(s)
Evolution, Molecular , Genes, Insect , Promoter Regions, Genetic , RNA Polymerase III/metabolism , RNA Polymerase II/metabolism , RNA, Small Nuclear/genetics , Animals , Anopheles/genetics , Base Sequence , Bees/genetics , Bombyx/genetics , Conserved Sequence , Drosophila/genetics , Drosophila melanogaster/genetics , Molecular Sequence Data , Sequence Alignment , Substrate Specificity
17.
Brain Connect ; 9(8): 604-612, 2019 10.
Article in English | MEDLINE | ID: mdl-31328535

ABSTRACT

Machine learning techniques have been implemented to reveal brain features that distinguish people with autism spectrum disorders (ASDs) from typically developing (TD) peers. However, it remains unknown whether different neuroimaging modalities are equally informative for diagnostic classification. We combined anatomical magnetic resonance imaging (aMRI), diffusion weighted imaging (DWI), and functional connectivity MRI (fcMRI) using conditional random forest (CRF) for supervised learning to compare how informative each modality was in diagnostic classification. In-house data (N = 93) included 47 TD and 46 ASD participants, matched on age, motion, and nonverbal IQ. Four main analyses consistently indicated that fcMRI variables were significantly more informative than anatomical variables from aMRI and DWI. This was found (1) when the top 100 variables from CRF (run separately in each modality) were combined for multimodal CRF; (2) when only 19 top variables reaching >67% accuracy in each modality were combined in multimodal CRF; and (3) when the large number of initial variables (before dimension reduction) potentially biasing comparisons in favor of fcMRI was reduced using a less granular region of interest scheme. Consistent superiority of fcMRI was even found (4) when 100 variables per modality were randomly selected, removing any such potential bias. Greater informative value of functional than anatomical modalities may relate to the nature of fcMRI data, reflecting more closely behavioral condition, which is also the basis of diagnosis, whereas brain anatomy may be more reflective of neurodevelopmental history.


Subject(s)
Autism Spectrum Disorder/diagnostic imaging , Brain/diagnostic imaging , Magnetic Resonance Imaging , Adolescent , Autism Spectrum Disorder/physiopathology , Brain/physiopathology , Child , Cohort Studies , Connectome , Diagnosis, Computer-Assisted , Female , Humans , Male , Neural Pathways/diagnostic imaging , Neural Pathways/physiopathology , Supervised Machine Learning
19.
Sci Rep ; 9(1): 4474, 2019 03 14.
Article in English | MEDLINE | ID: mdl-30872748

ABSTRACT

Tuberculosis (TB) represents a significant challenge to public health authorities, especially with the emergence of drug-resistant (DR) and multidrug-resistant (MDR) isolates of Mycobacterium tuberculosis. We sought to examine the genomic variations among recently isolated strains of M. tuberculosis in two closely related countries with different population demography in the Middle East. Clinical isolates of M. tuberculosis from both Egypt and Saudi Arabia were subjected to phenotypic and genotypic analysis on gene and genome-wide levels. Isolates with MDR phenotypes were highly prevalent in Egypt (up to 35%) despite its relatively stable population structure (sympatric pattern). MDR-TB isolates were not identified in the isolates from Saudi Arabia despite its active guest worker program (allopatric pattern). However, tuberculosis isolates from Saudi Arabia, where lineage 4 was more prevalent (>65%), showed more diversity than isolates from Egypt, where lineage 3 was the most prevalent (>75%). Phylogenetic and molecular dating analyses indicated that lineages from Egypt were recently diverged (~78 years), whereas those from Saudi Arabia were diverged by over 200 years. Interestingly, DR isolates did not appear to cluster together or spread more widely than drug-sensitive isolates, suggesting poor treatment as the main cause for emergence of drug resistance rather than more virulence or more capacity to persist.


Subject(s)
Drug Resistance, Bacterial , Mycobacterium tuberculosis/classification , Tuberculosis, Multidrug-Resistant/epidemiology , Whole Genome Sequencing/methods , Adolescent , Adult , Aged , Child , Child, Preschool , Egypt/epidemiology , Female , Humans , Infant , Male , Middle Aged , Mycobacterium tuberculosis/genetics , Mycobacterium tuberculosis/isolation & purification , Phylogeny , Prevalence , Saudi Arabia/epidemiology , Tuberculosis, Multidrug-Resistant/microbiology , Young Adult
20.
BMC Bioinformatics ; 9: 232, 2008 May 08.
Article in English | MEDLINE | ID: mdl-18466625

ABSTRACT

BACKGROUND: Utilization of alternative initiation sites for protein translation directed by non-AUG codons in mammalian mRNAs is observed with increasing frequency. Alternative initiation sites are utilized for the synthesis of important regulatory proteins that control distinct biological functions. It is, therefore, of high significance to define the parameters that allow accurate bioinformatic prediction of alternative translation initiation sites (aTIS). This study has investigated 5'-UTR regions of mRNAs to define consensus sequence properties and structural features that allow identification of alternative initiation sites for protein translation. RESULTS: Bioinformatic evaluation of 5'-UTR sequences of mammalian mRNAs was conducted for classification and identification of alternative translation initiation sites for a group of mRNA sequences that have been experimentally demonstrated to utilize alternative non-AUG initiation sites for protein translation. These are represented by the codons CUG, GUG, UUG, AUA, and ACG for aTIS. The first phase of this bioinformatic analysis implements a classification tree that evaluated 5'-UTRs for unique consensus sequence features near the initiation codon, characteristics of 5'-UTR nucleotide sequences, and secondary structural features in a decision tree that categorizes mRNAs into those with potential aTIS, and those without. The second phase addresses identification of the aTIS codon and its location. Critical parameters of 5'-UTRs were assessed by an Artificial Neural Network (ANN) for identification of the aTIS codon and its location. ANNs have previously been used for the purpose of AUG start site prediction and are applicable in complex. ANN analyses demonstrated that multiple properties were required for predicting aTIS codons; these properties included unique consensus nucleotide sequences at positions -7 and -6 combined with positions -3 and +4, 5'-UTR length, ORF length, predicted secondary structures, free energy features, upstream AUGs, and G/C ratio. Importantly, combined results of the classification tree and the ANN analyses provided highly accurate bioinformatic predictions of alternative translation initiation sites. CONCLUSION: This study has defined the unique properties of 5'-UTR sequences of mRNAs for successful bioinformatic prediction of alternative initiation sites utilized in protein translation. The ability to define aTIS through the described bioinformatic analyses can be of high importance for genomic analyses to provide full predictions of translated mammalian and human gene products required for cellular functions in health and disease.


Subject(s)
5' Untranslated Regions/genetics , Models, Genetic , RNA Splice Sites/genetics , RNA, Messenger/genetics , Sequence Analysis, RNA/methods , Transcription Initiation Site , Animals , Computational Biology/methods , Computer Simulation , Humans
SELECTION OF CITATIONS
SEARCH DETAIL