Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 59
Filter
Add more filters

Publication year range
1.
Bioinformatics ; 39(12)2023 12 01.
Article in English | MEDLINE | ID: mdl-38039142

ABSTRACT

MOTIVATION: Microbial sequences generated from clinical samples are often contaminated with human host sequences that must be removed for ethical and legal reasons. Care must be taken to excise host sequences without inadvertently removing target microbial sequences to the detriment of downstream analyses such as variant calling and de novo assembly. RESULTS: To facilitate accurate host decontamination of both short and long sequencing reads, we developed Hostile, a tool capable of accurate host read removal using a laptop. We demonstrate that our approach removes at least 99.6% of real human reads and retains at least 99.989% of simulated bacterial reads. Using Hostile with a masked reference genome further increases bacterial read retention (≥99.997%) with negligible (≤0.001%) reduction in human read removal performance. Compared with an existing tool, Hostile removes 21%-23% more human short reads and 21-43 times fewer bacterial reads, typically in less time. AVAILABILITY AND IMPLEMENTATION: Hostile is implemented as an MIT-licensed Python package available from https://github.com/bede/hostile together with supplementary material.


Subject(s)
Decontamination , Software , Humans , Sequence Analysis, DNA , High-Throughput Nucleotide Sequencing , Genome , Bacteria/genetics
2.
PLoS Biol ; 19(11): e3001421, 2021 11.
Article in English | MEDLINE | ID: mdl-34752446

ABSTRACT

The open sharing of genomic data provides an incredibly rich resource for the study of bacterial evolution and function and even anthropogenic activities such as the widespread use of antimicrobials. However, these data consist of genomes assembled with different tools and levels of quality checking, and of large volumes of completely unprocessed raw sequence data. In both cases, considerable computational effort is required before biological questions can be addressed. Here, we assembled and characterised 661,405 bacterial genomes retrieved from the European Nucleotide Archive (ENA) in November of 2018 using a uniform standardised approach. Of these, 311,006 did not previously have an assembly. We produced a searchable COmpact Bit-sliced Signature (COBS) index, facilitating the easy interrogation of the entire dataset for a specific sequence (e.g., gene, mutation, or plasmid). Additional MinHash and pp-sketch indices support genome-wide comparisons and estimations of genomic distance. Combined, this resource will allow data to be easily subset and searched, phylogenetic relationships between genomes to be quickly elucidated, and hypotheses rapidly generated and tested. We believe that this combination of uniform processing and variety of search/filter functionalities will make this a resource of very wide utility. In terms of diversity within the data, a breakdown of the 639,981 high-quality genomes emphasised the uneven species composition of the ENA/public databases, with just 20 of the total 2,336 species making up 90% of the genomes. The overrepresented species tend to be acute/common human pathogens, aligning with research priorities at different levels from individual interests to funding bodies and national and global public health agencies.


Subject(s)
Bacteria/genetics , Biodiversity , DNA, Bacterial/genetics , Data Curation , Base Sequence , Drug Resistance, Bacterial/genetics , Species Specificity
3.
Bioinformatics ; 38(12): 3291-3293, 2022 06 13.
Article in English | MEDLINE | ID: mdl-35551365

ABSTRACT

SUMMARY: Viral sequence data from clinical samples frequently contain contaminating human reads, which must be removed prior to sharing for legal and ethical reasons. To enable host read removal for SARS-CoV-2 sequencing data on low-specification laptops, we developed ReadItAndKeep, a fast lightweight tool for Illumina and nanopore data that only keeps reads matching the SARS-CoV-2 genome. Peak RAM usage is typically below 10 MB, and runtime less than 1 min. We show that by excluding the polyA tail from the viral reference, ReadItAndKeep prevents bleed-through of human reads, whereas mapping to the human genome lets some reads escape. We believe our test approach (including all possible reads from the human genome, human samples from each of the 26 populations in the 1000 genomes data and a diverse set of SARS-CoV-2 genomes) will also be useful for others. AVAILABILITY AND IMPLEMENTATION: ReadItAndKeep is implemented in C++, released under the MIT license, and available from https://github.com/GenomePathogenAnalysisService/read-it-and-keep. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
COVID-19 , Software , Humans , Sequence Analysis, DNA , SARS-CoV-2/genetics , Decontamination , High-Throughput Nucleotide Sequencing , Genome, Human
4.
N Engl J Med ; 379(15): 1403-1415, 2018 10 11.
Article in English | MEDLINE | ID: mdl-30280646

ABSTRACT

BACKGROUND: The World Health Organization recommends drug-susceptibility testing of Mycobacterium tuberculosis complex for all patients with tuberculosis to guide treatment decisions and improve outcomes. Whether DNA sequencing can be used to accurately predict profiles of susceptibility to first-line antituberculosis drugs has not been clear. METHODS: We obtained whole-genome sequences and associated phenotypes of resistance or susceptibility to the first-line antituberculosis drugs isoniazid, rifampin, ethambutol, and pyrazinamide for isolates from 16 countries across six continents. For each isolate, mutations associated with drug resistance and drug susceptibility were identified across nine genes, and individual phenotypes were predicted unless mutations of unknown association were also present. To identify how whole-genome sequencing might direct first-line drug therapy, complete susceptibility profiles were predicted. These profiles were predicted to be susceptible to all four drugs (i.e., pansusceptible) if they were predicted to be susceptible to isoniazid and to the other drugs or if they contained mutations of unknown association in genes that affect susceptibility to the other drugs. We simulated the way in which the negative predictive value changed with the prevalence of drug resistance. RESULTS: A total of 10,209 isolates were analyzed. The largest proportion of phenotypes was predicted for rifampin (9660 [95.4%] of 10,130) and the smallest was predicted for ethambutol (8794 [89.8%] of 9794). Resistance to isoniazid, rifampin, ethambutol, and pyrazinamide was correctly predicted with 97.1%, 97.5%, 94.6%, and 91.3% sensitivity, respectively, and susceptibility to these drugs was correctly predicted with 99.0%, 98.8%, 93.6%, and 96.8% specificity. Of the 7516 isolates with complete phenotypic drug-susceptibility profiles, 5865 (78.0%) had complete genotypic predictions, among which 5250 profiles (89.5%) were correctly predicted. Among the 4037 phenotypic profiles that were predicted to be pansusceptible, 3952 (97.9%) were correctly predicted. CONCLUSIONS: Genotypic predictions of the susceptibility of M. tuberculosis to first-line drugs were found to be correlated with phenotypic susceptibility to these drugs. (Funded by the Bill and Melinda Gates Foundation and others.).


Subject(s)
Antitubercular Agents/pharmacology , Drug Resistance, Bacterial/genetics , Genome, Bacterial , Mycobacterium tuberculosis/genetics , Tuberculosis/drug therapy , Whole Genome Sequencing , Antitubercular Agents/therapeutic use , Ethambutol/pharmacology , Genotype , Humans , Isoniazid/pharmacology , Microbial Sensitivity Tests , Mutation , Mycobacterium tuberculosis/drug effects , Mycobacterium tuberculosis/isolation & purification , Phenotype , Pyrazinamide/pharmacology , Rifampin/pharmacology , Tuberculosis/microbiology
5.
Cardiol Young ; 31(2): 229-232, 2021 Feb.
Article in English | MEDLINE | ID: mdl-33153502

ABSTRACT

BACKGROUND: A 10% prevalence of intracranial aneurysms in patients with coarctation of the aorta has been described in a few studies. Our objective is to describe the rate of intracranial aneurysm detection in patients with coarctation of the aorta in the current era. We hypothesise that, with earlier detection and coarctation of the aorta intervention, the rate of intracranial aneurysm is lower than previously reported and screening imaging may only be warranted in older patients or patients with certain risk factors. METHODS: This is a retrospective study of 102 patients aged 13 years and older with coarctation who underwent brain computed tomography angiography, magnetic resonance imaging (MRI), or magnetic resonance angiography between January, 2000 and February, 2018. RESULTS: The median age of coarctation repair was 4.4 months (2 days-47 years) and the initial repair was primarily surgical (90.2%). There were 11 former smokers, 4 current smokers, and 13 patients with ongoing hypertension. Imaging modalities included computed tomography angiography (13.7%), MRI (41.2%), and magnetic resonance angiography (46.1%), performed at a median age of 33.3 years, 22.4 years, and 25 years, respectively. There were 42 studies performed for screening, 48 studies performed for neurologic symptoms, and 12 studies performed for both screening and symptoms. There were no intracranial aneurysms detected in this study. CONCLUSIONS: These results suggest that the rate of intracranial aneurysms may be lower than previously reported and larger studies should explore the risk of intracranial aneurysms in coarctation of the aorta in the current era.


Subject(s)
Aortic Coarctation , Intracranial Aneurysm , Adult , Aged , Aorta , Aortic Coarctation/diagnostic imaging , Aortic Coarctation/epidemiology , Humans , Intracranial Aneurysm/diagnostic imaging , Intracranial Aneurysm/epidemiology , Prevalence , Retrospective Studies
6.
Plant J ; 100(6): 1148-1162, 2019 12.
Article in English | MEDLINE | ID: mdl-31436867

ABSTRACT

Terpenes are important compounds in plant trophic interactions. A meta-analysis of GC-MS data from a diverse range of apple (Malus × domestica) genotypes revealed that apple fruit produces a range of terpene volatiles, with the predominant terpene being the acyclic branched sesquiterpene (E,E)-α-farnesene. Four quantitative trait loci (QTLs) for α-farnesene production in ripe fruit were identified in a segregating 'Royal Gala' (RG) × 'Granny Smith' (GS) population with one major QTL on linkage group 10 co-locating with the MdAFS1 (α-farnesene synthase-1) gene. Three of the four QTLs were derived from the GS parent, which was consistent with GC-MS analysis of headspace and solvent-extracted terpenes showing that cold-treated GS apples produced higher levels of (E,E)-α-farnesene than RG. Transgenic RG fruit downregulated for MdAFS1 expression produced significantly lower levels of (E,E)-α-farnesene. To evaluate the role of (E,E)-α-farnesene in fungal pathogenesis, MdAFS1 RNA interference transgenic fruit and RG controls were inoculated with three important apple post-harvest pathogens [Colletotrichum acutatum, Penicillium expansum and Neofabraea alba (synonym Phlyctema vagabunda)]. From results obtained over four seasons, we demonstrate that reduced (E,E)-α-farnesene is associated with decreased disease initiation rates of all three pathogens. In each case, the infection rate was significantly reduced 7 days post-inoculation, although the size of successful lesions was comparable with infections on control fruit. These results indicate that (E,E)-α-farnesene production is likely to be an important factor involved in fungal pathogenesis in apple fruit.


Subject(s)
Fruit/genetics , Fruit/metabolism , Gene Expression Regulation, Plant , Malus/genetics , Malus/metabolism , Plant Diseases/immunology , Sesquiterpenes/metabolism , Colletotrichum/pathogenicity , Disease Resistance , Down-Regulation , Fungi/pathogenicity , Gas Chromatography-Mass Spectrometry , Genetic Linkage , Genotype , Penicillium/pathogenicity , Plant Diseases/microbiology , Plant Proteins/genetics , Plant Proteins/metabolism , Plants, Genetically Modified/genetics , Quantitative Trait Loci , RNA Interference/immunology , Terpenes/metabolism
7.
BMC Bioinformatics ; 19(1): 26, 2018 01 30.
Article in English | MEDLINE | ID: mdl-29382321

ABSTRACT

BACKGROUND: Genome assemblies across all domains of life are being produced routinely. Initial analysis of a new genome usually includes annotation and comparative genomics. Synteny provides a framework in which conservation of homologous genes and gene order is identified between genomes of different species. The availability of human and mouse genomes paved the way for algorithm development in large-scale synteny mapping, which eventually became an integral part of comparative genomics. Synteny analysis is regularly performed on assembled sequences that are fragmented, neglecting the fact that most methods were developed using complete genomes. It is unknown to what extent draft assemblies lead to errors in such analysis. RESULTS: We fragmented genome assemblies of model nematodes to various extents and conducted synteny identification and downstream analysis. We first show that synteny between species can be underestimated up to 40% and find disagreements between popular tools that infer synteny blocks. This inconsistency and further demonstration of erroneous gene ontology enrichment tests raise questions about the robustness of previous synteny analysis when gold standard genome sequences remain limited. In addition, assembly scaffolding using a reference guided approach with a closely related species may result in chimeric scaffolds with inflated assembly metrics if a true evolutionary relationship was overlooked. Annotation quality, however, has minimal effect on synteny if the assembled genome is highly contiguous. CONCLUSIONS: Our results show that a minimum N50 of 1 Mb is required for robust downstream synteny analysis, which emphasizes the importance of gold standard genomes to the science community, and should be achieved given the current progress in sequencing technology.


Subject(s)
Genome , Genomics/methods , Algorithms , Animals , Caenorhabditis elegans/genetics , Nematoda/genetics
8.
Nucleic Acids Res ; 44(D1): D604-9, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26578596

ABSTRACT

The Human Pan-Microbe Communities (HPMC) database (http://www.hpmcd.org/) provides a manually curated, searchable, metagenomic resource to facilitate investigation of human gastrointestinal microbiota. Over the past decade, the application of metagenome sequencing to elucidate the microbial composition and functional capacity present in the human microbiome has revolutionized many concepts in our basic biology. When sufficient high quality reference genomes are available, whole genome metagenomic sequencing can provide direct biological insights and high-resolution classification. The HPMC database provides species level, standardized phylogenetic classification of over 1800 human gastrointestinal metagenomic samples. This is achieved by combining a manually curated list of bacterial genomes from human faecal samples with over 21000 additional reference genomes representing bacteria, viruses, archaea and fungi with manually curated species classification and enhanced sample metadata annotation. A user-friendly, web-based interface provides the ability to search for (i) microbial groups associated with health or disease state, (ii) health or disease states and community structure associated with a microbial group, (iii) the enrichment of a microbial gene or sequence and (iv) enrichment of a functional annotation. The HPMC database enables detailed analysis of human microbial communities and supports research from basic microbiology and immunology to therapeutic development in human health and disease.


Subject(s)
Databases, Nucleic Acid , Genome, Microbial , Metagenomics , Disease , Gastrointestinal Tract/microbiology , Gastrointestinal Tract/virology , Genes, Microbial , Humans , Internet , Metagenomics/standards , Microbiota , Molecular Sequence Annotation , Reference Standards , Sequence Analysis, DNA
9.
Plant J ; 82(6): 937-950, 2015 Jun.
Article in English | MEDLINE | ID: mdl-25904040

ABSTRACT

Phenylpropenes, such as eugenol and trans-anethole, are important aromatic compounds that determine flavour and aroma in many herbs and spices. Some apple varieties produce fruit with a highly desirable spicy/aromatic flavour that has been attributed to the production of estragole, a methylated phenylpropene. To elucidate the molecular basis for estragole production and its contribution to ripe apple flavour and aroma we characterised a segregating population from a Royal Gala (RG, estragole producer) × Granny Smith (GS, non-producer) apple cross. Two quantitative trait loci (QTLs; accounting for 9.2 and 24.8% of the variation) on linkage group (LG) 1 and LG2 were identified that co-located with seven candidate genes for phenylpropene O-methyltransferases (MdoOMT1-7). Of these genes, only expression of MdoOMT1 on LG1 increased strongly with ethylene and could be correlated with increasing estragole production in ripening RG fruit. Transient over-expression in tobacco showed that MdoOMT1 utilised a range of phenylpropene substrates and catalysed the conversion of chavicol to estragole. Royal Gala carried two alleles (MdoOMT1a, MdoOMT1b) whilst GS appeared to be homozygous for MdoOMT1b. MdoOMT1a showed a higher affinity and catalytic efficiency towards chavicol than MdoOMT1b, which could account for the phenotypic variation at the LG1 QTL. Multiple transgenic RG lines with reduced MdoOMT1 expression produced lower levels of methylated phenylpropenes, including estragole and methyleugenol. Differences in fruit aroma could be perceived in these fruit, compared with controls, by sensory analysis. Together these results indicate that MdoOMT1 is required for the production of methylated phenylpropenes in apple and that phenylpropenes including estragole may contribute to ripe apple fruit aroma.


Subject(s)
Anisoles/metabolism , Fruit/metabolism , Malus/metabolism , Methyltransferases/metabolism , Plant Proteins/genetics , Allylbenzene Derivatives , Ethylenes/metabolism , Eugenol/analogs & derivatives , Eugenol/metabolism , Fruit/genetics , Gene Expression Regulation, Plant , Isoenzymes/genetics , Isoenzymes/metabolism , Malus/genetics , Methyltransferases/genetics , Molecular Sequence Data , Odorants , Phylogeny , Plant Proteins/metabolism , Plants, Genetically Modified , Quantitative Trait Loci
10.
Bioinformatics ; 31(22): 3691-3, 2015 Nov 15.
Article in English | MEDLINE | ID: mdl-26198102

ABSTRACT

UNLABELLED: A typical prokaryote population sequencing study can now consist of hundreds or thousands of isolates. Interrogating these datasets can provide detailed insights into the genetic structure of prokaryotic genomes. We introduce Roary, a tool that rapidly builds large-scale pan genomes, identifying the core and accessory genes. Roary makes construction of the pan genome of thousands of prokaryote samples possible on a standard desktop without compromising on the accuracy of results. Using a single CPU Roary can produce a pan genome consisting of 1000 isolates in 4.5 hours using 13 GB of RAM, with further speedups possible using multiple processors. AVAILABILITY AND IMPLEMENTATION: Roary is implemented in Perl and is freely available under an open source GPLv3 license from http://sanger-pathogens.github.io/Roary CONTACT: roary@sanger.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome, Bacterial , Prokaryotic Cells/metabolism , Software , Computer Simulation , Databases, Genetic , Salmonella typhi/genetics
11.
Bioinformatics ; 31(14): 2374-6, 2015 Jul 15.
Article in English | MEDLINE | ID: mdl-25725497

ABSTRACT

MOTIVATION: An accurate genome assembly from short read sequencing data is critical for downstream analysis, for example allowing investigation of variants within a sequenced population. However, assembling sequencing data from virus samples, especially RNA viruses, into a genome sequence is challenging due to the combination of viral population diversity and extremely uneven read depth caused by amplification bias in the inevitable reverse transcription and polymerase chain reaction amplification process of current methods. RESULTS: We developed a new de novo assembler called IVA (Iterative Virus Assembler) designed specifically for read pairs sequenced at highly variable depth from RNA virus samples. We tested IVA on datasets from 140 sequenced samples from human immunodeficiency virus-1 or influenza-virus-infected people and demonstrated that IVA outperforms all other virus de novo assemblers. AVAILABILITY AND IMPLEMENTATION: The software runs under Linux, has the GPLv3 licence and is freely available from http://sanger-pathogens.github.io/iva


Subject(s)
Genome, Viral , HIV-1/genetics , Influenza A virus/genetics , Influenza B virus/genetics , RNA Viruses/genetics , Sequence Analysis, DNA/methods , Software , HIV Infections/genetics , HIV Infections/virology , HIV-1/isolation & purification , High-Throughput Nucleotide Sequencing/methods , Humans , Influenza A virus/isolation & purification , Influenza B virus/isolation & purification , Influenza, Human/genetics , Influenza, Human/virology , Polymerase Chain Reaction/methods
12.
Plant J ; 78(6): 903-15, 2014 Jun.
Article in English | MEDLINE | ID: mdl-24661745

ABSTRACT

The 'fruity' attributes of ripe apples (Malus × domestica) arise from our perception of a combination of volatile ester compounds. Phenotypic variability in ester production was investigated using a segregating population from a 'Royal Gala' (RG; high ester production) × 'Granny Smith' (GS; low ester production) cross, as well as in transgenic RG plants in which expression of the alcohol acyl transferase 1 (AAT1) gene was reduced. In the RG × GS population, 46 quantitative trait loci (QTLs) for the production of esters and alcohols were identified on 15 linkage groups (LGs). The major QTL for 35 individual compounds was positioned on LG2 and co-located with AAT1. Multiple AAT1 gene variants were identified in RG and GS, but only two (AAT1-RGa and AAT1-GSa) were functional. AAT1-RGa and AAT1-GSa were both highly expressed in the cortex and skin of ripe fruit, but AAT1 protein was observed mainly in the skin. Transgenic RG specifically reduced in AAT1 expression showed reduced levels of most key esters in ripe fruit. Differences in the ripe fruit aroma could be perceived by sensory analysis. The transgenic lines also showed altered ratios of biosynthetic precursor alcohols and aldehydes, and expression of a number of ester biosynthetic genes increased, presumably in response to the increased substrate pool. These results indicate that the AAT1 locus is critical for the biosynthesis of esters contributing to a 'ripe apple' flavour.


Subject(s)
Acetyltransferases/genetics , Esters/metabolism , Malus/genetics , Plant Proteins/genetics , Quantitative Trait Loci , Acetyltransferases/metabolism , Acetyltransferases/physiology , Chromosome Mapping , Down-Regulation , Genetic Association Studies , Genetic Linkage , Genetic Variation , Malus/metabolism , Molecular Sequence Data , Plant Proteins/metabolism , Plant Proteins/physiology , Plants, Genetically Modified/metabolism
13.
BMC Biol ; 12: 86, 2014 Oct 30.
Article in English | MEDLINE | ID: mdl-25359557

ABSTRACT

BACKGROUND: Rodent malaria parasites (RMP) are used extensively as models of human malaria. Draft RMP genomes have been published for Plasmodium yoelii, P. berghei ANKA (PbA) and P. chabaudi AS (PcAS). Although availability of these genomes made a significant impact on recent malaria research, these genomes were highly fragmented and were annotated with little manual curation. The fragmented nature of the genomes has hampered genome wide analysis of Plasmodium gene regulation and function. RESULTS: We have greatly improved the genome assemblies of PbA and PcAS, newly sequenced the virulent parasite P. yoelii YM genome, sequenced additional RMP isolates/lines and have characterized genotypic diversity within RMP species. We have produced RNA-seq data and utilised it to improve gene-model prediction and to provide quantitative, genome-wide, data on gene expression. Comparison of the RMP genomes with the genome of the human malaria parasite P. falciparum and RNA-seq mapping permitted gene annotation at base-pair resolution. Full-length chromosomal annotation permitted a comprehensive classification of all subtelomeric multigene families including the 'Plasmodium interspersed repeat genes' (pir). Phylogenetic classification of the pir family, combined with pir expression patterns, indicates functional diversification within this family. CONCLUSIONS: Complete RMP genomes, RNA-seq and genotypic diversity data are excellent and important resources for gene-function and post-genomic analyses and to better interrogate Plasmodium biology. Genotypic diversity between P. chabaudi isolates makes this species an excellent parasite to study genotype-phenotype relationships. The improved classification of multigene families will enhance studies on the role of (variant) exported proteins in virulence and immune evasion/modulation.


Subject(s)
Gene Expression , Genome, Protozoan , Plasmodium falciparum/genetics , Plasmodium/classification , Base Sequence , Chromosome Mapping , Gene Expression Regulation , Genotype , Molecular Sequence Data , Multigene Family , Plasmodium/genetics , Plasmodium falciparum/classification , RNA, Protozoan/genetics , Sequence Analysis, RNA , Transcriptome/genetics
14.
Lancet Microbe ; 2024 Jun 05.
Article in English | MEDLINE | ID: mdl-38851206

ABSTRACT

BACKGROUND: The antibiotic bedaquiline is a key component of new WHO regimens for drug-resistant tuberculosis; however, predicting bedaquiline resistance from bacterial genotypes remains challenging. We aimed to understand the genetic mechanisms of bedaquiline resistance by analysing Mycobacterium tuberculosis isolates from South Africa. METHODS: For this genomic analysis, we conducted whole-genome sequencing of Mycobacterium tuberculosis samples collected at two referral laboratories in Cape Town and Johannesburg, covering regions of South Africa with a high prevalence of tuberculosis. We used the tool ARIBA to measure the status of predefined genes that are associated with bedaquiline resistance. To produce a broad genetic landscape of M tuberculosis in South Africa, we extended our analysis to include all publicly available isolates from the European Nucleotide Archive, including isolates obtained by the CRyPTIC consortium, for which minimum inhibitory concentrations of bedaquiline were available. FINDINGS: Between Jan 10, 2019, and July, 22, 2020, we sequenced 505 M tuberculosis isolates from 461 patients. Of the 64 isolates with mutations within the mmpR5 regulatory gene, we found 53 (83%) had independent acquisition of 31 different mutations, with a particular enrichment of truncated MmpR5 in bedaquiline-resistant isolates resulting from either frameshift mutations or the introduction of an insertion element. Truncation occurred across three M tuberculosis lineages, and were present in 66% of bedaquiline-resistant isolates. Although the distributions overlapped, the median minimum inhibitory concentration of bedaquiline was 0·25 mg/L (IQR 0·12-0·25) in mmpR5-disrupted isolates, compared with 0·06 mg/L (0·03-0·06) in wild-type M tuberculosis. INTERPRETATION: Reduction in the susceptibility of M tuberculosis to bedaquiline has evolved repeatedly across the phylogeny. In our data, we see no evidence that this reduction has led to the spread of a successful strain in South Africa. Binary phenotyping based on the bedaquiline breakpoint might be inappropriate to monitor resistance to this drug. We recommend the use of minimum inhibitory concentrations in addition to MmpR5 truncation screening to identify moderate increases in resistance to bedaquiline. FUNDING: US Centers for Disease Control and Prevention.

15.
bioRxiv ; 2024 Apr 30.
Article in English | MEDLINE | ID: mdl-38746185

ABSTRACT

The SARS-CoV-2 genome occupies a unique place in infection biology - it is the most highly sequenced genome on earth (making up over 20% of public sequencing datasets) with fine scale information on sampling date and geography, and has been subject to unprecedented intense analysis. As a result, these phylogenetic data are an incredibly valuable resource for science and public health. However, the vast majority of the data was sequenced by tiling amplicons across the full genome, with amplicon schemes that changed over the pandemic as mutations in the viral genome interacted with primer binding sites. In combination with the disparate set of genome assembly workflows and lack of consistent quality control (QC) processes, the current genomes have many systematic errors that have evolved with the virus and amplicon schemes. These errors have significant impacts on the phylogeny, and therefore over the last few years, many thousands of hours of researchers time has been spent in "eyeballing" trees, looking for artefacts, and then patching the tree. Given the huge value of this dataset, we therefore set out to reprocess the complete set of public raw sequence data in a rigorous amplicon-aware manner, and build a cleaner phylogeny. Here we provide a global tree of 3,960,704 samples, built from a consistently assembled set of high quality consensus sequences from all available public data as of March 2023, viewable at https://viridian.taxonium.org. Each genome was constructed using a novel assembly tool called Viridian (https://github.com/iqbal-lab-org/viridian), developed specifically to process amplicon sequence data, eliminating artefactual errors and mask the genome at low quality positions. We provide simulation and empirical validation of the methodology, and quantify the improvement in the phylogeny. Phase 2 of our project will address the fact that the data in the public archives is heavily geographically biased towards the Global North. We therefore have contributed new raw data to ENA/SRA from many countries including Ghana, Thailand, Laos, Sri Lanka, India, Argentina and Singapore. We will incorporate these, along with all public raw data submitted between March 2023 and the current day, into an updated set of assemblies, and phylogeny. We hope the tree, consensus sequences and Viridian will be a valuable resource for researchers.

16.
PLoS Pathog ; 7(9): e1002219, 2011 Sep.
Article in English | MEDLINE | ID: mdl-21909270

ABSTRACT

Bursaphelenchus xylophilus is the nematode responsible for a devastating epidemic of pine wilt disease in Asia and Europe, and represents a recent, independent origin of plant parasitism in nematodes, ecologically and taxonomically distinct from other nematodes for which genomic data is available. As well as being an important pathogen, the B. xylophilus genome thus provides a unique opportunity to study the evolution and mechanism of plant parasitism. Here, we present a high-quality draft genome sequence from an inbred line of B. xylophilus, and use this to investigate the biological basis of its complex ecology which combines fungal feeding, plant parasitic and insect-associated stages. We focus particularly on putative parasitism genes as well as those linked to other key biological processes and demonstrate that B. xylophilus is well endowed with RNA interference effectors, peptidergic neurotransmitters (including the first description of ins genes in a parasite) stress response and developmental genes and has a contracted set of chemosensory receptors. B. xylophilus has the largest number of digestive proteases known for any nematode and displays expanded families of lysosome pathway genes, ABC transporters and cytochrome P450 pathway genes. This expansion in digestive and detoxification proteins may reflect the unusual diversity in foods it exploits and environments it encounters during its life cycle. In addition, B. xylophilus possesses a unique complement of plant cell wall modifying proteins acquired by horizontal gene transfer, underscoring the impact of this process on the evolution of plant parasitism by nematodes. Together with the lack of proteins homologous to effectors from other plant parasitic nematodes, this confirms the distinctive molecular basis of plant parasitism in the Bursaphelenchus lineage. The genome sequence of B. xylophilus adds to the diversity of genomic data for nematodes, and will be an important resource in understanding the biology of this unusual parasite.


Subject(s)
Plants/parasitology , Tylenchida/genetics , Amino Acid Sequence , Animals , Cell Wall/metabolism , Cellulases/genetics , Cellulases/metabolism , Evolution, Molecular , Lysosomes/genetics , Lysosomes/metabolism , Molecular Sequence Data , Neuropeptides/biosynthesis , Peptide Hydrolases/genetics , Tylenchida/growth & development
17.
Lancet Microbe ; 4(5): e358-e368, 2023 05.
Article in English | MEDLINE | ID: mdl-37003285

ABSTRACT

BACKGROUND: Bedaquiline is a core drug for the treatment of multidrug-resistant tuberculosis; however, the understanding of resistance mechanisms is poor, which is hampering rapid molecular diagnostics. Some bedaquiline-resistant mutants are also cross-resistant to clofazimine. To decipher bedaquiline and clofazimine resistance determinants, we combined experimental evolution, protein modelling, genome sequencing, and phenotypic data. METHODS: For this in-vitro and in-silico data analysis, we used a novel in-vitro evolutionary model using subinhibitory drug concentrations to select bedaquiline-resistant and clofazimine-resistant mutants. We determined bedaquiline and clofazimine minimum inhibitory concentrations and did Illumina and PacBio sequencing to characterise selected mutants and establish a mutation catalogue. This catalogue also includes phenotypic and genotypic data of a global collection of more than 14 000 clinical Mycobacterium tuberculosis complex isolates, and publicly available data. We investigated variants implicated in bedaquiline resistance by protein modelling and dynamic simulations. FINDINGS: We discerned 265 genomic variants implicated in bedaquiline resistance, with 250 (94%) variants affecting the transcriptional repressor (Rv0678) of the MmpS5-MmpL5 efflux system. We identified 40 new variants in vitro, and a new bedaquiline resistance mechanism caused by a large-scale genomic rearrangement. Additionally, we identified in vitro 15 (7%) of 208 mutations found in clinical bedaquiline-resistant isolates. From our in-vitro work, we detected 14 (16%) of 88 mutations so far identified as being associated with clofazimine resistance and also seen in clinically resistant strains, and catalogued 35 new mutations. Structural modelling of Rv0678 showed four major mechanisms of bedaquiline resistance: impaired DNA binding, reduction in protein stability, disruption of protein dimerisation, and alteration in affinity for its fatty acid ligand. INTERPRETATION: Our findings advance the understanding of drug resistance mechanisms in M tuberculosis complex strains. We have established an extended mutation catalogue, comprising variants implicated in resistance and susceptibility to bedaquiline and clofazimine. Our data emphasise that genotypic testing can delineate clinical isolates with borderline phenotypes, which is essential for the design of effective treatments. FUNDING: Leibniz ScienceCampus Evolutionary Medicine of the Lung, Deutsche Forschungsgemeinschaft, Research Training Group 2501 TransEvo, Rhodes Trust, Stanford University Medical Scientist Training Program, National Institute for Health and Care Research Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Bill & Melinda Gates Foundation, Wellcome Trust, and Marie Sklodowska-Curie Actions.


Subject(s)
Clofazimine , Mycobacterium tuberculosis , Clofazimine/pharmacology , Clofazimine/therapeutic use , Mycobacterium tuberculosis/genetics , Antitubercular Agents/pharmacology , Antitubercular Agents/therapeutic use , Diarylquinolines/pharmacology , Diarylquinolines/therapeutic use
18.
BMC Genomics ; 13: 4, 2012 Jan 04.
Article in English | MEDLINE | ID: mdl-22216965

ABSTRACT

BACKGROUND: MicroRNAs (miRNAs) play key roles in regulating post-transcriptional gene expression and are essential for development in the free-living nematode Caenorhabditis elegans and in higher organisms. Whether microRNAs are involved in regulating developmental programs of parasitic nematodes is currently unknown. Here we describe the the miRNA repertoire of two important parasitic nematodes as an essential first step in addressing this question. RESULTS: The small RNAs from larval and adult stages of two parasitic species, Brugia pahangi and Haemonchus contortus, were identified using deep-sequencing and bioinformatic approaches. Comparative analysis to known miRNA sequences reveals that the majority of these miRNAs are novel. Some novel miRNAs are abundantly expressed and display developmental regulation, suggesting important functional roles. Despite the lack of conservation in the miRNA repertoire, genomic positioning of certain miRNAs within or close to specific coding genes is remarkably conserved across diverse species, indicating selection for these associations. Endogenous small-interfering RNAs and Piwi-interacting (pi)RNAs, which regulate gene and transposon expression, were also identified. piRNAs are expressed in adult stage H. contortus, supporting a conserved role in germline maintenance in some parasitic nematodes. CONCLUSIONS: This in-depth comparative analysis of nematode miRNAs reveals the high level of divergence across species and identifies novel sequences potentially involved in development. Expression of novel miRNAs may reflect adaptations to different environments and lifestyles. Our findings provide a detailed foundation for further study of the evolution and function of miRNAs within nematodes and for identifying potential targets for intervention.


Subject(s)
Brugia pahangi/genetics , Genetic Variation , Genome, Helminth/genetics , Haemonchus/genetics , MicroRNAs/genetics , Animals , Brugia pahangi/growth & development , Caenorhabditis elegans/genetics , Caenorhabditis elegans/growth & development , Cluster Analysis , Computational Biology , Genes, Helminth , Haemonchus/growth & development , Larva/genetics , Larva/metabolism , MicroRNAs/metabolism , RNA, Small Interfering/genetics , RNA, Small Interfering/metabolism , Sequence Analysis, RNA
19.
PLoS One ; 17(3): e0264492, 2022.
Article in English | MEDLINE | ID: mdl-35271613

ABSTRACT

Just like the scientific data they generate, simulation workflows for research should be findable, accessible, interoperable, and reusable (FAIR). However, while significant progress has been made towards FAIR data, the majority of science and engineering workflows used in research remain poorly documented and often unavailable, involving ad hoc scripts and manual steps, hindering reproducibility and stifling progress. We introduce Sim2Ls (pronounced simtools) and the Sim2L Python library that allow developers to create and share end-to-end computational workflows with well-defined and verified inputs and outputs. The Sim2L library makes Sim2Ls, their requirements, and their services discoverable, verifies inputs and outputs, and automatically stores results in a globally-accessible simulation cache and results database. This simulation ecosystem is available in nanoHUB, an open platform that also provides publication services for Sim2Ls, a computational environment for developers and users, and the hardware to execute runs and store results at no cost. We exemplify the use of Sim2Ls using two applications and discuss best practices towards FAIR simulation workflows and associated data.


Subject(s)
Data Management , Ecosystem , Computer Simulation , Reproducibility of Results , Software , Workflow
20.
IEEE Access ; 10: 54301-54312, 2022.
Article in English | MEDLINE | ID: mdl-37309510

ABSTRACT

Hearing loss is a common problem affecting the quality of life for thousands of people. However, many individuals with hearing loss are dissatisfied with the quality of modern hearing aids. Amplification is the main method of compensating for hearing loss in modern hearing aids. One common amplification technique is dynamic range compression, which maps audio signals onto a person's hearing range using an amplification curve. However, due to the frequency dependent nature of the human cochlea, compression is often performed independently in different frequency bands. This paper presents a real-time multirate multiband amplification system for hearing aids, which includes a multirate channelizer for separating an audio signal into eleven standard audiometric frequency bands, and an automatic gain control system for accurate control of the steady state and dynamic behavior of audio compression as specified by ANSI standards. The spectral channelizer offers high frequency resolution with low latency of 5.4 ms and about 14× improvement in complexity over a baseline design. Our automatic gain control includes a closed-form solution for satisfying any designated attack and release times for any desired compression parameters. The increased frequency resolution and precise gain adjustment allow our system to more accurately fulfill audiometric hearing aid prescriptions.

SELECTION OF CITATIONS
SEARCH DETAIL