Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
Add more filters










Publication year range
1.
bioRxiv ; 2024 Apr 30.
Article in English | MEDLINE | ID: mdl-38746185

ABSTRACT

The SARS-CoV-2 genome occupies a unique place in infection biology - it is the most highly sequenced genome on earth (making up over 20% of public sequencing datasets) with fine scale information on sampling date and geography, and has been subject to unprecedented intense analysis. As a result, these phylogenetic data are an incredibly valuable resource for science and public health. However, the vast majority of the data was sequenced by tiling amplicons across the full genome, with amplicon schemes that changed over the pandemic as mutations in the viral genome interacted with primer binding sites. In combination with the disparate set of genome assembly workflows and lack of consistent quality control (QC) processes, the current genomes have many systematic errors that have evolved with the virus and amplicon schemes. These errors have significant impacts on the phylogeny, and therefore over the last few years, many thousands of hours of researchers time has been spent in "eyeballing" trees, looking for artefacts, and then patching the tree. Given the huge value of this dataset, we therefore set out to reprocess the complete set of public raw sequence data in a rigorous amplicon-aware manner, and build a cleaner phylogeny. Here we provide a global tree of 3,960,704 samples, built from a consistently assembled set of high quality consensus sequences from all available public data as of March 2023, viewable at https://viridian.taxonium.org. Each genome was constructed using a novel assembly tool called Viridian (https://github.com/iqbal-lab-org/viridian), developed specifically to process amplicon sequence data, eliminating artefactual errors and mask the genome at low quality positions. We provide simulation and empirical validation of the methodology, and quantify the improvement in the phylogeny. Phase 2 of our project will address the fact that the data in the public archives is heavily geographically biased towards the Global North. We therefore have contributed new raw data to ENA/SRA from many countries including Ghana, Thailand, Laos, Sri Lanka, India, Argentina and Singapore. We will incorporate these, along with all public raw data submitted between March 2023 and the current day, into an updated set of assemblies, and phylogeny. We hope the tree, consensus sequences and Viridian will be a valuable resource for researchers.

2.
Bioinformatics ; 39(12)2023 12 01.
Article in English | MEDLINE | ID: mdl-38039142

ABSTRACT

MOTIVATION: Microbial sequences generated from clinical samples are often contaminated with human host sequences that must be removed for ethical and legal reasons. Care must be taken to excise host sequences without inadvertently removing target microbial sequences to the detriment of downstream analyses such as variant calling and de novo assembly. RESULTS: To facilitate accurate host decontamination of both short and long sequencing reads, we developed Hostile, a tool capable of accurate host read removal using a laptop. We demonstrate that our approach removes at least 99.6% of real human reads and retains at least 99.989% of simulated bacterial reads. Using Hostile with a masked reference genome further increases bacterial read retention (≥99.997%) with negligible (≤0.001%) reduction in human read removal performance. Compared with an existing tool, Hostile removes 21%-23% more human short reads and 21-43 times fewer bacterial reads, typically in less time. AVAILABILITY AND IMPLEMENTATION: Hostile is implemented as an MIT-licensed Python package available from https://github.com/bede/hostile together with supplementary material.


Subject(s)
Decontamination , Software , Humans , Sequence Analysis, DNA , High-Throughput Nucleotide Sequencing , Genome , Bacteria/genetics
3.
Sci Rep ; 13(1): 8319, 2023 05 23.
Article in English | MEDLINE | ID: mdl-37221274

ABSTRACT

Asthma development and exacerbation is linked to respiratory virus infections. There is limited information regarding the presence of viruses during non-exacerbation/infection periods. We investigated the nasopharyngeal/nasal virome during a period of asymptomatic state, in a subset of 21 healthy and 35 asthmatic preschool children from the Predicta cohort. Using metagenomics, we described the virome ecology and the cross-species interactions within the microbiome. The virome was dominated by eukaryotic viruses, while prokaryotic viruses (bacteriophages) were independently observed with low abundance. Rhinovirus B species consistently dominated the virome in asthma. Anelloviridae were the most abundant and rich family in both health and asthma. However, their richness and alpha diversity were increased in asthma, along with the co-occurrence of different Anellovirus genera. Bacteriophages were richer and more diverse in healthy individuals. Unsupervised clustering identified three virome profiles that were correlated to asthma severity and control and were independent of treatment, suggesting a link between the respiratory virome and asthma. Finally, we observed different cross-species ecological associations in the healthy versus the asthmatic virus-bacterial interactome, and an expanded interactome of eukaryotic viruses in asthma. Upper respiratory virome "dysbiosis" appears to be a novel feature of pre-school asthma during asymptomatic/non-infectious states and merits further investigation.


Subject(s)
Anelloviridae , Asthma , Bacteriophages , Child , Humans , Child, Preschool , Eukaryota , Virome , Eukaryotic Cells , Asymptomatic Diseases
4.
Commun Biol ; 6(1): 459, 2023 04 26.
Article in English | MEDLINE | ID: mdl-37100852

ABSTRACT

The origin of embryo implantation in mammals ~148 million years ago was a dramatic shift in reproductive strategy, yet the molecular changes that established mammal implantation are largely unknown. Although progesterone receptor signalling predates the origin of mammals and is highly conserved in, and critical for, successful mammal pregnancy, it alone cannot explain the origin and subsequent diversity of implantation strategies throughout the placental mammal radiation. MiRNAs are known to be flexible and dynamic regulators with a well-established role in the pathophysiology of mammal placenta. We propose that a dynamic core microRNA (miRNA) network originated early in placental mammal evolution, responds to conserved mammal pregnancy cues (e.g. progesterone), and facilitates species-specific responses. Here we identify 13 miRNA gene families that arose at the origin of placental mammals and were subsequently retained in all descendent lineages. The expression of these miRNAs in response to early pregnancy molecules is regulated in a species-specific manner in endometrial epithelia of species with extreme implantation strategies (i.e. bovine and human). Furthermore, this set of miRNAs preferentially target proteins under positive selective pressure on the ancestral eutherian lineage. Discovery of this core embryo implantation toolkit and specifically adapted proteins helps explain the origin and evolution of implantation in mammals.


Subject(s)
MicroRNAs , Placenta , Pregnancy , Humans , Cattle , Animals , Female , Placenta/metabolism , Eutheria/genetics , Embryo Implantation/genetics , Mammals/genetics , MicroRNAs/genetics , MicroRNAs/metabolism , Genomics
5.
Allergy ; 78(5): 1258-1268, 2023 05.
Article in English | MEDLINE | ID: mdl-36595290

ABSTRACT

BACKGROUND: From early life, respiratory viruses are implicated in the development, exacerbation and persistence of respiratory conditions such as asthma. Complex dynamics between microbial communities and host immune responses shape immune maturation and homeostasis, influencing health outcomes. We evaluated the hypothesis that the respiratory virome is linked to systemic immune responses, using peripheral blood and nasopharyngeal swab samples from preschool-age children in the PreDicta cohort. METHODS: Peripheral blood mononuclear cells from 51 children (32 asthmatics and 19 healthy controls) participating in the 2-year multinational PreDicta cohort were cultured with bacterial (Bacterial-DNA, LPS) or viral (R848, Poly:IC, RV) stimuli. Supernatants were analysed by Luminex for the presence of 22 relevant cytokines. Virome composition was obtained using untargeted high throughput sequencing of nasopharyngeal samples. The metagenomic data were used for the characterization of virome profiles and the presence of key viral families (Picornaviridae, Anelloviridae, Siphoviridae). These were correlated to cytokine secretion patterns, identified through hierarchical clustering and principal component analysis. RESULTS: High spontaneous cytokine release was associated with increased presence of Prokaryotic virome profiles and reduced presence of Eukaryotic and Anellovirus profiles. Antibacterial responses did not correlate with specific viral families or virome profile; however, low antiviral responders had more Prokaryotic and less Eukaryotic virome profiles. Anelloviruses and Anellovirus-dominated profiles were equally distributed among immune response clusters. The presence of Picornaviridae and Siphoviridae was associated with low interferon-λ responses. Asthma or allergy did not modify these correlations. CONCLUSION: Antiviral cytokine responses at a systemic level reflect the upper airway virome composition. Individuals with low innate interferon responses have higher abundance of Picornaviruses (mostly Rhinoviruses) and bacteriophages. Bacteriophages, particularly Siphoviridae, appear to be sensitive sensors of host antimicrobial capacity, while Anelloviruses are not correlated with TLR-induced immune responses.


Subject(s)
Antiviral Agents , Asthma , Child, Preschool , Child , Humans , Virome , Leukocytes, Mononuclear , Interferons , Immunity
6.
Lancet Digit Health ; 4(10): e705-e716, 2022 10.
Article in English | MEDLINE | ID: mdl-36038496

ABSTRACT

BACKGROUND: Direct evaluation of vascular inflammation in patients with COVID-19 would facilitate more efficient trials of new treatments and identify patients at risk of long-term complications who might respond to treatment. We aimed to develop a novel artificial intelligence (AI)-assisted image analysis platform that quantifies cytokine-driven vascular inflammation from routine CT angiograms, and sought to validate its prognostic value in COVID-19. METHODS: For this prospective outcomes validation study, we developed a radiotranscriptomic platform that uses RNA sequencing data from human internal mammary artery biopsies to develop novel radiomic signatures of vascular inflammation from CT angiography images. We then used this platform to train a radiotranscriptomic signature (C19-RS), derived from the perivascular space around the aorta and the internal mammary artery, to best describe cytokine-driven vascular inflammation. The prognostic value of C19-RS was validated externally in 435 patients (331 from study arm 3 and 104 from study arm 4) admitted to hospital with or without COVID-19, undergoing clinically indicated pulmonary CT angiography, in three UK National Health Service (NHS) trusts (Oxford, Leicester, and Bath). We evaluated the diagnostic and prognostic value of C19-RS for death in hospital due to COVID-19, did sensitivity analyses based on dexamethasone treatment, and investigated the correlation of C19-RS with systemic transcriptomic changes. FINDINGS: Patients with COVID-19 had higher C19-RS than those without (adjusted odds ratio [OR] 2·97 [95% CI 1·43-6·27], p=0·0038), and those infected with the B.1.1.7 (alpha) SARS-CoV-2 variant had higher C19-RS values than those infected with the wild-type SARS-CoV-2 variant (adjusted OR 1·89 [95% CI 1·17-3·20] per SD, p=0·012). C19-RS had prognostic value for in-hospital mortality in COVID-19 in two testing cohorts (high [≥6·99] vs low [<6·99] C19-RS; hazard ratio [HR] 3·31 [95% CI 1·49-7·33], p=0·0033; and 2·58 [1·10-6·05], p=0·028), adjusted for clinical factors, biochemical biomarkers of inflammation and myocardial injury, and technical parameters. The adjusted HR for in-hospital mortality was 8·24 (95% CI 2·16-31·36, p=0·0019) in patients who received no dexamethasone treatment, but 2·27 (0·69-7·55, p=0·18) in those who received dexamethasone after the scan, suggesting that vascular inflammation might have been a therapeutic target of dexamethasone in COVID-19. Finally, C19-RS was strongly associated (r=0·61, p=0·00031) with a whole blood transcriptional module representing dysregulation of coagulation and platelet aggregation pathways. INTERPRETATION: Radiotranscriptomic analysis of CT angiography scans introduces a potentially powerful new platform for the development of non-invasive imaging biomarkers. Application of this platform in routine CT pulmonary angiography scans done in patients with COVID-19 produced the radiotranscriptomic signature C19-RS, a marker of cytokine-driven inflammation driving systemic activation of coagulation and responsible for adverse clinical outcomes, which predicts in-hospital mortality and might allow targeted therapy. FUNDING: Engineering and Physical Sciences Research Council, British Heart Foundation, Oxford BHF Centre of Research Excellence, Innovate UK, NIHR Oxford Biomedical Research Centre, Wellcome Trust, Onassis Foundation.


Subject(s)
COVID-19 , SARS-CoV-2 , Angiography , Artificial Intelligence , COVID-19/diagnostic imaging , Cytokines , Humans , Inflammation/diagnostic imaging , Prospective Studies , State Medicine , Tomography, X-Ray Computed
7.
Cell ; 185(12): 2116-2131.e18, 2022 06 09.
Article in English | MEDLINE | ID: mdl-35662412

ABSTRACT

Highly transmissible Omicron variants of SARS-CoV-2 currently dominate globally. Here, we compare neutralization of Omicron BA.1, BA.1.1, and BA.2. BA.2 RBD has slightly higher ACE2 affinity than BA.1 and slightly reduced neutralization by vaccine serum, possibly associated with its increased transmissibility. Neutralization differences between sub-lineages for mAbs (including therapeutics) mostly arise from variation in residues bordering the ACE2 binding site; however, more distant mutations S371F (BA.2) and R346K (BA.1.1) markedly reduce neutralization by therapeutic antibody Vir-S309. In-depth structure-and-function analyses of 27 potent RBD-binding mAbs isolated from vaccinated volunteers following breakthrough Omicron-BA.1 infection reveals that they are focused in two main clusters within the RBD, with potent right-shoulder antibodies showing increased prevalence. Selection and somatic maturation have optimized antibody potency in less-mutated epitopes and recovered potency in highly mutated epitopes. All 27 mAbs potently neutralize early pandemic strains, and many show broad reactivity with variants of concern.


Subject(s)
Antibodies, Monoclonal , COVID-19 Vaccines/immunology , SARS-CoV-2 , Spike Glycoprotein, Coronavirus , Angiotensin-Converting Enzyme 2 , Antibodies, Monoclonal/chemistry , Antibodies, Monoclonal/genetics , Antibodies, Viral , COVID-19 , COVID-19 Vaccines/administration & dosage , Epitopes , Humans , Neutralization Tests , SARS-CoV-2/classification , SARS-CoV-2/genetics , Spike Glycoprotein, Coronavirus/chemistry
8.
Cell ; 185(14): 2422-2433.e13, 2022 07 07.
Article in English | MEDLINE | ID: mdl-35772405

ABSTRACT

The Omicron lineage of SARS-CoV-2, which was first described in November 2021, spread rapidly to become globally dominant and has split into a number of sublineages. BA.1 dominated the initial wave but has been replaced by BA.2 in many countries. Recent sequencing from South Africa's Gauteng region uncovered two new sublineages, BA.4 and BA.5, which are taking over locally, driving a new wave. BA.4 and BA.5 contain identical spike sequences, and although closely related to BA.2, they contain further mutations in the receptor-binding domain of their spikes. Here, we study the neutralization of BA.4/5 using a range of vaccine and naturally immune serum and panels of monoclonal antibodies. BA.4/5 shows reduced neutralization by the serum from individuals vaccinated with triple doses of AstraZeneca or Pfizer vaccine compared with BA.1 and BA.2. Furthermore, using the serum from BA.1 vaccine breakthrough infections, there are, likewise, significant reductions in the neutralization of BA.4/5, raising the possibility of repeat Omicron infections.


Subject(s)
COVID-19 , Viral Vaccines , Antibodies, Neutralizing , Antibodies, Viral , COVID-19/prevention & control , Humans , Neutralization Tests , SARS-CoV-2/genetics , South Africa
9.
Bioinformatics ; 38(12): 3291-3293, 2022 06 13.
Article in English | MEDLINE | ID: mdl-35551365

ABSTRACT

SUMMARY: Viral sequence data from clinical samples frequently contain contaminating human reads, which must be removed prior to sharing for legal and ethical reasons. To enable host read removal for SARS-CoV-2 sequencing data on low-specification laptops, we developed ReadItAndKeep, a fast lightweight tool for Illumina and nanopore data that only keeps reads matching the SARS-CoV-2 genome. Peak RAM usage is typically below 10 MB, and runtime less than 1 min. We show that by excluding the polyA tail from the viral reference, ReadItAndKeep prevents bleed-through of human reads, whereas mapping to the human genome lets some reads escape. We believe our test approach (including all possible reads from the human genome, human samples from each of the 26 populations in the 1000 genomes data and a diverse set of SARS-CoV-2 genomes) will also be useful for others. AVAILABILITY AND IMPLEMENTATION: ReadItAndKeep is implemented in C++, released under the MIT license, and available from https://github.com/GenomePathogenAnalysisService/read-it-and-keep. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
COVID-19 , Software , Humans , Sequence Analysis, DNA , SARS-CoV-2/genetics , Decontamination , High-Throughput Nucleotide Sequencing , Genome, Human
10.
Clin Infect Dis ; 74(7): 1208-1219, 2022 04 09.
Article in English | MEDLINE | ID: mdl-34216472

ABSTRACT

BACKGROUND: Natural and vaccine-induced immunity will play a key role in controlling the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic. SARS-CoV-2 variants have the potential to evade natural and vaccine-induced immunity. METHODS: In a longitudinal cohort study of healthcare workers (HCWs) in Oxfordshire, United Kingdom, we investigated the protection from symptomatic and asymptomatic polymerase chain reaction (PCR)-confirmed SARS-CoV-2 infection conferred by vaccination (Pfizer-BioNTech BNT162b2, Oxford-AstraZeneca ChAdOx1 nCOV-19) and prior infection (determined using anti-spike antibody status), using Poisson regression adjusted for age, sex, temporal changes in incidence and role. We estimated protection conferred after 1 versus 2 vaccinations and from infections with the B.1.1.7 variant identified using whole genome sequencing. RESULTS: In total, 13 109 HCWs participated; 8285 received the Pfizer-BioNTech vaccine (1407 two doses), and 2738 the Oxford-AstraZeneca vaccine (49 two doses). Compared to unvaccinated seronegative HCWs, natural immunity and 2 vaccination doses provided similar protection against symptomatic infection: no HCW vaccinated twice had symptomatic infection, and incidence was 98% lower in seropositive HCWs (adjusted incidence rate ratio 0.02 [95% confidence interval {CI} < .01-.18]). Two vaccine doses or seropositivity reduced the incidence of any PCR-positive result with or without symptoms by 90% (0.10 [95% CI .02-.38]) and 85% (0.15 [95% CI .08-.26]), respectively. Single-dose vaccination reduced the incidence of symptomatic infection by 67% (0.33 [95% CI .21-.52]) and any PCR-positive result by 64% (0.36 [95% CI .26-.50]). There was no evidence of differences in immunity induced by natural infection and vaccination for infections with S-gene target failure and B.1.1.7. CONCLUSIONS: Natural infection resulting in detectable anti-spike antibodies and 2 vaccine doses both provide robust protection against SARS-CoV-2 infection, including against the B.1.1.7 variant.


Subject(s)
COVID-19 , SARS-CoV-2 , BNT162 Vaccine , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19 Vaccines , ChAdOx1 nCoV-19 , Cohort Studies , Health Personnel , Humans , Immunoglobulins , Incidence , Longitudinal Studies , Vaccination
11.
Microb Genom ; 7(9)2021 09.
Article in English | MEDLINE | ID: mdl-34559044

ABSTRACT

Analysing the flanking sequences surrounding genes of interest is often highly relevant to understanding the role of mobile genetic elements (MGEs) in horizontal gene transfer, particular for antimicrobial-resistance genes. Here, we present Flanker, a Python package that performs alignment-free clustering of gene flanking sequences in a consistent format, allowing investigation of MGEs without prior knowledge of their structure. These clusters, known as 'flank patterns' (FPs), are based on Mash distances, allowing for easy comparison of similarity across sequences. Additionally, Flanker can be flexibly parameterized to fine-tune outputs by characterizing upstream and downstream regions separately, and investigating variable lengths of flanking sequence. We apply Flanker to two recent datasets describing plasmid-associated carriage of important carbapenemase genes (blaOXA-48 and blaKPC-2/3) and show that it successfully identifies distinct clusters of FPs, including both known and previously uncharacterized structural variants. For example, Flanker identified four Tn4401 profiles that could not be sufficiently characterized using TETyper or MobileElementFinder, demonstrating the utility of Flanker for flanking-gene characterization. Similarly, using a large (n=226) European isolate dataset, we confirm findings from a previous smaller study demonstrating association between Tn1999.2 and blaOXA-48 upregulation and demonstrate 17 FPs (compared to the 5 previously identified). More generally, the demonstration in this study that FPs are associated with geographical regions and antibiotic-susceptibility phenotypes suggests that they may be useful as epidemiological markers. Flanker is freely available under an MIT license at https://github.com/wtmatlock/flanker.


Subject(s)
Gene Transfer, Horizontal , Genomics , Klebsiella pneumoniae/genetics , beta-Lactamases/genetics , Bacterial Proteins/genetics , Computational Biology , Interspersed Repetitive Sequences , Klebsiella Infections/microbiology , Plasmids
12.
J Infect ; 83(4): 473-482, 2021 10.
Article in English | MEDLINE | ID: mdl-34332019

ABSTRACT

OBJECTIVES: Despite robust efforts, patients and staff acquire SARS-CoV-2 infection in hospitals. We investigated whether whole-genome sequencing enhanced the epidemiological investigation of healthcare-associated SARS-CoV-2 acquisition. METHODS: From 17-November-2020 to 5-January-2021, 803 inpatients and 329 staff were diagnosed with SARS-CoV-2 infection at four Oxfordshire hospitals. We classified cases using epidemiological definitions, looked for a potential source for each nosocomial infection, and evaluated genomic evidence supporting transmission. RESULTS: Using national epidemiological definitions, 109/803(14%) inpatient infections were classified as definite/probable nosocomial, 615(77%) as community-acquired and 79(10%) as indeterminate. There was strong epidemiological evidence to support definite/probable cases as nosocomial. Many indeterminate cases were likely infected in hospital: 53/79(67%) had a prior-negative PCR and 75(95%) contact with a potential source. 89/615(11% of all 803 patients) with apparent community-onset had a recent hospital exposure. Within 764 samples sequenced 607 genomic clusters were identified (>1 SNP distinct). Only 43/607(7%) clusters contained evidence of onward transmission (subsequent cases within ≤ 1 SNP). 20/21 epidemiologically-identified outbreaks contained multiple genomic introductions. Most (80%) nosocomial acquisition occurred in rapid super-spreading events in settings with a mix of COVID-19 and non-COVID-19 patients. CONCLUSIONS: Current surveillance definitions underestimate nosocomial acquisition. Most nosocomial transmission occurs from a relatively limited number of highly infectious individuals.


Subject(s)
COVID-19 , Cross Infection , Cross Infection/epidemiology , Disease Outbreaks , Hospitals , Humans , SARS-CoV-2
13.
FASEB J ; 34(8): 11015-11029, 2020 08.
Article in English | MEDLINE | ID: mdl-32619075

ABSTRACT

During the preimplantation period of pregnancy in eutherian mammals, transcriptional and proteomic changes in the uterine endometrium are required to facilitate receptivity to an implanting blastocyst. These changes are mediated, in part, by proteins produced by the developing conceptus (inner cell mass and extraembryonic membranes). We hypothesized that this common process in early pregnancy in eutheria may be facilitated by highly conserved conceptus-derived proteins such as macrophage capping protein (CAPG). We propose that CAPG may share functionality in modifying the transcriptome of the endometrial epithelial cells to facilitate receptivity to implantation in species with different implantation strategies. A recombinant bovine form of CAPG (91% sequence identity between bovine and human) was produced and bovine endometrial epithelial (bEECs) and stromal (bESCs) and human endometrial epithelial cells (hEECs) were cultured for 24 hours with and without recombinant bovine CAPG (rbCAPG). RNA sequencing and quantitative real-time PCR analysis were used to assess the transcriptional response to rbCAPG (Control, vehicle, CAPG 10, 100, 1000 ng/mL: n = 3 biological replicates per treatment per species). Treatment of bEECs with CAPG resulted in alterations in the abundance of 1052 transcripts (629 increased and 423 decreased) compared to vehicle controls. Treatment of hEECs with bovine CAPG increased expression of transcripts previously known to interact with CAPG in different systems (CAPZB, CAPZA2, ADD1, and ADK) compared with vehicle controls (P < .05). In conclusion, we have demonstrated that CAPG, a highly conserved protein in eutherian mammals, elicits a transcriptional response in the endometrial epithelium in species with different implantation strategies that may contribute to pregnancy success.


Subject(s)
Cell Communication/physiology , Embryo Implantation/physiology , Embryo, Mammalian/metabolism , Endometrium/metabolism , Microfilament Proteins/metabolism , Nuclear Proteins/metabolism , Uterus/metabolism , Animals , Blastocyst/metabolism , Blastocyst/physiology , Cattle , Cells, Cultured , Embryo, Mammalian/physiology , Endometrium/physiology , Epithelial Cells/metabolism , Epithelial Cells/physiology , Epithelium/metabolism , Epithelium/physiology , Female , Humans , Pregnancy , Proteomics/methods , Transcription, Genetic/physiology , Transcriptome/physiology , Uterus/physiology
14.
Microb Genom ; 6(7)2020 07.
Article in English | MEDLINE | ID: mdl-32553019

ABSTRACT

Escherichia coli and Klebsiella spp. are important human pathogens that cause a wide spectrum of clinical disease. In healthcare settings, sinks and other wastewater sites have been shown to be reservoirs of antimicrobial-resistant E. coli and Klebsiella spp., particularly in the context of outbreaks of resistant strains amongst patients. Without focusing exclusively on resistance markers or a clinical outbreak, we demonstrate that many hospital sink drains are abundantly and persistently colonized with diverse populations of E. coli, Klebsiella pneumoniae and Klebsiella oxytoca, including both antimicrobial-resistant and susceptible strains. Using whole-genome sequencing of 439 isolates, we show that environmental bacterial populations are largely structured by ward and sink, with only a handful of lineages, such as E. coli ST635, being widely distributed, suggesting different prevailing ecologies, which may vary as a result of different inputs and selection pressures. Whole-genome sequencing of 46 contemporaneous patient isolates identified one (2 %; 95 % CI 0.05-11 %) E. coli urine infection-associated isolate with high similarity to a prior sink isolate, suggesting that sinks may contribute to up to 10 % of infections caused by these organisms in patients on the ward over the same timeframe. Using metagenomics from 20 sink-timepoints, we show that sinks also harbour many clinically relevant antimicrobial resistance genes including blaCTX-M, blaSHV and mcr, and may act as niches for the exchange and amplification of these genes. Our study reinforces the potential role of sinks in contributing to Enterobacterales infection and antimicrobial resistance in hospital patients, something that could be amenable to intervention. This article contains data hosted by Microreact.


Subject(s)
Escherichia coli Infections/diagnosis , Escherichia coli/classification , Klebsiella Infections/diagnosis , Klebsiella/classification , Wastewater/microbiology , Whole Genome Sequencing/methods , Drug Resistance, Multiple, Bacterial , Environmental Microbiology , Escherichia coli/genetics , Escherichia coli/isolation & purification , High-Throughput Nucleotide Sequencing , Hospitals , Humans , Klebsiella/genetics , Klebsiella/isolation & purification , Phylogeny , Population Surveillance , Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization
15.
Viruses ; 11(5)2019 04 26.
Article in English | MEDLINE | ID: mdl-31035503

ABSTRACT

Advances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequential data, such as signal processing, in which dimensionality reduction (i.e., compression) methods are routinely used to lessen the computational burden of analyses. In this work, we explored the application of dimensionality reduction methods to numerically represent high-throughput sequence data for three important biological applications of virus sequence data: reference-based mapping, short sequence classification and de novo assembly. Leveraging highly compressed sequence transformations to accelerate sequence comparison, our approach yielded comparable accuracy to existing approaches, further demonstrating its suitability for sequences originating from diverse virus populations. We assessed the application of our methodology using both synthetic and real viral pathogen sequences. Our results show that the use of highly compressed sequence approximations can provide accurate results, with analytical performance retained and even enhanced through appropriate dimensionality reduction of sequence data.


Subject(s)
Computational Biology , DNA Viruses/classification , DNA Viruses/genetics , Genome, Viral , Genomics , Computational Biology/methods , Genomics/methods , Humans
16.
Nat Microbiol ; 3(2): 189-196, 2018 02.
Article in English | MEDLINE | ID: mdl-29158606

ABSTRACT

The emergence of high-throughput DNA sequencing methods provides unprecedented opportunities to further unravel bacterial biodiversity and its worldwide role from human health to ecosystem functioning. However, despite the abundance of sequencing studies, combining data from multiple individual studies to address macroecological questions of bacterial diversity remains methodically challenging and plagued with biases. Here, using a machine-learning approach that accounts for differences among studies and complex interactions among taxa, we merge 30 independent bacterial data sets comprising 1,998 soil samples from 21 countries. Whereas previous meta-analysis efforts have focused on bacterial diversity measures or abundances of major taxa, we show that disparate amplicon sequence data can be combined at the taxonomy-based level to assess bacterial community structure. We find that rarer taxa are more important for structuring soil communities than abundant taxa, and that these rarer taxa are better predictors of community structure than environmental factors, which are often confounded across studies. We conclude that combining data from independent studies can be used to explore bacterial community dynamics, identify potential 'indicator' taxa with an important role in structuring communities, and propose hypotheses on the factors that shape bacterial biogeography that have been overlooked in the past.


Subject(s)
Bacteria/classification , Bacterial Physiological Phenomena , Ecology , Microbiota , Soil Microbiology , Bacteria/genetics , Biodiversity , DNA, Bacterial/genetics , Ecosystem , High-Throughput Nucleotide Sequencing , Machine Learning , Microbial Interactions , Phylogeny , RNA, Ribosomal, 16S/genetics , Soil
17.
Virus Evol ; 2(2): vew022, 2016 Jul.
Article in English | MEDLINE | ID: mdl-29492275

ABSTRACT

Genome sequencing technologies continue to develop with remarkable pace, yet analytical approaches for reconstructing and classifying viral genomes from mixed samples remain limited in their performance and usability. Existing solutions generally target expert users and often have unclear scope, making it challenging to critically evaluate their performance. There is a growing need for intuitive analytical tooling for researchers lacking specialist computing expertise and that is applicable in diverse experimental circumstances. Notable technical challenges have impeded progress; for example, fragments of viral genomes are typically orders of magnitude less abundant than those of host, bacteria, and/or other organisms in clinical and environmental metagenomes; observed viral genomes often deviate considerably from reference genomes demanding use of exhaustive alignment approaches; high intrapopulation viral diversity can lead to ambiguous sequence reconstruction; and finally, the relatively few documented viral reference genomes compared to the estimated number of distinct viral taxa renders classification problematic. Various software tools have been developed to accommodate the unique challenges and use cases associated with characterizing viral sequences; however, the quality of these tools varies, and their use often necessitates computing expertise or access to powerful computers, thus limiting their usefulness to many researchers. In this review, we consider the general and application-specific challenges posed by viral sequencing and analysis, outline the landscape of available tools and methodologies, and propose ways of overcoming the current barriers to effective analysis.

18.
F1000Res ; 4: 900, 2015.
Article in English | MEDLINE | ID: mdl-26535114

ABSTRACT

The khmer package is a freely available software library for working efficiently with fixed length DNA words, or k-mers. khmer provides implementations of a probabilistic k-mer counting data structure, a compressible De Bruijn graph representation, De Bruijn graph partitioning, and digital normalization. khmer is implemented in C++ and Python, and is freely available under the BSD license at  https://github.com/dib-lab/khmer/.

SELECTION OF CITATIONS
SEARCH DETAIL
...