RESUMO
BACKGROUND: Hepatitis C virus (HCV) has a high genetic diversity and is classified into 8 genotypes and over 90 subtypes with some endemic to specific world regions. This could compromise direct-acting antiviral (DAA) efficacy and global HCV elimination. METHODS: We characterised HCV subtypes 'rare' to the UK (non-1a/1b/2b/3a/4d) by whole genome sequencing via a national surveillance programme. Genetic analyses to determine the genotype of samples with unresolved genotypes were undertaken by comparison with ICTV HCV reference sequences. RESULTS: Two HCV variants were characterised as being closely related to the recently identified genotype 8 (GT8), with >85% pairwise genetic distance similarity to GT8 sequences and within the typical inter-subtype genetic distance range. The individuals infected by the variants were UK residents originally from Pakistan and India. In contrast, a third variant was only confidently identified to be more similar to GT6 compared to other genotypes across 6% of the genome and was isolated from a UK resident originally from Guyana. All three were cured with pangenotypic DAAs (Sofosbuvir + Velpatasvir or Glecaprevir + Pibrentasvir) despite the presence of resistance polymorphisms in NS3 (80â K/168E), NS5A (28â V/30S/62L/92S/93S) and NS5B (159F). CONCLUSIONS: This study expands our knowledge of HCV diversity by identifying two new GT8 subtypes and potentially a new genotype.
RESUMO
Carbapenem-resistant Enterobacterales (CRE) are among the most concerning antibiotic resistance threats due to high rates of multidrug resistance, transmissibility in health care settings, and high mortality rates. We evaluated the potential for regional genomic surveillance to track the spread of blaKPC-carrying CRE (KPC-CRE) by using isolate collections from health care facilities in three U.S. states. Clinical isolates were collected from Connecticut (2017 to 2018), Minnesota (2012 to 2018), and Tennessee (2016 to 2017) through the U.S. Centers for Disease Control and Prevention's Multi-site Gram-negative Surveillance Initiative (MuGSI) and additional surveillance. KPC-CRE isolates were whole-genome sequenced, yielding 255 isolates from 214 patients across 96 facilities. Case report data on patient comorbidities, facility exposures, and interfacility patient transfer were extracted. We observed that in Connecticut, most KPC-CRE isolates showed evidence of importation from outside the state, with limited local transmission. In Minnesota, cases were mainly from sporadic importation and transmission of blaKPC-carrying Klebsiella pneumoniae ST258, and clonal expansion of blaKPC-carrying Enterobacter hormaechei ST171, primarily at a single focal facility and its satellite facilities. In Tennessee, we observed transmission of diverse strains of blaKPC-carrying Enterobacter and Klesbiella, with evidence that most derived from the local acquisition of blaKPC plasmids circulating in an interconnected regional health care network. Thus, the underlying processes driving KPC-CRE burden can differ substantially across regions and can be discerned through regional genomic surveillance. This study provides proof of concept that integrating genomic data with information on interfacility patient transfers can provide insights into locations and drivers of regional KPC-CRE burden that can enable targeted interventions.
Assuntos
Infecções por Klebsiella , beta-Lactamases , Humanos , beta-Lactamases/genética , Proteínas de Bactérias/genética , Plasmídeos , Klebsiella pneumoniae/genética , Carbapenêmicos , Antibacterianos/farmacologia , Testes de Sensibilidade Microbiana , Infecções por Klebsiella/epidemiologiaRESUMO
BACKGROUND: Carbapenem-resistant Enterobacterales (CRE) harboring blaKPC have been endemic in Chicago-area healthcare networks for more than a decade. During 2016-2019, a series of regional point-prevalence surveys identified increasing prevalence of blaNDM-containing CRE in multiple long-term acute care hospitals (LTACHs) and ventilator-capable skilled nursing facilities (vSNFs). We performed a genomic epidemiology investigation of blaNDM-producing CRE to understand their regional emergence and spread. METHODS: We performed whole-genome sequencing on New Delhi metallo-beta-lactamase (NDM)+ CRE isolates from 4 point-prevalence surveys across 35 facilities (LTACHs, vSNFs, and acute care hospital medical intensive care units) in the Chicago area and investigated the genomic relatedness and transmission dynamics of these isolates over time. RESULTS: Genomic analyses revealed that the rise of NDM+ CRE was due to the clonal dissemination of an sequence type (ST) 147 Klebsiella pneumoniae strain harboring blaNDM-1 on an IncF plasmid. Dated phylogenetic reconstructions indicated that ST147 was introduced into the region around 2013 and likely acquired NDM around 2015. Analyzing the relatedness of strains within and between facilities supported initial increases in prevalence due to intrafacility transmission in certain vSNFs, with evidence of subsequent interfacility spread among LTACHs and vSNFs connected by patient transfer. CONCLUSIONS: We identified a regional outbreak of blaNDM-1 ST147 that began in and disseminated across Chicago area post-acute care facilities. Our findings highlight the importance of performing genomic surveillance at post-acute care facilities to identify emerging threats.
Assuntos
Klebsiella pneumoniae , Cuidados Semi-Intensivos , Humanos , Klebsiella pneumoniae/genética , Testes de Sensibilidade Microbiana , Tipagem de Sequências Multilocus , FilogeniaRESUMO
BACKGROUND: Patients entering nursing facilities (NFs) are frequently colonized with antibiotic-resistant organisms (AROs). To understand the determinants of ARO colonization on NF admission, we applied whole-genome sequencing to track the spread of 4 ARO species across regional NFs and evaluated patient-level characteristics and transfer acute care hospitals (ACHs) as risk factors for colonization. METHODS: Patients from 6 NFs (n = 584) were surveyed for methicillin-resistant Staphylococcus aureus (MRSA), vancomycin-resistant Enterococcus faecalis/faecium (VREfc/VREfm), and ciprofloxacin-resistant Escherichia coli (CipREc) colonization. Genomic analysis was performed to quantify ARO spread between NFs and compared to patient-transfer networks. The association between admission colonization and patient-level variables and recent ACH exposures was examined. RESULTS: The majority of ARO isolates belonged to major healthcare-associated lineages: MRSA (sequence type [ST] 5); VREfc (ST6); CipREc (ST131), and VREfm (clade A). While the genomic similarity of strains between NF pairs was positively associated with overlap in their feeder ACHs (P < .05 for MRSA, VREfc, and CipREc), limited phylogenetic clustering by either ACH or NF supported regional endemicity. Significant predictors for ARO colonization on NF admission included lower functional status and recent exposure to glycopeptides (adjusted odds ratio [aOR], > 2 for MRSA and VREfc/VREfm) or third-/fourth-generation cephalosporins (aOR, > 2 for MRSA and VREfm). Transfer from specific ACHs was an independent risk factor for only 1 ARO/ACH pair (VREfm/ACH19: aOR, 2.48). CONCLUSIONS: In this region, healthcare-associated ARO lineages are endemic among connected NFs and ACHs, making patient characteristics more informative of NF admission colonization risk than exposure to specific ACHs.
Assuntos
Infecções por Bactérias Gram-Positivas , Staphylococcus aureus Resistente à Meticilina , Infecções Estafilocócicas , Antibacterianos/farmacologia , Genômica , Humanos , Staphylococcus aureus Resistente à Meticilina/genética , Filogenia , Infecções Estafilocócicas/epidemiologiaRESUMO
Carbapenem-resistant Klebsiella pneumoniae (CRKP) is an antibiotic resistance threat of the highest priority. Given the limited treatment options for this multidrug-resistant organism (MDRO), there is an urgent need for targeted strategies to prevent transmission. Here, we applied whole-genome sequencing to a comprehensive collection of clinical isolates to reconstruct regional transmission pathways and analyzed this transmission network in the context of statewide patient transfer data and patient-level clinical data to identify drivers of regional transmission. We found that high regional CRKP burdens were due to a small number of regional introductions, with subsequent regional proliferation occurring via patient transfers among health care facilities. While CRKP was predicted to have been imported into each facility multiple times, there was substantial variation in the ratio of intrafacility transmission events per importation, indicating that amplification occurs unevenly across regional facilities. While myriad factors likely influence intrafacility transmission rates, an understudied one is the potential for clinical characteristics of colonized and infected patients to influence their propensity for transmission. Supporting the contribution of high-risk patients to elevated transmission rates, we observed that patients colonized and infected with CRKP in high-transmission facilities had higher rates of carbapenem use, malnutrition, and dialysis and were older. This report highlights the potential for regional infection prevention efforts that are grounded in genomic epidemiology to identify the patients and facilities that make the greatest contribution to regional MDRO prevalence, thereby facilitating the design of precision interventions of maximal impact.
Assuntos
Enterobacteriáceas Resistentes a Carbapenêmicos/genética , Infecções por Klebsiella/microbiologia , Klebsiella pneumoniae/genética , Enterobacteriáceas Resistentes a Carbapenêmicos/efeitos dos fármacos , Carbapenêmicos/farmacologia , Infecção Hospitalar/microbiologia , Farmacorresistência Bacteriana Múltipla/efeitos dos fármacos , Farmacorresistência Bacteriana Múltipla/genética , Humanos , Infecções por Klebsiella/tratamento farmacológico , Klebsiella pneumoniae/efeitos dos fármacos , Testes de Sensibilidade Microbiana , Estudos Prospectivos , Sequenciamento Completo do Genoma/métodosRESUMO
Measuring vector-human contact in a natural setting can inform precise targeting of interventions to interrupt transmission of vector-borne diseases. One approach is to directly match human DNA in vector bloodmeals to the individuals who were bitten using genotype panels of discriminative short tandem repeats (STRs). Existing methods for matching STR profiles in bloodmeals to the people bitten preclude the ability to match most incomplete profiles and multi-source bloodmeals to bitten individuals.We developed bistro, an R package that implements 3 preexisting STR matching methods as well as the package's namesake, bistro, a new algorithm described here. bistro employs forensic analysis methods to calculate likelihood ratios and match human STR profiles in bloodmeals to people using a dynamic threshold. We evaluated the algorithm's accuracy and compared it to existing matching approaches using a publicly-available panel of 188 single-source and 100 multi-source samples containing DNA from 50 known human sources. Then we applied it to match 777 newly field-collected mosquito bloodmeals to a database of 645 people.The R package implements four STR matching algorithms in user-friendly functions with clear documentation. bistro correctly matched 99% (187/188) of profiles in single-source samples, and 62% (224/359) of profiles from multi-source samples, resulting in a sensitivity of 0.75 (vs < 0.51 for other algorithms). The specificity of bistro was 0.9998 (vs. 1 for other algorithms). Furthermore, bistro identified 79% (720/906) of all possible matches for field-derived mosquitoes, yielding 1.4x more matches than existing algorithms.bistro identifies more correct bloodmeal-human matches than existing approaches, enabling more accurate and robust analyses of vector-human contact in natural settings. The bistro R package and corresponding documentation allow for straightforward uptake of this algorithm by others.
RESUMO
Molecular epidemiologic studies of malaria parasites and other pathogens commonly employ amplicon deep sequencing (AmpSeq) of marker genes derived from dried blood spots (DBS) to answer public health questions related to topics such as transmission and drug resistance. As these methods are increasingly employed to inform direct public health action, it is important to rigorously evaluate the risk of false positive and false negative haplotypes derived from clinically-relevant sample types. We performed a control experiment evaluating haplotype recovery from AmpSeq of 5 marker genes (ama1, csp, msp7, sera2, and trap) from DBS containing mixtures of DNA from 1 to 10 known P. falciparum reference strains across 3 parasite densities in triplicate (n = 270 samples). While false positive haplotypes were present across all parasite densities and mixtures, we optimized censoring criteria to remove 83% (148/179) of false positives while removing only 8% (67/859) of true positives. Post-censoring, the median pairwise Jaccard distance between replicates was 0.83. We failed to recover 35% (477/1365) of haplotypes expected to be present in the sample. Haplotypes were more likely to be missed in low-density samples with <1.5 genomes/µL (OR: 3.88, CI: 1.82-8.27, vs. high-density samples with ≥75 genomes/µL) and in samples with lower read depth (OR per 10,000 reads: 0.61, CI: 0.54-0.69). Furthermore, minority haplotypes within a sample were more likely to be missed than dominant haplotypes (OR per 0.01 increase in proportion: 0.96, CI: 0.96-0.97). Finally, in clinical samples the percent concordance across markers for multiplicity of infection ranged from 40%-80%. Taken together, our observations indicate that, with sufficient read depth, the majority of haplotypes can be successfully recovered from DBS while limiting the false positive rate.
RESUMO
Background: Much effort and resources have been invested to control malaria transmission in Sub-Saharan Africa, but it remains a major public health problem. For the disease to be transmitted from one person to another, the female Anopheles vector must survive 10-14 days following an infective bite for the Plasmodiumgametocytes to develop into infectious sporozoites which can be transmitted to the next person during a bloodmeal. The goal of this investigation was to assess factors associated with wild-caught Anopheles survival and infection following host-seeking and indoor resting. Methods: The study was conducted in a longitudinal cohort of 75 households in 5 villages including a total of 755 household members in Bungoma County, Kenya. Monthly adult mosquito collection was conducted by attenuated aspiration in all the enrolled households, and the mosquitoes were reared in the insectary for 7 days. The daily mortality rate was determined through day 7, and all the mosquitoes were morphologically identified. Female Anopheline mosquitoes were dissected, and species-level members of the Anopheles gambiae complex were resolved by molecular methods. The abdomen for all samples were processed for P. falciparum detection by PCR. Results: Within a period of 25 months, the total number of culex and Anopheles mosquitoes collected indoors were 12,843 and 712 respectively. Anopheles gambiaeand Anopheles funestus were the major vectors though their population varied between different villages. 61.2% (n=436/712) of the Anopheles species survived up to day 7 with the lowest mortality rate recorded on day 5 of captivity. The survival rate also varied between the different Anophelesspecies. 683 of 712 mosquito abdomens were tested for P. falciparumdetection and 7.8% (53/683) tested positive for P. falciparum with An. funestus having a higher (10%) prevalence than An. gambaie s.s.(6.0%, p=0.095, Pearson Chi square test). The proportion of household members sleeping under a bednet the night before mosquito collection varied across time and village. An. funestus survival times were refractory to household ITN coverage and An. gambaie s.s. survival was reduced only under very high (>95%) ITN coverage. Conclusion: Despite ITN coverage, mosquitoes still acquired bloodmeals and P. falciparum infections. Survival differed across species and was inversely correlated with high ITN exposure in the household, but not oocyst development.
RESUMO
The human infectious reservoir of Plasmodium falciparum is governed by transmission efficiency during vector-human contact and mosquito biting preferences. Understanding biting bias in a natural setting can help target interventions to interrupt transmission. In a 15-month cohort in western Kenya, we detected P. falciparum in indoor-resting Anopheles and human blood samples by qPCR and matched mosquito bloodmeals to cohort participants using short-tandem repeat genotyping. Using risk factor analyses and discrete choice models, we assessed mosquito biting behavior with respect to parasite transmission. Biting was highly unequal; 20% of people received 86% of bites. Biting rates were higher on males (biting rate ratio (BRR): 1.68; CI: 1.28-2.19), children 5-15 years (BRR: 1.49; CI: 1.13-1.98), and P. falciparum-infected individuals (BRR: 1.25; CI: 1.01-1.55). In aggregate, P. falciparum-infected school-age (5-15 years) boys accounted for 50% of bites potentially leading to onward transmission and had an entomological inoculation rate 6.4x higher than any other group. Additionally, infectious mosquitoes were nearly 3x more likely than non-infectious mosquitoes to bite P. falciparum-infected individuals (relative risk ratio 2.76, 95% CI 1.65-4.61). Thus, persistent P. falciparum transmission was characterized by disproportionate onward transmission from school-age boys and by the preference of infected mosquitoes to feed upon infected people.
Assuntos
Anopheles , Mordeduras e Picadas de Insetos , Malária Falciparum , Mosquitos Vetores , Plasmodium falciparum , Humanos , Anopheles/parasitologia , Anopheles/fisiologia , Animais , Plasmodium falciparum/fisiologia , Plasmodium falciparum/isolamento & purificação , Plasmodium falciparum/genética , Malária Falciparum/transmissão , Malária Falciparum/parasitologia , Masculino , Adolescente , Criança , Pré-Escolar , Feminino , Quênia/epidemiologia , Mosquitos Vetores/parasitologia , Mosquitos Vetores/fisiologia , Adulto , Comportamento Alimentar , Adulto Jovem , LactenteRESUMO
1. Measuring vector-human contact in a natural setting can inform precise targeting of interventions to interrupt transmission of vector-borne diseases. One approach is to directly match human DNA in vector bloodmeals to the individuals who were bitten using genotype panels of discriminative short tandem repeats (STRs). Existing methods for matching STR profiles in bloodmeals to the people bitten preclude the ability to match most incomplete profiles and multi-source bloodmeals to bitten individuals. 2. We developed bistro, an R package that implements 3 preexisting STR matching methods as well as the package's namesake, bistro, a new algorithm described here. bistro employs forensic analysis methods to calculate likelihood ratios and match human STR profiles in bloodmeals to people using a dynamic threshold. We evaluated the algorithm's accuracy and compared it to existing matching approaches using a publicly-available panel of 188 single-source and 100 multi-source samples containing DNA from 50 known human sources. Then we applied it to match 777 newly field-collected mosquito bloodmeals to a database of 645 people. 3. The R package implements four STR matching algorithms in user-friendly functions with clear documentation. bistro correctly matched 99% (184/185) of profiles in single-source samples, and 63% (225/359) of profiles from multi-source samples, resulting in a sensitivity of 0.75 (vs < 0.51 for other algorithms). The specificity of bistro was 0.9998 (vs. 1 for other algorithms). Furthermore, bistro identified 80% (729/909) of all possible matches for field-derived mosquitoes, yielding 1.4x more matches than existing algorithms. 4. bistro identifies more correct bloodmeal-human matches than existing approaches, enabling more accurate and robust analyses of vector-human contact in natural settings. The bistro R package and corresponding documentation allow for straightforward uptake of this algorithm by others.
RESUMO
Molecular epidemiologic studies of malaria parasites commonly employ amplicon deep sequencing (AmpSeq) of marker genes derived from dried blood spots (DBS) to answer public health questions related to topics such as transmission and drug resistance. As these methods are increasingly employed to inform direct public health action, it is important to rigorously evaluate the risk of false positive and false negative haplotypes derived from clinically-relevant sample types. We performed a control experiment evaluating haplotype recovery from AmpSeq of 5 marker genes (ama1, csp, msp7, sera2, and trap) from DBS containing mixtures of DNA from 1 to 10 known P. falciparum reference strains across 3 parasite densities in triplicate (n=270 samples). While false positive haplotypes were present across all parasite densities and mixtures, we optimized censoring criteria to remove 83% (148/179) of false positives while removing only 8% (67/859) of true positives. Post-censoring, the median pairwise Jaccard distance between replicates was 0.83. We failed to recover 35% (477/1365) of haplotypes expected to be present in the sample. Haplotypes were more likely to be missed in low-density samples with <1.5 genomes/µL (OR: 3.88, CI: 1.82-8.27, vs. high-density samples with ≥75 genomes/µL) and in samples with lower read depth (OR per 10,000 reads: 0.61, CI: 0.54-0.69). Furthermore, minority haplotypes within a sample were more likely to be missed than dominant haplotypes (OR per 0.01 increase in proportion: 0.96, CI: 0.96-0.97). Finally, in clinical samples the percent concordance across markers for multiplicity of infection ranged from 40%-80%. Taken together, our observations indicate that, with sufficient read depth, haplotypes can be successfully recovered from DBS while limiting the false positive rate.
RESUMO
We assessed susceptibility patterns to newer antimicrobial agents among clinical carbapenem-resistant Klebsiella pneumoniae (CRKP) isolates from patients in long-term acute-care hospitals (LTACHs) from 2014 to 2015. Meropenem-vaborbactam and imipenem-relebactam nonsusceptibility were observed among 9.9% and 9.1% of isolates, respectively. Nonsusceptibility to ceftazidime-avibactam (1.1%) and plazomicin (0.8%) were uncommon.
Assuntos
Ceftazidima , Klebsiella pneumoniae , Humanos , Testes de Sensibilidade Microbiana , Antibacterianos/farmacologia , Antibacterianos/uso terapêutico , Combinação de Medicamentos , beta-LactamasesRESUMO
Increasing evidence of regional pathogen transmission networks highlights the importance of investigating the dissemination of multidrug-resistant organisms (MDROs) across a region to identify where transmission is occurring and how pathogens move across regions. We developed a framework for investigating MDRO regional transmission dynamics using whole-genome sequencing data and created regentrans, an easy-to-use, open source R package that implements these methods (https://github.com/Snitkin-Lab-Umich/regentrans). Using a dataset of over 400 carbapenem-resistant isolates of Klebsiella pneumoniae collected from patients in 21 long-term acute care hospitals over a one-year period, we demonstrate how to use our framework to gain insights into differences in inter- and intra-facility transmission across different facilities and over time. This framework and corresponding R package will allow investigators to better understand the origins and transmission patterns of MDROs, which is the first step in understanding how to stop transmission at the regional level.
Assuntos
Farmacorresistência Bacteriana Múltipla , Genômica/métodos , Infecções por Klebsiella/transmissão , Klebsiella pneumoniae/classificação , Carbapenêmicos/farmacologia , Infecção Hospitalar/microbiologia , Infecção Hospitalar/transmissão , Bases de Dados Genéticas , Humanos , Klebsiella pneumoniae/efeitos dos fármacos , Klebsiella pneumoniae/genética , Klebsiella pneumoniae/isolamento & purificação , Filogenia , Software , Sequenciamento Completo do GenomaRESUMO
Population genetic diversity of Plasmodium falciparum antigenic loci is high despite large bottlenecks in population size during the parasite life cycle. The prevalence of genetically distinct haplotypes at these loci, while well characterized in humans, has not been thoroughly compared between human and mosquito hosts. We assessed parasite haplotype prevalence, diversity, and evenness using human and mosquito P. falciparum infections collected from the same households during a 14-month longitudinal cohort study using amplicon deep sequencing of two antigenic gene fragments (ama1 and csp). To a prior set of infected humans (n = 1,175/2,813; 86.2% sequencing success) and mosquito abdomens (n = 199/1,448; 95.5% sequencing success), we added sequences from infected mosquito heads (n = 134/1,448; 98.5% sequencing success). The overall and sample-level parasite populations were more diverse in mosquitoes than in humans. Additionally, haplotype prevalences were more even in the P. falciparum human population than in the mosquito population, consistent with balancing selection occurring at these loci in humans. In contrast, we observed that infections in humans were more likely to harbor a dominant haplotype than infections in mosquitoes, potentially due to removal of unfit strains by the human immune system. Finally, within a given mosquito, there was little overlap in genetic composition of abdomen and head infections, suggesting that infections may be cleared from the abdomen during a mosquito's lifespan. Taken together, our observations provide evidence for the mosquito vector acting as a reservoir of sequence diversity in malaria parasite populations. IMPORTANCE Plasmodium falciparum is the deadliest human malaria parasite, and infections consisting of concurrent, multiple strains are common in regions of high endemicity. During transitions within and between the parasite's mosquito and human hosts, these strains are subject to population bottlenecks, and distinct parasite strains may have differential fitness in the various environments encountered. These bottlenecks and fitness differences may lead to differences in strain prevalence and diversity between hosts. We investigated differences in genetic diversity and evenness between P. falciparum parasites in human and mosquito hosts collected from the same households during a 14-month longitudinal study in Kenya. Compared to human parasite populations and infections, P. falciparum parasites observed in mosquito populations and infections were more diverse by multiple population genetic metrics. This suggests that the mosquito vector acts as a reservoir of sequence diversity in malaria parasite populations.
Assuntos
Culicidae , Variação Genética , Malária Falciparum , Plasmodium falciparum , Animais , Humanos , Culicidae/parasitologia , Estudos Longitudinais , Malária Falciparum/parasitologia , Plasmodium falciparum/genéticaRESUMO
We assessed risk factors for colistin resistance among carbapenem-resistant Klebsiella pneumoniae (CRKP) from 375 patients in long-term acute care hospitals. Recent colistin or polymyxin B exposure was associated with increased odds of colistin resistance (adjusted odds ratio = 1.11 per day of exposure, 95% confidence interval = 1.03-1.19, P = .007).
RESUMO
Inspired by well-established material and pedagogy provided by The Carpentries (Wilson, 2016), we developed a two-day workshop curriculum that teaches introductory R programming for managing, analyzing, plotting and reporting data using packages from the tidyverse (Wickham et al., 2019), the Unix shell, version control with git, and GitHub. While the official Software Carpentry curriculum is comprehensive, we found that it contains too much content for a two-day workshop. We also felt that the independent nature of the lessons left learners confused about how to integrate the newly acquired programming skills in their own work. Thus, we developed a new curriculum that aims to teach novices how to implement reproducible research principles in their own data analysis. The curriculum integrates live coding lessons with individual-level and group-based practice exercises, and also serves as a succinct resource that learners can reference both during and after the workshop. Moreover, it lowers the entry barrier for new instructors as they do not have to develop their own teaching materials or sift through extensive content. We developed this curriculum during a two-day sprint, successfully used it to host a two-day virtual workshop with almost 40 participants, and updated the material based on instructor and learner feedback. We hope that our new curriculum will prove useful to future instructors interested in teaching workshops with similar learning objectives.
RESUMO
Machine learning (ML) for classification and prediction based on a set of features is used to make decisions in healthcare, economics, criminal justice and more. However, implementing an ML pipeline including preprocessing, model selection, and evaluation can be time-consuming, confusing, and difficult. Here, we present mikropml (prononced "meek-ROPE em el"), an easy-to-use R package that implements ML pipelines using regression, support vector machines, decision trees, random forest, or gradient-boosted trees. The package is available on GitHub, CRAN, and conda.
RESUMO
Carbapenem-resistant Klebsiella pneumoniae (CRKP) is a critical-priority antibiotic resistance threat that has emerged over the past several decades, spread across the globe, and accumulated resistance to last-line antibiotic agents. While CRKP infections are associated with high mortality, only a subset of patients acquiring CRKP extraintestinal colonization will develop clinical infection. Here, we sought to ascertain the relative importance of patient characteristics and CRKP genetic background in determining patient risk of infection. Machine learning models classifying colonization versus infection were built using whole-genome sequences and clinical metadata from a comprehensive set of 331 CRKP extraintestinal isolates collected across 21 long-term acute-care hospitals over the course of a year. Model performance was evaluated based on area under the receiver operating characteristic curve (AUROC) on held-out test data. We found that patient and genomic features were predictive of clinical CRKP infection to similar extents (AUROC interquartile ranges [IQRs]: patient = 0.59 to 0.68, genomic = 0.55 to 0.61, combined = 0.62 to 0.68). Patient predictors of infection included the presence of indwelling devices, kidney disease, and length of stay. Genomic predictors of infection included presence of the ICEKp10 mobile genetic element carrying the yersiniabactin iron acquisition system and disruption of an O-antigen biosynthetic gene in a sublineage of the epidemic ST258 clone. Altered O-antigen biosynthesis increased association with the respiratory tract, and subsequent ICEKp10 acquisition was associated with increased virulence. These results highlight the potential of integrated models including both patient and microbial features to provide a more holistic understanding of patient clinical trajectories and ongoing within-lineage pathogen adaptation.IMPORTANCE Multidrug-resistant organisms, such as carbapenem-resistant Klebsiella pneumoniae (CRKP), colonize alarmingly large fractions of patients in regions of endemicity, but only a subset of patients develop life-threatening infections. While patient characteristics influence risk for infection, the relative contribution of microbial genetic background to patient risk remains unclear. We used machine learning to determine whether patient and/or microbial characteristics can discriminate between CRKP extraintestinal colonization and infection across multiple health care facilities and found that both patient and microbial factors were predictive. Examination of informative microbial genetic features revealed variation within the ST258 epidemic lineage that was associated with respiratory tract colonization and increased rates of infection. These findings indicate that circulating genetic variation within a highly prevalent epidemic lineage of CRKP influences patient clinical trajectories. In addition, this work supports the need for future studies examining the microbial genetic determinants of clinical outcomes in human populations, as well as epidemiologic and experimental follow-ups of identified features to discern generalizability and biological mechanisms.
RESUMO
We are bioinformatics trainees at the University of Michigan who started a local chapter of Girls Who Code to provide a fun and supportive environment for high school women to learn the power of coding. Our goal was to cover basic coding topics and data science concepts through live coding and hands-on practice. However, we could not find a resource that exactly met our needs. Therefore, over the past three years, we have developed a curriculum and instructional format using Jupyter notebooks to effectively teach introductory Python for data science. This method, inspired by The Carpentries organization, uses bite-sized lessons followed by independent practice time to reinforce coding concepts, and culminates in a data science capstone project using real-world data. We believe our open curriculum is a valuable resource to the wider education community and hope that educators will use and improve our lessons, practice problems, and teaching best practices. Anyone can contribute to our Open Educational Resources on GitHub.
RESUMO
While variant identification pipelines are becoming increasingly standardized, less attention has been paid to the pre-processing of variants prior to their use in bacterial genome-wide association studies (bGWAS). Three nuances of variant pre-processing that impact downstream identification of genetic associations include the separation of variants at multiallelic sites, separation of variants in overlapping genes, and referencing of variants relative to ancestral alleles. Here we demonstrate the importance of these variant pre-processing steps on diverse bacterial genomic datasets and present prewas, an R package, that standardizes the pre-processing of multiallelic sites, overlapping genes, and reference alleles before bGWAS. This package facilitates improved reproducibility and interpretability of bGWAS results. prewas enables users to extract maximal information from bGWAS by implementing multi-line representation for multiallelic sites and variants in overlapping genes. prewas outputs a binary SNP matrix that can be used for SNP-based bGWAS and will prevent the masking of minor alleles during bGWAS analysis. The optional binary gene matrix output can be used for gene-based bGWAS, which will enable users to maximize the power and evolutionary interpretability of their bGWAS studies. prewas is available for download from GitHub.