Pesquisa | Portal Regional da BVS

1.

Recommendations for Uniform Variant Calling of SARS-CoV-2 Genome Sequence across Bioinformatic Workflows.

Connor, Ryan; Shakya, Migun; Yarmosh, David A; Maier, Wolfgang; Martin, Ross; Bradford, Rebecca; Brister, J Rodney; Chain, Patrick S G; Copeland, Courtney A; di Iulio, Julia; Hu, Bin; Ebert, Philip; Gunti, Jonathan; Jin, Yumi; Katz, Kenneth S; Kochergin, Andrey; LaRosa, Tré; Li, Jiani; Li, Po-E; Lo, Chien-Chi; Rashid, Sujatha; Maiorova, Evguenia S; Xiao, Chunlin; Zalunin, Vadim; Purcell, Lisa; Pruitt, Kim D.

Viruses ; 16(3)2024 03 11.

Artigo em Inglês | MEDLINE | ID: mdl-38543795

RESUMO

Genomic sequencing of clinical samples to identify emerging variants of SARS-CoV-2 has been a key public health tool for curbing the spread of the virus. As a result, an unprecedented number of SARS-CoV-2 genomes were sequenced during the COVID-19 pandemic, which allowed for rapid identification of genetic variants, enabling the timely design and testing of therapies and deployment of new vaccine formulations to combat the new variants. However, despite the technological advances of deep sequencing, the analysis of the raw sequence data generated globally is neither standardized nor consistent, leading to vastly disparate sequences that may impact identification of variants. Here, we show that for both Illumina and Oxford Nanopore sequencing platforms, downstream bioinformatic protocols used by industry, government, and academic groups resulted in different virus sequences from same sample. These bioinformatic workflows produced consensus genomes with differences in single nucleotide polymorphisms, inclusion and exclusion of insertions, and/or deletions, despite using the same raw sequence as input datasets. Here, we compared and characterized such discrepancies and propose a specific suite of parameters and protocols that should be adopted across the field. Consistent results from bioinformatic workflows are fundamental to SARS-CoV-2 and future pathogen surveillance efforts, including pandemic preparation, to allow for a data-driven and timely public health response.

Assuntos

COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/epidemiologia , Pandemias , Fluxo de Trabalho , Biologia Computacional

2.

Towards increased accuracy and reproducibility in SARS-CoV-2 next generation sequence analysis for public health surveillance.

Connor, Ryan; Yarmosh, David A; Maier, Wolfgang; Shakya, Migun; Martin, Ross; Bradford, Rebecca; Brister, J Rodney; Chain, Patrick Sg; Copeland, Courtney A; di Iulio, Julia; Hu, Bin; Ebert, Philip; Gunti, Jonathan; Jin, Yumi; Katz, Kenneth S; Kochergin, Andrey; LaRosa, Tré; Li, Jiani; Li, Po-E; Lo, Chien-Chi; Rashid, Sujatha; Maiorova, Evguenia S; Xiao, Chunlin; Zalunin, Vadim; Pruitt, Kim D.

bioRxiv ; 2022 Nov 03.

Artigo em Inglês | MEDLINE | ID: mdl-36380755

RESUMO

During the COVID-19 pandemic, SARS-CoV-2 surveillance efforts integrated genome sequencing of clinical samples to identify emergent viral variants and to support rapid experimental examination of genome-informed vaccine and therapeutic designs. Given the broad range of methods applied to generate new viral genomes, it is critical that consensus and variant calling tools yield consistent results across disparate pipelines. Here we examine the impact of sequencing technologies (Illumina and Oxford Nanopore) and 7 different downstream bioinformatic protocols on SARS-CoV-2 variant calling as part of the NIH Accelerating COVID-19 Therapeutic Interventions and Vaccines (ACTIV) Tracking Resistance and Coronavirus Evolution (TRACE) initiative, a public-private partnership established to address the COVID-19 outbreak. Our results indicate that bioinformatic workflows can yield consensus genomes with different single nucleotide polymorphisms, insertions, and/or deletions even when using the same raw sequence input datasets. We introduce the use of a specific suite of parameters and protocols that greatly improves the agreement among pipelines developed by diverse organizations. Such consistency among bioinformatic pipelines is fundamental to SARS-CoV-2 and future pathogen surveillance efforts. The application of analysis standards is necessary to more accurately document phylogenomic trends and support data-driven public health responses.

3.

Intrahost SARS-CoV-2 k-mer Identification Method (iSKIM) for Rapid Detection of Mutations of Concern Reveals Emergence of Global Mutation Patterns.

Thommana, Ashley; Shakya, Migun; Gandhi, Jaykumar; Fung, Christian K; Chain, Patrick S G; Maljkovic Berry, Irina; Conte, Matthew A.

Viruses ; 14(10)2022 09 27.

Artigo em Inglês | MEDLINE | ID: mdl-36298683

RESUMO

Despite unprecedented global sequencing and surveillance of SARS-CoV-2, timely identification of the emergence and spread of novel variants of concern (VoCs) remains a challenge. Several million raw genome sequencing runs are now publicly available. We sought to survey these datasets for intrahost variation to study emerging mutations of concern. We developed iSKIM ("intrahost SARS-CoV-2 k-mer identification method") to relatively quickly and efficiently screen the many SARS-CoV-2 datasets to identify intrahost mutations belonging to lineages of concern. Certain mutations surged in frequency as intrahost minor variants just prior to, or while lineages of concern arose. The Spike N501Y change common to several VoCs was found as a minor variant in 834 samples as early as October 2020. This coincides with the timing of the first detected samples with this mutation in the Alpha/B.1.1.7 and Beta/B.1.351 lineages. Using iSKIM, we also found that Spike L452R was detected as an intrahost minor variant as early as September 2020, prior to the observed rise of the Epsilon/B.1.429/B.1.427 lineages in late 2020. iSKIM rapidly screens for mutations of interest in raw data, prior to genome assembly, and can be used to detect increases in intrahost variants, potentially providing an early indication of novel variant spread.

Assuntos

COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/diagnóstico , COVID-19/epidemiologia , Mutação , Glicoproteína da Espícula de Coronavírus/genética

4.

Benchmark datasets for SARS-CoV-2 surveillance bioinformatics.

Xiaoli, Lingzi; Hagey, Jill V; Park, Daniel J; Gulvik, Christopher A; Young, Erin L; Alikhan, Nabil-Fareed; Lawsin, Adrian; Hassell, Norman; Knipe, Kristen; Oakeson, Kelly F; Retchless, Adam C; Shakya, Migun; Lo, Chien-Chi; Chain, Patrick; Page, Andrew J; Metcalf, Benjamin J; Su, Michelle; Rowell, Jessica; Vidyaprakash, Eshaw; Paden, Clinton R; Huang, Andrew D; Roellig, Dawn; Patel, Ketan; Winglee, Kathryn; Weigand, Michael R; Katz, Lee S.

PeerJ ; 10: e13821, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36093336

RESUMO

Background: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the cause of coronavirus disease 2019 (COVID-19), has spread globally and is being surveilled with an international genome sequencing effort. Surveillance consists of sample acquisition, library preparation, and whole genome sequencing. This has necessitated a classification scheme detailing Variants of Concern (VOC) and Variants of Interest (VOI), and the rapid expansion of bioinformatics tools for sequence analysis. These bioinformatic tools are means for major actionable results: maintaining quality assurance and checks, defining population structure, performing genomic epidemiology, and inferring lineage to allow reliable and actionable identification and classification. Additionally, the pandemic has required public health laboratories to reach high throughput proficiency in sequencing library preparation and downstream data analysis rapidly. However, both processes can be limited by a lack of a standardized sequence dataset. Methods: We identified six SARS-CoV-2 sequence datasets from recent publications, public databases and internal resources. In addition, we created a method to mine public databases to identify representative genomes for these datasets. Using this novel method, we identified several genomes as either VOI/VOC representatives or non-VOI/VOC representatives. To describe each dataset, we utilized a previously published datasets format, which describes accession information and whole dataset information. Additionally, a script from the same publication has been enhanced to download and verify all data from this study. Results: The benchmark datasets focus on the two most widely used sequencing platforms: long read sequencing data from the Oxford Nanopore Technologies platform and short read sequencing data from the Illumina platform. There are six datasets: three were derived from recent publications; two were derived from data mining public databases to answer common questions not covered by published datasets; one unique dataset representing common sequence failures was obtained by rigorously scrutinizing data that did not pass quality checks. The dataset summary table, data mining script and quality control (QC) values for all sequence data are publicly available on GitHub: https://github.com/CDCgov/datasets-sars-cov-2. Discussion: The datasets presented here were generated to help public health laboratories build sequencing and bioinformatics capacity, benchmark different workflows and pipelines, and calibrate QC thresholds to ensure sequencing quality. Together, improvements in these areas support accurate and timely outbreak investigation and surveillance, providing actionable data for pandemic management. Furthermore, these publicly available and standardized benchmark data will facilitate the development and adjudication of new pipelines.

Assuntos

COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/epidemiologia , Benchmarking , Biologia Computacional , Análise de Sequência

5.

Intrahost SARS-CoV-2 k-mer identification method (iSKIM) for rapid detection of mutations of concern reveals emergence of global mutation patterns.

Thommana, Ashley; Shakya, Migun; Gandhi, Jaykumar; Fung, Christian K; Chain, Patrick S G; Berry, Irina Maljkovic; Conte, Matthew A.

bioRxiv ; 2022 Aug 16.

Artigo em Inglês | MEDLINE | ID: mdl-36032969

RESUMO

Despite unprecedented global sequencing and surveillance of SARS-CoV-2, timely identification of the emergence and spread of novel variants of concern (VoCs) remains a challenge. Several million raw genome sequencing runs are now publicly available. We sought to survey these datasets for intrahost variation to study emerging mutations of concern. We developed iSKIM ("intrahost SARS-CoV-2 k-mer identification method") to relatively quickly and efficiently screen the many SARS-CoV-2 datasets to identify intrahost mutations belonging to lineages of concern. Certain mutations surged in frequency as intrahost minor variants just prior to, or while lineages of concern arose. The Spike N501Y change common to several VoCs was found as a minor variant in 834 samples as early as October 2020. This coincides with the timing of the first detected samples with this mutation in the Alpha/B.1.1.7 and Beta/B.1.351 lineages. Using iSKIM, we also found that Spike L452R was detected as an intrahost minor variant as early as September 2020, prior to the observed rise of the Epsilon/B.1.429/B.1.427 lineages in late 2020. iSKIM rapidly screens for mutations of interest in raw data, prior to genome assembly, and can be used to detect increases in intrahost variants, potentially providing an early indication of novel variant spread.

6.

EDGE COVID-19: a web platform to generate submission-ready genomes from SARS-CoV-2 sequencing efforts.

Lo, Chien-Chi; Shakya, Migun; Connor, Ryan; Davenport, Karen; Flynn, Mark; Gutiérrez, Adán Myers Y; Hu, Bin; Li, Po-E; Jackson, Elais Player; Xu, Yan; Chain, Patrick S G.

Bioinformatics ; 38(10): 2700-2704, 2022 05 13.

Artigo em Inglês | MEDLINE | ID: mdl-35561186

RESUMO

SUMMARY: Genomics has become an essential technology for surveilling emerging infectious disease outbreaks. A range of technologies and strategies for pathogen genome enrichment and sequencing are being used by laboratories worldwide, together with different and sometimes ad hoc, analytical procedures for generating genome sequences. A fully integrated analytical process for raw sequence to consensus genome determination, suited to outbreaks such as the ongoing COVID-19 pandemic, is critical to provide a solid genomic basis for epidemiological analyses and well-informed decision making. We have developed a web-based platform and integrated bioinformatic workflows that help to provide consistent high-quality analysis of SARS-CoV-2 sequencing data generated with either the Illumina or Oxford Nanopore Technologies (ONT). Using an intuitive web-based interface, this workflow automates data quality control, SARS-CoV-2 reference-based genome variant and consensus calling, lineage determination and provides the ability to submit the consensus sequence and necessary metadata to GenBank, GISAID and INSDC raw data repositories. We tested workflow usability using real world data and validated the accuracy of variant and lineage analysis using several test datasets, and further performed detailed comparisons with results from the COVID-19 Galaxy Project workflow. Our analyses indicate that EC-19 workflows generate high-quality SARS-CoV-2 genomes. Finally, we share a perspective on patterns and impact observed with Illumina versus ONT technologies on workflow congruence and differences. AVAILABILITY AND IMPLEMENTATION: https://edge-covid19.edgebioinformatics.org, and https://github.com/LANL-Bioinformatics/EDGE/tree/SARS-CoV2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

COVID-19 , SARS-CoV-2 , Genoma Viral , Genômica , Humanos , Pandemias , SARS-CoV-2/genética

7.

Experimental evidence for the impact of soil viruses on carbon cycling during surface plant litter decomposition.

Albright, Michaeline B N; Gallegos-Graves, La Verne; Feeser, Kelli L; Montoya, Kyana; Emerson, Joanne B; Shakya, Migun; Dunbar, John.

ISME Commun ; 2(1): 24, 2022 Mar 16.

Artigo em Inglês | MEDLINE | ID: mdl-37938672

RESUMO

To date, the potential impact of viral communities on biogeochemical cycles in soil has largely been inferred from correlational evidence, such as virus-driven changes in microbial abundances, viral auxiliary metabolic genes, and links with soil physiochemical properties. To more directly test the impact of soil viruses on carbon cycling during plant litter decomposition, we added concentrated viral community suspensions to complex litter decomposer communities in 40-day microcosm experiments. Microbial communities from two New Mexico alpine soils, Pajarito (PJ) and Santa Fe (SF), were inoculated onto grass litter on sand, and three treatments were applied in triplicate to each set of microcosms: addition of buffer (no added virus), live virus (+virus), or killed-virus (+killed-virus) fractions extracted from the same soil. Significant differences in respiration were observed between the +virus and +killed-virus treatments in the PJ, but not the SF microcosms. Bacterial and fungal community composition differed significantly by treatment in both PJ and SF microcosms. Combining data across both soils, viral addition altered links between bacterial and fungal diversity, dissolved organic carbon and total nitrogen. Overall, we demonstrate that increasing viral pressure in complex microbial communities can impact terrestrial biogeochemical cycling but is context-dependent.

8.

Comparative genomic and phenotypic characterization of invasive non-typhoidal Salmonella isolates from Siaya, Kenya.

Kubicek-Sutherland, Jessica Z; Xie, Gary; Shakya, Migun; Dighe, Priya K; Jacobs, Lindsey L; Daligault, Hajnalka; Davenport, Karen; Stromberg, Loreen R; Stromberg, Zachary R; Cheng, Qiuying; Kempaiah, Prakasha; Ong'echa, John Michael; Otieno, Vincent; Raballah, Evans; Anyona, Samuel; Ouma, Collins; Chain, Patrick S G; Perkins, Douglas J; Mukundan, Harshini; McMahon, Benjamin H; Doggett, Norman A.

PLoS Negl Trop Dis ; 15(2): e0008991, 2021 02.

Artigo em Inglês | MEDLINE | ID: mdl-33524010

RESUMO

Non-typhoidal Salmonella (NTS) is a major global health concern that often causes bloodstream infections in areas of the world affected by malnutrition and comorbidities such as HIV and malaria. Developing a strategy to control the emergence and spread of highly invasive and antimicrobial resistant NTS isolates requires a comprehensive analysis of epidemiological factors and molecular pathogenesis. Here, we characterize 11 NTS isolates that caused bloodstream infections in pediatric patients in Siaya, Kenya from 2003-2010. Nine isolates were identified as S. Typhimurium sequence type 313 while the other two were S. Enteritidis. Comprehensive genotypic and phenotypic analyses were performed to compare these isolates to those previously identified in sub-Saharan Africa. We identified a S. Typhimurium isolate referred to as UGA14 that displayed novel plasmid, pseudogene and resistance features as compared to other isolates reported from Africa. Notably, UGA14 is able to ferment both lactose and sucrose due to the acquisition of insertion elements on the pKST313 plasmid. These findings show for the first time the co-evolution of plasmid-mediated lactose and sucrose metabolism along with cephalosporin resistance in NTS further elucidating the evolutionary mechanisms of invasive NTS phenotypes. These results further support the use of combined genomic and phenotypic approaches to detect and characterize atypical NTS isolates in order to advance biosurveillance efforts that inform countermeasures aimed at controlling invasive and antimicrobial resistant NTS.

Assuntos

Genômica , Fenótipo , Infecções por Salmonella/epidemiologia , Salmonella enteritidis/genética , Salmonella typhimurium/genética , Antibacterianos/uso terapêutico , Pré-Escolar , Farmacorresistência Bacteriana Múltipla/efeitos dos fármacos , Feminino , Humanos , Lactente , Recém-Nascido , Quênia/epidemiologia , Masculino , Infecções por Salmonella/tratamento farmacológico , Infecções por Salmonella/microbiologia , Salmonella enteritidis/isolamento & purificação , Salmonella enteritidis/fisiologia , Salmonella typhimurium/isolamento & purificação , Salmonella typhimurium/fisiologia

9.

A public website for the automated assessment and validation of SARS-CoV-2 diagnostic PCR assays.

Li, Po-E; Myers Y Gutiérrez, Adán; Davenport, Karen; Flynn, Mark; Hu, Bin; Lo, Chien-Chi; Player Jackson, Elais; Shakya, Migun; Xu, Yan; Gans, Jason D; Chain, Patrick S G.

Bioinformatics ; 37(7): 1024-1025, 2021 05 17.

Artigo em Inglês | MEDLINE | ID: mdl-32777813

RESUMO

SUMMARY: Polymerase chain reaction-based assays are the current gold standard for detecting and diagnosing SARS-CoV-2. However, as SARS-CoV-2 mutates, we need to constantly assess whether existing PCR-based assays will continue to detect all known viral strains. To enable the continuous monitoring of SARS-CoV-2 assays, we have developed a web-based assay validation algorithm that checks existing PCR-based assays against the ever-expanding genome databases for SARS-CoV-2 using both thermodynamic and edit-distance metrics. The assay-screening results are displayed as a heatmap, showing the number of mismatches between each detection and each SARS-CoV-2 genome sequence. Using a mismatch threshold to define detection failure, assay performance is summarized with the true-positive rate (recall) to simplify assay comparisons. AVAILABILITY AND IMPLEMENTATION: The assay evaluation website and supporting software are Open Source and freely available at https://covid19.edgebioinformatics.org/#/assayValidation, https://github.com/jgans/thermonucleotide BLAST and https://github.com/LANL-Bioinformatics/assay_validation. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

COVID-19 , SARS-CoV-2 , Teste para COVID-19 , Humanos , Reação em Cadeia da Polimerase , Sensibilidade e Especificidade

10.

NCBI's Virus Discovery Codeathon: Building "FIVE" -The Federated Index of Viral Experiments API Index.

Martí-Carreras, Joan; Gener, Alejandro Rafael; Miller, Sierra D; Brito, Anderson F; Camacho, Christiam E; Connor, Ryan; Deboutte, Ward; Glickman, Cody; Kristensen, David M; Meyer, Wynn K; Modha, Sejal; Norris, Alexis L; Saha, Surya; Belford, Anna K; Biederstedt, Evan; Brister, James Rodney; Buchmann, Jan P; Cooley, Nicholas P; Edwards, Robert A; Javkar, Kiran; Muchow, Michael; Muralidharan, Harihara Subrahmaniam; Pepe-Ranney, Charles; Shah, Nidhi; Shakya, Migun; Tisza, Michael J; Tully, Benjamin J; Vanmechelen, Bert; Virta, Valerie C; Weissman, J L; Zalunin, Vadim; Efremov, Alexandre; Busby, Ben.

Viruses ; 12(12)2020 12 10.

Artigo em Inglês | MEDLINE | ID: mdl-33322070

RESUMO

Viruses represent important test cases for data federation due to their genome size and the rapid increase in sequence data in publicly available databases. However, some consequences of previously decentralized (unfederated) data are lack of consensus or comparisons between feature annotations. Unifying or displaying alternative annotations should be a priority both for communities with robust entry representation and for nascent communities with burgeoning data sources. To this end, during this three-day continuation of the Virus Hunting Toolkit codeathon series (VHT-2), a new integrated and federated viral index was elaborated. This Federated Index of Viral Experiments (FIVE) integrates pre-existing and novel functional and taxonomy annotations and virus-host pairings. Variability in the context of viral genomic diversity is often overlooked in virus databases. As a proof-of-concept, FIVE was the first attempt to include viral genome variation for HIV, the most well-studied human pathogen, through viral genome diversity graphs. As per the publication of this manuscript, FIVE is the first implementation of a virus-specific federated index of such scope. FIVE is coded in BigQuery for optimal access of large quantities of data and is publicly accessible. Many projects of database or index federation fail to provide easier alternatives to access or query information. To this end, a Python API query system was developed to enhance the accessibility of FIVE.

Assuntos

Biologia Computacional , Bases de Dados Genéticas , Metagenômica/métodos , Vírus/genética , Biologia Computacional/métodos , Variação Genética , Genoma Viral , Interações Hospedeiro-Patógeno , Humanos , Interface Usuário-Computador , Proteínas Virais/genética , Proteínas Virais/metabolismo , Vírus/metabolismo , Navegador

11.

The National Microbiome Data Collaborative: enabling microbiome science.

Wood-Charlson, Elisha M; Auberry, Deanna; Blanco, Hannah; Borkum, Mark I; Corilo, Yuri E; Davenport, Karen W; Deshpande, Shweta; Devarakonda, Ranjeet; Drake, Meghan; Duncan, William D; Flynn, Mark C; Hays, David; Hu, Bin; Huntemann, Marcel; Li, Po-E; Lipton, Mary; Lo, Chien-Chi; Millard, David; Miller, Kayd; Piehowski, Paul D; Purvine, Samuel; Reddy, T B K; Shakya, Migun; Sundaramurthi, Jagadish Chandrabose; Vangay, Pajau; Wei, Yaxing; Wilson, Bruce E; Canon, Shane; Chain, Patrick S G; Fagnan, Kjiersten; Martin, Stanton; McCue, Lee Ann; Mungall, Christopher J; Mouncey, Nigel J; Maxon, Mary E; Eloe-Fadrosh, Emiley A.

Nat Rev Microbiol ; 18(6): 313-314, 2020 06.

Artigo em Inglês | MEDLINE | ID: mdl-32350400

Assuntos

Microbiota , Ciência de Dados , Humanos , Colaboração Intersetorial

12.

A Gene Cluster That Encodes Histone Deacetylase Inhibitors Contributes to Bacterial Persistence and Antibiotic Tolerance in Burkholderia thailandensis.

Micheva-Viteva, Sofiya N; Shakya, Migun; Adikari, Samantha H; Gleasner, Cheryl D; Velappan, Nileena; Mourant, Judith R; Chain, Patrick S G; Hong-Geller, Elizabeth.

mSystems ; 5(1)2020 Feb 11.

Artigo em Inglês | MEDLINE | ID: mdl-32047060

RESUMO

Persister cells are genetically identical variants in a bacterial population that have phenotypically modified their physiology to survive environmental stress. In bacterial pathogens, persisters are able to survive antibiotic treatment and reinfect patients in a frustrating cycle of chronic infection. To better define core persistence mechanisms for therapeutics development, we performed transcriptomics analyses of Burkholderia thailandensis populations enriched for persisters via three methods: flow sorting for low proton motive force, meropenem treatment, and culture aging. Although the three persister-enriched populations generally displayed divergent gene expression profiles that reflect the multimechanistic nature of stress adaptations, there were several common gene pathways activated in two or all three populations. These include polyketide and nonribosomal peptide synthesis, Clp proteases, mobile elements, enzymes involved in lipid metabolism, and ATP-binding cassette (ABC) transporter systems. In particular, identification of genes that encode polyketide synthases (PKSs) and fatty acid catabolism factors indicates that generation of secondary metabolites, natural products, and complex lipids could be part of the metabolic program that governs the persistence state. We also found that loss-of-function mutations in the PKS-encoding gene locus BTH_I2366, which plays a role in biosynthesis of histone deacetylase (HDAC) inhibitors, resulted in increased sensitivity to antibiotics targeting DNA replication. Furthermore, treatment of multiple bacterial pathogens with a fatty acid synthesis inhibitor, CP-640186, potentiated the efficacy of meropenem against the persister populations. Altogether, our results suggest that bacterial persisters may exhibit an outwardly dormant physiology but maintain active metabolic processes that are required to maintain persistence.IMPORTANCE The discovery of antibiotics such as penicillin and streptomycin marked a historic milestone in the 1940s and heralded a new era of antimicrobial therapy as the modern standard for medical treatment. Yet, even in those early days of discovery, it was noted that a small subset of cells (â¼1 in 105) survived antibiotic treatment and continued to persist, leading to recurrence of chronic infection. These persisters are phenotypic variants that have modified their physiology to survive environmental stress. In this study, we have performed three transcriptomic screens to identify persistence genes that are common between three different stressor conditions. In particular, we identified genes that function in the synthesis of secondary metabolites, small molecules, and complex lipids, which are likely required to maintain the persistence state. Targeting universal persistence genes can lead to the development of clinically relevant antipersistence therapeutics for infectious disease management.

13.

Standardized phylogenetic and molecular evolutionary analysis applied to species across the microbial tree of life.

Shakya, Migun; Ahmed, Sanaa A; Davenport, Karen W; Flynn, Mark C; Lo, Chien-Chi; Chain, Patrick S G.

Sci Rep ; 10(1): 1723, 2020 02 03.

Artigo em Inglês | MEDLINE | ID: mdl-32015354

RESUMO

There is growing interest in reconstructing phylogenies from the copious amounts of genome sequencing projects that target related viral, bacterial or eukaryotic organisms. To facilitate the construction of standardized and robust phylogenies for disparate types of projects, we have developed a complete bioinformatic workflow, with a web-based component to perform phylogenetic and molecular evolutionary (PhaME) analysis from sequencing reads, draft assemblies or completed genomes of closely related organisms. Furthermore, the ability to incorporate raw data, including some metagenomic samples containing a target organism (e.g. from clinical samples with suspected infectious agents), shows promise for the rapid phylogenetic characterization of organisms within complex samples without the need for prior assembly.

Assuntos

Burkholderia/genética , Ebolavirus/genética , Escherichia/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Saccharomyces/genética , Software , Algoritmos , Evolução Biológica , Mapeamento Cromossômico , Biologia Computacional , Conjuntos de Dados como Assunto , Evolução Molecular , Metagenoma , Filogenia , Validação de Programas de Computador

14.

Novel Insights Into the Spread of Enteric Pathogens Using Genomics.

Domman, Daryl; Ruis, Christopher; Dorman, Matthew J; Shakya, Migun; Chain, Patrick S G.

J Infect Dis ; 221(Suppl 3): S319-S330, 2020 03 28.

Artigo em Inglês | MEDLINE | ID: mdl-31538189

Assuntos

Diarreia/epidemiologia , Gastroenteropatias/epidemiologia , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Diarreia/microbiologia , Diarreia/virologia , Gastroenteropatias/microbiologia , Gastroenteropatias/virologia , Humanos

15.

Advances and Challenges in Metatranscriptomic Analysis.

Shakya, Migun; Lo, Chien-Chi; Chain, Patrick S G.

Front Genet ; 10: 904, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31608125

RESUMO

Sequencing-based analyses of microbiomes have traditionally focused on addressing the question of community membership and profiling taxonomic abundance through amplicon sequencing of 16 rRNA genes. More recently, shotgun metagenomics, which involves the random sequencing of all genomic content of a microbiome, has dominated this arena due to advancements in sequencing technology throughput and capability to profile genes as well as microbiome membership. While these methods have revealed a great number of insights into a wide variety of microbiomes, both of these approaches only describe the presence of organisms or genes, and not whether they are active members of the microbiome. To obtain deeper insights into how a microbial community responds over time to their changing environmental conditions, microbiome scientists are beginning to employ large-scale metatranscriptomics approaches. Here, we present a comprehensive review on computational metatranscriptomics approaches to study microbial community transcriptomes. We review the major advancements in this burgeoning field, compare strengths and weaknesses to other microbiome analysis methods, list available tools and workflows, and describe use cases and limitations of this method. We envision that this field will continue to grow exponentially, as will the scope of projects (e.g. longitudinal studies of community transcriptional responses to perturbations over time) and the resulting data. This review will provide a list of options for computational analysis of these data and will highlight areas in need of development.

16.

NCBI's Virus Discovery Hackathon: Engaging Research Communities to Identify Cloud Infrastructure Requirements.

Connor, Ryan; Brister, Rodney; Buchmann, Jan P; Deboutte, Ward; Edwards, Rob; Martí-Carreras, Joan; Tisza, Mike; Zalunin, Vadim; Andrade-Martínez, Juan; Cantu, Adrian; D'Amour, Michael; Efremov, Alexandre; Fleischmann, Lydia; Forero-Junco, Laura; Garmaeva, Sanzhima; Giluso, Melissa; Glickman, Cody; Henderson, Margaret; Kellman, Benjamin; Kristensen, David; Leubsdorf, Carl; Levi, Kyle; Levi, Shane; Pakala, Suman; Peddu, Vikas; Ponsero, Alise; Ribeiro, Eldred; Roy, Farrah; Rutter, Lindsay; Saha, Surya; Shakya, Migun; Shean, Ryan; Miller, Matthew; Tully, Benjamin; Turkington, Christopher; Youens-Clark, Ken; Vanmechelen, Bert; Busby, Ben.

Genes (Basel) ; 10(9)2019 09 16.

Artigo em Inglês | MEDLINE | ID: mdl-31527408

RESUMO

A wealth of viral data sits untapped in publicly available metagenomic data sets when it might be extracted to create a usable index for the virological research community. We hypothesized that work of this complexity and scale could be done in a hackathon setting. Ten teams comprised of over 40 participants from six countries, assembled to create a crowd-sourced set of analysis and processing pipelines for a complex biological data set in a three-day event on the San Diego State University campus starting 9 January 2019. Prior to the hackathon, 141,676 metagenomic data sets from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) were pre-assembled into contiguous assemblies (contigs) by NCBI staff. During the hackathon, a subset consisting of 2953 SRA data sets (approximately 55 million contigs) was selected, which were further filtered for a minimal length of 1 kb. This resulted in 4.2 million (Mio) contigs, which were aligned using BLAST against all known virus genomes, phylogenetically clustered and assigned metadata. Out of the 4.2 Mio contigs, 360,000 contigs were labeled with domains and an additional subset containing 4400 contigs was screened for virus or virus-like genes. The work yielded valuable insights into both SRA data and the cloud infrastructure required to support such efforts, revealing analysis bottlenecks and possible workarounds thereof. Mainly: (i) Conservative assemblies of SRA data improves initial analysis steps; (ii) existing bioinformatic software with weak multithreading/multicore support can be elevated by wrapper scripts to use all cores within a computing node; (iii) redesigning existing bioinformatic algorithms for a cloud infrastructure to facilitate its use for a wider audience; and (iv) a cloud infrastructure allows a diverse group of researchers to collaborate effectively. The scientific findings will be extended during a follow-up event. Here, we present the applied workflows, initial results, and lessons learned from the hackathon.

Assuntos

Computação em Nuvem/normas , Genoma Viral , Metagenoma , Metagenômica/métodos , Big Data , Genoma Humano , Humanos , Metagenômica/normas , Software

17.

Machine-Learning Classification Suggests That Many Alphaproteobacterial Prophages May Instead Be Gene Transfer Agents.

Kogay, Roman; Neely, Taylor B; Birnbaum, Daniel P; Hankel, Camille R; Shakya, Migun; Zhaxybayeva, Olga.

Genome Biol Evol ; 11(10): 2941-2953, 2019 10 01.

Artigo em Inglês | MEDLINE | ID: mdl-31560374

RESUMO

Many of the sequenced bacterial and archaeal genomes encode regions of viral provenance. Yet, not all of these regions encode bona fide viruses. Gene transfer agents (GTAs) are thought to be former viruses that are now maintained in genomes of some bacteria and archaea and are hypothesized to enable exchange of DNA within bacterial populations. In Alphaproteobacteria, genes homologous to the "head-tail" gene cluster that encodes structural components of the Rhodobacter capsulatus GTA (RcGTA) are found in many taxa, even if they are only distantly related to Rhodobacter capsulatus. Yet, in most genomes available in GenBank RcGTA-like genes have annotations of typical viral proteins, and therefore are not easily distinguished from their viral homologs without additional analyses. Here, we report a "support vector machine" classifier that quickly and accurately distinguishes RcGTA-like genes from their viral homologs by capturing the differences in the amino acid composition of the encoded proteins. Our open-source classifier is implemented in Python and can be used to scan homologs of the RcGTA genes in newly sequenced genomes. The classifier can also be trained to identify other types of GTAs, or even to detect other elements of viral ancestry. Using the classifier trained on a manually curated set of homologous viruses and GTAs, we detected RcGTA-like "head-tail" gene clusters in 57.5% of the 1,423 examined alphaproteobacterial genomes. We also demonstrated that more than half of the in silico prophage predictions are instead likely to be GTAs, suggesting that in many alphaproteobacterial genomes the RcGTA-like elements remain unrecognized.

Assuntos

Alphaproteobacteria/genética , Prófagos/genética , Máquina de Vetores de Suporte , Alphaproteobacteria/classificação , Genes Bacterianos , Genes Virais , Genoma Bacteriano

18.

Remedial Treatment of Corroded Iron Objects by Environmental Aeromonas Isolates.

Kooli, Wafa M; Junier, Thomas; Shakya, Migun; Monachon, Mathilde; Davenport, Karen W; Vaideeswaran, Kaushik; Vernudachi, Alexandre; Marozau, Ivan; Monrouzeau, Teddy; Gleasner, Cheryl D; McMurry, Kim; Lienhard, Reto; Rufener, Lucien; Perret, Jean-Luc; Sereda, Olha; Chain, Patrick S; Joseph, Edith; Junier, Pilar.

Appl Environ Microbiol ; 85(3)2019 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-30478230

RESUMO

Using bacteria to transform reactive corrosion products into stable compounds represents an alternative to traditional methods employed in iron conservation. Two environmental Aeromonas strains (CA23 and CU5) were used to transform ferric iron corrosion products (goethite and lepidocrocite) into stable ferrous iron-bearing minerals (vivianite and siderite). A genomic and transcriptomic approach was used to analyze the metabolic traits of these strains and to evaluate their pathogenic potential. Although genes involved in solid-phase iron reduction were identified, key genes present in other environmental iron-reducing species are missing from the genome of CU5. Several pathogenicity factors were identified in the genomes of both strains, but none of these was expressed under iron reduction conditions. Additional in vivo tests showed hemolytic and cytotoxic activities for strain CA23 but not for strain CU5. Both strains were easily inactivated using ethanol and heat. Nonetheless, given a lesser potential for a pathogenic lifestyle, CU5 is the most promising candidate for the development of a bio-based iron conservation method stabilizing iron corrosion. Based on all the results, a prototype treatment was established using archaeological items. On those, the conversion of reactive corrosion products and the formation of a homogenous layer of biogenic iron minerals were achieved. This study shows how naturally occurring microorganisms and their metabolic capabilities can be used to develop bio-inspired solutions to the problem of metal corrosion.IMPORTANCE Microbiology can greatly help in the quest for a sustainable solution to the problem of iron corrosion, which causes important economic losses in a wide range of fields, including the protection of cultural heritage and building materials. Using bacteria to transform reactive and unstable corrosion products into more-stable compounds represents a promising approach. The overall aim of this study was to develop a method for the conservation and restoration of corroded iron items, starting from the isolation of iron-reducing bacteria from natural environments. This resulted in the identification of a suitable candidate (Aeromonas sp. strain CU5) that mediates the formation of desirable minerals at the surfaces of the objects. This led to the proof of concept of an application method on real objects.

Assuntos

Aeromonas/metabolismo , Compostos Férricos/metabolismo , Compostos de Ferro/metabolismo , Ferro/metabolismo , Minerais/metabolismo , Aeromonas/genética , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Biodegradação Ambiental , Corrosão , Genoma Bacteriano , Ferro/química , Oxirredução

19.

Insights into origin and evolution of α-proteobacterial gene transfer agents.

Shakya, Migun; Soucy, Shannon M; Zhaxybayeva, Olga.

Virus Evol ; 3(2): vex036, 2017 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-29250433

RESUMO

Several bacterial and archaeal lineages produce nanostructures that morphologically resemble small tailed viruses, but, unlike most viruses, contain apparently random pieces of the host genome. Since these elements can deliver the packaged DNA to other cells, they were dubbed gene transfer agents (GTAs). Because many genes involved in GTA production have viral homologs, it has been hypothesized that the GTA ancestor was a virus. Whether GTAs represent an atypical virus, a defective virus, or a virus co-opted by the prokaryotes for some function, remains to be elucidated. To evaluate these possibilities, we examined the distribution and evolutionary histories of genes that encode a GTA in the α-proteobacterium Rhodobacter capsulatus (RcGTA). We report that although homologs of many individual RcGTA genes are abundant across bacteria and their viruses, RcGTA-like genomes are mainly found in one subclade of α-proteobacteria. When compared with the viral homologs, genes of the RcGTA-like genomes evolve significantly slower, and do not have higher %A+T nucleotides than their host chromosomes. Moreover, they appear to reside in stable regions of the bacterial chromosomes that are generally conserved across taxonomic orders. These findings argue against RcGTA being an atypical or a defective virus. Our phylogenetic analyses suggest that RcGTA ancestor likely originated in the lineage that gave rise to contemporary α-proteobacterial orders Rhizobiales, Rhodobacterales, Caulobacterales, Parvularculales, and Sphingomonadales, and since that time the RcGTA-like element has co-evolved with its host chromosomes. Such evolutionary history is compatible with maintenance of these elements by bacteria due to some selective advantage. As for many other prokaryotic traits, horizontal gene transfer played a substantial role in the evolution of RcGTA-like elements, not only in shaping its genome components within the orders, but also in occasional dissemination of RcGTA-like regions across the orders and even to different bacterial phyla.

20.

Brief Protocol for EDGE Bioinformatics: Analyzing Microbial and Metagenomic NGS Data.

Philipson, Casandra; Davenport, Karen; Voegtly, Logan; Lo, Chien-Chi; Li, Po-E; Xu, Yan; Shakya, Migun; Cer, Regina Z; Bishop-Lilly, Kimberly A; Hamilton, Theron; Chain, Patrick S G.

Bio Protoc ; 7(23): e2622, 2017 Dec 05.

Artigo em Inglês | MEDLINE | ID: mdl-34595290

RESUMO

Next-generation sequencing (NGS) offers unparalleled resolution for untargeted organism detection and characterization. However, the majority of NGS analysis programs require users to be proficient in programming and command-line interfaces. EDGE bioinformatics was developed to offer scientists with little to no bioinformatics expertise a point-and-click platform for analyzing sequencing data in a rapid and reproducible manner. EDGE (Empowering the Development of Genomics Expertise) v1.0 released in January 2017, is an intuitive web-based bioinformatics platform engineered for the analysis of microbial and metagenomic NGS-based data ( Li et al., 2017 ). The EDGE bioinformatics suite combines vetted publicly available tools, and tracks settings to ensure reliable and reproducible analysis workflows. To execute the EDGE workflow, only raw sequencing reads and a project ID are necessary. Users can access in-house data, or run analyses on samples deposited in Sequence Read Archive. Default settings offer a robust first-glance and are often sufficient for novice users. All analyses are modular; users can easily turn workflows on/off, and modify parameters to cater to project needs. Results are compiled and available for download in a PDF-formatted report containing publication quality figures. We caution that interpreting results still requires in-depth scientific understanding, however report visuals are often informative, even to novice users.

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

RESUMO

Assuntos

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA