Pesquisa | Portal Regional da BVS

Functional Genomics Platform, A Cloud-Based Platform for Studying Microbial Life at Scale.

Seabolt, Edward E; Nayar, Gowri; Krishnareddy, Harsha; Agarwal, Akshay; Beck, Kristen L; Terrizzano, Ignacio; Kandogan, Eser; Kunitomi, Mark; Roth, Mary; Mukherjee, Vandana; Kaufman, James H.

IEEE/ACM Trans Comput Biol Bioinform ; 19(2): 940-952, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-32877338

RESUMO

The rapid growth in biological sequence data is revolutionizing our understanding of genotypic diversity and challenging conventional approaches to informatics. With the increasing availability of genomic data, traditional bioinformatic tools require substantial computational time and the creation of ever-larger indices each time a researcher seeks to gain insight from the data. To address these challenges, we pre-computed important relationships between biological entities spanning the Central Dogma of Molecular Biology and captured this information in a relational database. The database can be queried across hundreds of millions of entities and returns results in a fraction of the time required by traditional methods. In this paper, we describe Functional Genomics Platform (formerly known as OMXWare), a comprehensive database relating genotype to phenotype for bacterial life. Continually updated, the Functional Genomics Platform today contains data derived from 200,000 curated, self-consistently assembled genomes. The database stores functional data for over 68 million genes, 52 million proteins, and 239 million domains with associated biological activity annotations from Gene Ontology, KEGG, MetaCyc, and Reactome. The Functional Genomics Platform maps all of the many-to-many connections between each biological entity including the originating genome, gene, protein, and protein domain. Various microbial studies, from infectious disease to environmental health, can benefit from the rich data and connections. We describe the data selection, the pipeline to create and update the Functional Genomics Platform, and the developer tools (Python SDK and REST APIs)which allow researchers to efficiently study microbial life at scale.

Assuntos

Bases de Dados Genéticas , Software , Computação em Nuvem , Genoma , Genômica/métodos

Semi-Supervised Pipeline for Autonomous Annotation of SARS-CoV-2 Genomes.

Beck, Kristen L; Seabolt, Edward; Agarwal, Akshay; Nayar, Gowri; Bianco, Simone; Krishnareddy, Harsha; Ngo, Timothy A; Kunitomi, Mark; Mukherjee, Vandana; Kaufman, James H.

Viruses ; 13(12)2021 12 03.

Artigo em Inglês | MEDLINE | ID: mdl-34960694

RESUMO

SARS-CoV-2 genomic sequencing efforts have scaled dramatically to address the current global pandemic and aid public health. However, autonomous genome annotation of SARS-CoV-2 genes, proteins, and domains is not readily accomplished by existing methods and results in missing or incorrect sequences. To overcome this limitation, we developed a novel semi-supervised pipeline for automated gene, protein, and functional domain annotation of SARS-CoV-2 genomes that differentiates itself by not relying on the use of a single reference genome and by overcoming atypical genomic traits that challenge traditional bioinformatic methods. We analyzed an initial corpus of 66,000 SARS-CoV-2 genome sequences collected from labs across the world using our method and identified the comprehensive set of known proteins with 98.5% set membership accuracy and 99.1% accuracy in length prediction, compared to proteome references, including Replicase polyprotein 1ab (with its transcriptional slippage site). Compared to other published tools, such as Prokka (base) and VAPiD, we yielded a 6.4- and 1.8-fold increase in protein annotations. Our method generated 13,000,000 gene, protein, and domain sequences-some conserved across time and geography and others representing emerging variants. We observed 3362 non-redundant sequences per protein on average within this corpus and described key D614G and N501Y variants spatiotemporally in the initial genome corpus. For spike glycoprotein domains, we achieved greater than 97.9% sequence identity to references and characterized receptor binding domain variants. We further demonstrated the robustness and extensibility of our method on an additional 4000 variant diverse genomes containing all named variants of concern and interest as of August 2021. In this cohort, we successfully identified all keystone spike glycoprotein mutations in our predicted protein sequences with greater than 99% accuracy as well as demonstrating high accuracy of the protein and domain annotations. This work comprehensively presents the molecular targets to refine biomedical interventions for SARS-CoV-2 with a scalable, high-accuracy method to analyze newly sequenced infections as they arise.

Assuntos

COVID-19/virologia , Genoma Viral , Anotação de Sequência Molecular , SARS-CoV-2/genética , Sequência de Aminoácidos , Sequência de Bases , Biologia Computacional , Humanos , Mutação , Ligação Proteica , Domínios Proteicos , Glicoproteína da Espícula de Coronavírus/genética

Monitoring the microbiome for food safety and quality using deep shotgun sequencing.

Beck, Kristen L; Haiminen, Niina; Chambliss, David; Edlund, Stefan; Kunitomi, Mark; Huang, B Carol; Kong, Nguyet; Ganesan, Balasubramanian; Baker, Robert; Markwell, Peter; Kawas, Ban; Davis, Matthew; Prill, Robert J; Krishnareddy, Harsha; Seabolt, Ed; Marlowe, Carl H; Pierre, Sophie; Quintanar, André; Parida, Laxmi; Dubois, Geraud; Kaufman, James; Weimer, Bart C.

NPJ Sci Food ; 5(1): 3, 2021 Feb 08.

Artigo em Inglês | MEDLINE | ID: mdl-33558514

RESUMO

In this work, we hypothesized that shifts in the food microbiome can be used as an indicator of unexpected contaminants or environmental changes. To test this hypothesis, we sequenced the total RNA of 31 high protein powder (HPP) samples of poultry meal pet food ingredients. We developed a microbiome analysis pipeline employing a key eukaryotic matrix filtering step that improved microbe detection specificity to >99.96% during in silico validation. The pipeline identified 119 microbial genera per HPP sample on average with 65 genera present in all samples. The most abundant of these were Bacteroides, Clostridium, Lactococcus, Aeromonas, and Citrobacter. We also observed shifts in the microbial community corresponding to ingredient composition differences. When comparing culture-based results for Salmonella with total RNA sequencing, we found that Salmonella growth did not correlate with multiple sequence analyses. We conclude that microbiome sequencing is useful to characterize complex food microbial communities, while additional work is required for predicting specific species' viability from total RNA sequencing.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA