RESUMO
The Gemina system (http://gemina.igs.umaryland.edu) identifies, standardizes and integrates the outbreak metadata for the breadth of NIAID category A-C viral and bacterial pathogens, thereby providing an investigative and surveillance tool describing the Who [Host], What [Disease, Symptom], When [Date], Where [Location] and How [Pathogen, Environmental Source, Reservoir, Transmission Method] for each pathogen. The Gemina database will provide a greater understanding of the interactions of viral and bacterial pathogens with their hosts and infectious diseases through in-depth literature text-mining, integrated outbreak metadata, outbreak surveillance tools, extensive ontology development, metadata curation and representative genomic sequence identification and standards development. The Gemina web interface provides metadata selection and retrieval of a pathogen's; Infection Systems (Pathogen, Host, Disease, Transmission Method and Anatomy) and Incidents (Location and Date) along with a hosts Age and Gender. The Gemina system provides an integrated investigative and geospatial surveillance system connecting pathogens, pathogen products and disease anchored on the taxonomic ID of the pathogen and host to identify the breadth of hosts and diseases known for these pathogens, to identify the extent of outbreak locations, and to identify unique genomic regions with the DNA Signature Insignia Detection Tool.
Assuntos
Doenças Transmissíveis/microbiologia , Biologia Computacional/métodos , Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Genes Bacterianos , Genes Virais , Animais , Infecções Bacterianas/microbiologia , Doenças Transmissíveis/virologia , Biologia Computacional/tendências , Bases de Dados Factuais , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Software , Interface Usuário-Computador , Viroses/virologiaRESUMO
BACKGROUND: Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. RESULTS: We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. CONCLUSION: The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.
Assuntos
Computadores , Internet , Análise de Sequência de DNA , Software , Biologia Computacional , Genômica , Sequenciamento de Nucleotídeos em Larga EscalaRESUMO
MOTIVATION: The growth of sequence data has been accompanied by an increasing need to analyze data on distributed computer clusters. The use of these systems for routine analysis requires scalable and robust software for data management of large datasets. Software is also needed to simplify data management and make large-scale bioinformatics analysis accessible and reproducible to a wide class of target users. RESULTS: We have developed a workflow management system named Ergatis that enables users to build, execute and monitor pipelines for computational analysis of genomics data. Ergatis contains preconfigured components and template pipelines for a number of common bioinformatics tasks such as prokaryotic genome annotation and genome comparisons. Outputs from many of these components can be loaded into a Chado relational database. Ergatis was designed to be accessible to a broad class of users and provides a user friendly, web-based interface. Ergatis supports high-throughput batch processing on distributed compute clusters and has been used for data management in a number of genome annotation and comparative genomics projects. AVAILABILITY: Ergatis is an open-source project and is freely available at http://ergatis.sourceforge.net.
Assuntos
Biologia Computacional/métodos , Internet , Software , Bases de Dados Genéticas , Bases de Dados de Proteínas , Fluxo de TrabalhoRESUMO
The methodologies used to generate genome and metagenome annotations are diverse and vary between groups and laboratories. Descriptions of the annotation process are helpful in interpreting genome annotation data. Some groups have produced Standard Operating Procedures (SOPs) that describe the annotation process, but standards are lacking for structure and content of these descriptions. In addition, there is no central repository to store and disseminate procedures and protocols for genome annotation. We highlight the importance of SOPs for genome annotation and endorse an online repository of SOPs.