RESUMO
The Ontology for Biomedical Investigations (OBI) underwent a focused review of assay term annotations, logic and hierarchy with a goal to improve and standardize these terms. As a result, inconsistencies in W3C Web Ontology Language (OWL) expressions were identified and corrected, and additionally, standardized design patterns and a formalized template to maintain them were developed. We describe here this informative and productive process to describe the specific benefits and obstacles for OBI and the universal lessons for similar projects.
Assuntos
Ontologias Biológicas , Idioma , Padrões de ReferênciaRESUMO
BACKGROUND: Microbial genetics has formed a foundation for understanding many aspects of biology. Systematic annotation that supports computational data mining should reveal further insights for microbes, microbiomes, and conserved functions beyond microbes. The Ontology of Microbial Phenotypes (OMP) was created to support such annotation. RESULTS: We define standards for an OMP-based annotation framework that supports the capture of a variety of phenotypes and provides flexibility for different levels of detail based on a combination of pre- and post-composition using OMP and other Open Biomedical Ontology (OBO) projects. A system for entering and viewing OMP annotations has been added to our online, public, web-based data portal. CONCLUSIONS: The annotation framework described here is ready to support projects to capture phenotypes from the experimental literature for a variety of microbes. Defining the OMP annotation standard should support the development of new software tools for data mining and analysis in comparative phenomics.
Assuntos
Ontologias Biológicas , Curadoria de Dados/métodos , Microbiologia , Fenótipo , MetadadosRESUMO
High-throughput studies constitute an essential and valued source of information for researchers. However, high-throughput experimental workflows are often complex, with multiple data sets that may contain large numbers of false positives. The representation of high-throughput data in the Gene Ontology (GO) therefore presents a challenging annotation problem, when the overarching goal of GO curation is to provide the most precise view of a gene's role in biology. To address this, representatives from annotation teams within the GO Consortium reviewed high-throughput data annotation practices. We present an annotation framework for high-throughput studies that will facilitate good standards in GO curation and, through the use of new high-throughput evidence codes, increase the visibility of these annotations to the research community.
Assuntos
Bases de Dados Genéticas , Ontologia Genética , Genômica/métodos , Anotação de Sequência Molecular/métodos , Animais , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNARESUMO
The Evidence and Conclusion Ontology (ECO) contains terms (classes) that describe types of evidence and assertion methods. ECO terms are used in the process of biocuration to capture the evidence that supports biological assertions (e.g. gene product X has function Y as supported by evidence Z). Capture of this information allows tracking of annotation provenance, establishment of quality control measures and query of evidence. ECO contains over 1500 terms and is in use by many leading biological resources including the Gene Ontology, UniProt and several model organism databases. ECO is continually being expanded and revised based on the needs of the biocuration community. The ontology is freely available for download from GitHub (https://github.com/evidenceontology/) or the project's website (http://evidenceontology.org/). Users can request new terms or changes to existing terms through the project's GitHub site. ECO is released into the public domain under CC0 1.0 Universal.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Ontologia Genética , Proteínas/genética , Animais , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Proteínas/metabolismo , Análise de Sequência de Proteína , Interface Usuário-ComputadorRESUMO
Candida albicans is the predominant cause of vulvovaginal candidiasis (VVC). Little is known regarding the genetic diversity of Candida spp. in the vagina or the microvariations in strains over time that may contribute to the development of VVC. This study reports the draft genome sequences of four C. albicans and one C. glabrata strains isolated from women with VVC. An SNP-based whole-genome phylogeny indicates that these isolates are closely related; however, phylogenetic distances between them suggest that there may be genetic adaptations driven by unique host environments. These sequences will facilitate further comparative analyses and ultimately improve our understanding of genetic variation in isolates of Candida spp. that are associated with VVC.
Assuntos
Candida albicans/genética , Candida glabrata/genética , Genoma Fúngico , Filogenia , Adulto , Antifúngicos/uso terapêutico , Candida albicans/classificação , Candida albicans/efeitos dos fármacos , Candida albicans/isolamento & purificação , Candida glabrata/classificação , Candida glabrata/efeitos dos fármacos , Candida glabrata/isolamento & purificação , Candidíase Vulvovaginal/diagnóstico , Candidíase Vulvovaginal/tratamento farmacológico , Candidíase Vulvovaginal/microbiologia , Feminino , Variação Genética , Humanos , Estudos Longitudinais , Vagina/microbiologia , Sequenciamento Completo do GenomaRESUMO
The Evidence and Conclusion Ontology (ECO) is a community resource for describing the various types of evidence that are generated during the course of a scientific study and which are typically used to support assertions made by researchers. ECO describes multiple evidence types, including evidence resulting from experimental (i.e., wet lab) techniques, evidence arising from computational methods, statements made by authors (whether or not supported by evidence), and inferences drawn by researchers curating the literature. In addition to summarizing the evidence that supports a particular assertion, ECO also offers a means to document whether a computer or a human performed the process of making the annotation. Incorporating ECO into an annotation system makes it possible to leverage the structure of the ontology such that associated data can be grouped hierarchically, users can select data associated with particular evidence types, and quality control pipelines can be optimized. Today, over 30 resources, including the Gene Ontology, use the Evidence and Conclusion Ontology to represent both evidence and how annotations are made.
Assuntos
Ontologia Genética , Anotação de Sequência Molecular/métodos , Animais , Biologia Computacional/métodos , Curadoria de Dados/métodos , Bases de Dados Genéticas , Humanos , Internet , SoftwareRESUMO
Human cryptosporidiosis is caused primarily by Cryptosporidium hominis, C. parvum and C. meleagridis. To accelerate research on parasites in the genus Cryptosporidium, we generated annotated, draft genome sequences of human C. hominis isolates TU502_2012 and UKH1, C. meleagridis UKMEL1, also isolated from a human patient, and the avian parasite C. baileyi TAMU-09Q1. The annotation of the genome sequences relied in part on RNAseq data generated from the oocyst stage of both C. hominis and C. baileyi The genome assembly of C. hominis is significantly more complete and less fragmented than that available previously, which enabled the generation of a much-improved gene set for this species, with an increase in average gene length of 500 bp relative to the protein-encoding genes in the 2004 C. hominis annotation. Our results reveal that the genomes of C. hominis and C. parvum are very similar in both gene density and average gene length. These data should prove a valuable resource for the Cryptosporidium research community.
Assuntos
Biologia Computacional/métodos , Cryptosporidium/genética , Genoma de Protozoário , Genômica , Anotação de Sequência Molecular , Cryptosporidium/classificação , Perfilação da Expressão Gênica , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , TranscriptomaRESUMO
Mucormycosis is a life-threatening infection caused by Mucorales fungi. Here we sequence 30 fungal genomes, and perform transcriptomics with three representative Rhizopus and Mucor strains and with human airway epithelial cells during fungal invasion, to reveal key host and fungal determinants contributing to pathogenesis. Analysis of the host transcriptional response to Mucorales reveals platelet-derived growth factor receptor B (PDGFRB) signaling as part of a core response to divergent pathogenic fungi; inhibition of PDGFRB reduces Mucorales-induced damage to host cells. The unique presence of CotH invasins in all invasive Mucorales, and the correlation between CotH gene copy number and clinical prevalence, are consistent with an important role for these proteins in mucormycosis pathogenesis. Our work provides insight into the evolution of this medically and economically important group of fungi, and identifies several molecular pathways that might be exploited as potential therapeutic targets.
Assuntos
Genoma Fúngico , Mucorales/genética , Mucormicose/microbiologia , Transcriptoma/genética , Células A549 , Amidoidrolases/metabolismo , Sequência de Aminoácidos , Animais , Sequência de Bases , Proteínas Fúngicas/química , Genes Fúngicos , Humanos , Masculino , Camundongos Endogâmicos ICR , Anotação de Sequência Molecular , Mucorales/enzimologia , Mucorales/isolamento & purificação , Filogenia , Polimorfismo de Nucleotídeo Único/genética , Rhizopus/genética , Análise de Sequência de RNA , Especificidade da EspécieRESUMO
Domain-specific databases are essential resources for the biomedical community, leveraging expert knowledge to curate published literature and provide access to referenced data and knowledge. The limited scope of these databases, however, poses important challenges on their infrastructure, visibility, funding and usefulness to the broader scientific community. CollecTF is a community-oriented database documenting experimentally validated transcription factor (TF)-binding sites in the Bacteria domain. In its quest to become a community resource for the annotation of transcriptional regulatory elements in bacterial genomes, CollecTF aims to move away from the conventional data-repository paradigm of domain-specific databases. Through the adoption of well-established ontologies, identifiers and collaborations, CollecTF has progressively become also a portal for the annotation and submission of information on transcriptional regulatory elements to major biological sequence resources (RefSeq, UniProtKB and the Gene Ontology Consortium). This fundamental change in database conception capitalizes on the domain-specific knowledge of contributing communities to provide high-quality annotations, while leveraging the availability of stable information hubs to promote long-term access and provide high-visibility to the data. As a submission portal, CollecTF generates TF-binding site information through direct annotation of RefSeq genome records, definition of TF-based regulatory networks in UniProtKB entries and submission of functional annotations to the Gene Ontology. As a database, CollecTF provides enhanced search and browsing, targeted data exports, binding motif analysis tools and integration with motif discovery and search platforms. This innovative approach will allow CollecTF to focus its limited resources on the generation of high-quality information and the provision of specialized access to the data.Database URL: http://www.collectf.org/.
Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Interface Usuário-ComputadorRESUMO
The Ontology for Biomedical Investigations (OBI) is an ontology that provides terms with precisely defined meanings to describe all aspects of how investigations in the biological and medical domains are conducted. OBI re-uses ontologies that provide a representation of biomedical knowledge from the Open Biological and Biomedical Ontologies (OBO) project and adds the ability to describe how this knowledge was derived. We here describe the state of OBI and several applications that are using it, such as adding semantic expressivity to existing databases, building data entry forms, and enabling interoperability between knowledge resources. OBI covers all phases of the investigation process, such as planning, execution and reporting. It represents information and material entities that participate in these processes, as well as roles and functions. Prior to OBI, it was not possible to use a single internally consistent resource that could be applied to multiple types of experiments for these applications. OBI has made this possible by creating terms for entities involved in biological and medical investigations and by importing parts of other biomedical ontologies such as GO, Chemical Entities of Biological Interest (ChEBI) and Phenotype Attribute and Trait Ontology (PATO) without altering their meaning. OBI is being used in a wide range of projects covering genomics, multi-omics, immunology, and catalogs of services. OBI has also spawned other ontologies (Information Artifact Ontology) and methods for importing parts of ontologies (Minimum information to reference an external ontology term (MIREOT)). The OBI project is an open cross-disciplinary collaborative effort, encompassing multiple research communities from around the globe. To date, OBI has created 2366 classes and 40 relations along with textual and formal definitions. The OBI Consortium maintains a web resource (http://obi-ontology.org) providing details on the people, policies, and issues being addressed in association with OBI. The current release of OBI is available at http://purl.obolibrary.org/obo/obi.owl.
Assuntos
Ontologias Biológicas , Animais , Ontologias Biológicas/organização & administração , Ontologias Biológicas/estatística & dados numéricos , Ontologias Biológicas/tendências , Biologia Computacional , Bases de Dados Factuais , Humanos , Internet , Metadados , Semântica , SoftwareRESUMO
Disease can be conceptualized as the result of interactions between infecting microbe and holobiont, the combination of a host and its microbial communities. It is likely that genomic variation in the host, infecting microbe, and commensal microbiota are key determinants of infectious disease clinical outcomes. However, until recently, simultaneous, multiomic investigation of infecting microbe and holobiont components has rarely been explored. Herein, we characterized the infecting microbe, host, micro- and mycobiomes leading up to infection onset in a leukemia patient that developed invasive mucormycosis. We discovered that the patient was infected with a strain of the recently described Mucor velutinosus species which we determined was hypervirulent in a Drosophila challenge model and has a predisposition for skin dissemination. After completing the infecting M. velutinosus genome and genomes from four other Mucor species, comparative pathogenomics was performed and assisted in identifying 66 M. velutinosus-specific putatively secreted proteins, including multiple novel secreted aspartyl proteinases which may contribute to the unique clinical presentation of skin dissemination. Whole exome sequencing of the patient revealed multiple non-synonymous polymorphisms in genes critical to control of fungal proliferation, such as TLR6 and PTX3. Moreover, the patient had a non-synonymous polymorphism in the NOD2 gene and a missense mutation in FUT2, which have been linked to microbial dysbiosis and microbiome diversity maintenance during physiologic stress, respectively. In concert with host genetic polymorphism data, the micro- and mycobiome analyses revealed that the infection developed amid a dysbiotic microbiome with low α-diversity, dominated by staphylococci. Additionally, longitudinal mycobiome data showed that M. velutinosus DNA was detectable in oral samples preceding disease onset. Our genome-level study of the host-infecting microbe-commensal triad extends the concept of personalized genomic medicine to the holobiont-infecting microbe interface thereby offering novel opportunities for using synergistic genetic methods to increase understanding of infectious diseases pathogenesis and clinical outcomes.
Assuntos
Microbioma Gastrointestinal/genética , Genoma Fúngico , Leucemia Mieloide Aguda/complicações , Mucor/genética , Mucormicose/microbiologia , Infecções Oportunistas/microbiologia , Antifúngicos/uso terapêutico , Protocolos de Quimioterapia Combinada Antineoplásica/efeitos adversos , Neutropenia Febril Induzida por Quimioterapia , Proteínas Fúngicas/genética , Fungemia/microbiologia , Interações Hospedeiro-Patógeno , Humanos , Masculino , Pessoa de Meia-Idade , Mucor/isolamento & purificação , Mucormicose/tratamento farmacológico , Proteínas de Neoplasias/genética , Onicomicose/complicações , Infecções Oportunistas/tratamento farmacológicoRESUMO
Biocuration has become a cornerstone for analyses in biology, and to meet needs, the amount of annotations has considerably grown in recent years. However, the reliability of these annotations varies; it has thus become necessary to be able to assess the confidence in annotations. Although several resources already provide confidence information about the annotations that they produce, a standard way of providing such information has yet to be defined. This lack of standardization undermines the propagation of knowledge across resources, as well as the credibility of results from high-throughput analyses. Seeded at a workshop during the Biocuration 2012 conference, a working group has been created to address this problem. We present here the elements that were identified as essential for assessing confidence in annotations, as well as a draft ontology--the Confidence Information Ontology--to illustrate how the problems identified could be addressed. We hope that this effort will provide a home for discussing this major issue among the biocuration community. Tracker URL: https://github.com/BgeeDB/confidence-information-ontology Ontology URL: https://raw.githubusercontent.com/BgeeDB/confidence-information-ontology/master/src/ontology/cio-simple.obo
Assuntos
Ontologias Biológicas , Curadoria de Dados/normas , Congressos como AssuntoRESUMO
This study reports the release of draft genome sequences of two isolates of Lichtheimia corymbifera and two isolates of L. ramosa. Phylogenetic analyses indicate that the two L. corymbifera strains (CDC-B2541 and 008-049) are closely related to the previously sequenced L. corymbifera isolate (FSU 9682) while our two L. ramosa strains CDC-B5399 and CDC-B5792 cluster apart from them. These genome sequences will further the understanding of intraspecies and interspecies genetic variation within the Mucoraceae family of pathogenic fungi.
Assuntos
Genoma Fúngico , Mucorales/genética , Análise de Sequência de DNA , Análise por Conglomerados , Microbiologia Ambiental , Variação Genética , Humanos , Dados de Sequência Molecular , Mucorales/classificação , Mucorales/isolamento & purificação , Mucormicose/microbiologia , Filogenia , Homologia de SequênciaRESUMO
BACKGROUND: Phenotypic data are routinely used to elucidate gene function in organisms amenable to genetic manipulation. However, previous to this work, there was no generalizable system in place for the structured storage and retrieval of phenotypic information for bacteria. RESULTS: The Ontology of Microbial Phenotypes (OMP) has been created to standardize the capture of such phenotypic information from microbes. OMP has been built on the foundations of the Basic Formal Ontology and the Phenotype and Trait Ontology. Terms have logical definitions that can facilitate computational searching of phenotypes and their associated genes. OMP can be accessed via a wiki page as well as downloaded from SourceForge. Initial annotations with OMP are being made for Escherichia coli using a wiki-based annotation capture system. New OMP terms are being concurrently developed as annotation proceeds. CONCLUSIONS: We anticipate that diverse groups studying microbial genetics and associated phenotypes will employ OMP for standardizing microbial phenotype annotation, much as the Gene Ontology has standardized gene product annotation. The resulting OMP resource and associated annotations will facilitate prediction of phenotypes for unknown genes and result in new experimental characterization of phenotypes and functions.
Assuntos
Fenômenos Fisiológicos Bacterianos , Biologia Computacional/métodos , Software , FenótipoRESUMO
BACKGROUND: More than 20% of the world's population is at risk for infection by filarial nematodes and >180 million people worldwide are already infected. Along with infection comes significant morbidity that has a socioeconomic impact. The eight filarial nematodes that infect humans are Wuchereria bancrofti, Brugia malayi, Brugia timori, Onchocerca volvulus, Loa loa, Mansonella perstans, Mansonella streptocerca, and Mansonella ozzardi, of which three have published draft genome sequences. Since all have humans as the definitive host, standard avenues of research that rely on culturing and genetics have often not been possible. Therefore, genome sequencing provides an important window into understanding the biology of these parasites. The need for large amounts of high quality genomic DNA from homozygous, inbred lines; the availability of only short sequence reads from next-generation sequencing platforms at a reasonable expense; and the lack of random large insert libraries has limited our ability to generate high quality genome sequences for these parasites. However, the Pacific Biosciences single molecule, real-time sequencing platform holds great promise in reducing input amounts and generating sufficiently long sequences that bypass the need for large insert paired libraries. RESULTS: Here, we report on efforts to generate a more complete genome assembly for L. loa using genetically heterogeneous DNA isolated from a single clinical sample and sequenced on the Pacific Biosciences platform. To obtain the best assembly, numerous assemblers and sequencing datasets were analyzed, combined, and compared. Quiver-informed trimming of an assembly of only Pacific Biosciences reads by HGAP2 was selected as the final assembly of 96.4 Mbp in 2,250 contigs. This results in ~9% more of the genome in ~85% fewer contigs from ~80% less starting material at a fraction of the cost of previous Roche 454-based sequencing efforts. CONCLUSIONS: The result is the most complete filarial nematode assembly produced thus far and demonstrates the utility of single molecule sequencing on the Pacific Biosciences platform for genetically heterogeneous metazoan genomes.
Assuntos
Genoma Helmíntico , Loa/isolamento & purificação , Loíase/parasitologia , Análise de Sequência de DNA/métodos , Animais , Humanos , Loa/genética , Dados de Sequência Molecular , Análise de Sequência de DNA/economia , Análise de Sequência de DNA/instrumentaçãoRESUMO
BACKGROUND: Halyomorpha halys (Stål) (Insecta:Hemiptera;Pentatomidae), commonly known as the Brown Marmorated Stink Bug (BMSB), is an invasive pest of the mid-Atlantic region of the United States, causing economically important damage to a wide range of crops. Native to Asia, BMSB was first observed in Allentown, PA, USA, in 1996, and this pest is now well-established throughout the US mid-Atlantic region and beyond. In addition to the serious threat BMSB poses to agriculture, BMSB has become a nuisance to homeowners, invading home gardens and congregating in large numbers in human-made structures, including homes, to overwinter. Despite its significance as an agricultural pest with limited control options, only 100 bp of BMSB sequence data was available in public databases when this project began. RESULTS: Transcriptome sequencing was undertaken to provide a molecular resource to the research community to inform the development of pest control strategies and to provide molecular data for population genetics studies of BMSB. Using normalized, strand-specific libraries, we sequenced pools of all BMSB life stages on the Illumina HiSeq. Trinity was used to assemble 200,000 putative transcripts in >100,000 components. A novel bioinformatic method that analyzed the strand-specificity of the data reduced this to 53,071 putative transcripts from 18,573 components. By integrating multiple other data types, we narrowed this further to 13,211 representative transcripts. CONCLUSIONS: Bacterial endosymbiont genes were identified in this dataset, some of which have a copy number consistent with being lateral gene transfers between endosymbiont genomes and Hemiptera, including ankyrin-repeat related proteins, lysozyme, and mannanase. Such genes and endosymbionts may provide novel targets for BMSB-specific biocontrol. This study demonstrates the utility of strand-specific sequencing in generating shotgun transcriptomes and that rapid sequencing shotgun transcriptomes is possible without the need for extensive inbreeding to generate homozygous lines. Such sequencing can provide a rapid response to pest invasions similar to that already described for disease epidemiology.
Assuntos
Perfilação da Expressão Gênica/métodos , Heterópteros/genética , Proteínas de Insetos/genética , Análise de Sequência de RNA/métodos , Animais , Bactérias/genética , Proteínas de Bactérias/genética , Biologia Computacional/métodos , Feminino , Transferência Genética Horizontal , Heterópteros/microbiologia , Espécies Introduzidas , Masculino , Dados de Sequência Molecular , Filogenia , SimbioseRESUMO
The Evidence Ontology (ECO) is a structured, controlled vocabulary for capturing evidence in biological research. ECO includes diverse terms for categorizing evidence that supports annotation assertions including experimental types, computational methods, author statements and curator inferences. Using ECO, annotation assertions can be distinguished according to the evidence they are based on such as those made by curators versus those automatically computed or those made via high-throughput data review versus single test experiments. Originally created for capturing evidence associated with Gene Ontology annotations, ECO is now used in other capacities by many additional annotation resources including UniProt, Mouse Genome Informatics, Saccharomyces Genome Database, PomBase, the Protein Information Resource and others. Information on the development and use of ECO can be found at http://evidenceontology.org. The ontology is freely available under Creative Commons license (CC BY-SA 3.0), and can be downloaded in both Open Biological Ontologies and Web Ontology Language formats at http://code.google.com/p/evidenceontology. Also at this site is a tracker for user submission of term requests and questions. ECO remains under active development in response to user-requested terms and in collaborations with other ontologies and database resources. Database URL: Evidence Ontology Web site: http://evidenceontology.org.
Assuntos
Bases de Dados Genéticas , Genômica/métodos , Internet , Anotação de Sequência Molecular/métodos , Vocabulário Controlado , Animais , Camundongos , SaccharomycesRESUMO
We report the draft genome sequence of Mortierella alpina isolate CDC-B6842. M. alpina is a nonpathogenic member of the Mucoromycotina subphylum of fungi that is an important model for understanding the molecular mechanisms of lipid production and metabolism.
RESUMO
We report the draft genome sequences of Geomyces pannorum sensu lato and Geomyces (Pseudogymnoascus) destructans. G. pannorum has a larger proteome than G. destructans, containing more proteins with ascribed enzymatic functions. This dichotomy in the genomes of related psychrophilic fungi is a valuable target for defining their distinct saprobic and pathogenic attributes.
RESUMO
We developed an RNA-Seq-based method to simultaneously capture prokaryotic and eukaryotic expression profiles of cells infected with intracellular bacteria. As proof of principle, this method was applied to Chlamydia trachomatis-infected epithelial cell monolayers in vitro, successfully obtaining transcriptomes of both C. trachomatis and the host cells at 1 and 24 hours post-infection. Chlamydiae are obligate intracellular bacterial pathogens that cause a range of mammalian diseases. In humans chlamydiae are responsible for the most common sexually transmitted bacterial infections and trachoma (infectious blindness). Disease arises by adverse host inflammatory reactions that induce tissue damage & scarring. However, little is known about the mechanisms underlying these outcomes. Chlamydia are genetically intractable as replication outside of the host cell is not yet possible and there are no practical tools for routine genetic manipulation, making genome-scale approaches critical. The early timeframe of infection is poorly understood and the host transcriptional response to chlamydial infection is not well defined. Our simultaneous RNA-Seq method was applied to a simplified in vitro model of chlamydial infection. We discovered a possible chlamydial strategy for early iron acquisition, putative immune dampening effects of chlamydial infection on the host cell, and present a hypothesis for Chlamydia-induced fibrotic scarring through runaway positive feedback loops. In general, simultaneous RNA-Seq helps to reveal the complex interplay between invading bacterial pathogens and their host mammalian cells and is immediately applicable to any bacteria/host cell interaction.