RESUMEN
MicroRNA regulation of key biological and developmental pathways is a rapidly expanding area of research, accompanied by vast amounts of experimental data. This data, however, is not widely available in bioinformatic resources, making it difficult for researchers to find and analyze microRNA-related experimental data and define further research projects. We are addressing this problem by providing two new bioinformatics data sets that contain experimentally verified functional information for mammalian microRNAs involved in cardiovascular-relevant, and other, processes. To date, our resource provides over 4400 Gene Ontology annotations associated with over 500 microRNAs from human, mouse, and rat and over 2400 experimentally validated microRNA:target interactions. We illustrate how this resource can be used to create microRNA-focused interaction networks with a biological context using the known biological role of microRNAs and the mRNAs they regulate, enabling discovery of associations between gene products, biological pathways and, ultimately, diseases. This data will be crucial in advancing the field of microRNA bioinformatics and will establish consistent data sets for reproducible functional analysis of microRNAs across all biological research areas.
Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Ontología de Genes , Redes Reguladoras de Genes/genética , MicroARNs/genética , Anotación de Secuencia Molecular/métodos , Animales , Humanos , Ratones , RatasRESUMEN
Over the last few years, several groups have evaluated the potential of microRNAs (miRNAs) as biomarkers for cardiometabolic disease. In this review, we discuss the emerging literature on the role of miRNAs and other small noncoding RNAs in platelets and in the circulation, and the potential use of miRNAs as biomarkers for platelet activation. Platelets are a major source of miRNAs, YRNAs, and circular RNAs. By harnessing multiomics approaches, we may gain valuable insights into their potential function. Because not all miRNAs are detectable in the circulation, we also created a gene ontology annotation for circulating miRNAs using the gene ontology term extracellular space as part of blood plasma. Finally, we share key insights for measuring circulating miRNAs. We propose ways to standardize miRNA measurements, in particular by using platelet-poor plasma to avoid confounding caused by residual platelets in plasma or by adding RNase inhibitors to serum to reduce degradation. This should enhance comparability of miRNA measurements across different cohorts. We provide recommendations for future miRNA biomarker studies, emphasizing the need for accurate interpretation within a biological and methodological context.
Asunto(s)
Plaquetas/metabolismo , MicroARNs/sangre , Activación Plaquetaria/fisiología , Trombosis/sangre , Animales , Coagulación Sanguínea/fisiología , Humanos , MicroARNs/genética , ARN no Traducido/sangre , ARN no Traducido/genética , Trombosis/diagnóstico , Trombosis/genéticaRESUMEN
MicroRNA regulation of developmental and cellular processes is a relatively new field of study, and the available research data have not been organized to enable its inclusion in pathway and network analysis tools. The association of gene products with terms from the Gene Ontology is an effective method to analyze functional data, but until recently there has been no substantial effort dedicated to applying Gene Ontology terms to microRNAs. Consequently, when performing functional analysis of microRNA data sets, researchers have had to rely instead on the functional annotations associated with the genes encoding microRNA targets. In consultation with experts in the field of microRNA research, we have created comprehensive recommendations for the Gene Ontology curation of microRNAs. This curation manual will enable provision of a high-quality, reliable set of functional annotations for the advancement of microRNA research. Here we describe the key aspects of the work, including development of the Gene Ontology to represent this data, standards for describing the data, and guidelines to support curators making these annotations. The full microRNA curation guidelines are available on the GO Consortium wiki (http://wiki.geneontology.org/index.php/MicroRNA_GO_annotation_manual).
Asunto(s)
Guías como Asunto , MicroARNs/genética , Animales , Silenciador del Gen , Humanos , RatonesRESUMEN
The Gene Ontology Annotation (GOA) resource (http://www.ebi.ac.uk/GOA) provides evidence-based Gene Ontology (GO) annotations to proteins in the UniProt Knowledgebase (UniProtKB). Manual annotations provided by UniProt curators are supplemented by manual and automatic annotations from model organism databases and specialist annotation groups. GOA currently supplies 368 million GO annotations to almost 54 million proteins in more than 480,000 taxonomic groups. The resource now provides annotations to five times the number of proteins it did 4 years ago. As a member of the GO Consortium, we adhere to the most up-to-date Consortium-agreed annotation guidelines via the use of quality control checks that ensures that the GOA resource supplies high-quality functional information to proteins from a wide range of species. Annotations from GOA are freely available and are accessible through a powerful web browser as well as a variety of annotation file formats.
Asunto(s)
Bases de Datos de Proteínas , Ontología de Genes , Anotación de Secuencia Molecular , Proteínas/genética , Humanos , Internet , Programas InformáticosRESUMEN
BACKGROUND: The Gene Ontology project integrates data about the function of gene products across a diverse range of organisms, allowing the transfer of knowledge from model organisms to humans, and enabling computational analyses for interpretation of high-throughput experimental and clinical data. The core data structure is the annotation, an association between a gene product and a term from one of the three ontologies comprising the GO. Historically, it has not been possible to provide additional information about the context of a GO term, such as the target gene or the location of a molecular function. This has limited the specificity of knowledge that can be expressed by GO annotations. RESULTS: The GO Consortium has introduced annotation extensions that enable manually curated GO annotations to capture additional contextual details. Extensions represent effector-target relationships such as localization dependencies, substrates of protein modifiers and regulation targets of signaling pathways and transcription factors as well as spatial and temporal aspects of processes such as cell or tissue type or developmental stage. We describe the content and structure of annotation extensions, provide examples, and summarize the current usage of annotation extensions. CONCLUSIONS: The additional contextual information captured by annotation extensions improves the utility of functional annotation by representing dependencies between annotations to terms in the different ontologies of GO, external ontologies, or an organism's gene products. These enhanced annotations can also support sophisticated queries and reasoning, and will provide curated, directional links between many gene products to support pathway and network reconstruction.
Asunto(s)
Ontología de Genes , Anotación de Secuencia Molecular , Biología Computacional/métodos , Humanos , Proteínas/genéticaRESUMEN
The GO annotation dataset provided by the UniProt Consortium (GOA: http://www.ebi.ac.uk/GOA) is a comprehensive set of evidenced-based associations between terms from the Gene Ontology resource and UniProtKB proteins. Currently supplying over 100 million annotations to 11 million proteins in more than 360,000 taxa, this resource has increased 2-fold over the last 2 years and has benefited from a wealth of checks to improve annotation correctness and consistency as well as now supplying a greater information content enabled by GO Consortium annotation format developments. Detailed, manual GO annotations obtained from the curation of peer-reviewed papers are directly contributed by all UniProt curators and supplemented with manual and electronic annotations from 36 model organism and domain-focused scientific resources. The inclusion of high-quality, automatic annotation predictions ensures the UniProt GO annotation dataset supplies functional information to a wide range of proteins, including those from poorly characterized, non-model organism species. UniProt GO annotations are freely available in a range of formats accessible by both file downloads and web-based views. In addition, the introduction of a new, normalized file format in 2010 has made for easier handling of the complete UniProt-GOA data set.
Asunto(s)
Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Vocabulario Controlado , Anotación de Secuencia Molecular/normasRESUMEN
Introduction: The normal development of all heart valves requires highly coordinated signaling pathways and downstream mediators. While genomic variants can be responsible for congenital valve disease, environmental factors can also play a role. Later in life valve calcification is a leading cause of aortic valve stenosis, a progressive disease that may lead to heart failure. Current research into the causes of both congenital valve diseases and valve calcification is using a variety of high-throughput methodologies, including transcriptomics, proteomics and genomics. High quality genetic data from biological knowledge bases are essential to facilitate analyses and interpretation of these high-throughput datasets. The Gene Ontology (GO, http://geneontology.org/) is a major bioinformatics resource used to interpret these datasets, as it provides structured, computable knowledge describing the role of gene products across all organisms. The UCL Functional Gene Annotation team focuses on GO annotation of human gene products. Having identified that the GO annotations included in transcriptomic, proteomic and genomic data did not provide sufficient descriptive information about heart valve development, we initiated a focused project to address this issue. Methods: This project prioritized 138 proteins for GO annotation, which led to the curation of 100 peer-reviewed articles and the creation of 400 heart valve development-relevant GO annotations. Results: While the focus of this project was heart valve development, around 600 of the 1000 annotations created described the broader cellular role of these proteins, including those describing aortic valve morphogenesis, BMP signaling and endocardial cushion development. Our functional enrichment analysis of the 28 proteins known to have a role in bicuspid aortic valve disease confirmed that this annotation project has led to an improved interpretation of a heart valve genetic dataset. Discussion: To address the needs of the heart valve research community this project has provided GO annotations to describe the specific roles of key proteins involved in heart valve development. The breadth of GO annotations created by this project will benefit many of those seeking to interpret a wide range of cardiovascular genomic, transcriptomic, proteomic and metabolomic datasets.
RESUMEN
The Gene Ontology Annotation (GOA) project at the EBI (http://www.ebi.ac.uk/goa) provides high-quality electronic and manual associations (annotations) of Gene Ontology (GO) terms to UniProt Knowledgebase (UniProtKB) entries. Annotations created by the project are collated with annotations from external databases to provide an extensive, publicly available GO annotation resource. Currently covering over 160 000 taxa, with greater than 32 million annotations, GOA remains the largest and most comprehensive open-source contributor to the GO Consortium (GOC) project. Over the last five years, the group has augmented the number and coverage of their electronic pipelines and a number of new manual annotation projects and collaborations now further enhance this resource. A range of files facilitate the download of annotations for particular species, and GO term information and associated annotations can also be viewed and downloaded from the newly developed GOA QuickGO tool (http://www.ebi.ac.uk/QuickGO), which allows users to precisely tailor their annotation set.
Asunto(s)
Bases de Datos de Proteínas , Genes , Proteínas/genética , Vocabulario Controlado , Animales , Humanos , Proteoma/genéticaRESUMEN
BACKGROUND: Gene Ontology (GO) is a major bioinformatic resource used for analysis of large biomedical datasets, for example from genome-wide association studies, applied universally across biological fields, including Alzheimer's disease (AD) research. OBJECTIVE: We aim to demonstrate the applicability of GO for interpretation of AD datasets to improve the understanding of the underlying molecular disease mechanisms, including the involvement of inflammatory pathways and dysregulated microRNAs (miRs). METHODS: We have undertaken a systematic full article GO annotation approach focused on microglial proteins implicated in AD and the miRs regulating their expression. PANTHER was used for enrichment analysis of previously published AD data. Cytoscape was used for visualizing and analyzing miR-target interactions captured from published experimental evidence. RESULTS: We contributed 3,084 new annotations for 494 entities, i.e., on average six new annotations per entity. This included a total of 1,352 annotations for 40 prioritized microglial proteins implicated in AD and 66 miRs regulating their expression, yielding an average of twelve annotations per prioritized entity. The updated GO resource was then used to re-analyze previously published data. The re-analysis showed novel processes associated with AD-related genes, not identified in the original study, such as 'gliogenesis', 'regulation of neuron projection development', or 'response to cytokine', demonstrating enhanced applicability of GO for neuroscience research. CONCLUSIONS: This study highlights ongoing development of the neurobiological aspects of GO and demonstrates the value of biocuration activities in the area, thus helping to delineate the molecular bases of AD to aid the development of diagnostic tools and treatments.
Asunto(s)
Enfermedad de Alzheimer/genética , Encefalitis/genética , Expresión Génica , Ontología de Genes , Biología Computacional/métodos , Humanos , Microglía/metabolismo , Anotación de Secuencia Molecular/métodosRESUMEN
Activation of E2F transcription factors at the G1-to-S phase boundary, with the resultant expression of genes needed for DNA synthesis and S-phase, is due to phosphorylation of the retinoblastoma-related (RBR) protein by cyclin D-dependent kinase (CYCD-CDK), particularly CYCD3-CDKA. Arabidopsis has three canonical E2F genes, of which E2Fa and E2Fb are proposed to encode transcriptional activators and E2Fc a repressor. Previous studies have identified genes regulated in response to high-level constitutive expression of E2Fa and of CYCD3;1, but such plants display significant phenotypic abnormalities. We have sought to identify targets that show responses to lower level induced changes in abundance of these cell cycle regulators. Expression of E2Fa, E2Fc or CYCD3;1 was induced using dexamethasone and the effects analysed using microarrays in a time course allowing short and longer term effects to be observed. Overlap between CYCD3;1 and E2Fa modulated genes substantiates their action in a common pathway with a key role in controlling the G1/S transition, with additional targets for CYCD3;1 in chromatin modification and for E2Fa in cell wall biogenesis and development. E2Fc induction led primarily to gene downregulation, but did not antagonise E2Fa action and hence E2Fc appears to function outside the CYCD3-RBR pathway, does not have a direct effect on cell cycle genes, and promoter analysis suggests a distinct binding site preference.
Asunto(s)
Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Ciclinas/metabolismo , Factores de Transcripción E2F/metabolismo , Fase G1/fisiología , Fase S/fisiología , Arabidopsis/citología , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Ciclinas/genética , Factores de Transcripción E2F/genética , Citometría de Flujo , Fase G1/genética , Regulación de la Expresión Génica de las Plantas/genética , Regulación de la Expresión Génica de las Plantas/fisiología , Análisis de Secuencia por Matrices de Oligonucleótidos , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Fase S/genética , Transducción de Señal/genética , Transducción de Señal/fisiologíaRESUMEN
High-throughput studies constitute an essential and valued source of information for researchers. However, high-throughput experimental workflows are often complex, with multiple data sets that may contain large numbers of false positives. The representation of high-throughput data in the Gene Ontology (GO) therefore presents a challenging annotation problem, when the overarching goal of GO curation is to provide the most precise view of a gene's role in biology. To address this, representatives from annotation teams within the GO Consortium reviewed high-throughput data annotation practices. We present an annotation framework for high-throughput studies that will facilitate good standards in GO curation and, through the use of new high-throughput evidence codes, increase the visibility of these annotations to the research community.
Asunto(s)
Bases de Datos Genéticas , Ontología de Genes , Genómica/métodos , Anotación de Secuencia Molecular/métodos , Animales , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia de ADNRESUMEN
The analysis and interpretation of high-throughput datasets relies on access to high-quality bioinformatics resources, as well as processing pipelines and analysis tools. Gene Ontology (GO, geneontology.org) is a major resource for gene enrichment analysis. The aim of this project, funded by the Alzheimer's Research United Kingdom (ARUK) foundation and led by the University College London (UCL) biocuration team, was to enhance the GO resource by developing new neurological GO terms, and use GO terms to annotate gene products associated with dementia. Specifically, proteins and protein complexes relevant to processes involving amyloid-beta and tau have been annotated and the resulting annotations are denoted in GO databases as 'ARUK-UCL'. Biological knowledge presented in the scientific literature was captured through the association of GO terms with dementia-relevant protein records; GO itself was revised, and new GO terms were added. This literature biocuration increased the number of Alzheimer's-relevant gene products that were being associated with neurological GO terms, such as 'amyloid-beta clearance' or 'learning or memory', as well as neuronal structures and their compartments. Of the total 2055 annotations that we contributed for the prioritised gene products, 526 have associated proteins and complexes with neurological GO terms. To ensure that these descriptive annotations could be provided for Alzheimer's-relevant gene products, over 70 new GO terms were created. Here, we describe how the improvements in ontology development and biocuration resulting from this initiative can benefit the scientific community and enhance the interpretation of dementia data.
RESUMEN
BACKGROUND: A systems biology approach to cardiac physiology requires a comprehensive representation of how coordinated processes operate in the heart, as well as the ability to interpret relevant transcriptomic and proteomic experiments. The Gene Ontology (GO) Consortium provides structured, controlled vocabularies of biological terms that can be used to summarize and analyze functional knowledge for gene products. METHODS AND RESULTS: In this study, we created a computational resource to facilitate genetic studies of cardiac physiology by integrating literature curation with attention to an improved and expanded ontological representation of heart processes in the Gene Ontology. As a result, the Gene Ontology now contains terms that comprehensively describe the roles of proteins in cardiac muscle cell action potential, electrical coupling, and the transmission of the electrical impulse from the sinoatrial node to the ventricles. Evaluating the effectiveness of this approach to inform data analysis demonstrated that Gene Ontology annotations, analyzed within an expanded ontological context of heart processes, can help to identify candidate genes associated with arrhythmic disease risk loci. CONCLUSIONS: We determined that a combination of curation and ontology development for heart-specific genes and processes supports the identification and downstream analysis of genes responsible for the spread of the cardiac action potential through the heart. Annotating these genes and processes in a structured format facilitates data analysis and supports effective retrieval of gene-centric information about cardiac defects.
Asunto(s)
Ontología de Genes , Cardiopatías , Proteómica , Biología Computacional , Bases de Datos Genéticas , Corazón , Cardiopatías/genética , Humanos , Anotación de Secuencia Molecular , FenotipoRESUMEN
Carotid artery intima media thickness (cIMT) and carotid plaque are measures of subclinical atherosclerosis associated with ischemic stroke and coronary heart disease (CHD). Here, we undertake meta-analyses of genome-wide association studies (GWAS) in 71,128 individuals for cIMT, and 48,434 individuals for carotid plaque traits. We identify eight novel susceptibility loci for cIMT, one independent association at the previously-identified PINX1 locus, and one novel locus for carotid plaque. Colocalization analysis with nearby vascular expression quantitative loci (cis-eQTLs) derived from arterial wall and metabolic tissues obtained from patients with CHD identifies candidate genes at two potentially additional loci, ADAMTS9 and LOXL4. LD score regression reveals significant genetic correlations between cIMT and plaque traits, and both cIMT and plaque with CHD, any stroke subtype and ischemic stroke. Our study provides insights into genes and tissue-specific regulatory mechanisms linking atherosclerosis both to its functional genomic origins and its clinical consequences in humans.
Asunto(s)
Grosor Intima-Media Carotídeo , Enfermedad Coronaria/genética , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo/métodos , Placa Aterosclerótica/genética , Proteína ADAMTS9/genética , Aminoácido Oxidorreductasas/genética , Enfermedad Coronaria/patología , Humanos , Escala de Lod , Placa Aterosclerótica/patología , Polimorfismo de Nucleótido Simple , Proteína-Lisina 6-Oxidasa , Sitios de Carácter Cuantitativo/genética , Factores de RiesgoRESUMEN
The specificity of knowledge that Gene Ontology (GO) annotations currently can represent is still restricted by the legacy format of the GO annotation file, a format intentionally designed for simplicity to keep the barriers to entry low and thus encourage initial adoption. Historically, the information that could be captured in a GO annotation was simply the role or location of a gene product, although genetically interacting or binding partners could be specified. While there was no mechanism within the original GO annotation format for capturing additional information about the context of a GO term, such as the target gene of an activity or the location of a molecular function, the long-term vision for the GO Consortium was to provide greater expressivity in its annotations to capture physiologically relevant information.Thus, as a step forwards, the GO Consortium has introduced a new field into the annotation format, annotation extensions, which can be used to capture valuable contextual detail. This provides experimentally verified links between gene products and other physiological information that is crucial for accurate analysis of pathway and network data. This chapter will provide a simple overview of annotation extensions, illustrated with examples of their usage, and explain why they are useful for scientists and bioinformaticians alike.
Asunto(s)
Ontología de Genes , Anotación de Secuencia Molecular/métodos , Animales , Biología Computacional/métodos , Bases de Datos Genéticas , Humanos , Proteínas/análisis , Proteínas/genética , Proteínas/metabolismoRESUMEN
BACKGROUND: Recent research into ciliary structure and function provides important insights into inherited diseases termed ciliopathies and other cilia-related disorders. This wealth of knowledge needs to be translated into a computational representation to be fully exploitable by the research community. To this end, members of the Gene Ontology (GO) and SYSCILIA Consortia have worked together to improve representation of ciliary substructures and processes in GO. METHODS: Members of the SYSCILIA and Gene Ontology Consortia suggested additions and changes to GO, to reflect new knowledge in the field. The project initially aimed to improve coverage of ciliary parts, and was then broadened to cilia-related biological processes. Discussions were documented in a public tracker. We engaged the broader cilia community via direct consultation and by referring to the literature. Ontology updates were implemented via ontology editing tools. RESULTS: So far, we have created or modified 127 GO terms representing parts and processes related to eukaryotic cilia/flagella or prokaryotic flagella. A growing number of biological pathways are known to involve cilia, and we continue to incorporate this knowledge in GO. The resulting expansion in GO allows more precise representation of experimentally derived knowledge, and SYSCILIA and GO biocurators have created 199 annotations to 50 human ciliary proteins. The revised ontology was also used to curate mouse proteins in a collaborative project. The revised GO and annotations, used in comparative 'before and after' analyses of representative ciliary datasets, improve enrichment results significantly. CONCLUSIONS: Our work has resulted in a broader and deeper coverage of ciliary composition and function. These improvements in ontology and protein annotation will benefit all users of GO enrichment analysis tools, as well as the ciliary research community, in areas ranging from microscopy image annotation to interpretation of high-throughput studies. We welcome feedback to further enhance the representation of cilia biology in GO.
RESUMEN
A large gap remains between the amount of knowledge in scientific literature and the fraction that gets curated into standardized databases, despite many curation initiatives. Yet the availability of comprehensive knowledge in databases is crucial for exploiting existing background knowledge, both for designing follow-up experiments and for interpreting new experimental data. Structured resources also underpin the computational integration and modeling of regulatory pathways, which further aids our understanding of regulatory dynamics. We argue how cooperation between the scientific community and professional curators can increase the capacity of capturing precise knowledge from literature. We demonstrate this with a project in which we mobilize biological domain experts who curate large amounts of DNA binding transcription factors, and show that they, although new to the field of curation, can make valuable contributions by harvesting reported knowledge from scientific papers. Such community curation can enhance the scientific epistemic process.Database URL: http://www.tfcheckpoint.org.
Asunto(s)
Biología Computacional/métodos , Proteínas de Unión al ADN/genética , Curaduría de Datos/métodos , Bases de Datos Genéticas , Regulación de la Expresión Génica/genética , Factores de Transcripción/genética , Animales , Minería de Datos , Humanos , Mamíferos , Ratones , RatasRESUMEN
Exudates were collected from stumps of pre-anthesis inflorescences of oil palm and analysed for cytokinin and gibberellin content using combined HPLC-ELISA techniques. Three antisera, for zeatin-type, dihydrozeatin-type and isopentenyladenine-type cytokinins, were used in ELISAs to identify members of these three groups of cytokinins. Ribotides, 9-glucosides, free bases and ribosides were detected for each of the groups with zeatin riboside the most abundant cytokinin identified in the exudate. Isopentenyladenine-type and dihydrozeatin-type cytokinins were also identified but at lower levels. In addition, two monoclonal antibodies were used in the development of novel ELISAs for members of the 13-hydroxylated and non-13-hydroxylated families of gibberellins. The new ELISAs allow the determination of gibberellins in smaller amounts of tissue than are required for GC-MS. The most abundant gibberellins identified in exudates were GA19 and GA44, as well as other members of the early 13-hydroxylation pathway. Gibberellins were confirmed by GC-MS. The presence of these types of growth regulators in exudate supplying immature inflorescences suggest they have a role in growth and development of these structures.
Asunto(s)
Arecaceae/química , Citocininas/aislamiento & purificación , Giberelinas/aislamiento & purificación , Estructuras de las Plantas/química , Cromatografía Líquida de Alta Presión , Citocininas/química , Ensayo de Inmunoadsorción Enzimática , Cromatografía de Gases y Espectrometría de Masas , Giberelinas/química , Estructura Molecular , Reguladores del Crecimiento de las Plantas/química , Reguladores del Crecimiento de las Plantas/aislamiento & purificaciónRESUMEN
The Gene Ontology Consortium (GOC) is a major bioinformatics project that provides structured controlled vocabularies to classify gene product function and location. GOC members create annotations to gene products using the Gene Ontology (GO) vocabularies, thus providing an extensive, publicly available resource. The GO and its annotations to gene products are now an integral part of functional analysis, and statistical tests using GO data are becoming routine for researchers to include when publishing functional information. While many helpful articles about the GOC are available, there are certain updates to the ontology and annotation sets that sometimes go unobserved. Here we describe some of the ways in which GO can change that should be carefully considered by all users of GO as they may have a significant impact on the resulting gene product annotations, and therefore the functional description of the gene product, or the interpretation of analyses performed on GO datasets. GO annotations for gene products change for many reasons, and while these changes generally improve the accuracy of the representation of the underlying biology, they do not necessarily imply that previous annotations were incorrect. We additionally describe the quality assurance mechanisms we employ to improve the accuracy of annotations, which necessarily changes the composition of the annotation sets we provide. We use the Universal Protein Resource (UniProt) for illustrative purposes of how the GO Consortium, as a whole, manages these changes.
RESUMEN
The Evidence Ontology (ECO) is a structured, controlled vocabulary for capturing evidence in biological research. ECO includes diverse terms for categorizing evidence that supports annotation assertions including experimental types, computational methods, author statements and curator inferences. Using ECO, annotation assertions can be distinguished according to the evidence they are based on such as those made by curators versus those automatically computed or those made via high-throughput data review versus single test experiments. Originally created for capturing evidence associated with Gene Ontology annotations, ECO is now used in other capacities by many additional annotation resources including UniProt, Mouse Genome Informatics, Saccharomyces Genome Database, PomBase, the Protein Information Resource and others. Information on the development and use of ECO can be found at http://evidenceontology.org. The ontology is freely available under Creative Commons license (CC BY-SA 3.0), and can be downloaded in both Open Biological Ontologies and Web Ontology Language formats at http://code.google.com/p/evidenceontology. Also at this site is a tracker for user submission of term requests and questions. ECO remains under active development in response to user-requested terms and in collaborations with other ontologies and database resources. Database URL: Evidence Ontology Web site: http://evidenceontology.org.