Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
1.
Bioinformatics ; 35(24): 5264-5270, 2019 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-31228194

RESUMO

SUMMARY: The human karyotype has been used as a mechanism for describing and detecting gross abnormalities in the genome for many decades. It is used both for routine diagnostic purposes and for research to further our understanding of the causes of disease. Despite these important applications there has been no rigorous computational representation of the karyotype; rather an informal, string-based representation is used, making it hard to check, organize and search data of this form. In this article, we describe our use of OWL, the Ontology Web Language, to generate a fully computational representation of the karyotype; the development of this ontology represents a significant advance from the traditional bioinformatics use for tagging and navigation and has necessitated the development of a new ontology development environment called Tawny-OWL. AVAILABILITY AND IMPLEMENTATION: The Karyotype Ontology and associated Tawny-OWL source code is available on GitHub at https://github.com/jaydchan/tawny-karyotype, under a LGPL License, Version 3.0.


Assuntos
Cariótipo , Software , Biologia Computacional , Genoma , Humanos , Idioma
2.
J Mol Evol ; 86(6): 395-403, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29987491

RESUMO

Mitochondrial diseases are a highly complex, heterogeneous group of disorders. Mitochondrial DNA variants that are linked to disease can exhibit variable expression and penetrance. This has an implication for mitochondrial diagnostics as variants that cause disease in one individual may not in another. It has been suggested that the sequence context in which a variant arises could influence the genotype-phenotype relationship. However, the consequence of sequence variation between different haplogroups on the expression of disease is not well understood. European haplogroups are the most widely studied. To ensure accurate diagnostics for patients globally, we first need to understand how, if at all, the sequence context in which a variant arises contributes to the manifestion of disease. To help us understand this, we used 2752 sequences from 33 non-human species that do not have disease. We searched for variants in the seven complex I genes that are associated with disease in humans. Our findings indicate that only three reported pathogenic complex I variants have arisen in these species. More importantly, only one of these, m.3308T>C, has arisen with its associated amino acid change in the studied non-human species. With the status of m.3308T>C as a disease causing variant being a matter of debate. This is a stark contrast to previous findings in the mitochondrial tRNA genes and suggests that sequence context may be less important in the complex I genes. This information will help us improve the identification and diagnosis of mitochondrial DNA variants in non-European populations.


Assuntos
DNA Mitocondrial/genética , Haplótipos/genética , Mutação/genética , Penetrância , RNA de Transferência/genética , Sequência de Bases , Sequência Consenso/genética , Complexo I de Transporte de Elétrons/genética , Variação Genética , Humanos , Especificidade da Espécie
3.
Bioinformatics ; 33(17): 2731-2736, 2017 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-28525546

RESUMO

MOTIVATION: As the quantity of data being depositing into biological databases continues to increase, it becomes ever more vital to develop methods that enable us to understand this data and ensure that the knowledge is correct. It is widely-held that data percolates between different databases, which causes particular concerns for data correctness; if this percolation occurs, incorrect data in one database may eventually affect many others while, conversely, corrections in one database may fail to percolate to others. In this paper, we test this widely-held belief by directly looking for sentence reuse both within and between databases. Further, we investigate patterns of how sentences are reused over time. Finally, we consider the limitations of this form of analysis and the implications that this may have for bioinformatics database design. RESULTS: We show that reuse of annotation is common within many different databases, and that also there is a detectable level of reuse between databases. In addition, we show that there are patterns of reuse that have previously been shown to be associated with percolation errors. AVAILABILITY AND IMPLEMENTATION: Analytical software is available on request. CONTACT: phillip.lord@newcastle.ac.uk.


Assuntos
Biologia Computacional/métodos , Bases de Dados Factuais , Linguística , Anotação de Sequência Molecular , Software
4.
Mediators Inflamm ; 2015: 471719, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26819498

RESUMO

The number of patients with autoimmune diseases and severe allergies and recipients of transplants increases worldwide. Currently, these patients require lifelong administration of immunomodulatory drugs. Often, these drugs are expensive and show immediate or late-occurring severe side effects. Treatment would be greatly improved by targeting the cause of autoimmunity, that is, loss of tolerance to self-antigens. Accumulating knowledge on immune mechanisms has led to the development of tolerogenic dendritic cells (tolDC), with the specific objective to restrain unwanted immune reactions in the long term. The first clinical trials with tolDC have recently been conducted and more tolDC trials are underway. Although the safety trials have been encouraging, many questions relating to tolDC, for example, cell-manufacturing protocols, administration route, amount and frequency, or mechanism of action, remain to be answered. Aiming to join efforts in translating tolDC and other tolerogenic cellular products (e.g., Tregs and macrophages) to the clinic, a European COST (European Cooperation in Science and Technology) network has been initiated-A FACTT (action to focus and accelerate cell-based tolerance-inducing therapies). A FACTT aims to minimize overlap and maximize comparison of tolDC approaches through establishment of minimum information models and consensus monitoring parameters, ensuring that progress will be in an efficient, safe, and cost-effective way.


Assuntos
Transferência Adotiva , Células Dendríticas/imunologia , Tolerância Imunológica , Autoimunidade , Ensaios Clínicos como Assunto , Comportamento Cooperativo , Europa (Continente) , Humanos
5.
Bioinform Adv ; 4(1): vbae057, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38721398

RESUMO

Motivation: Data reuse is a common and vital practice in molecular biology and enables the knowledge gathered over recent decades to drive discovery and innovation in the life sciences. Much of this knowledge has been collated into molecular biology databases, such as UniProtKB, and these resources derive enormous value from sharing data among themselves. However, quantifying and documenting this kind of data reuse remains a challenge. Results: The article reports on a one-day virtual workshop hosted by the UniProt Consortium in March 2023, attended by representatives from biodata resources, experts in data management, and NIH program managers. Workshop discussions focused on strategies for tracking data reuse, best practices for reusing data, and the challenges associated with data reuse and tracking. Surveys and discussions showed that data reuse is widespread, but critical information for reproducibility is sometimes lacking. Challenges include costs of tracking data reuse, tensions between tracking data and open sharing, restrictive licenses, and difficulties in tracking commercial data use. Recommendations that emerged from the discussion include: development of standardized formats for documenting data reuse, education about the obstacles posed by restrictive licenses, and continued recognition by funding agencies that data management is a critical activity that requires dedicated resources. Availability and implementation: Summaries of survey results are available at: https://docs.google.com/forms/d/1j-VU2ifEKb9C-sW6l3ATB79dgHdRk5v_lESv2hawnso/viewanalytics (survey of data providers) and https://docs.google.com/forms/d/18WbJFutUd7qiZoEzbOytFYXSfWFT61hVce0vjvIwIjk/viewanalytics (survey of users).

6.
Bioinformatics ; 28(18): i562-i568, 2012 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-22962482

RESUMO

MOTIVATION: Annotations are a key feature of many biological databases, used to convey our knowledge of a sequence to the reader. Ideally, annotations are curated manually, however manual curation is costly, time consuming and requires expert knowledge and training. Given these issues and the exponential increase of data, many databases implement automated annotation pipelines in an attempt to avoid un-annotated entries. Both manual and automated annotations vary in quality between databases and annotators, making assessment of annotation reliability problematic for users. The community lacks a generic measure for determining annotation quality and correctness, which we look at addressing within this article. Specifically we investigate word reuse within bulk textual annotations and relate this to Zipf's Principle of Least Effort. We use the UniProt Knowledgebase (UniProtKB) as a case study to demonstrate this approach since it allows us to compare annotation change, both over time and between automated and manually curated annotations. RESULTS: By applying power-law distributions to word reuse in annotation, we show clear trends in UniProtKB over time, which are consistent with existing studies of quality on free text English. Further, we show a clear distinction between manual and automated analysis and investigate cohorts of protein records as they mature. These results suggest that this approach holds distinct promise as a mechanism for judging annotation quality. AVAILABILITY: Source code is available at the authors website: http://homepages.cs.ncl.ac.uk/m.j.bell1/annotation. CONTACT: phillip.lord@newcastle.ac.uk.


Assuntos
Bases de Dados de Proteínas , Bases de Conhecimento , Anotação de Sequência Molecular/normas , Proteínas/química , Proteínas/fisiologia
7.
Bioinformatics ; 28(11): 1495-500, 2012 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-22492647

RESUMO

MOTIVATION: Biological experiments give insight into networks of processes inside a cell, but are subject to error and uncertainty. However, due to the overlap between the large number of experiments reported in public databases it is possible to assess the chances of individual observations being correct. In order to do so, existing methods rely on high-quality 'gold standard' reference networks, but such reference networks are not always available. RESULTS: We present a novel algorithm for computing the probability of network interactions that operates without gold standard reference data. We show that our algorithm outperforms existing gold standard-based methods. Finally, we apply the new algorithm to a large collection of genetic interaction and protein-protein interaction experiments. AVAILABILITY: The integrated dataset and a reference implementation of the algorithm as a plug-in for the Ondex data integration framework are available for download at http://bio-nexus.ncl.ac.uk/projects/nogold/


Assuntos
Algoritmos , Teorema de Bayes , Epistasia Genética , Mapeamento de Interação de Proteínas/normas , Funções Verossimilhança , Mapeamento de Interação de Proteínas/métodos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
8.
PeerJ ; 11: e15352, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37273539

RESUMO

Minimum information models are reporting frameworks that describe the essential information that needs to be provided in a publication, so that the work can be repeated or compared to other work. In 2016, Minimum Information about Tolerogenic Antigen-Presenting cells (MITAP) was created to standardize the reporting on tolerogenic antigen-presenting cells, including tolerogenic dendritic cells (tolDCs). tolDCs is a generic term for dendritic cells that have the ability to (re-)establish immune tolerance; they have been developed as a cell therapy for autoimmune diseases or for the prevention of transplant rejection. Because protocols to generate these therapeutic cells vary widely, MITAP was deemed to be a pivotal reporting tool by and for the tolDC community. In this paper, we explored the impact that MITAP has had on the tolDC field. We did this by examining a subset of the available literature on tolDCs. Our analysis shows that MITAP is used in only the minority of relevant papers (14%), but where it is used the amount of metadata available is slightly increased over where it is not. From this, we conclude that MITAP has been a partial success, but that much more needs to be done if standardized reporting is to become common within the discipline.


Assuntos
Doenças Autoimunes , Células Dendríticas , Humanos , Tolerância Imunológica
9.
Bioinformatics ; 27(9): 1299-306, 2011 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-21414991

RESUMO

MOTIVATION: The rise of high-throughput technologies in the post-genomic era has led to the production of large amounts of biological data. Many of these datasets are freely available on the Internet. Making optimal use of these data is a significant challenge for bioinformaticians. Various strategies for integrating data have been proposed to address this challenge. One of the most promising approaches is the development of semantically rich integrated datasets. Although well suited to computational manipulation, such integrated datasets are typically too large and complex for easy visualization and interactive exploration. RESULTS: We have created an integrated dataset for Saccharomyces cerevisiae using the semantic data integration tool Ondex, and have developed a view-based visualization technique that allows for concise graphical representations of the integrated data. The technique was implemented in a plug-in for Cytoscape, called OndexView. We used OndexView to investigate telomere maintenance in S. cerevisiae. AVAILABILITY: The Ondex yeast dataset and the OndexView plug-in for Cytoscape are accessible at http://bsu.ncl.ac.uk/ondexview.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Armazenamento e Recuperação da Informação/métodos , Biologia de Sistemas/métodos , Internet , Saccharomyces cerevisiae/genética , Telômero/genética
10.
PLoS Comput Biol ; 5(7): e1000443, 2009 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-19649320

RESUMO

In recent years, ontologies have become a mainstream topic in biomedical research. When biological entities are described using a common schema, such as an ontology, they can be compared by means of their annotations. This type of comparison is called semantic similarity, since it assesses the degree of relatedness between two entities by the similarity in meaning of their annotations. The application of semantic similarity to biomedical ontologies is recent; nevertheless, several studies have been published in the last few years describing and evaluating diverse approaches. Semantic similarity has become a valuable tool for validating the results drawn from biomedical studies such as gene clustering, gene expression data analysis, prediction and validation of molecular interactions, and disease gene prioritization. We review semantic similarity measures applied to biomedical ontologies and propose their classification according to the strategies they employ: node-based versus edge-based and pairwise versus groupwise. We also present comparative assessment studies and discuss the implications of their results. We survey the existing implementations of semantic similarity measures, and we describe examples of applications to biomedical research. This will clarify how biomedical researchers can benefit from semantic similarity measures and help them choose the approach most suitable for their studies.Biomedical ontologies are evolving toward increased coverage, formality, and integration, and their use for annotation is increasingly becoming a focus of both effort by biomedical experts and application of automated annotation procedures to create corpora of higher quality and completeness than are currently available. Given that semantic similarity measures are directly dependent on these evolutions, we can expect to see them gaining more relevance and even becoming as essential as sequence similarity is today in biomedical research.


Assuntos
Biologia Computacional/métodos , Semântica , Terminologia como Assunto , Algoritmos , Pesquisa Biomédica/métodos , Classificação/métodos , Processamento de Linguagem Natural , Software
11.
Evol Appl ; 12(10): 1912-1930, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-31700535

RESUMO

Mitochondrial disorders are heterogeneous, showing variable presentation and penetrance. Over the last three decades, our ability to recognize mitochondrial patients and diagnose these mutations, linking genotype to phenotype, has greatly improved. However, it has become increasingly clear that these strides in diagnostics have not benefited all population groups. Recent studies have demonstrated that patients from genetically understudied populations, in particular those of black African heritage, are less likely to receive a diagnosis of mtDNA disease. It has been suggested that haplogroup context might influence the presentation and penetrance of mtDNA disease; thus, the spectrum of mutations that are associated with disease in different populations. However, to date there is only one well-established example of such an effect: the increased penetrance of two Leber's hereditary optic neuropathy mutations on a haplogroup J background. This paper conducted the most extensive investigation to date into the importance of haplogroup context on the pathogenicity of mtDNA mutations. We searched for proven human point mutations across 726 multiple sequence alignments derived from 33 non-human species absent of disease. A total of 58 pathogenic point mutations arise in the sequences of these species. We assessed the sequence context and found evidence of population variants that could modulate the phenotypic expression of these point mutations masking the pathogenic effects seen in humans. This supports the theory that sequence context is influential in the presentation of mtDNA disease and has implications for diagnostic practices. We have shown that our current understanding of the pathogenicity of mtDNA point mutations, primarily built on studies of individuals with haplogroups HVUKTJ, will not present a complete picture. This will have the effect of creating a diagnostic inequality, whereby individuals who do not belong to these lineages are less likely to receive a genetic diagnosis.

12.
ACS Synth Biol ; 8(7): 1498-1514, 2019 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-31059645

RESUMO

Standard representation of data is key for the reproducibility of designs in synthetic biology. The Synthetic Biology Open Language (SBOL) has already emerged as a data standard to represent information about genetic circuits, and it is based on capturing data using graphs. The language provides the syntax using a free text document that is accessible to humans only. This paper describes SBOL-OWL, an ontology for a machine understandable definition of SBOL. This ontology acts as a semantic layer for genetic circuit designs. As a result, computational tools can understand the meaning of design entities in addition to parsing structured SBOL data. SBOL-OWL not only describes how genetic circuits can be constructed computationally, it also facilitates the use of several existing Semantic Web tools for synthetic biology. This paper demonstrates some of these features, for example, to validate designs and check for inconsistencies. Through the use of SBOL-OWL, queries can be simplified and become more intuitive. Moreover, existing reasoners can be used to infer information about genetic circuit designs that cannot be directly retrieved using existing querying mechanisms. This ontological representation of the SBOL standard provides a new perspective to the verification, representation, and querying of information about genetic circuits and is important to incorporate complex design information via the integration of biological ontologies.


Assuntos
Redes Reguladoras de Genes/genética , Biologia Sintética/métodos , Humanos , Modelos Biológicos , Linguagens de Programação , Reprodutibilidade dos Testes , Semântica , Software
13.
BMC Bioinformatics ; 8: 57, 2007 Feb 20.
Artigo em Inglês | MEDLINE | ID: mdl-17311682

RESUMO

The bio-ontology community falls into two camps: first we have biology domain experts, who actually hold the knowledge we wish to capture in ontologies; second, we have ontology specialists, who hold knowledge about techniques and best practice on ontology development. In the bio-ontology domain, these two camps have often come into conflict, especially where pragmatism comes into conflict with perceived best practice. One of these areas is the insistence of computer scientists on a well-defined semantic basis for the Knowledge Representation language being used. In this article, we will first describe why this community is so insistent. Second, we will illustrate this by examining the semantics of the Web Ontology Language and the semantics placed on the Directed Acyclic Graph as used by the Gene Ontology. Finally we will reconcile the two representations, including the broader Open Biomedical Ontologies format. The ability to exchange between the two representations means that we can capitalise on the features of both languages. Such utility can only arise by the understanding of the semantics of the languages being used. By this illustration of the usefulness of a clear, well-defined language semantics, we wish to promote a wider understanding of the computer science perspective amongst potential users within the biological community.


Assuntos
Inteligência Artificial , Bases de Dados Genéticas , Genes , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Linguagens de Programação , Proteínas/classificação , Biologia Computacional/métodos , Sistemas de Gerenciamento de Base de Dados , Filogenia , Interface Usuário-Computador
14.
PLoS One ; 12(11): e0187862, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29161289

RESUMO

Mitochondrial DNA (mtDNA) mutations are well recognized as an important cause of inherited disease. Diseases caused by mtDNA mutations exhibit a high degree of clinical heterogeneity with a complex genotype-phenotype relationship, with many such mutations exhibiting incomplete penetrance. There is evidence that the spectrum of mutations causing mitochondrial disease might differ between different mitochondrial lineages (haplogroups) seen in different global populations. This would point to the importance of sequence context in the expression of mutations. To explore this possibility, we looked for mutations which are known to cause disease in humans, in animals of other species unaffected by mtDNA disease. The mt-tRNA genes are the location of many pathogenic mutations, with the m.3243A>G mutation on the mt-tRNA-Leu(UUR) being the most frequently seen mutation in humans. This study looked for the presence of m.3243A>G in 2784 sequences from 33 species, as well as any of the other mutations reported in association with disease located on mt-tRNA-Leu(UUR). We report a number of disease associated variations found on mt-tRNA-Leu(UUR) in other chordates, as the major population variant, with m.3243A>G being seen in 6 species. In these, we also found a number of mutations which appear compensatory and which could prevent the pathogenicity associated with this change in humans. This work has important implications for the discovery and diagnosis of mtDNA mutations in non-European populations. In addition, it might provide a partial explanation for the conflicting results in the literature that examines the role of mtDNA variants in complex traits.


Assuntos
DNA Mitocondrial/genética , Mitocôndrias/genética , Doenças Mitocondriais/genética , RNA de Transferência/genética , Animais , Humanos , Doenças Mitocondriais/patologia , Mutação/genética , Especificidade da Espécie
15.
J Biomed Semantics ; 8(1): 54, 2017 Nov 28.
Artigo em Inglês | MEDLINE | ID: mdl-29179777

RESUMO

BACKGROUND: There are many challenges associated with ontology building, as the process often touches on many different subject areas; it needs knowledge of the problem domain, an understanding of the ontology formalism, software in use and, sometimes, an understanding of the philosophical background. In practice, it is very rare that an ontology can be completed by a single person, as they are unlikely to combine all of these skills. So people with these skills must collaborate. One solution to this is to use face-to-face meetings, but these can be expensive and time-consuming for teams that are not co-located. Remote collaboration is possible, of course, but one difficulty here is that domain specialists use a wide-variety of different "formalisms" to represent and share their data - by the far most common, however, is the "office file" either in the form of a word-processor document or a spreadsheet. Here we describe the development of an ontology of immunological cell types; this was initially developed by domain specialists using an Excel spreadsheet for collaboration. We have transformed this spreadsheet into an ontology using highly-programmatic and pattern-driven ontology development. Critically, the spreadsheet remains part of the source for the ontology; the domain specialists are free to update it, and changes will percolate to the end ontology. RESULTS: We have developed a new ontology describing immunological cell lines built by instantiating ontology design patterns written programmatically, using values from a spreadsheet catalogue. CONCLUSIONS: This method employs a spreadsheet that was developed by domain experts. The spreadsheet is unconstrained in its usage and can be freely updated resulting in a new ontology. This provides a general methodology for ontology development using data generated by domain specialists.


Assuntos
Ontologias Biológicas , Web Semântica , Semântica , Software , Animais , Humanos , Internet , Ferramenta de Busca/métodos , Terminologia como Assunto
16.
ACS Synth Biol ; 5(10): 1086-1097, 2016 10 21.
Artigo em Inglês | MEDLINE | ID: mdl-27110921

RESUMO

One aim of synthetic biologists is to create novel and predictable biological systems from simpler modular parts. This approach is currently hampered by a lack of well-defined and characterized parts and devices. However, there is a wealth of existing biological information, which can be used to identify and characterize biological parts, and their design constraints in the literature and numerous biological databases. However, this information is spread among these databases in many different formats. New computational approaches are required to make this information available in an integrated format that is more amenable to data mining. A tried and tested approach to this problem is to map disparate data sources into a single data set, with common syntax and semantics, to produce a data warehouse or knowledge base. Ontologies have been used extensively in the life sciences, providing this common syntax and semantics as a model for a given biological domain, in a fashion that is amenable to computational analysis and reasoning. Here, we present an ontology for applications in synthetic biology design, SyBiOnt, which facilitates the modeling of information about biological parts and their relationships. SyBiOnt was used to create the SyBiOntKB knowledge base, incorporating and building upon existing life sciences ontologies and standards. The reasoning capabilities of ontologies were then applied to automate the mining of biological parts from this knowledge base. We propose that this approach will be useful to speed up synthetic biology design and ultimately help facilitate the automation of the biological engineering life cycle.


Assuntos
Mineração de Dados , Bases de Dados Genéticas , Biologia Sintética , Bacillus subtilis/genética , Bacillus subtilis/metabolismo , Biologia Computacional , DNA Bacteriano/genética , Bases de Conhecimento , Regiões Promotoras Genéticas , Análise de Sequência de DNA , Software
17.
PeerJ ; 4: e2300, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27635311

RESUMO

Cellular therapies with tolerogenic antigen-presenting cells (tolAPC) show great promise for the treatment of autoimmune diseases and for the prevention of destructive immune responses after transplantation. The methodologies for generating tolAPC vary greatly between different laboratories, making it difficult to compare data from different studies; thus constituting a major hurdle for the development of standardised tolAPC therapeutic products. Here we describe an initiative by members of the tolAPC field to generate a minimum information model for tolAPC (MITAP), providing a reporting framework that will make differences and similarities between tolAPC products transparent. In this way, MITAP constitutes a first but important step towards the production of standardised and reproducible tolAPC for clinical application.

18.
PLoS One ; 11(4): e0154556, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27128319

RESUMO

The Ontology for Biomedical Investigations (OBI) is an ontology that provides terms with precisely defined meanings to describe all aspects of how investigations in the biological and medical domains are conducted. OBI re-uses ontologies that provide a representation of biomedical knowledge from the Open Biological and Biomedical Ontologies (OBO) project and adds the ability to describe how this knowledge was derived. We here describe the state of OBI and several applications that are using it, such as adding semantic expressivity to existing databases, building data entry forms, and enabling interoperability between knowledge resources. OBI covers all phases of the investigation process, such as planning, execution and reporting. It represents information and material entities that participate in these processes, as well as roles and functions. Prior to OBI, it was not possible to use a single internally consistent resource that could be applied to multiple types of experiments for these applications. OBI has made this possible by creating terms for entities involved in biological and medical investigations and by importing parts of other biomedical ontologies such as GO, Chemical Entities of Biological Interest (ChEBI) and Phenotype Attribute and Trait Ontology (PATO) without altering their meaning. OBI is being used in a wide range of projects covering genomics, multi-omics, immunology, and catalogs of services. OBI has also spawned other ontologies (Information Artifact Ontology) and methods for importing parts of ontologies (Minimum information to reference an external ontology term (MIREOT)). The OBI project is an open cross-disciplinary collaborative effort, encompassing multiple research communities from around the globe. To date, OBI has created 2366 classes and 40 relations along with textual and formal definitions. The OBI Consortium maintains a web resource (http://obi-ontology.org) providing details on the people, policies, and issues being addressed in association with OBI. The current release of OBI is available at http://purl.obolibrary.org/obo/obi.owl.


Assuntos
Ontologias Biológicas , Animais , Ontologias Biológicas/organização & administração , Ontologias Biológicas/estatística & dados numéricos , Ontologias Biológicas/tendências , Biologia Computacional , Bases de Dados Factuais , Humanos , Internet , Metadados , Semântica , Software
19.
PLoS One ; 8(10): e75541, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24143170

RESUMO

A constant influx of new data poses a challenge in keeping the annotation in biological databases current. Most biological databases contain significant quantities of textual annotation, which often contains the richest source of knowledge. Many databases reuse existing knowledge; during the curation process annotations are often propagated between entries. However, this is often not made explicit. Therefore, it can be hard, potentially impossible, for a reader to identify where an annotation originated from. Within this work we attempt to identify annotation provenance and track its subsequent propagation. Specifically, we exploit annotation reuse within the UniProt Knowledgebase (UniProtKB), at the level of individual sentences. We describe a visualisation approach for the provenance and propagation of sentences in UniProtKB which enables a large-scale statistical analysis. Initially levels of sentence reuse within UniProtKB were analysed, showing that reuse is heavily prevalent, which enables the tracking of provenance and propagation. By analysing sentences throughout UniProtKB, a number of interesting propagation patterns were identified, covering over [Formula: see text] sentences. Over [Formula: see text] sentences remain in the database after they have been removed from the entries where they originally occurred. Analysing a subset of these sentences suggest that approximately [Formula: see text] are erroneous, whilst [Formula: see text] appear to be inconsistent. These results suggest that being able to visualise sentence propagation and provenance can aid in the determination of the accuracy and quality of textual annotation. Source code and supplementary data are available from the authors website at http://homepages.cs.ncl.ac.uk/m.j.bell1/sentence_analysis/.


Assuntos
Gráficos por Computador , Bases de Dados de Proteínas , Anotação de Sequência Molecular/métodos , Projetos de Pesquisa , Intervalos de Confiança , Armazenamento e Recuperação da Informação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA