Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 50(D1): D587-D595, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34850110

RESUMO

Molecular interactions are key drivers of biological function. Providing interaction resources to the research community is important since they allow functional interpretation and network-based analysis of molecular data. ConsensusPathDB (http://consensuspathdb.org) is a meta-database combining interactions of diverse types from 31 public resources for humans, 16 for mice and 14 for yeasts. Using ConsensusPathDB, researchers commonly evaluate lists of genes, proteins and metabolites against sets of molecular interactions defined by pathways, Gene Ontology and network neighborhoods and retrieve complex molecular neighborhoods formed by heterogeneous interaction types. Furthermore, the integrated protein-protein interaction network is used as a basis for propagation methods. Here, we present the 2022 update of ConsensusPathDB, highlighting content growth, additional functionality and improved database stability. For example, the number of human molecular interactions increased to 859 848 connecting 200 499 unique physical entities such as genes/proteins, metabolites and drugs. Furthermore, we integrated regulatory datasets in the form of transcription factor-, microRNA- and enhancer-gene target interactions, thus providing novel functionality in the context of overrepresentation and enrichment analyses. We specifically emphasize the use of the integrated protein-protein interaction network as a scaffold for network inferences, present topological characteristics of the network and discuss strengths and shortcomings of such approaches.


Assuntos
Bases de Dados Genéticas , Mapas de Interação de Proteínas/genética , Proteínas/genética , Software , Animais , Biologia Computacional/tendências , Ontologia Genética/tendências , Redes Reguladoras de Genes/genética , Humanos , Camundongos , MicroRNAs/classificação , MicroRNAs/genética , Proteínas/classificação , Interface Usuário-Computador
2.
Nucleic Acids Res ; 47(D1): D330-D338, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30395331

RESUMO

The Gene Ontology resource (GO; http://geneontology.org) provides structured, computable knowledge regarding the functions of genes and gene products. Founded in 1998, GO has become widely adopted in the life sciences, and its contents are under continual improvement, both in quantity and in quality. Here, we report the major developments of the GO resource during the past two years. Each monthly release of the GO resource is now packaged and given a unique identifier (DOI), enabling GO-based analyses on a specific release to be reproduced in the future. The molecular function ontology has been refactored to better represent the overall activities of gene products, with a focus on transcription regulator activities. Quality assurance efforts have been ramped up to address potentially out-of-date or inaccurate annotations. New evidence codes for high-throughput experiments now enable users to filter out annotations obtained from these sources. GO-CAM, a new framework for representing gene function that is more expressive than standard GO annotations, has been released, and users can now explore the growing repository of these models. We also provide the 'GO ribbon' widget for visualizing GO annotations to a gene; the widget can be easily embedded in any web page.


Assuntos
Ontologia Genética/história , Animais , Bactérias/genética , Eucariotos/genética , Ontologia Genética/organização & administração , Ontologia Genética/tendências , Ensaios de Triagem em Larga Escala , História do Século XX , História do Século XXI , Humanos , Proteínas Quinases Ativadas por Mitógeno/genética , Anotação de Sequência Molecular , Controle de Qualidade
3.
Nucleic Acids Res ; 47(W1): W402-W407, 2019 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-31251384

RESUMO

The PSIPRED Workbench is a web server offering a range of predictive methods to the bioscience community for 20 years. Here, we present the work we have completed to update the PSIPRED Protein Analysis Workbench and make it ready for the next 20 years. The main focus of our recent website upgrade work has been the acceleration of analyses in the face of increasing protein sequence database size. We additionally discuss any new software, the new hardware infrastructure, our webservices and web site. Lastly we survey updates to some of the key predictive algorithms available through our website.


Assuntos
Ontologia Genética/tendências , Anotação de Sequência Molecular/métodos , Proteínas/química , Software/história , Sequência de Aminoácidos , Sítios de Ligação , Ontologia Genética/história , História do Século XXI , Internet , Modelos Moleculares , Anotação de Sequência Molecular/história , Ligação Proteica , Conformação Proteica em alfa-Hélice , Conformação Proteica em Folha beta , Domínios e Motivos de Interação entre Proteínas , Proteínas/história , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos
4.
PLoS Comput Biol ; 15(11): e1007419, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31682632

RESUMO

Automated protein annotation using the Gene Ontology (GO) plays an important role in the biosciences. Evaluation has always been considered central to developing novel annotation methods, but little attention has been paid to the evaluation metrics themselves. Evaluation metrics define how well an annotation method performs and allows for them to be ranked against one another. Unfortunately, most of these metrics were adopted from the machine learning literature without establishing whether they were appropriate for GO annotations. We propose a novel approach for comparing GO evaluation metrics called Artificial Dilution Series (ADS). Our approach uses existing annotation data to generate a series of annotation sets with different levels of correctness (referred to as their signal level). We calculate the evaluation metric being tested for each annotation set in the series, allowing us to identify whether it can separate different signal levels. Finally, we contrast these results with several false positive annotation sets, which are designed to expose systematic weaknesses in GO assessment. We compared 37 evaluation metrics for GO annotation using ADS and identified drastic differences between metrics. We show that some metrics struggle to differentiate between different signal levels, while others give erroneously high scores to the false positive data sets. Based on our findings, we provide guidelines on which evaluation metrics perform well with the Gene Ontology and propose improvements to several well-known evaluation metrics. In general, we argue that evaluation metrics should be tested for their performance and we provide software for this purpose (https://bitbucket.org/plyusnin/ads/). ADS is applicable to other areas of science where the evaluation of prediction results is non-trivial.


Assuntos
Biologia Computacional/métodos , Anotação de Sequência Molecular/classificação , Anotação de Sequência Molecular/métodos , Algoritmos , Benchmarking/métodos , Bases de Dados Genéticas , Bases de Dados de Proteínas , Ontologia Genética/tendências , Reprodutibilidade dos Testes , Software
5.
Nucleic Acids Res ; 46(W1): W84-W88, 2018 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-29741643

RESUMO

The unprecedented growth of high-throughput sequencing has led to an ever-widening annotation gap in protein databases. While computational prediction methods are available to make up the shortfall, a majority of public web servers are hindered by practical limitations and poor performance. Here, we introduce PANNZER2 (Protein ANNotation with Z-scoRE), a fast functional annotation web server that provides both Gene Ontology (GO) annotations and free text description predictions. PANNZER2 uses SANSparallel to perform high-performance homology searches, making bulk annotation based on sequence similarity practical. PANNZER2 can output GO annotations from multiple scoring functions, enabling users to see which predictions are robust across predictors. Finally, PANNZER2 predictions scored within the top 10 methods for molecular function and biological process in the CAFA2 NK-full benchmark. The PANNZER2 web server is updated on a monthly schedule and is accessible at http://ekhidna2.biocenter.helsinki.fi/sanspanz/. The source code is available under the GNU Public Licence v3.


Assuntos
Biologia Computacional/tendências , Ontologia Genética/tendências , Internet , Software , Algoritmos , Bases de Dados de Proteínas/tendências , Sequenciamento de Nucleotídeos em Larga Escala , Anotação de Sequência Molecular
6.
Am J Hum Genet ; 97(1): 111-24, 2015 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-26119816

RESUMO

The Human Phenotype Ontology (HPO) is widely used in the rare disease community for differential diagnostics, phenotype-driven analysis of next-generation sequence-variation data, and translational research, but a comparable resource has not been available for common disease. Here, we have developed a concept-recognition procedure that analyzes the frequencies of HPO disease annotations as identified in over five million PubMed abstracts by employing an iterative procedure to optimize precision and recall of the identified terms. We derived disease models for 3,145 common human diseases comprising a total of 132,006 HPO annotations. The HPO now comprises over 250,000 phenotypic annotations for over 10,000 rare and common diseases and can be used for examining the phenotypic overlap among common diseases that share risk alleles, as well as between Mendelian diseases and common diseases linked by genomic location. The annotations, as well as the HPO itself, are freely available.


Assuntos
Ontologia Genética/tendências , Doenças Genéticas Inatas/classificação , Doenças Genéticas Inatas/genética , Fenótipo , Terminologia como Assunto , Doenças Genéticas Inatas/patologia , Humanos , MEDLINE , Modelos Biológicos
7.
J Endocrinol Invest ; 41(10): 1237-1245, 2018 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-29520684

RESUMO

OBJECTIVE: To identify novel clinically relevant genes in papillary thyroid carcinoma from public databases. METHODS: Four original microarray datasets, GSE3678, GSE3467, GSE33630 and GSE58545, were downloaded. Differentially expressed genes (DEGs) were filtered from integrated data. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed, followed by protein-protein interaction (PPI) network construction. The CentiScape pug-in was performed to scale degree. The genes at the top of the degree distribution (≥ 95% percentile) in the significantly perturbed networks were defined as central genes. UALCAN and The Cancer Genome Atlas Clinical Explorer were used to verify clinically relevant genes and perform survival analysis. RESULT: 225 commonly changed DEGs (111 up-regulated and 114 down-regulated) were identified. The DEGs were classified into three groups by GO terms. KEGG pathway enrichment analysis showed DEGs mainly enriched in the PI3K-Akt signaling pathway, pathways in cancer, focal adhesion and proteoglycans in cancer. DEGs' protein-protein interaction (PPI) network complex was developed; six central genes (BCL2, CCND1, FN1, IRS1, COL1A1, CXCL12) were identified. Among them, BCL2, CCND1 and COL1A1 were identified as clinically relevant genes. CONCLUSION: BCL2, CCND1 and COL1A1 may be key genes for papillary thyroid carcinoma. Further molecular biological experiments are required to confirm the function of the identified genes.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes/genética , Câncer Papilífero da Tireoide/genética , Neoplasias da Glândula Tireoide/genética , Biologia Computacional/tendências , Bases de Dados Genéticas/tendências , Perfilação da Expressão Gênica/tendências , Ontologia Genética/tendências , Humanos , Câncer Papilífero da Tireoide/diagnóstico , Neoplasias da Glândula Tireoide/diagnóstico
8.
Arch Gynecol Obstet ; 297(1): 161-183, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-29063236

RESUMO

OBJECTIVE: Breast cancer is a severe risk to public health and has adequately convoluted pathogenesis. Therefore, the description of key molecular markers and pathways is of much importance for clarifying the molecular mechanism of breast cancer-associated fibroblasts initiation and progression. Breast cancer-associated fibroblasts gene expression dataset was downloaded from Gene Expression Omnibus database. METHODS: A total of nine samples, including three normal fibroblasts, three granulin-stimulated fibroblasts and three cancer-associated fibroblasts samples, were used to identify differentially expressed genes (DEGs) between normal fibroblasts, granulin-stimulated fibroblasts and cancer-associated fibroblasts samples. The gene ontology (GO) and pathway enrichment analysis was performed, and protein-protein interaction (PPI) network of the DEGs was constructed by NetworkAnalyst software. RESULTS: Totally, 190 DEGs were identified, including 66 up-regulated and 124 down-regulated genes. GO analysis results showed that up-regulated DEGs were significantly enriched in biological processes (BP), including cell-cell signalling and negative regulation of cell proliferation; molecular function (MF), including insulin-like growth factor II binding and insulin-like growth factor I binding; cellular component (CC), including insulin-like growth factor binding protein complex and integral component of plasma membrane; the down-regulated DEGs were significantly enriched in BP, including cell adhesion and extracellular matrix organization; MF, including N-acetylgalactosamine 4-sulfate 6-O-sulfotransferase activity and calcium ion binding; CC, including extracellular space and extracellular matrix. WIKIPATHWAYS analysis showed the up-regulated DEGs were enriched in myometrial relaxation and contraction pathways. WIKIPATHWAYS, REACTOME, PID_NCI and KEGG pathway analysis showed the down-regulated DEGs were enriched endochondral ossification, TGF beta signalling pathway, integrin cell surface interactions, beta1 integrin cell surface interactions, malaria and glycosaminoglycan biosynthesis-chondroitin sulfate/dermatan sulphate. The top 5 up-regulated hub genes, CDKN2A, MME, PBX1, IGFBP3, and TFAP2C and top 5 down-regulated hub genes VCAM1, KRT18, TGM2, ACTA2, and STAMBP were identified from the PPI network, and subnetworks revealed these genes were involved in significant pathways, including myometrial relaxation and contraction pathways, integrin cell surface interactions, beta1 integrin cell surface interaction. Besides, the target hsa-mirs for DEGs were identified. hsa-mir-759, hsa-mir-4446-5p, hsa-mir-219a-1-3p and hsa-mir-26a-5p were important miRNAs in this study. CONCLUSIONS: We pinpoint important key genes and pathways closely related with breast cancer-associated fibroblasts initiation and progression by a series of bioinformatics analysis on DEGs. These screened genes and pathways provided for a more detailed molecular mechanism underlying breast cancer-associated fibroblasts occurrence and progression, holding promise for acting as molecular markers and probable therapeutic targets.


Assuntos
Neoplasias da Mama/genética , Fibroblastos Associados a Câncer/metabolismo , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Ontologia Genética/tendências , Neoplasias da Mama/patologia , Feminino , Humanos
9.
BMC Bioinformatics ; 18(1): 177, 2017 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-28320317

RESUMO

BACKGROUND: The number of genomics and proteomics experiments is growing rapidly, producing an ever-increasing amount of data that are awaiting functional interpretation. A number of function prediction algorithms were developed and improved to enable fast and automatic function annotation. With the well-defined structure and manual curation, Gene Ontology (GO) is the most frequently used vocabulary for representing gene functions. To understand relationship and similarity between GO annotations of genes, it is important to have a convenient pipeline that quantifies and visualizes the GO function analyses in a systematic fashion. RESULTS: NaviGO is a web-based tool for interactive visualization, retrieval, and computation of functional similarity and associations of GO terms and genes. Similarity of GO terms and gene functions is quantified with six different scores including protein-protein interaction and context based association scores we have developed in our previous works. Interactive navigation of the GO function space provides intuitive and effective real-time visualization of functional groupings of GO terms and genes as well as statistical analysis of enriched functions. CONCLUSIONS: We developed NaviGO, which visualizes and analyses functional similarity and associations of GO terms and genes. The NaviGO webserver is freely available at: http://kiharalab.org/web/navigo .


Assuntos
Ontologia Genética/tendências , Genômica/métodos
10.
Methods ; 74: 3-15, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25088781

RESUMO

As the amount of genome information increases rapidly, there is a correspondingly greater need for methods that provide accurate and automated annotation of gene function. For example, many high-throughput technologies--e.g., next-generation sequencing--are being used today to generate lists of genes associated with specific conditions. However, their functional interpretation remains a challenge and many tools exist trying to characterize the function of gene-lists. Such systems rely typically in enrichment analysis and aim to give a quick insight into the underlying biology by presenting it in a form of a summary-report. While the load of annotation may be alleviated by such computational approaches, the main challenge in modern annotation remains to develop a systems form of analysis in which a pipeline can effectively analyze gene-lists quickly and identify aggregated annotations through computerized resources. In this article we survey some of the many such tools and methods that have been developed to automatically interpret the biological functions underlying gene-lists. We overview current functional annotation aspects from the perspective of their epistemology (i.e., the underlying theories used to organize information about gene function into a body of verified and documented knowledge) and find that most of the currently used functional annotation methods fall broadly into one of two categories: they are based either on 'known' formally-structured ontology annotations created by 'experts' (e.g., the GO terms used to describe the function of Entrez Gene entries), or--perhaps more adventurously--on annotations inferred from literature (e.g., many text-mining methods use computer-aided reasoning to acquire knowledge represented in natural languages). Overall however, deriving detailed and accurate insight from such gene lists remains a challenging task, and improved methods are called for. In particular, future methods need to (1) provide more holistic insight into the underlying molecular systems; (2) provide better follow-up experimental testing and treatment options, and (3) better manage gene lists derived from organisms that are not well-studied. We discuss some promising approaches that may help achieve these advances, especially the use of extended dictionaries of biomedical concepts and molecular mechanisms, as well as greater use of annotation benchmarks.


Assuntos
Mineração de Dados/métodos , Bases de Dados Genéticas , Ontologia Genética , Animais , Mineração de Dados/tendências , Bases de Dados Genéticas/tendências , Ontologia Genética/tendências , Humanos
11.
Psychiatry Res ; 293: 113387, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32823199

RESUMO

Because the pathogenesis of major depressive disorder (MDD) is still unclear and the accurate diagnosis remains unavailable, we aimed to analyze its molecular mechanisms and develop a gene classifier to improve diagnostic accuracy. We extracted differentially expressed genes from two datasets, GSE45642 (from brain tissue samples) and GSE98793 (from blood samples), and found three key modules to have a significant correlation with MDD traits by weighted gene coexpression network analysis. Hub genes were identified from the key modules according to the connectivity degree in the network and subjected to least absolute shrinkage and selection operator regression analysis. A total of eighty-five hub genes were selected to construct the gene classifier, which had considerable ability to recognize MDD patients in the training set and test set. In addition, the relationship between the key MDD modules and brain tissues indicated that the anterior cingulate should be a notable region in the study of MDD pathogenesis. The results of Gene Ontology (GO) and pathway enrichment analyses reiterate the relationship between depression and immunity. Therefore we identified MDD hub genes in the InnateDB database, and found 14 genes involved in both MDD and the inflammatory response.


Assuntos
Transtorno Depressivo Maior/diagnóstico , Transtorno Depressivo Maior/genética , Perfilação da Expressão Gênica/métodos , Ontologia Genética , Redes Reguladoras de Genes/genética , Estudos de Associação Genética/métodos , Feminino , Ontologia Genética/tendências , Humanos , Masculino
12.
Eur Rev Med Pharmacol Sci ; 24(3): 1134-1141, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-32096169

RESUMO

OBJECTIVE: The morbidity and mortality of patients with colorectal cancer, one of the most common malignant tumors worldwide, is steadily increasing. The aim of this study was to investigate the association between prognostic immune-related gene profile and the outcome of colorectal cancer in patients by analyzing datasets from The Cancer Genome Atlas (TCGA). MATERIALS AND METHODS: Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) further demonstrated that these genes were enriched in many immune-related biological processes. Univariate Cox regression analysis was applied to examine the association of immune-related genes with the prognosis in patients with colorectal cancer. The least absolute shrinkage and selection operation (LASSO) Cox regression model was then used to establish the immune-related signature for the prognostic evaluation of colorectal cancer in patients. Survival differences were assessed by the Kaplan-Meier method along with the log-rank test. RESULTS: A total of 133 prognostic immune-related signatures were identified by using the univariate Cox proportional hazards regression analysis. A 14-gene signature-based risk score was constructed using the LASSO Cox regression. According to the cut-off of the risk-score, patients were assigned to the low-risk and high-risk groups. The log-rank test suggested that the survival time of the low-risk group was significantly higher than that of the high-risk group. In the time-dependent ROC curve analysis, the AUC for 1-year, 3-year, and 5-year overall survival (OS) were 0.781, 0.742, and 0.791, respectively. GO and KEGG analysis further revealed that the gene sets were actively involved in immune and inflammatory response, as well as the cytokine-cytokine receptor interaction pathway. CONCLUSIONS: To summarize, we identified a novel 14-gene immune-related signature that may potentially serve as a prognostic predictor for colorectal cancer, thereby contributing to patient personalized treatment decisions. Further research needs to be conducted to validate the prognostic value of the selected genes.


Assuntos
Neoplasias Colorretais/genética , Neoplasias Colorretais/imunologia , Bases de Dados Genéticas/tendências , Ontologia Genética/tendências , Imunidade Celular/imunologia , Idoso , Neoplasias Colorretais/diagnóstico , Neoplasias Colorretais/patologia , Intervalo Livre de Doença , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Valor Preditivo dos Testes , Prognóstico
13.
Medicine (Baltimore) ; 99(34): e21863, 2020 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-32846838

RESUMO

Dermatomyositis is a common connective tissue disease. The occurrence and development of dermatomyositis is a result of multiple factors, but its exact pathogenesis has not been fully elucidated. Here, we used biological information method to explore and predict the major disease related genes of dermatomyositis and to find the underlying pathogenic molecular mechanism.The gene expression data of GDS1956, GDS2153, GDS2855, and GDS3417 including 94 specimens, 66 cases of dermatomyositis specimens and 28 cases of normal specimens, were obtained from the Gene Expression Omnibus database. The 4 microarray gene data groups were combined to get differentially expressed genes (DEGs). The gene ontology and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichments of DEGs were operated by the database for annotation, visualization and integrated discovery and KEGG orthology based annotation system databases, separately. The protein-protein interaction networks of the DEGs were built from the STRING website. A total of 4097 DEGs were extracted from the 4 Gene Expression Omnibus datasets, of which 2213 genes were upregulated, and 1884 genes were downregulated. Gene ontology analysis indicated that the biological functions of DEGs focused primarily on response to virus, type I interferon signaling pathway and negative regulation of viral genome replication. The main cellular components include extracellular space, cytoplasm, and blood microparticle. The molecular functions include protein binding, double-stranded RNA binding and MHC class I protein binding. KEGG pathway analysis showed that these DEGs were mainly involved in the toll-like receptor signaling pathway, cytosolic DNA-sensing pathway, RIG-I-like receptor signaling pathway, complement and coagulation cascades, arginine and proline metabolism, phagosome signaling pathway. The following 13 closely related genes, XAF1, NT5E, UGCG, GBP2, TLR3, DDX58, STAT1, GBP1, PLSCR1, OAS3, SP100, IGK, and RSAD2, were key nodes from the protein-protein interaction network.This research suggests that exploring for DEGs and pathways in dermatomyositis using integrated bioinformatics methods could help us realize the molecular mechanism underlying the development of dermatomyositis, be of actual implication for the early detection and prophylaxis of dermatomyositis and afford reliable goals for the curing of dermatomyositis.


Assuntos
Biologia Computacional/instrumentação , Dermatomiosite/genética , Ontologia Genética/tendências , Interferon Tipo I/genética , Mapas de Interação de Proteínas/genética , Dermatomiosite/epidemiologia , Motivo de Ligação ao RNA de Cadeia Dupla/genética , Regulação para Baixo , Humanos , Incidência , Análise em Microsséries/métodos , Anotação de Sequência Molecular/métodos , Ligação Proteica , Transdução de Sinais , Regulação para Cima
14.
Gigascience ; 7(8)2018 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-30107399

RESUMO

Background: The Gene Ontology (GO) is one of the most widely used resources in molecular and cellular biology, largely through the use of "enrichment analysis." To facilitate informed use of GO, we present GOtrack (https://gotrack.msl.ubc.ca), which provides access to historical records and trends in the GO and GO annotations. Findings: GOtrack gives users access to gene- and term-level information on annotations for nine model organisms as well as an interactive tool that measures the stability of enrichment results over time for user-provided "hit lists" of genes. To document the effects of GO evolution on enrichment, we analyzed more than 2,500 published hit lists of human genes (most older than 9 years ); 53% of hit lists were considered to yield significantly stable enrichment results. Conclusions: Because stability is far from assured for any individual hit list, GOtrack can lead to more informed and cautious application of GO to genomics research.


Assuntos
Ontologia Genética/tendências , Genômica/métodos , Anotação de Sequência Molecular/tendências , Animais , Eucariotos/genética , Humanos
15.
PLoS One ; 8(10): e75993, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24146805

RESUMO

Ontologies support automatic sharing, combination and analysis of life sciences data. They undergo regular curation and enrichment. We studied the impact of an ontology evolution on its structural complexity. As a case study we used the sixty monthly releases between January 2008 and December 2012 of the Gene Ontology and its three independent branches, i.e. biological processes (BP), cellular components (CC) and molecular functions (MF). For each case, we measured complexity by computing metrics related to the size, the nodes connectivity and the hierarchical structure. The number of classes and relations increased monotonously for each branch, with different growth rates. BP and CC had similar connectivity, superior to that of MF. Connectivity increased monotonously for BP, decreased for CC and remained stable for MF, with a marked increase for the three branches in November and December 2012. Hierarchy-related measures showed that CC and MF had similar proportions of leaves, average depths and average heights. BP had a lower proportion of leaves, and a higher average depth and average height. For BP and MF, the late 2012 increase of connectivity resulted in an increase of the average depth and average height and a decrease of the proportion of leaves, indicating that a major enrichment effort of the intermediate-level hierarchy occurred. The variation of the number of classes and relations in an ontology does not provide enough information about the evolution of its complexity. However, connectivity and hierarchy-related metrics revealed different patterns of values as well as of evolution for the three branches of the Gene Ontology. CC was similar to BP in terms of connectivity, and similar to MF in terms of hierarchy. Overall, BP complexity increased, CC was refined with the addition of leaves providing a finer level of annotations but decreasing slightly its complexity, and MF complexity remained stable.


Assuntos
Biologia Computacional/história , Ontologia Genética/tendências , Vocabulário Controlado/história , Ontologia Genética/estatística & dados numéricos , História do Século XXI , Humanos , Fatores de Tempo
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa