Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 62
Filtrar
1.
Sensors (Basel) ; 24(4)2024 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-38400265

RESUMO

Activities of daily living (ADLs) are fundamental routine tasks that the majority of physically and mentally healthy people can independently execute. In this paper, we present a semantic framework for detecting problems in ADLs execution, monitored through smart home sensors. In the context of this work, we conducted a pilot study, gathering raw data from various sensors and devices installed in a smart home environment. The proposed framework combines multiple Semantic Web technologies (i.e., ontology, RDF, triplestore) to handle and transform these raw data into meaningful representations, forming a knowledge graph. Subsequently, SPARQL queries are used to define and construct explicit rules to detect problematic behaviors in ADL execution, a procedure that leads to generating new implicit knowledge. Finally, all available results are visualized in a clinician dashboard. The proposed framework can monitor the deterioration of ADLs performance for people across the dementia spectrum by offering a comprehensive way for clinicians to describe problematic behaviors in the everyday life of an individual.


Assuntos
Atividades Cotidianas , Semântica , Humanos , Projetos Piloto , Software
2.
BMC Med Inform Decis Mak ; 22(1): 16, 2022 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-35042480

RESUMO

BACKGROUND: For standardization of terms in the reports of medical device adverse events, 89 Japanese medical device adverse event terminologies were published in March 2015. The 89 terminologies were developed independently by 13 industry associations, suggesting that there may be inconsistencies among the terms proposed. The purpose of this study was to integrate the 89 sets of terminologies and evaluate inconsistencies among them using SPARQL. METHODS: In order to evaluate the inconsistencies among the integrated terminology, the following six items were evaluated: (1) whether the two-layer structure between category term and preferred term is consistent, (2) whether synonyms of a preferred term are involved. Reversing the layer-category order of matching was also performed, (3) whether each preferred term is subordinate to only one category term, (4) whether the definitions of terms are uniquely determined, (5) whether CDRH-NCIt terms corresponding to preferred terms are uniquely determined, (6) whether a term in a medical device problem is used for patient problems. RESULTS: About 60% of the total number of duplicated terms were found. This is because industry associations that created multiple terminologies adopted the same terms in terminologies of similar medical device groups. In the case that all terms with the same spelling have the same concept, efficient integration can be achieved automatically using RDF. Furthermore, we evaluated six matters of inconsistency in this study, terms that need to be reviewed accounted for about 10% or less than 10% in each item. CONCLUSIONS: The RDF and SPARQL were useful tools to explore inconsistencies of hierarchies, definition statements, and synonyms when integrating terminolgy by term notation, and these had the advantage of reducing the physical and time burden.


Assuntos
Idioma , Humanos , Japão
3.
BMC Microbiol ; 21(1): 325, 2021 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-34809564

RESUMO

BACKGROUND: The abundance of glycomics data that have accumulated has led to the development of many useful databases to aid in the understanding of the function of the glycans and their impact on cellular activity. At the same time, the endeavor for data sharing between glycomics databases with other biological databases have contributed to the creation of new knowledgebases. However, different data types in data description have impeded the data sharing for knowledge integration. To solve this matter, Semantic Web techniques including Resource Description Framework (RDF) and ontology development have been adopted by various groups to standardize the format for data exchange. These semantic data have contributed to the expansion of knowledgebases and hold promises of providing data that can be intelligently processed. On the other hand, bench biologists who are experts in experimental finding are end users and data producers. Therefore, it is indispensable to reduce the technical barrier required for bench biologists to manipulate their experimental data to be compatible with standard formats for data sharing. RESULTS: There are many essential concepts and practical techniques for data integration but there is no method to enable researchers to easily apply Semantic Web techniques to their experimental data. We implemented our procedure on unformatted information of E.coli O-antigen structures collected from the web and show how this information can be expressed as formatted data applicable to Semantic Web standards. In particular, we described the E-coli O-antigen biosynthesis pathway using the BioPAX ontology developed to support data exchange between pathway databases. CONCLUSIONS: The method we implemented to semantically describe O-antigen biosynthesis should be helpful for biologists to understand how glycan information, including relevant pathway reaction data, can be easily shared. We hope this method can contribute to lower the technical barrier that is required when experimental findings are formulated into formal representations and can lead bench scientists to readily participate in the construction of new knowledgebases that are integrated with existing ones. Such integration over the Semantic Web will enable future work in artificial intelligence and machine learning to enable computers to infer new relationships and hypotheses in the life sciences.


Assuntos
Escherichia coli/metabolismo , Disseminação de Informação , Antígenos O/biossíntese , Vias Biossintéticas , Escherichia coli/química , Escherichia coli/genética , Antígenos O/química , Semântica
4.
J Biomed Inform ; 108: 103504, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32673790

RESUMO

This study developed a medicine query system based on Semantic Web and open data especially for self-medication users to search over-the-counter (OTC) medicines. Most existing medicine query systems are based on keyword searches. If users are uncertain about the exact search words, these query systems do not offer effective help. Furthermore, most systems provide inadequate explanations of symptoms and ailments for users to use with confidence. To remedy these issues, this study builds a knowledge base to enable inference-based searches and data mashup for integrating information from across the Web. Three components were identified: (1) building an ontology model to describe the relationships between ailments and symptoms; (2) upgrading medicinal product datasets to link them with the ontology model on a semantic level; and (3) developing a data mashup to integrate web resources to help users to find references. Furthermore, the aim was to develop a web-based application that utilizes inference mechanisms to provide users with tools for interactive manipulation. A pilot experiment for skin ailments was implemented to learn the problem-solving skills of the system. Finally, two experts utilized a content validity index to rate a four-dimension 15-item scale. The evaluation results show that experts found the proposed system excellent for content validity.


Assuntos
Bases de Conhecimento , Web Semântica , Internet , Medicamentos sem Prescrição , Semântica
5.
Int J Mol Sci ; 21(5)2020 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-32143440

RESUMO

The adhesion behavior of human tissue cells changes in vitro, when gravity forces affecting these cells are modified. To understand the mechanisms underlying these changes, proteins involved in cell-cell or cell-extracellular matrix adhesion, their expression, accumulation, localization, and posttranslational modification (PTM) regarding changes during exposure to microgravity were investigated. As the sialylation of adhesion proteins is influencing cell adhesion on Earth in vitro and in vivo, we analyzed the sialylation of cell adhesion molecules detected by omics studies on cells, which change their adhesion behavior when exposed to microgravity. Using a knowledge graph created from experimental omics data and semantic searches across several reference databases, we studied the sialylation of adhesion proteins glycosylated at their extracellular domains with regards to its sensitivity to microgravity. This way, experimental omics data networked with the current knowledge about the binding of sialic acids to cell adhesion proteins, its regulation, and interactions in between those proteins provided insights into the mechanisms behind our experimental findings, suggesting that balancing the sialylation against the de-sialylation of the terminal ends of the adhesion proteins' glycans influences their binding activity. This sheds light on the transition from two- to three-dimensional growth observed in microgravity, mirroring cell migration and cancer metastasis in vivo.


Assuntos
Adesão Celular , Processamento de Proteína Pós-Traducional , Ausência de Peso , Animais , Caderinas/metabolismo , Linhagem Celular Tumoral , Movimento Celular , Bases de Dados Factuais , Matriz Extracelular/metabolismo , Humanos , Receptores de Hialuronatos/metabolismo , Integrinas/metabolismo , Células MCF-7 , Camundongos , Metástase Neoplásica , Domínios Proteicos , Proteoma , Ácidos Siálicos/química
6.
J Biomed Inform ; 100: 103320, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31669288

RESUMO

If monolayers of cancer cells are exposed to microgravity, some of the cells cease adhering to the bottom of a culture flask and join three-dimensional aggregates floating in the culture medium. Searching reasons for this change in phenotype, we performed proteome analyses and learnt that accumulation and posttranslational modification of proteins involved in cell-matrix and cell-cell adhesion are affected. To further investigate these proteins, we developed a methodology to find histological images about focal adhesion complex (FA) proteins. Selecting proteins expressed by human FTC-133 and MCF-7 cancer cells and known to be incorporated in FA, we transformed the experimental data to RDF to establish a core semantic knowledgebase. Applying iterative SPARQL queries to Linked Open Databases, we augmented these data with additional functional, transformation- and aggregation-related relationships. Using reasoning, we retrieved publications with images about the spatial arrangement of proteins incorporated in FA. Contextualizing those images enabled us to gain insights about FA of cells changing their site of growth, and to independently validate our experimental results. This new way to link experimental proteome data to biomedical knowledge from various sources via searching images may generally be applied in science when images are a tool of knowledge dissemination.


Assuntos
Adesões Focais , Proteínas de Neoplasias/metabolismo , Neoplasias/patologia , Proteômica , Semântica , Humanos , Bases de Conhecimento , Células MCF-7
7.
J Proteome Res ; 17(12): 4211-4226, 2018 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-30191714

RESUMO

20,230 protein-coding genes have been predicted from the analysis of the human genome (neXtProt release 2018-01-17), and about 10% of them are still lacking functional annotation, either predicted by bioinformatics tools or captured from experimental reports. A systematic exploration of the available literature on uncharacterized human genes/proteins led to proposal of functional annotations for 113 proteins and to consolidation of a list of 1,862 uncharacterized human proteins. The advanced search functionality of neXtProt was used extensively in order to examine the landscape of the uncharacterized human proteome in terms of subcellular locations, protein-protein interactions, tissue expression, association with diseases, and 3D structure. Finally, a deep data mining in various publicly available resources allowed building functional hypotheses for 26 uncharacterized human proteins validated at protein level (uPE1). These hypotheses cover the fields of cilia biology, male reproduction, metabolism, nervous system, immunity, inflammation, RNA metabolism, and chromatin biology. They will require experimental validation before they can be considered for annotation. Despite technological progresses, the pace of human protein characterization studies is still slow. It could be accelerated by a better integration of existing knowledge resources and by initiating large collaborative projects involving specialists of different biology fields. We hope that our analysis will contribute to set up the ground for such collaborative approaches and will be exploited by the HUPO Human Proteome Project teams committed to characterize uPE1 proteins.


Assuntos
Anotação de Sequência Molecular , Proteoma/genética , Biologia Computacional , Mineração de Dados , Genoma Humano/genética , Humanos , Métodos , Proteoma/análise
8.
Int J Mol Sci ; 19(8)2018 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-30071661

RESUMO

When monolayers of tissue cancer cells of various origins are exposed to real or simulated microgravity, many cells leave the monolayer and assemble to three-dimensional (3D) aggregates (spheroids). In order to define the cellular machinery leading to this change in growth behavior of FTC-133 human thyroid cancer cells and MCF-7 breast cancer cells, we recently performed proteome analyses on these cell lines and determined the proteins' accumulation in monolayer cells grown under 1g-conditions as well as in the cells of spheroids assembled under simulated microgravity during three and 14 days, respectively. At that time, an influence of the increment or decrement of some of the more than 5000 proteins detected in each cell line was investigated. In this study, we focused on posttranslational modifications (PTMs) of proteins. For this purpose, we selected candidates from the list of the proteins detected in the two preceding proteome analyses, which showed significant accumulation in spheroid cells as compared to 1g monolayer cells. Then we searched for those PTMs of the selected proteins, which according to the literature have already been determined experimentally. Using the Semantic Protocol and RDF Query Language (SPARQL), various databases were examined. Most efficient was the search in the latest version of the dbPTM database. In total, we found 72 different classes of PTMs comprising mainly phosphorylation, glycosylation, ubiquitination and acetylation. Most interestingly, in 35 of the 69 proteins, N6 residues of lysine are modifiable.


Assuntos
Mineração de Dados , Bases de Dados Genéticas , Proteínas de Neoplasias , Processamento de Proteína Pós-Traducional , Neoplasias da Glândula Tireoide , Ausência de Peso , Humanos , Células MCF-7 , Proteínas de Neoplasias/biossíntese , Proteínas de Neoplasias/genética , Neoplasias da Glândula Tireoide/genética , Neoplasias da Glândula Tireoide/metabolismo , Neoplasias da Glândula Tireoide/patologia
9.
BMC Bioinformatics ; 18(1): 93, 2017 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-28178937

RESUMO

BACKGROUND: Toward improved interoperability of distributed biological databases, an increasing number of datasets have been published in the standardized Resource Description Framework (RDF). Although the powerful SPARQL Protocol and RDF Query Language (SPARQL) provides a basis for exploiting RDF databases, writing SPARQL code is burdensome for users including bioinformaticians. Thus, an easy-to-use interface is necessary. RESULTS: We developed SPANG, a SPARQL client that has unique features for querying RDF datasets. SPANG dynamically generates typical SPARQL queries according to specified arguments. It can also call SPARQL template libraries constructed in a local system or published on the Web. Further, it enables combinatorial execution of multiple queries, each with a distinct target database. These features facilitate easy and effective access to RDF datasets and integrative analysis of distributed data. CONCLUSIONS: SPANG helps users to exploit RDF datasets by generation and reuse of SPARQL queries through a simple interface. This client will enhance integrative exploitation of biological RDF datasets distributed across the Web. This software package is freely available at http://purl.org/net/spang .


Assuntos
Redes de Comunicação de Computadores , Bases de Dados Factuais , Internet
10.
BMC Bioinformatics ; 18(1): 435, 2017 Oct 02.
Artigo em Inglês | MEDLINE | ID: mdl-28969593

RESUMO

BACKGROUND: There are a large number of biological databases publicly available for scientists in the web. Also, there are many private databases generated in the course of research projects. These databases are in a wide variety of formats. Web standards have evolved in the recent times and semantic web technologies are now available to interconnect diverse and heterogeneous sources of data. Therefore, integration and querying of biological databases can be facilitated by techniques used in semantic web. Heterogeneous databases can be converted into Resource Description Format (RDF) and queried using SPARQL language. Searching for exact queries in these databases is trivial. However, exploratory searches need customized solutions, especially when multiple databases are involved. This process is cumbersome and time consuming for those without a sufficient background in computer science. In this context, a search engine facilitating exploratory searches of databases would be of great help to the scientific community. RESULTS: We present BioCarian, an efficient and user-friendly search engine for performing exploratory searches on biological databases. The search engine is an interface for SPARQL queries over RDF databases. We note that many of the databases can be converted to tabular form. We first convert the tabular databases to RDF. The search engine provides a graphical interface based on facets to explore the converted databases. The facet interface is more advanced than conventional facets. It allows complex queries to be constructed, and have additional features like ranking of facet values based on several criteria, visually indicating the relevance of a facet value and presenting the most important facet values when a large number of choices are available. For the advanced users, SPARQL queries can be run directly on the databases. Using this feature, users will be able to incorporate federated searches of SPARQL endpoints. We used the search engine to do an exploratory search on previously published viral integration data and were able to deduce the main conclusions of the original publication. BioCarian is accessible via http://www.biocarian.com . CONCLUSIONS: We have developed a search engine to explore RDF databases that can be used by both novice and advanced users.


Assuntos
Bases de Dados Factuais , Ferramenta de Busca , Internet , Software
12.
Biodivers Data J ; 12: e120304, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38912110

RESUMO

Background: Numerous taxonomic studies have focused on the dung beetle genus Helictopleurus d'Orbigny, 1915, endemic to Madagascar. However, this genus stilll needs a thorough revision. Semantic technologies, such as nanopublications, hold the potential to enhance taxonomy by transforming how data are published and analysed. This paper evaluates the effectiveness of nanopublications in establishing synonyms within the genus Helictopleurus. New information: In this study, we identify four new synonyms within Helictopleurus: H.rudicollis (Fairmaire, 1898) = H.hypocrita Balthasar, 1941 syn. nov.; H.vadoni Lebis, 1960 = H.perpunctatus Balthasar, 1963 syn. nov.; H.halffteri Balthasar, 1964 = H.dorbignyi Montreuil, 2005 syn. nov.; H.clouei (Harold, 1869) = H.gibbicollis (Fairmaire, 1895) syn. nov. Helictopleurus may have a significantly larger number of synonyms than currently known, indicating potentially inaccurate estimates about its recent extinction.We also publish the newly-established synonyms as nanopublications, which are machine-readable data snippets accessible online. Additionally, we explore the utility of nanopublications in taxonomy and demonstrate their practical use with an example query for data extraction.

13.
Heliyon ; 10(7): e29046, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38623249

RESUMO

This article is dedicated to the development of a model for competencies within an educational program and its implementation through the use of semantic technologies. The model proposed by the authors is distinctive in that competencies are organized into a hierarchical data structure with arbitrary levels of nesting. Furthermore, the article presents an original solution for modelling the input requirements for studying a course, which is defined in the form of dependencies between the competencies generated by the course and the competencies of other courses. The outcome of this work is an ontological model of a competency-based curriculum, for which the authors have developed and implemented algorithms for data addition and retrieval, as well as for analyzing the consistency of the curriculum in terms of the input requirements for studying a discipline and the learning outcomes from previous periods. The findings presented in the article will prove to be valuable in the development of educational process management information systems and educational program constructors. They will also be instrumental in aligning diverse educational programs within the context of academic mobility.

14.
PeerJ Comput Sci ; 10: e2133, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39145249

RESUMO

Monitoring the data sources for possible changes is an important consumption requirement for applications running in interaction with the Web of Data. In this article, MonARCh which is an architecture for monitoring the result changes of registered SPARQL queries in the Linked Data environment, is proposed. MonARCh can be comprehended as a publish/subscribe system in the general sense. However, it differs in how communication with the data sources is realized. Data sources in the Linked Data environment do not publish the changes in the data. MonARCh provides the necessary communication infrastructure between the data sources and the consumers for the notification of changes. Users subscribe SPARQL queries to the system which are then converted to federated queries. MonARCh periodically checks for updates by re-executing SERVICE clauses and notifying users in case of any result change. In addition, to provide scalability, MonARCh takes the advantage of concurrent computation of the actor model. The parallel join algorithm utilized speeds up query execution and result generation processes. The design science methodology is used during the design, implementation and evaluation of the architecture. When compared to the literature MonARCh meets all the sufficient requirements from the linked data monitoring and state of the art perspectives while having many outstanding features from both points of view. The evaluation results show that even while working under the limited two-node cluster setting MonARCh could reach from 300 to 25,000 query monitoring capacity according to the diverse query selectivities executed within our test bench.

15.
J Biomed Semantics ; 14(1): 7, 2023 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-37393296

RESUMO

The current rise of Open Science and Reproducibility in the Life Sciences requires the creation of rich, machine-actionable metadata in order to better share and reuse biological digital resources such as datasets, bioinformatics tools, training materials, etc. For this purpose, FAIR principles have been defined for both data and metadata and adopted by large communities, leading to the definition of specific metrics. However, automatic FAIRness assessment is still difficult because computational evaluations frequently require technical expertise and can be time-consuming. As a first step to address these issues, we propose FAIR-Checker, a web-based tool to assess the FAIRness of metadata presented by digital resources. FAIR-Checker offers two main facets: a "Check" module providing a thorough metadata evaluation and recommendations, and an "Inspect" module which assists users in improving metadata quality and therefore the FAIRness of their resource. FAIR-Checker leverages Semantic Web standards and technologies such as SPARQL queries and SHACL constraints to automatically assess FAIR metrics. Users are notified of missing, necessary, or recommended metadata for various resource categories. We evaluate FAIR-Checker in the context of improving the FAIRification of individual resources, through better metadata, as well as analyzing the FAIRness of more than 25 thousand bioinformatics software descriptions.


Assuntos
Disciplinas das Ciências Biológicas , Reconhecimento Automatizado de Padrão , Reprodutibilidade dos Testes , Web Semântica , Biologia Computacional
16.
J Cheminform ; 15(1): 61, 2023 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-37340506

RESUMO

Current biological and chemical research is increasingly dependent on the reusability of previously acquired data, which typically come from various sources. Consequently, there is a growing need for database systems and databases stored in them to be interoperable with each other. One of the possible solutions to address this issue is to use systems based on Semantic Web technologies, namely on the Resource Description Framework (RDF) to express data and on the SPARQL query language to retrieve the data. Many existing biological and chemical databases are stored in the form of a relational database (RDB). Converting a relational database into the RDF form and storing it in a native RDF database system may not be desirable in many cases. It may be necessary to preserve the original database form, and having two versions of the same data may not be convenient. A solution may be to use a system mapping the relational database to the RDF form. Such a system keeps data in their original relational form and translates incoming SPARQL queries to equivalent SQL queries, which are evaluated by a relational-database system. This review compares different RDB-to-RDF mapping systems with a primary focus on those that can be used free of charge. In addition, it compares different approaches to expressing RDB-to-RDF mappings. The review shows that these systems represent a viable method providing sufficient performance. Their real-life performance is demonstrated on data and queries coming from the neXtProt project.

17.
Procedia Comput Sci ; 197: 362-369, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35043070

RESUMO

Coronavirus disease is a worldwide pandemic. The need for accurate data and information become an important thing in this pandemic situation. In Indonesia, the government provides an official website for displaying COVID-19 spread statistics. However, the data provided does not follow the 5-star open data. As a result, the data is not reusable and integrated easily into another dataset and application. In this paper, we proposed an RDF vocabulary for presenting COVID-19 data in Indonesia. In addition, two queries are presented as an example for using our vocabulary and dataset as part of Linked Open data movement.

18.
Stud Health Technol Inform ; 290: 76-80, 2022 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-35672974

RESUMO

The heterogeneity of electronic health records model is a major problem: it is necessary to gather data from various models for clinical research, but also for clinical decision support. The Observational Medical Outcomes Partnership - Common Data Model (OMOP-CDM) has emerged as a standard model for structuring health records populated from various other sources. This model is proposed as a relational database schema. However, in the field of decision support, formal ontologies are commonly used. In this paper, we propose a translation of OMOP-CDM into an ontology, and we explore the utility of the semantic web for structuring EHR in a clinical decision support perspective, and the use of the SPARQL language for querying health records. The resulting ontology is available online.


Assuntos
Registros Eletrônicos de Saúde , Bases de Dados Factuais
19.
Appl In Vitro Toxicol ; 8(1): 2-13, 2022 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-35388368

RESUMO

Introduction: The AOP-Wiki is the main platform for the development and storage of adverse outcome pathways (AOPs). These AOPs describe mechanistic information about toxicodynamic processes and can be used to develop effective risk assessment strategies. However, it is challenging to automatically and systematically parse, filter, and use its contents. We explored solutions to better structure the AOP-Wiki content, and to link it with chemical and biological resources. Together, this allows more detailed exploration, which can be automated. Materials and Methods: We converted the complete AOP-Wiki content into resource description framework (RDF) triples. We used >20 ontologies for the semantic annotation of property-object relations, including the Chemical Information Ontology, Dublin Core, and the AOP Ontology. Results: The resulting RDF contains >122,000 triples describing 158 unique properties of >15,000 unique subjects. Furthermore, >3500 link-outs were added to 12 chemical databases, and >7500 link-outs to 4 gene and protein databases. The AOP-Wiki RDF has been made available at https://aopwiki.rdf.bigcat-bioinformatics.org. Discussion: SPARQL queries can be used to answer biological and toxicological questions, such as listing measurement methods for all Key Events leading to an Adverse Outcome of interest. The full power that the use of this new resource provides becomes apparent when combining the content with external databases using federated queries. Conclusion: Overall, the AOP-Wiki RDF allows new ways to explore the rapidly growing AOP knowledge and makes the integration of this database in automated workflows possible, making the AOP-Wiki more FAIR.

20.
J Biomed Semantics ; 13(1): 11, 2022 03 28.
Artigo em Inglês | MEDLINE | ID: mdl-35346379

RESUMO

BACKGROUND: In life sciences, there has been a long-standing effort of standardization and integration of reference datasets and databases. Despite these efforts, many studies data are provided using specific and non-standard formats. This hampers the capacity to reuse the studies data in other pipelines, the capacity to reuse the pipelines results in other studies, and the capacity to enrich the data with additional information. The Regulatory Circuits project is one of the largest efforts for integrating human cell genomics data to predict tissue-specific transcription factor-genes interaction networks. In spite of its success, it exhibits the usual shortcomings limiting its update, its reuse (as a whole or partially), and its extension with new data samples. To address these limitations, the resource has previously been integrated in an RDF triplestore so that TF-gene interaction networks could be generated with two SPARQL queries. However, this triplestore did not store the computed networks and did not integrate metadata about tissues and samples, therefore limiting the reuse of this dataset. In particular, it does not enable to reuse only a portion of Regulatory Circuits if a study focuses on a subset of the tissues, nor to combine the samples described in the datasets with samples from other studies. Overall, these limitations advocate for the design of a complete, flexible and reusable representation of the Regulatory Circuits dataset based on Semantic Web technologies. RESULTS: We provide a modular RDF representation of the Regulatory Circuits, called Linked Extended Regulatory Circuits (LERC). It consists in (i) descriptions of biological and experimental context mapped to the references databases, (ii) annotations about TF-gene interactions at the sample level for 808 samples, (iii) annotations about TF-gene interactions at the tissue level for 394 tissues, (iv) metadata connecting the knowledge graphs cited above. LERC is based on a modular organisation into 1,205 RDF named graphs for representing the biological data, the sample-specific and the tissue-specific networks, and the corresponding metadata. In total it contains 3,910,794,050 triples and is available as a SPARQL endpoint. CONCLUSION: The flexible and modular architecture of LERC supports biologically-relevant SPARQL queries. It allows an easy and fast querying of the resources related to the initial Regulatory Circuits datasets and facilitates its reuse in other studies. ASSOCIATED WEBSITE: https://regulatorycircuits-lod.genouest.org.


Assuntos
Disciplinas das Ciências Biológicas , Animais , Bases de Dados Factuais , Humanos , Estágios do Ciclo de Vida , Metadados
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa