RESUMO
BACKGROUND: With an overarching goal of increasing diversity and inclusion in biomedical sciences, the National Research Mentoring Network (NRMN) developed a web-based national mentoring platform (MyNRMN) that seeks to connect mentors and mentees to support the persistence of underrepresented minorities in the biomedical sciences. As of May 15, 2024, the MyNRMN platform, which provides mentoring, networking, and professional development tools, has facilitated more than 12,100 unique mentoring connections between faculty, students, and researchers in the biomedical domain. OBJECTIVE: This study aimed to examine the large-scale mentoring connections facilitated by our web-based platform between students (mentees) and faculty (mentors) across institutional and geographic boundaries. Using an innovative graph database, we analyzed diverse mentoring connections between mentors and mentees across demographic characteristics in the biomedical sciences. METHODS: Through the MyNRMN platform, we observed profile data and analyzed mentoring connections made between students and faculty across institutional boundaries by race, ethnicity, gender, institution type, and educational attainment between July 1, 2016, and May 31, 2021. RESULTS: In total, there were 15,024 connections with 2222 mentees and 1652 mentors across 1625 institutions contributing data. Female mentees participated in the highest number of connections (3996/6108, 65%), whereas female mentors participated in 58% (5206/8916) of the connections. Black mentees made up 38% (2297/6108) of the connections, whereas White mentors participated in 56% (5036/8916) of the connections. Mentees were predominately from institutions classified as Research 1 (R1; doctoral universities-very high research activity) and historically Black colleges and universities (556/2222, 25% and 307/2222, 14%, respectively), whereas 31% (504/1652) of mentors were from R1 institutions. CONCLUSIONS: To date, the utility of mentoring connections across institutions throughout the United States and how mentors and mentees are connected is unknown. This study examined these connections and the diversity of these connections using an extensive web-based mentoring network.
Assuntos
Tutoria , Mentores , Humanos , Tutoria/métodos , Mentores/estatística & dados numéricos , Feminino , Masculino , Pesquisa Biomédica/estatística & dados numéricos , Estados Unidos , Grupos Minoritários/estatística & dados numéricos , Bases de Dados Factuais , Docentes/estatística & dados numéricosRESUMO
BACKGROUND: Graph databases enable efficient storage of heterogeneous, highly-interlinked data, such as clinical data. Subsequently, researchers can extract relevant features from these datasets and apply machine learning for diagnosis, biomarker discovery, or understanding pathogenesis. METHODS: To facilitate machine learning and save time for extracting data from the graph database, we developed and optimized Decision Tree Plug-in (DTP) containing 24 procedures to generate and evaluate decision trees directly in the graph database Neo4j on homogeneous and unconnected nodes. RESULTS: Creation of the decision tree for three clinical datasets directly in the graph database from the nodes required between 0.059 and 0.099 s, while calculating the decision tree with the same algorithm in Java from CSV files took 0.085-0.112 s. Furthermore, our approach was faster than the standard decision tree implementations in R (0.62 s) and equal to Python (0.08 s), also using CSV files as input for small datasets. In addition, we have explored the strengths of DTP by evaluating a large dataset (approx. 250,000 instances) to predict patients with diabetes and compared the performance against algorithms generated by state-of-the-art packages in R and Python. By doing so, we have been able to show competitive results on the performance of Neo4j, in terms of quality of predictions as well as time efficiency. Furthermore, we could show that high body-mass index and high blood pressure are the main risk factors for diabetes. CONCLUSION: Overall, our work shows that integrating machine learning into graph databases saves time for additional processes as well as external memory, and could be applied to a variety of use cases, including clinical applications. This provides user with the advantages of high scalability, visualization and complex querying.
Assuntos
Algoritmos , Pesquisa Biomédica , Humanos , Índice de Massa Corporal , Bases de Dados Factuais , Árvores de DecisõesRESUMO
BACKGROUND: Medical databases normally contain large amounts of data in a variety of forms. Although they grant significant insights into diagnosis and treatment, implementing data exploration into current medical databases is challenging since these are often based on a relational schema and cannot be used to easily extract information for cohort analysis and visualization. As a consequence, valuable information regarding cohort distribution or patient similarity may be missed. With the rapid advancement of biomedical technologies, new forms of data from methods such as Next Generation Sequencing (NGS) or chromosome microarray (array CGH) are constantly being generated; hence it can be expected that the amount and complexity of medical data will rise and bring relational database systems to a limit. DESCRIPTION: We present Graph4Med, a web application that relies on a graph database obtained by transforming a relational database. Graph4Med provides a straightforward visualization and analysis of a selected patient cohort. Our use case is a database of pediatric Acute Lymphoblastic Leukemia (ALL). Along routine patients' health records it also contains results of latest technologies such as NGS data. We developed a suitable graph data schema to convert the relational data into a graph data structure and store it in Neo4j. We used NeoDash to build a dashboard for querying and displaying patients' cohort analysis. This way our tool (1) quickly displays the overview of patients' cohort information such as distributions of gender, age, mutations (fusions), diagnosis; (2) provides mutation (fusion) based similarity search and display in a maneuverable graph; (3) generates an interactive graph of any selected patient and facilitates the identification of interesting patterns among patients. CONCLUSION: We demonstrate the feasibility and advantages of a graph database for storing and querying medical databases. Our dashboard allows a fast and interactive analysis and visualization of complex medical data. It is especially useful for patients similarity search based on mutations (fusions), of which vast amounts of data have been generated by NGS in recent years. It can discover relationships and patterns in patients cohorts that are normally hard to grasp. Expanding Graph4Med to more medical databases will bring novel insights into diagnostic and research.
Assuntos
Software , Criança , Humanos , Bases de Dados FactuaisRESUMO
BACKGROUND: Many systems biology studies leverage the integration of multiple data types (across different data sources) to offer a more comprehensive view of the biological system being studied. While SQL (Structured Query Language) databases are popular in the biomedical domain, NoSQL database technologies have been used as a more relationship-based, flexible and scalable method of data integration. RESULTS: We have created a graph database integrating data from multiple sources. In addition to using a graph-based query language (Cypher) for data retrieval, we have developed a web-based dashboard that allows users to easily browse and plot data without the need to learn Cypher. We have also implemented a visual graph query interface for users to browse graph data. Finally, we have built a prototype to allow the user to query the graph database in natural language. CONCLUSION: We have demonstrated the feasibility and flexibility of using a graph database for storing and querying immunological data with complex biological relationships. Querying a graph database through such relationships has the potential to discover novel relationships among heterogeneous biological data and metadata.
Assuntos
Armazenamento e Recuperação da Informação , Web Semântica , Bases de Dados Factuais , Idioma , Biologia de SistemasRESUMO
BACKGROUND: SNOMED CT Expression Constraint Language (ECL) is a declarative language developed by SNOMED International for the definition of SNOMED CT Expression Constraints (ECs). ECs are executable expressions that define intensional subsets of clinical meanings by stating constraints over the logic definition of concepts. The execution of an EC on some SNOMED CT substrate yields the intended subset, and it requires an execution engine able to receive an EC as input, execute it, and return the matching concepts. An important issue regarding subsets of clinical concepts is their use in terminology binding between clinical information models and terminologies for defining the set of valid values of codified data. OBJECTIVE: To define and implement methods for the simplification, semantic validation and execution of ECs over a graph-oriented SNOMED CT database, and to provide a method for the visual representation of subsets in order to explore, understand and validate its content, as well as to develop an EC execution platform, called SNQuery, which makes use of these methods. METHODS: Since SNOMED CT is a directed and acyclic graph, we have used a graph-oriented database to represent the content of SNOMED CT, where the schema and instances are represented as graphs and the data manipulation is expressed by graph-oriented operations. For the execution of ECs over the graph database, it is performed a translation process in which ECs are translated into a set of Cypher Query Language queries. We have defined some EC simplification methods that leverage the logic structure underlying SNOMED CT. The purpose of these methods is to reduce the complexity of ECs and, in turn, its execution time, as well as to validate them from a SNOMED CT Concept Model and logical definition points of view. We also have developed a graphic representation based on the circle packing geometrical concept, which allows validating subsets, as well as pre-defined refsets and the terminology itself. RESULTS: We have developed SNQuery, a platform for the definition of intensional subsets of SNOMED CT concepts by means of the execution of ECs over a graph-oriented SNOMED CT database. Additionally, we have incorporated methods for the simplification and semantic validation of ECs, as well as for the visualization of subsets as a mechanism to understand and validate them. SNQuery has been evaluated in terms of EC execution times. CONCLUSION: In this paper, we provide methods to simplify, semantically validate and execute ECs over a graph-oriented database. We also offer a method to visualize the intensional subsets obtained by executing ECs to explore, understand and validate them, as well as refsets and the terminology itself. The definition of intensional subsets is useful to bind content between clinical information models and clinical terminologies, which is a necessary step to achieve semantic interoperability between EHR systems.
Assuntos
Semântica , Systematized Nomenclature of Medicine , Bases de Dados Factuais , TraduçãoRESUMO
We propose a Disease-Symptom graph database for our mobile-assisted e-healthcare application. A large Disease-Symptom graph is stored in the cloud and accessed using mobile devices over the Internet. Query and search are the fundamental operations of graph databases. However, while searching the Disease-Symptom graph for making preliminary diagnosis of diseases, queries become complex due to the complex structure of data and also queries are too hard to write and interpret. Moreover, it is not possible to access the graph frequently due to limited bandwidth of the network, transmission delay, and higher cost. Subgraph generation or pruning algorithm for appropriate inputs is one of the solutions to this problem. In this paper, we propose an efficient pruning algorithm by introducing a new approach to decompose the Disease-Symptom graph into a series of symptom trees (ST). All the Symptom trees are merged to build a pruned subgraph which is our requirement. We demonstrate the efficiency and effectiveness of our pruning algorithm both analytically and empirically and validate on Disease-Symptom graph database, as well as other real graph databases. Also a comparison is done with an efficient existing reachability based Chain Cover algorithm after modifying it ChainCoverPrune as pruning algorithm. These two algorithms are tested for storage and access parametric measures for querying the synthetic and real directed databases to show the efficiency of the proposed algorithm.
Assuntos
Algoritmos , Telemedicina , Conjuntos de Dados como AssuntoRESUMO
BACKGROUND: While doctors should analyze a large amount of electronic medical record (EMR) data to conduct clinical research, the analyzing process requires information technology (IT) skills, which is difficult for most doctors in China. METHODS: In this paper, we build a novel tool QAnalysis, where doctors enter their analytic requirements in their natural language and then the tool returns charts and tables to the doctors. For a given question from a user, we first segment the sentence, and then we use grammar parser to analyze the structure of the sentence. After linking the segmentations to concepts and predicates in knowledge graphs, we convert the question into a set of triples connected with different kinds of operators. These triples are converted to queries in Cypher, the query language for Neo4j. Finally, the query is executed on Neo4j, and the results shown in terms of tables and charts are returned to the user. RESULTS: The tool supports top 50 questions we gathered from two hospital departments with the Delphi method. We also gathered 161 questions from clinical research papers with statistical requirements on EMR data. Experimental results show that our tool can directly cover 78.20% of these statistical questions and the precision is as high as 96.36%. Such extension is easy to achieve with the help of knowledge-graph technology we have adopted. The recorded demo can be accessed from https://github.com/NLP-BigDataLab/QAnalysis-project . CONCLUSION: Our tool shows great flexibility in processing different kinds of statistic questions, which provides a convenient way for doctors to get statistical results directly in natural language.
Assuntos
Pesquisa Biomédica , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , China , Humanos , Reconhecimento Automatizado de Padrão , SoftwareRESUMO
IT Landscape models are representing the real-world IT infrastructure of a company. They include hardware assets such as physical servers and storage media, as well as virtual components like clusters, virtual machines and applications. These models are a critical source of information in numerous tasks, including planning, error detection and impact analysis. The responsible stakeholders often struggle to keep such a large and densely connected model up-to-date due to its inherent size and complexity, as well as due to the lack of proper tool support. Even though modeling techniques are very suitable for this domain, existing tools do not offer the required features, scalability or flexibility. In order to solve these challenges and meet the requirements that arise from this application domain, we combine domain-driven modeling concepts with scalable graph-based repository technology and a custom language for model-level queries. We analyze in detail how we synthesized these requirements from the application domain and how they relate to the features of our repository. We discuss the architecture of our solution which comprises the entire data management stack, including transactions, queries, versioned persistence and metamodel evolution. Finally, we evaluate our approach in a case study where our open-source repository implementation is employed in a production environment in an industrial context, as well as in a comparative benchmark with an existing state-of-the-art solution.
RESUMO
BACKGROUND: Rapid generation of omics data in recent years have resulted in vast amounts of disconnected datasets without systemic integration and knowledge building, while individual groups have made customized, annotated datasets available on the web with few ways to link them to in-lab datasets. With so many research groups generating their own data, the ability to relate it to the larger genomic and comparative genomic context is becoming increasingly crucial to make full use of the data. RESULTS: The Omics Database Generator (ODG) allows users to create customized databases that utilize published genomics data integrated with experimental data which can be queried using a flexible graph database. When provided with omics and experimental data, ODG will create a comparative, multi-dimensional graph database. ODG can import definitions and annotations from other sources such as InterProScan, the Gene Ontology, ENZYME, UniPathway, and others. This annotation data can be especially useful for studying new or understudied species for which transcripts have only been predicted, and rapidly give additional layers of annotation to predicted genes. In better studied species, ODG can perform syntenic annotation translations or rapidly identify characteristics of a set of genes or nucleotide locations, such as hits from an association study. ODG provides a web-based user-interface for configuring the data import and for querying the database. Queries can also be run from the command-line and the database can be queried directly through programming language hooks available for most languages. ODG supports most common genomic formats as well as generic, easy to use tab-separated value format for user-provided annotations. CONCLUSIONS: ODG is a user-friendly database generation and query tool that adapts to the supplied data to produce a comparative genomic database or multi-layered annotation database. ODG provides rapid comparative genomic annotation and is therefore particularly useful for non-model or understudied species. For species for which more data are available, ODG can be used to conduct complex multi-omics, pattern-matching queries.
Assuntos
Bases de Dados de Ácidos Nucleicos , Genômica , Software , Anotação de Sequência MolecularRESUMO
BACKGROUND: When modeling in Systems Biology and Systems Medicine, the data is often extensive, complex and heterogeneous. Graphs are a natural way of representing biological networks. Graph databases enable efficient storage and processing of the encoded biological relationships. They furthermore support queries on the structure of biological networks. RESULTS: We present the Java-based framework STON (SBGN TO Neo4j). STON imports and translates metabolic, signalling and gene regulatory pathways represented in the Systems Biology Graphical Notation into a graph-oriented format compatible with the Neo4j graph database. CONCLUSION: STON exploits the power of graph databases to store and query complex biological pathways. This advances the possibility of: i) identifying subnetworks in a given pathway; ii) linking networks across different levels of granularity to address difficulties related to incomplete knowledge representation at single level; and iii) identifying common patterns between pathways in the database.
Assuntos
Redes Reguladoras de Genes , Redes e Vias Metabólicas , Transdução de Sinais , Software , Biologia de Sistemas/métodos , Bases de Dados Factuais , HumanosRESUMO
The biological networks controlling plant signal transduction, metabolism and gene regulation are composed of not only tens of thousands of genes, compounds, proteins and RNAs but also the complicated interactions and co-ordination among them. These networks play critical roles in many fundamental mechanisms, such as plant growth, development and environmental response. Although much is known about these complex interactions, the knowledge and data are currently scattered throughout the published literature, publicly available high-throughput data sets and third-party databases. Many 'unknown' yet important interactions among genes need to be mined and established through extensive computational analysis. However, exploring these complex biological interactions at the network level from existing heterogeneous resources remains challenging and time-consuming for biologists. Here, we introduce HRGRN, a graph search-empowered integrative database of Arabidopsis signal transduction, metabolism and gene regulatory networks. HRGRN utilizes Neo4j, which is a highly scalable graph database management system, to host large-scale biological interactions among genes, proteins, compounds and small RNAs that were either validated experimentally or predicted computationally. The associated biological pathway information was also specially marked for the interactions that are involved in the pathway to facilitate the investigation of cross-talk between pathways. Furthermore, HRGRN integrates a series of graph path search algorithms to discover novel relationships among genes, compounds, RNAs and even pathways from heterogeneous biological interaction data that could be missed by traditional SQL database search methods. Users can also build subnetworks based on known interactions. The outcomes are visualized with rich text, figures and interactive network graphs on web pages. The HRGRN database is freely available at http://plantgrn.noble.org/hrgrn/.
Assuntos
Arabidopsis/genética , Bases de Dados Genéticas , Redes Reguladoras de Genes , Transdução de Sinais , Algoritmos , Arabidopsis/fisiologia , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Internet , SoftwareRESUMO
Inconsistent disease coding standards in medicine create hurdles in data exchange and analysis. This paper proposes a machine learning system to address this challenge. The system automatically matches unstructured medical text (doctor notes, complaints) to ICD-10 codes. It leverages a unique architecture featuring a training layer for model development and a knowledge base that captures relationships between symptoms and diseases. Experiments using data from a large medical research center demonstrated the system's effectiveness in disease classification prediction. Logistic regression emerged as the optimal model due to its superior processing speed, achieving an accuracy of 81.07% with acceptable error rates during high-load testing. This approach offers a promising solution to improve healthcare informatics by overcoming coding standard incompatibility and automating code prediction from unstructured medical text.
Assuntos
Registros Eletrônicos de Saúde , Classificação Internacional de Doenças , Aprendizado de Máquina , Processamento de Linguagem Natural , Humanos , Codificação ClínicaRESUMO
In the Islamic domain, Hadiths hold significant importance, standing as crucial texts following the Holy Quran. Each Hadith contains three main parts: the ISNAD (chain of narrators), TARAF (starting part, often from Prophet Muhammad), and MATN (Hadith content). ISNAD, a chain of narrators involved in transmitting that particular MATN. Hadith scholars determine the trustworthiness of the transmitted MATN by the quality of the ISNAD. The ISNAD's data is available in its original Arabic language, with narrator names transliterated into English. This paper presents the Multi-IsnadSet (MIS), that has great potential to be employed by the social scientist and theologist. A multi-directed graph structure is used to represents the complex interactions among the narrators of Hadith. The MIS dataset represent directed graph which consists of 2092 nodes, representing individual narrators, and 77,797 edges represent the Sanad-Hadith connections. The MIS dataset represents multiple ISNAD of the Hadith based on the Sahih Muslim Hadith book. The dataset was carefully extracted from online multiple Hadith sources using data scraping and web crawling techniques tools, providing extensive Hadith details. Each dataset entry provides a complete view of a specific Hadith, including the original book, Hadith number, textual content (MATN), list of narrators, narrator count, sequence of narrators, and ISNAD count. In this paper, four different tools were designed and constructed for modeling and analyzing narrative network such as python library (NetworkX), powerful graph database Neo4j and two different network analysis tools named Gephi and CytoScape. The Neo4j graph database is used to represent the multi-dimensional graph related data for the ease of extraction and establishing new relationships among nodes. Researchers can use MIS to explore Hadith credibility including classification of Hadiths (Sahih=perfection in the Sanad/Dhaif=imperfection in the Sanad), and narrators (trustworthy/not). Traditionally, scholars have focused on identifying the longest and shortest Sanad between two Narrators, but in MIS, the emphasis shifts to determining the optimum/authentic Sanad, considering narrator qualities. The graph representation of the authentic and manually curated dataset will open ways for the development of computational models that could identify the significance of a chain and a narrator. The dataset allows the researchers to provide Hadith narrators and Hadith ISNAD that could be used in a wide variety of future research studies related to Hadith authentication and rules extraction. Moreover, the dataset encourages cross-disciplinary research, bridging the gap between Islamic studies, artificial intelligence (AI), social network analysis (SNA), and Graph Neural Network (GNN).
RESUMO
PURPOSE: In this work, we present a subsystem of a robotic circulating nurse, that produces recommendations for the next supplied sterile item based on incomplete requests from the sterile OR staff, the current situation, predefined knowledge and experience from previous surgeries. We describe a structure to store and query the underlying information in terms of entities and their relationships of varying strength. METHODS: For the implementation, the graph database Neo4j is used as a core component together with its querying language Cypher. We outline a specific structure of nodes and relationships, i.e., a graph. Primarily, it allows to represent entities like surgeons, surgery types and items, as well as their complex interconnectivity. In addition, it enables to match given situations and partial requests in the OR with corresponding subgraphs. The subgraphs provide suitable sterile items and allow to prioritize them according to their utilization frequency. RESULTS: The graph database was populated with existing data from 854 surgeries describing the intraoperative use of sterile items. A test scenario is evaluated in which a request for "Prolene" is made during a cholecystectomy. The software identifies a specific "Prolene" suture material as the most probable requested sterile item, because of its utilization frequency from over 95%. Other "Prolene" suture materials were used in less than 15% of the cholecystectomies. CONCLUSION: We have proposed a graph database for the selection of sterile items in the operating room. The example shows how the partial information from different sources can be easily integrated in a query, leading to an unique result. Eventually, we propose possible enhancements to further improve the quality of the recommendations. In the next step, the recommendations of the software will be evaluated in real time during surgeries.
Assuntos
Software , Humanos , Bases de Dados FactuaisRESUMO
The German Medical Informatics Initiative (MII) aims to increase the interoperability and reuse of clinical routine data for research purposes. One important result of the MII work is a German-wide common core data set (CDS), which is to be provided by over 31 data integration centers (DIZ) following a strict specification. One standard format for data sharing is HL7/FHIR. Locally, classical data warehouses are often in use for data storage and retrieval. We are interested to investigate the advantages of a graph database in this setting. After having transferred the MII CDS into a graph, storing it in a graph database and subsequently enriching it with accompanying meta-information, we see a great potential for more sophisticated data exploration and analysis. Here we describe the extract-transform-load process which we set up as a proof of concept to achieve the transformation and to make the common set of core data accessible as a graph.
Assuntos
Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação , Disseminação de Informação , Data Warehousing , Bases de Dados Factuais , Nível Sete de SaúdeRESUMO
Navigating through a real-world map can be represented in a bi-directed graph with a group of nodes representing the intersections and edges representing the roads between them. In cycling, we can plan training as a group of nodes and edges the athlete must cover. Optimizing routes using artificial intelligence is a well-studied phenomenon. Much work has been done on finding the quickest and shortest paths between two points. In cycling, the solution is not necessarily the shortest and quickest path. However, the optimum path is the one where a cyclist covers the suitable distance, ascent, and descent based on his/her training parameters. This paper presents a Neo4j graph-based dataset of cycling routes in Slovenia. It consists of 152,659 nodes representing individual road intersections and 410,922 edges representing the roads between them. The dataset allows the researchers to develop and optimize cycling training generation algorithms, where distance, ascent, descent, and road type are considered.
RESUMO
The knowledge graph is one of the essential infrastructures of artificial intelligence. It is a challenge for knowledge engineering to construct a high-quality domain knowledge graph for multi-source heterogeneous data. We propose a complete process framework for constructing a knowledge graph that combines structured data and unstructured data, which includes data processing, information extraction, knowledge fusion, data storage, and update strategies, aiming to improve the quality of the knowledge graph and extend its life cycle. Specifically, we take the construction process of an enterprise knowledge graph as an example and integrate enterprise register information, litigation-related information, and enterprise announcement information to enrich the enterprise knowledge graph. For the unstructured text, we improve existing model to extract triples and the F1-score of our model reached 72.77%. The number of nodes and edges in our constructed enterprise knowledge graph reaches 1,430,000 and 3,170,000, respectively. Furthermore, for each type of multi-source heterogeneous data, we apply corresponding methods and strategies for information extraction and data storage and carry out a detailed comparative analysis of graph databases. From the perspective of practical use, the informative enterprise knowledge graph and its timely update can serve many actual business needs. Our proposed enterprise knowledge graph has been deployed in HuaRong RongTong (Beijing) Technology Co., Ltd. and is used by the staff as a powerful tool for corporate due diligence. The key features are reported and analyzed in the case study. Overall, this paper provides an easy-to-follow solution and practice for domain knowledge graph construction, as well as demonstrating its application in corporate due diligence.
RESUMO
Semantic interoperability establishes intercommunications and enables data sharing across disparate systems. In this study, we propose an ostensive information architecture for healthcare information systems to decrease ambiguity caused by using signs in different contexts for different purposes. The ostensive information architecture adopts a consensus-based approach initiated from the perspective of information systems re-design and can be applied to other domains where information exchange is required between heterogeneous systems. Driven by the issues in FHIR (Fast Health Interoperability Resources) implementation, an ostensive approach that supplements the current lexical approach in semantic exchange is proposed. A Semantic Engine with an FHIR knowledge graph as the core is constructed using Neo4j to provide semantic interpretation and examples. The MIMIC III (Medical Information Mart for Intensive Care) datasets and diabetes datasets have been employed to demonstrate the effectiveness of the proposed information architecture. We further discuss the benefits of the separation of semantic interpretation and data storage from the perspective of information system design, and the semantic reasoning towards patient-centric care underpinned by the Semantic Engine.
RESUMO
While the continuing decline in genotyping and sequencing costs has largely benefited plant research, some key species for meeting the challenges of agriculture remain mostly understudied. As a result, heterogeneous datasets for different traits are available for a significant number of these species. As gene structures and functions are to some extent conserved through evolution, comparative genomics can be used to transfer available knowledge from one species to another. However, such a translational research approach is complex due to the multiplicity of data sources and the non-harmonized description of the data. Here, we provide two pipelines, referred to as structural and functional pipelines, to create a framework for a NoSQL graph-database (Neo4j) to integrate and query heterogeneous data from multiple species. We call this framework Orthology-driven knowledge base framework for translational research (Ortho_KB). The structural pipeline builds bridges across species based on orthology. The functional pipeline integrates biological information, including QTL, and RNA-sequencing datasets, and uses the backbone from the structural pipeline to connect orthologs in the database. Queries can be written using the Neo4j Cypher language and can, for instance, lead to identify genes controlling a common trait across species. To explore the possibilities offered by such a framework, we populated Ortho_KB to obtain OrthoLegKB, an instance dedicated to legumes. The proposed model was evaluated by studying the conservation of a flowering-promoting gene. Through a series of queries, we have demonstrated that our knowledge graph base provides an intuitive and powerful platform to support research and development programmes.
RESUMO
The current management of patients with multimorbidity is suboptimal, with either a single-disease approach to care or treatment guideline adaptations that result in poor adherence due to their complexity. Although this has resulted in calls for more holistic and personalized approaches to prescribing, progress toward these goals has remained slow. With the rapid advancement of machine learning (ML) methods, promising approaches now also exist to accelerate the advance of precision medicine in multimorbidity. These include analyzing disease comorbidity networks, using knowledge graphs that integrate knowledge from different medical domains, and applying network analysis and graph ML. Multimorbidity disease networks have been used to improve disease diagnosis, treatment recommendations, and patient prognosis. Knowledge graphs that combine different medical entities connected by multiple relationship types integrate data from different sources, allowing for complex interactions and creating a continuous flow of information. Network analysis and graph ML can then extract the topology and structure of networks and reveal hidden properties, including disease phenotypes, network hubs, and pathways; predict drugs for repurposing; and determine safe and more holistic treatments. In this article, we describe the basic concepts of creating bipartite and unipartite disease and patient networks and review the use of knowledge graphs, graph algorithms, graph embedding methods, and graph ML within the context of multimorbidity. Specifically, we provide an overview of the application of graph theory for studying multimorbidity, the methods employed to extract knowledge from graphs, and examples of the application of disease networks for determining the structure and pathways of multimorbidity, identifying disease phenotypes, predicting health outcomes, and selecting safe and effective treatments. In today's modern data-hungry, ML-focused world, such network-based techniques are likely to be at the forefront of developing robust clinical decision support tools for safer and more holistic approaches to treating older patients with multimorbidity.