Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 91
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 23(3)2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35419584

RESUMEN

Gene Ontology (GO) is widely used in the biological domain. It is the most comprehensive ontology providing formal representation of gene functions (GO concepts) and relations between them. However, unintentional quality defects (e.g. missing or erroneous relations) in GO may exist due to the large size of GO concepts and complexity of GO structures. Such quality defects would impact the results of GO-based analyses and applications. In this work, we introduce a novel evidence-based lexical pattern approach for quality assurance of GO relations. We leverage two layers of evidence to suggest potentially missing relations in GO as follows. We first utilize related concept pairs (i.e. existing relations) in GO to extract relationship-specific lexical patterns, which serve as the first layer evidence to automatically suggest potentially missing relations between unrelated concept pairs. For each suggested missing relation, we further identify two other existing relations as the second layer of evidence that resemble the difference between the missing relation and the existing relation based on which the missing relation is suggested. Applied to the 15 December 2021 release of GO, this approach suggested a total of 866 potentially missing relations. Local domain experts evaluated the entire set of potentially missing relations, and identified 821 as missing relations and 45 indicate erroneous existing relations. We submitted these findings to the GO consortium for further validation and received encouraging feedback. These indicate that our evidence-based approach can be utilized to uncover missing relations and erroneous existing relations in GO.


Asunto(s)
Ontología de Genes
2.
BMC Med Inform Decis Mak ; 24(Suppl 3): 103, 2024 Apr 19.
Artículo en Inglés | MEDLINE | ID: mdl-38641585

RESUMEN

BACKGROUND: Alzheimer's Disease (AD) is a devastating disease that destroys memory and other cognitive functions. There has been an increasing research effort to prevent and treat AD. In the US, two major data sharing resources for AD research are the National Alzheimer's Coordinating Center (NACC) and the Alzheimer's Disease Neuroimaging Initiative (ADNI); Additionally, the National Institutes of Health (NIH) Common Data Elements (CDE) Repository has been developed to facilitate data sharing and improve the interoperability among data sets in various disease research areas. METHOD: To better understand how AD-related data elements in these resources are interoperable with each other, we leverage different representation models to map data elements from different resources: NACC to ADNI, NACC to NIH CDE, and ADNI to NIH CDE. We explore bag-of-words based and word embeddings based models (Word2Vec and BioWordVec) to perform the data element mappings in these resources. RESULTS: The data dictionaries downloaded on November 23, 2021 contain 1,195 data elements in NACC, 13,918 in ADNI, and 27,213 in NIH CDE Repository. Data element preprocessing reduced the numbers of NACC and ADNI data elements for mapping to 1,099 and 7,584 respectively. Manual evaluation of the mapping results showed that the bag-of-words based approach achieved the best precision, while the BioWordVec based approach attained the best recall. In total, the three approaches mapped 175 out of 1,099 (15.92%) NACC data elements to ADNI; 107 out of 1,099 (9.74%) NACC data elements to NIH CDE; and 171 out of 7,584 (2.25%) ADNI data elements to NIH CDE. CONCLUSIONS: The bag-of-words based and word embeddings based approaches showed promise in mapping AD-related data elements between different resources. Although the mapping approaches need further improvement, our result indicates that there is a critical need to standardize CDEs across these valuable AD research resources in order to maximize the discoveries regarding AD pathophysiology, diagnosis, and treatment that can be gleaned from them.


Asunto(s)
Enfermedad de Alzheimer , Estados Unidos/epidemiología , Humanos , Enfermedad de Alzheimer/diagnóstico por imagen , Enfermedad de Alzheimer/epidemiología , Elementos de Datos Comunes , Neuroimagen , National Institutes of Health (U.S.)
3.
BMC Med Inform Decis Mak ; 23(Suppl 1): 87, 2023 05 09.
Artículo en Inglés | MEDLINE | ID: mdl-37161566

RESUMEN

BACKGROUND: Biomedical ontologies are representations of biomedical knowledge that provide terms with precisely defined meanings. They play a vital role in facilitating biomedical research in a cross-disciplinary manner. Quality issues of biomedical ontologies will hinder their effective usage. One such quality issue is missing concepts. In this study, we introduce a logical definition-based approach to identify potential missing concepts in SNOMED CT. A unique contribution of our approach is that it is capable of obtaining both logical definitions and fully specified names for potential missing concepts. METHOD: The logical definitions of unrelated pairs of fully defined concepts in non-lattice subgraphs that indicate quality issues are intersected to generate the logical definitions of potential missing concepts. A text summarization model (called PEGASUS) is fine-tuned to predict the fully specified names of the potential missing concepts from their generated logical definitions. Furthermore, the identified potential missing concepts are validated using external resources including the Unified Medical Language System (UMLS), biomedical literature in PubMed, and a newer version of SNOMED CT. RESULTS: From the March 2021 US Edition of SNOMED CT, we obtained a total of 30,313 unique logical definitions for potential missing concepts through the intersecting process. We fine-tuned a PEGASUS summarization model with 289,169 training instances and tested it on 36,146 instances. The model achieved 72.83 of ROUGE-1, 51.06 of ROUGE-2, and 71.76 of ROUGE-L on the test dataset. The model correctly predicted 11,549 out of 36,146 fully specified names in the test dataset. Applying the fine-tuned model on the 30,313 unique logical definitions, 23,031 total potential missing concepts were identified. Out of these, a total of 2,312 (10.04%) were automatically validated by either of the three resources. CONCLUSIONS: The results showed that our logical definition-based approach for identification of potential missing concepts in SNOMED CT is encouraging. Nevertheless, there is still room for improving the performance of naming concepts based on logical definitions.


Asunto(s)
Ontologías Biológicas , Investigación Biomédica , Humanos , Systematized Nomenclature of Medicine , Conocimiento , Lenguaje
4.
BMC Med Inform Decis Mak ; 23(Suppl 1): 151, 2023 08 04.
Artículo en Inglés | MEDLINE | ID: mdl-37542312

RESUMEN

BACKGROUND: In the United States, the National Alzheimer's Coordinating Center (NACC) and the Alzheimer's Disease Neuroimaging Initiative (ADNI) are two major data sharing resources for Alzheimer's Disease (AD) research. NACC and ADNI strive to make their data more FAIR (findable, interoperable, accessible and reusable) for the broader research community. However, there is limited work harmonizing and supporting cross-cohort interoperability of the two resources. METHOD: In this paper, we leverage an ontology-based approach to harmonize data elements in the two resources and develop a web-based query system to search patient cohorts across the two resources. We first mapped data elements across NACC and ADNI, and performed value harmonization for the mapped data elements with inconsistent permissible values. Then we built an Alzheimer's Disease Data Element Ontology (ADEO) to model the mapped data elements in NACC and ADNI. We further developed a prototype cross-cohort query system to search patient cohorts across NACC and ADNI. RESULTS: After manual review, we found 172 mappings between NACC and ADNI. These 172 mappings were further used to construct common concepts in ADEO. Our data element mapping and harmonization resulted in five files storing common concepts, variables in NACC and ADNI, mappings between variables and common concepts, permissible values of categorical type data elements, and coding inconsistency harmonization, respectively. Our cross-cohort query system consists of three core architectural elements: a web-based interface, an advanced query engine, and a backend MongoDB database. CONCLUSIONS: In this work, ADEO has been specifically designed to facilitate data harmonization and cross-cohort query of NACC and ADNI data resources. Although our prototype cross-cohort query system was developed for exploring NACC and ADNI, its backend and frontend framework has been designed and implemented to be generally applicable to other domains for querying patient cohorts from multiple heterogeneous data sources.


Asunto(s)
Enfermedad de Alzheimer , Humanos , Estados Unidos , Enfermedad de Alzheimer/diagnóstico por imagen , Neuroimagen
5.
BMC Bioinformatics ; 23(Suppl 6): 281, 2022 Jul 14.
Artículo en Inglés | MEDLINE | ID: mdl-35836130

RESUMEN

BACKGROUND: Model card reports aim to provide informative and transparent description of machine learning models to stakeholders. This report document is of interest to the National Institutes of Health's Bridge2AI initiative to address the FAIR challenges with artificial intelligence-based machine learning models for biomedical research. We present our early undertaking in developing an ontology for capturing the conceptual-level information embedded in model card reports. RESULTS: Sourcing from existing ontologies and developing the core framework, we generated the Model Card Report Ontology. Our development efforts yielded an OWL2-based artifact that represents and formalizes model card report information. The current release of this ontology utilizes standard concepts and properties from OBO Foundry ontologies. Also, the software reasoner indicated no logical inconsistencies with the ontology. With sample model cards of machine learning models for bioinformatics research (HIV social networks and adverse outcome prediction for stent implantation), we showed the coverage and usefulness of our model in transforming static model card reports to a computable format for machine-based processing. CONCLUSIONS: The benefit of our work is that it utilizes expansive and standard terminologies and scientific rigor promoted by biomedical ontologists, as well as, generating an avenue to make model cards machine-readable using semantic web technology. Our future goal is to assess the veracity of our model and later expand the model to include additional concepts to address terminological gaps. We discuss tools and software that will utilize our ontology for potential application services.


Asunto(s)
Ontologías Biológicas , Semántica , Inteligencia Artificial , Biología Computacional , Aprendizaje Automático , Programas Informáticos
6.
J Biomed Inform ; 134: 104162, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-36029954

RESUMEN

The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) provides a unified model to integrate disparate real-world data (RWD) sources. An integral part of the OMOP CDM is the Standardized Vocabularies (henceforth referred to as the OMOP vocabulary), which enables organization and standardization of medical concepts across various clinical domains of the OMOP CDM. For concepts with the same meaning from different source vocabularies, one is designated as the standard concept, while the others are specified as non-standard or source concepts and mapped to the standard one. However, due to the heterogeneity of source vocabularies, there may exist mapping issues such as erroneous mappings and missing mappings in the OMOP vocabulary, which could affect the results of downstream analyses with RWD. In this paper, we focus on quality assurance of vaccine concept mappings in the OMOP vocabulary, which is necessary to accurately harness the power of RWD on vaccines. We introduce a semi-automated lexical approach to audit vaccine mappings in the OMOP vocabulary. We generated two types of vaccine-pairs: mapped and unmapped, where mapped vaccine-pairs are pairs of vaccine concepts with a "Maps to" relationship, while unmapped vaccine-pairs are those without a "Maps to" relationship. We represented each vaccine concept name as a set of words, and derived term-difference pairs (i.e., name differences) for mapped and unmapped vaccine-pairs. If the same term-difference pair can be obtained by both mapped and unmapped vaccine-pairs, then this is considered as a potential mapping inconsistency. Applying this approach to the vaccine mappings in OMOP, a total of 2087 potentially mapping inconsistencies were obtained. A randomly selected 200 samples were evaluated by domain experts to identify, validate, and categorize the inconsistencies. Experts identified 95 cases revealing valid mapping issues. The remaining 105 cases were found to be invalid due to the external and/or contextual information used in the mappings that were not reflected in the concept names of vaccines. This indicates that our semi-automated approach shows promise in identifying mapping inconsistencies among vaccine concepts in the OMOP vocabulary.


Asunto(s)
Vacunas , Vocabulario , Mejoramiento de la Calidad , Vocabulario Controlado
7.
Bioinformatics ; 36(10): 3207-3214, 2020 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-32065617

RESUMEN

MOTIVATION: The Gene Ontology (GO) is the unifying biological vocabulary for codifying, managing and sharing biological knowledge. Quality issues in GO, if not addressed, can cause misleading results or missed biological discoveries. Manual identification of potential quality issues in GO is a challenging and arduous task, given its growing size. We introduce an automated auditing approach for suggesting potentially missing is-a relations, which may further reveal erroneous is-a relations. RESULTS: We developed a Subsumption-based Sub-term Inference Framework (SSIF) by leveraging a novel term-algebra on top of a sequence-based representation of GO concepts along with three conditional rules (monotonicity, intersection and sub-concept rules). Applying SSIF to the October 3, 2018 release of GO suggested 1938 unique potentially missing is-a relations. Domain experts evaluated a random sample of 210 potentially missing is-a relations. The results showed SSIF achieved a precision of 60.61, 60.49 and 46.03% for the monotonicity, intersection and sub-concept rules, respectively. AVAILABILITY AND IMPLEMENTATION: SSIF is implemented in Java. The source code is available at https://github.com/rashmie/SSIF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Programas Informáticos , Ontología de Genes
8.
Epilepsia ; 62 Suppl 2: S106-S115, 2021 03.
Artículo en Inglés | MEDLINE | ID: mdl-33529363

RESUMEN

Big Data is no longer a novel concept in health care. Its promise of positive impact is not only undiminished, but daily enhanced by seemingly endless possibilities. Epilepsy is a disorder with wide heterogeneity in both clinical and research domains, and thus lends itself to Big Data concepts and techniques. It is therefore inevitable that Big Data will enable multimodal research, integrating various aspects of "-omics" domains, such as phenome, genome, microbiome, metabolome, and proteome. This scope and granularity have the potential to change our understanding of prognosis and mortality in epilepsy. The scale of new discovery is unprecedented due to the possibilities promised by advances in machine learning, in particular deep learning. The subsequent possibilities of personalized patient care through clinical decision support systems that are evidence-based, adaptive, and iterative seem to be within reach. A major objective is not only to inform decision-making, but also to reduce uncertainty in outcomes. Although the adoption of electronic health record (EHR) systems is near universal in the United States, for example, advanced clinical decision support in or ancillary to EHRs remains sporadic. In this review, we discuss the role of Big Data in the development of clinical decision support systems for epilepsy care, prognostication, and discovery.


Asunto(s)
Macrodatos , Sistemas de Apoyo a Decisiones Clínicas/tendencias , Epilepsia/diagnóstico , Epilepsia/terapia , Registros Electrónicos de Salud/tendencias , Humanos , Pronóstico
9.
J Med Internet Res ; 23(2): e22939, 2021 02 12.
Artículo en Inglés | MEDLINE | ID: mdl-33576745

RESUMEN

BACKGROUND: While electronic health records (EHR) bring various benefits to health care, EHR systems are often criticized as cumbersome to use, failing to fulfill the promise of improved health care delivery with little more than a means of meeting regulatory and billing requirements. EHR has also been recognized as one of the contributing factors for physician burnout. OBJECTIVE: Specialty-specific EHR systems have been suggested as an alternative approach that can potentially address challenges associated with general-purpose EHRs. We introduce the Epilepsy Tracking and optimized Management engine (EpiToMe), an exemplar bespoke EHR system for epilepsy care. EpiToMe uses an agile, physician-centered development strategy to optimize clinical workflow and patient care documentation. We present the design and implementation of EpiToMe and report the initial feedback on its utility for physician burnout. METHODS: Using collaborative, asynchronous data capturing interfaces anchored to a domain ontology, EpiToMe distributes reporting and documentation workload among technicians, clinical fellows, and attending physicians. Results of documentation are transmitted to the parent EHR to meet patient care requirements with a push of a button. An HL7 (version 2.3) messaging engine exchanges information between EpiToMe and the parent EHR to optimize clinical workflow tasks without redundant data entry. EpiToMe also provides live, interactive patient tracking interfaces to ease the burden of care management. RESULTS: Since February 2019, 15,417 electroencephalogram reports, 2635 Epilepsy Monitoring Unit daily reports, and 1369 Epilepsy Monitoring Unit phase reports have been completed in EpiToMe for 6593 unique patients. A 10-question survey was completed by 11 (among 16 invited) senior clinical attending physicians. Consensus was found that EpiToMe eased the burden of care documentation for patient management, a contributing factor to physician burnout. CONCLUSIONS: EpiToMe offers an exemplar bespoke EHR system developed using a physician-centered design and latest advancements in information technology. The bespoke approach has the potential to ease the burden of care management in epilepsy. This approach is applicable to other clinical specialties.


Asunto(s)
Registros Electrónicos de Salud/normas , Epilepsia/terapia , Humanos , Investigación Cualitativa , Encuestas y Cuestionarios
10.
BMC Med Inform Decis Mak ; 21(Suppl 7): 234, 2021 11 09.
Artículo en Inglés | MEDLINE | ID: mdl-34753458

RESUMEN

BACKGROUND: As biomedical knowledge is rapidly evolving, concept enrichment of biomedical terminologies is an active research area involving automatic identification of missing or new concepts. Previously, we prototyped a lexical-based formal concept analysis (FCA) approach in which concepts were derived by intersecting bags of words, to identify potentially missing concepts in the National Cancer Institute (NCI) Thesaurus. However, this prototype did not handle concept naming and positioning. In this paper, we introduce a sequenced-based FCA approach to identify potentially missing concepts, supporting concept naming and positioning. METHODS: We consider the concept name sequences as FCA attributes to construct the formal context. The concept-forming process is performed by computing the longest common substrings of concept name sequences. After new concepts are formalized, we further predict their potential positions in the original hierarchy by identifying their supertypes and subtypes from original concepts. Automated validation via external terminologies in the Unified Medical Language System (UMLS) and biomedical literature in PubMed is performed to evaluate the effectiveness of our approach. RESULTS: We applied our sequenced-based FCA approach to all the sub-hierarchies under Disease or Disorder in the NCI Thesaurus (19.08d version) and five sub-hierarchies under Clinical Finding and Procedure in the SNOMED CT (US Edition, March 2020 release). In total, 1397 potentially missing concepts were identified in the NCI Thesaurus and 7223 in the SNOMED CT. For NCI Thesaurus, 85 potentially missing concepts were found in external terminologies and 315 of the remaining 1312 appeared in biomedical literature. For SNOMED CT, 576 were found in external terminologies and 1159 out of the remaining 6647 were found in biomedical literature. CONCLUSION: Our sequence-based FCA approach has shown the promise for identifying potentially missing concepts in biomedical terminologies.


Asunto(s)
Systematized Nomenclature of Medicine , Unified Medical Language System , Humanos , PubMed , Vocabulario Controlado
11.
BMC Med Inform Decis Mak ; 20(Suppl 10): 301, 2020 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-33319696

RESUMEN

Biological and biomedical ontologies and terminologies are used to organize and store various domain-specific knowledge to provide standardization of terminology usage and to improve interoperability. The growing number of such ontologies and terminologies and their increasing adoption in clinical, research and healthcare settings call for effective and efficient quality assurance and semantic enrichment techniques of these ontologies and terminologies. In this editorial, we provide an introductory summary of nine articles included in this supplement issue for quality assurance and enrichment of biological and biomedical ontologies and terminologies. The articles cover a range of standards including SNOMED CT, National Cancer Institute Thesaurus, Unified Medical Language System, North American Association of Central Cancer Registries and OBO Foundry Ontologies.


Asunto(s)
Ontologías Biológicas , Humanos , Semántica , Systematized Nomenclature of Medicine , Unified Medical Language System , Vocabulario Controlado
12.
BMC Med Inform Decis Mak ; 20(Suppl 10): 273, 2020 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-33319703

RESUMEN

BACKGROUND: The National Cancer Institute (NCI) Thesaurus provides reference terminology for NCI and other systems. Previously, we proposed a hybrid prototype utilizing lexical features and role definitions of concepts in non-lattice subgraphs to identify missing IS-A relations in the NCI Thesaurus. However, no domain expert evaluation was provided in our previous work. In this paper, we further enhance the hybrid approach by leveraging a novel lexical feature-roots of noun chunks within concept names. Formal evaluation of our enhanced approach is also performed. METHOD: We first compute all the non-lattice subgraphs in the NCI Thesaurus. We model each concept using its role definitions, words and roots of noun chunks within its concept name and its ancestor's names. Then we perform subsumption testing for candidate concept pairs in the non-lattice subgraphs to automatically detect potentially missing IS-A relations. Domain experts evaluated the validity of these relations. RESULTS: We applied our approach to 19.08d version of the NCI Thesaurus. A total of 55 potentially missing IS-A relations were identified by our approach and reviewed by domain experts. 29 out of 55 were confirmed as valid by domain experts and have been incorporated in the newer versions of the NCI Thesaurus. 7 out of 55 further revealed incorrect existing IS-A relations in the NCI Thesaurus. CONCLUSIONS: The results showed that our hybrid approach by leveraging lexical features and role definitions is effective in identifying potentially missing IS-A relations in the NCI Thesaurus.


Asunto(s)
Vocabulario Controlado , Humanos , National Cancer Institute (U.S.) , Estados Unidos
13.
BMC Med Inform Decis Mak ; 20(Suppl 12): 330, 2020 12 24.
Artículo en Inglés | MEDLINE | ID: mdl-33357225

RESUMEN

BACKGROUND: Sudden death in epilepsy (SUDEP) is a rare disease in US, however, they account for 8-17% of deaths in people with epilepsy. This disease involves complicated physiological patterns and it is still not clear what are the physio-/bio-makers that can be used as an indicator to predict SUDEP so that care providers can intervene and treat patients in a timely manner. For this sake, UTHealth School of Biomedical Informatics (SBMI) organized a machine learning Hackathon to call for advanced solutions https://sbmi.uth.edu/hackathon/archive/sept19.htm . METHODS: In recent years, deep learning has become state of the art for many domains with large amounts data. Although healthcare has accumulated a lot of data, they are often not abundant enough for subpopulation studies where deep learning could be beneficial. Taking these limitations into account, we present a framework to apply deep learning to the detection of the onset of slow activity after a generalized tonic-clonic seizure, as well as other EEG signal detection problems exhibiting data paucity. RESULTS: We conducted ten training runs for our full method and seven model variants, statistically demonstrating the impact of each technique used in our framework with a high degree of confidence. CONCLUSIONS: Our findings point toward deep learning being a viable method for detection of the onset of slow activity provided approperiate regularization is performed.


Asunto(s)
Epilepsia , Convulsiones , Muerte Súbita , Electroencefalografía , Humanos , Convulsiones/diagnóstico , Convulsiones/terapia
14.
BMC Med Inform Decis Mak ; 20(Suppl 4): 259, 2020 12 14.
Artículo en Inglés | MEDLINE | ID: mdl-33317519

RESUMEN

BACKGROUND: Previously, we introduced our Patient Health Information Dialogue Ontology (PHIDO) that manages the dialogue and contextual information of the session between an agent and a health consumer. In this study, we take the next step and introduce the Conversational Ontology Operator (COO), the software engine harnessing PHIDO. We also developed a question-answering subsystem called Frankenstein Ontology Question-Answering for User-centric Systems (FOQUS) to support the dialogue interaction. METHODS: We tested both the dialogue engine and the question-answering system using application-based competency questions and questions furnished from our previous Wizard of OZ simulation trials. RESULTS: Our results revealed that the dialogue engine is able to perform the core tasks of communicating health information and conversational flow. Inter-rater agreement and accuracy scores among four reviewers indicated perceived, acceptable responses to the questions asked by participants from the simulation studies, yet the composition of the responses was deemed mediocre by our evaluators. CONCLUSIONS: Overall, we present some preliminary evidence of a functioning ontology-based system to manage dialogue and consumer questions. Future plans for this work will involve deploying this system in a speech-enabled agent to assess its usage with potential health consumer users.


Asunto(s)
Comunicación , Vacunas , Humanos , Atención Dirigida al Paciente , Programas Informáticos , Vacunación
15.
BMC Med Inform Decis Mak ; 20(Suppl 10): 271, 2020 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-33319710

RESUMEN

BACKGROUND: The Kentucky Cancer Registry (KCR) is a central cancer registry for the state of Kentucky that receives data about incident cancer cases from all healthcare facilities in the state within 6 months of diagnosis. Similar to all other U.S. and Canadian cancer registries, KCR uses a data dictionary provided by the North American Association of Central Cancer Registries (NAACCR) for standardized data entry. The NAACCR data dictionary is not an ontological system. Mapping between the NAACCR data dictionary and the National Cancer Institute (NCI) Thesaurus (NCIt) will facilitate the enrichment, dissemination and utilization of cancer registry data. We introduce a web-based system, called Interactive Mapping Interface (IMI), for creating mappings from data dictionaries to ontologies, in particular from NAACCR to NCIt. METHOD: IMI has been designed as a general approach with three components: (1) ontology library; (2) mapping interface; and (3) recommendation engine. The ontology library provides a list of ontologies as targets for building mappings. The mapping interface consists of six modules: project management, mapping dashboard, access control, logs and comments, hierarchical visualization, and result review and export. The built-in recommendation engine automatically identifies a list of candidate concepts to facilitate the mapping process. RESULTS: We report the architecture design and interface features of IMI. To validate our approach, we implemented an IMI prototype and pilot-tested features using the IMI interface to map a sample set of NAACCR data elements to NCIt concepts. 47 out of 301 NAACCR data elements have been mapped to NCIt concepts. Five branches of hierarchical tree have been identified from these mapped concepts for visual inspection. CONCLUSIONS: IMI provides an interactive, web-based interface for building mappings from data dictionaries to ontologies. Although our pilot-testing scope is limited, our results demonstrate feasibility using IMI for semantic enrichment of cancer registry data by mapping NAACCR data elements to NCIt concepts.


Asunto(s)
Ontologías Biológicas , Neoplasias , Canadá/epidemiología , Humanos , Internet , Neoplasias/diagnóstico , Neoplasias/epidemiología , Sistema de Registros , Vocabulario Controlado
16.
BMC Med Inform Decis Mak ; 20(Suppl 12): 328, 2020 12 24.
Artículo en Inglés | MEDLINE | ID: mdl-33357232

RESUMEN

Applying machine learning to healthcare sheds light on evidence-based decision making and has shown promises to improve healthcare by combining clinical knowledge and biomedical data. However, medicine and data science are not synchronized. Oftentimes, researchers with a strong data science background do not understand the clinical challenges, while on the other hand, physicians do not know the capacity and limitation of state-of-the-art machine learning methods. The difficulty boils down to the lack of a common interface between two highly intelligent communities due to the privacy concerns and the disciplinary gap. The School of Biomedical Informatics (SBMI) at UTHealth is a pilot in connecting both worlds to promote interdisciplinary research. Recently, the Center for Secure Artificial Intelligence For hEalthcare (SAFE) at SBMI is organizing a series of machine learning healthcare hackathons for real-world clinical challenges. We hosted our first Hackathon themed centered around Sudden Unexpected Death in Epilepsy and finding ways to recognize the warning signs. This community effort demonstrated that interdisciplinary discussion and productive competition has significantly increased the accuracy of warning sign detection compared to the previous work, and ultimately showing a potential of this hackathon as a platform to connect the two communities of data science and medicine.


Asunto(s)
Inteligencia Artificial , Epilepsia , Muerte Súbita , Electroencefalografía , Epilepsia/diagnóstico , Humanos , Aprendizaje Automático
17.
J Biomed Inform ; 80: 106-119, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29548711

RESUMEN

One of the basic challenges in developing structural methods for systematic audition on the quality of biomedical ontologies is the computational cost usually involved in exhaustive sub-graph analysis. We introduce ANT-LCA, a new algorithm for computing all non-trivial lowest common ancestors (LCA) of each pair of concepts in the hierarchical order induced by an ontology. The computation of LCA is a fundamental step for non-lattice approach for ontology quality assurance. Distinct from existing approaches, ANT-LCA only computes LCAs for non-trivial pairs, those having at least one common ancestor. To skip all trivial pairs that may be of no practical interest, ANT-LCA employs a simple but innovative algorithmic strategy combining topological order and dynamic programming to keep track of non-trivial pairs. We provide correctness proofs and demonstrate a substantial reduction in computational time for two largest biomedical ontologies: SNOMED CT and Gene Ontology (GO). ANT-LCA achieved an average computation time of 30 and 3 sec per version for SNOMED CT and GO, respectively, about 2 orders of magnitude faster than the best known approaches. Our algorithm overcomes a fundamental computational barrier in sub-graph based structural analysis of large ontological systems. It enables the implementation of a new breed of structural auditing methods that not only identifies potential problematic areas, but also automatically suggests changes to fix the issues. Such structural auditing methods can lead to more effective tools supporting ontology quality assurance work.


Asunto(s)
Algoritmos , Ontologías Biológicas , Minería de Datos/métodos , Informática Médica/métodos , Systematized Nomenclature of Medicine
18.
J Biomed Inform ; 78: 177-184, 2018 02.
Artículo en Inglés | MEDLINE | ID: mdl-29274386

RESUMEN

OBJECTIVE: We introduce a structural-lexical approach for auditing SNOMED CT using a combination of non-lattice subgraphs of the underlying hierarchical relations and enriched lexical attributes of fully specified concept names. Our goal is to develop a scalable and effective approach that automatically identifies missing hierarchical IS-A relations. METHODS: Our approach involves 3 stages. In stage 1, all non-lattice subgraphs of SNOMED CT's IS-A hierarchical relations are extracted. In stage 2, lexical attributes of fully-specified concept names in such non-lattice subgraphs are extracted. For each concept in a non-lattice subgraph, we enrich its set of attributes with attributes from its ancestor concepts within the non-lattice subgraph. In stage 3, subset inclusion relations between the lexical attribute sets of each pair of concepts in each non-lattice subgraph are compared to existing IS-A relations in SNOMED CT. For concept pairs within each non-lattice subgraph, if a subset relation is identified but an IS-A relation is not present in SNOMED CT IS-A transitive closure, then a missing IS-A relation is reported. The September 2017 release of SNOMED CT (US edition) was used in this investigation. RESULTS: A total of 14,380 non-lattice subgraphs were extracted, from which we suggested a total of 41,357 missing IS-A relations. For evaluation purposes, 200 non-lattice subgraphs were randomly selected from 996 smaller subgraphs (of size 4, 5, or 6) within the "Clinical Finding" and "Procedure" sub-hierarchies. Two domain experts confirmed 185 (among 223) suggested missing IS-A relations, a precision of 82.96%. CONCLUSIONS: Our results demonstrate that analyzing the lexical features of concepts in non-lattice subgraphs is an effective approach for auditing SNOMED CT.


Asunto(s)
Ontologías Biológicas , Minería de Datos/métodos , Garantía de la Calidad de Atención de Salud/normas , Systematized Nomenclature of Medicine , Algoritmos , Registros Electrónicos de Salud , Humanos , Auditoría Médica , Semántica
19.
BMC Med Inform Decis Mak ; 18(Suppl 2): 58, 2018 07 23.
Artículo en Inglés | MEDLINE | ID: mdl-30066656

RESUMEN

BACKGROUND: Association Rule Mining (ARM) has been widely used by biomedical researchers to perform exploratory data analysis and uncover potential relationships among variables in biomedical datasets. However, when biomedical datasets are high-dimensional, performing ARM on such datasets will yield a large number of rules, many of which may be uninteresting. Especially for imbalanced datasets, performing ARM directly would result in uninteresting rules that are dominated by certain variables that capture general characteristics. METHODS: We introduce a query-constraint-based ARM (QARM) approach for exploratory analysis of multiple, diverse clinical datasets in the National Sleep Research Resource (NSRR). QARM enables rule mining on a subset of data items satisfying a query constraint. We first perform a series of data-preprocessing steps including variable selection, merging semantically similar variables, combining multiple-visit data, and data transformation. We use Top-k Non-Redundant (TNR) ARM algorithm to generate association rules. Then we remove general and subsumed rules so that unique and non-redundant rules are resulted for a particular query constraint. RESULTS: Applying QARM on five datasets from NSRR obtained a total of 2517 association rules with a minimum confidence of 60% (using top 100 rules for each query constraint). The results show that merging similar variables could avoid uninteresting rules. Also, removing general and subsumed rules resulted in a more concise and interesting set of rules. CONCLUSIONS: QARM shows the potential to support exploratory analysis of large biomedical datasets. It is also shown as a useful method to reduce the number of uninteresting association rules generated from imbalanced datasets. A preliminary literature-based analysis showed that some association rules have supporting evidence from biomedical literature, while others without literature-based evidence may serve as the candidates for new hypotheses to explore and investigate. Together with literature-based evidence, the association rules mined over the NSRR clinical datasets may be used to support clinical decisions for sleep-related problems.


Asunto(s)
Minería de Datos/métodos , Investigación , Sueño , Algoritmos , Conjuntos de Datos como Asunto , Humanos , Semántica
20.
BMC Med Inform Decis Mak ; 18(1): 99, 2018 11 13.
Artículo en Inglés | MEDLINE | ID: mdl-30424756

RESUMEN

BACKGROUND: The National Sleep Research Resource (NSRR) is a large-scale, openly shared, data repository of de-identified, highly curated clinical sleep data from multiple NIH-funded epidemiological studies. Although many data repositories allow users to browse their content, few support fine-grained, cross-cohort query and exploration at study-subject level. We introduce a cross-cohort query and exploration system, called X-search, to enable researchers to query patient cohort counts across a growing number of completed, NIH-funded studies in NSRR and explore the feasibility or likelihood of reusing the data for research studies. METHODS: X-search has been designed as a general framework with two loosely-coupled components: semantically annotated data repository and cross-cohort exploration engine. The semantically annotated data repository is comprised of a canonical data dictionary, data sources with a data dictionary, and mappings between each individual data dictionary and the canonical data dictionary. The cross-cohort exploration engine consists of five modules: query builder, graphical exploration, case-control exploration, query translation, and query execution. The canonical data dictionary serves as the unified metadata to drive the visual exploration interfaces and facilitate query translation through the mappings. RESULTS: X-search is publicly available at https://www.x-search.net/ with nine NSRR datasets consisting of over 26,000 unique subjects. The canonical data dictionary contains over 900 common data elements across the datasets. X-search has received over 1800 cross-cohort queries by users from 16 countries. CONCLUSIONS: X-search provides a powerful cross-cohort exploration interface for querying and exploring heterogeneous datasets in the NSRR data repository, so as to enable researchers to evaluate the feasibility of potential research studies and generate potential hypotheses using the NSRR data.


Asunto(s)
Macrodatos , Minería de Datos , Bases de Datos Factuales , Conjuntos de Datos como Asunto , Trastornos del Sueño-Vigilia , Sueño , Estudios de Cohortes , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA