RESUMO
Disease ontologies facilitate the semantic organization and representation of domain-specific knowledge. In the case of prostate cancer (PCa), large volumes of research results and clinical data have been accumulated and needed to be standardized for sharing and translational researches. A formal representation of PCa-associated knowledge will be essential to the diverse data standardization, data sharing and the future knowledge graph extraction, deep phenotyping and explainable artificial intelligence developing. In this study, we constructed an updated PCa ontology (PCAO2) based on the ontology development life cycle. An online information retrieval system was designed to ensure the usability of the ontology. The PCAO2 with a subclass-based taxonomic hierarchy covers the major biomedical concepts for PCa-associated genotypic, phenotypic and lifestyle data. The current version of the PCAO2 contains 633 concepts organized under three biomedical viewpoints, namely, epidemiology, diagnosis and treatment. These concepts are enriched by the addition of definition, synonym, relationship and reference. For the precision diagnosis and treatment, the PCa-associated genes and lifestyles are integrated in the viewpoint of epidemiological aspects of PCa. PCAO2 provides a standardized and systematized semantic framework for studying large amounts of heterogeneous PCa data and knowledge, which can be further, edited and enriched by the scientific community. The PCAO2 is freely available at https://bioportal.bioontology.org/ontologies/PCAO, http://pcaontology.net/ and http://pcaontology.net/mobile/.
Assuntos
Ontologias Biológicas , Neoplasias da Próstata , Humanos , Masculino , Inteligência Artificial , Semântica , Neoplasias da Próstata/genéticaRESUMO
The left and right anterior temporal lobes (ATLs) encode semantic representations. They show graded hemispheric specialization in function, with the left ATL contributing preferentially to verbal semantic processing. We investigated the cognitive correlates of this organization, using resting-state functional connectivity as a measure of functional segregation between ATLs. We analyzed two independent resting-state fMRI datasets (n = 86 and n = 642) in which participants' verbal semantic expertise was measured using vocabulary tests. In both datasets, people with more advanced verbal semantic knowledge showed weaker functional connectivity between left and right ventral ATLs. This effect was highly specific. It was not observed for within-hemisphere connections between semantic regions (ventral ATL and inferior frontal gyrus (IFG), though it was found for left-right IFG connectivity in one dataset). Effects were not found for tasks probing semantic control, nonsemantic cognition, or face recognition. Our results suggest that hemispheric specialization in the ATLs is not an innate property but rather emerges as people develop highly detailed verbal semantic representations. We speculate that this effect is a consequence of the left ATL's greater connectivity with left-lateralized written word recognition regions, which causes it to preferentially represent meaning for advanced vocabulary acquired primarily through reading.
Assuntos
Mapeamento Encefálico , Lateralidade Funcional , Imageamento por Ressonância Magnética , Semântica , Lobo Temporal , Humanos , Lobo Temporal/fisiologia , Lobo Temporal/diagnóstico por imagem , Masculino , Feminino , Adulto , Lateralidade Funcional/fisiologia , Adulto Jovem , Mapeamento Encefálico/métodos , Vias Neurais/fisiologia , Vias Neurais/diagnóstico por imagemRESUMO
Transformer-based large language models (LLMs) are very suited for biological sequence data, because of analogies to natural language. Complex relationships can be learned, because a concept of "words" can be generated through tokenization. Training the models with masked token prediction, they learn both token sequence identity and larger sequence context. We developed methodology to interrogate model learning, which is both relevant for the interpretability of the model and to evaluate its potential for specific tasks. We used DNABERT, a DNA language model trained on the human genome with overlapping k-mers as tokens. To gain insight into the model's learning, we interrogated how the model performs predictions, extracted token embeddings, and defined a fine-tuning benchmarking task to predict the next tokens of different sizes without overlaps. This task evaluates foundation models without interrogating specific genome biology, it does not depend on tokenization strategies, vocabulary size, the dictionary, or the number of training parameters. Lastly, there is no leakage of information from token identity into the prediction task, which makes it particularly useful to evaluate the learning of sequence context. We discovered that the model with overlapping k-mers struggles to learn larger sequence context. Instead, the learned embeddings largely represent token sequence. Still, good performance is achieved for genome-biology-inspired fine-tuning tasks. Models with overlapping tokens may be used for tasks where a larger sequence context is of less relevance, but the token sequence directly represents the desired learning features. This emphasizes the need to interrogate knowledge representation in biological LLMs.
Assuntos
DNA , Humanos , DNA/química , Genoma Humano , Análise de Sequência de DNA/métodos , Processamento de Linguagem Natural , Biologia Computacional/métodosRESUMO
BACKGROUND: The increasing aging population presents a significant challenge, accompanied by a shortage of professional caregivers, adding to the therapeutic burden. Clinical decision support systems, utilizing computerized clinical guidelines, can improve healthcare quality, reduce expenses, save time, and boost caregiver efficiency. OBJECTIVES: 1) Develop and evaluate an automated quality assessment (QA) system for retrospective longitudinal care quality analysis, focusing on clinical staff adherence to evidence-based guidelines (GLs). 2) Assess the system's technical feasibility and functional capability for senior nurse use in geriatric pressure-ulcer management. METHODS: A computational QA system using our Quality Assessment Temporal Patterns (QATP) methodology was designed and implemented. Our methodology transforms the GL's procedural-knowledge into declarative-knowledge temporal-abstraction patterns representing the expected execution trace in the patient's data for correct therapy application. Fuzzy temporal logic allows for partial compliance, reflecting individual and grouped action performance considering their values and temporal aspects. The system was tested using a pressure ulcer treatment GL and data from 100 geriatric patients' Electronic Medical Records (EMR). After technical evaluation for accuracy and feasibility, an extensive functional evaluation was conducted by an experienced nurse, comparing QA scores with and without system support, and versus automated system scores. Time efficiency was also measured. RESULTS: QA scores from the geriatric nurse, with and without system's support, did not significantly differ from those provided by the automated system (p < 0.05), demonstrating the effectiveness and reliability of both manual and automated methods. The system-supported manual QA process reduced scoring time by approximately two-thirds, from an average of 17.3 min per patient manually to about 5.9 min with the system's assistance, highlighting the system's efficiency potential in clinical practice. CONCLUSION: The QA system based on QATP, produces scores consistent with an experienced nurse's assessment for complex care over extended periods. It enables quick and accurate quality care evaluation for multiple patients after brief training. Such automated QA systems may empower nursing staff, enabling them to manage more patients, accurately and consistently, while reducing costs due to saved time and effort, and enhanced compliance with evidence-based guidelines.
Assuntos
Sistemas de Apoio a Decisões Clínicas , Úlcera por Pressão , Humanos , Idoso , Úlcera por Pressão/terapia , Registros Eletrônicos de Saúde , Garantia da Qualidade dos Cuidados de Saúde/métodos , Idoso de 80 Anos ou mais , Estudos Retrospectivos , Feminino , Masculino , GeriatriaRESUMO
OBJECTIVE: Traditional knowledge-based and machine learning diagnostic decision support systems have benefited from integrating the medical domain knowledge encoded in the Unified Medical Language System (UMLS). The emergence of Large Language Models (LLMs) to supplant traditional systems poses questions of the quality and extent of the medical knowledge in the models' internal knowledge representations and the need for external knowledge sources. The objective of this study is three-fold: to probe the diagnosis-related medical knowledge of popular LLMs, to examine the benefit of providing the UMLS knowledge to LLMs (grounding the diagnosis predictions), and to evaluate the correlations between human judgments and the UMLS-based metrics for generations by LLMs. METHODS: We evaluated diagnoses generated by LLMs from consumer health questions and daily care notes in the electronic health records using the ConsumerQA and Problem Summarization datasets. Probing LLMs for the UMLS knowledge was performed by prompting the LLM to complete the diagnosis-related UMLS knowledge paths. Grounding the predictions was examined in an approach that integrated the UMLS graph paths and clinical notes in prompting the LLMs. The results were compared to prompting without the UMLS paths. The final experiments examined the alignment of different evaluation metrics, UMLS-based and non-UMLS, with human expert evaluation. RESULTS: In probing the UMLS knowledge, GPT-3.5 significantly outperformed Llama2 and a simple baseline yielding an F1 score of 10.9% in completing one-hop UMLS paths for a given concept. Grounding diagnosis predictions with the UMLS paths improved the results for both models on both tasks, with the highest improvement (4%) in SapBERT score. There was a weak correlation between the widely used evaluation metrics (ROUGE and SapBERT) and human judgments. CONCLUSION: We found that while popular LLMs contain some medical knowledge in their internal representations, augmentation with the UMLS knowledge provides performance gains around diagnosis generation. The UMLS needs to be tailored for the task to improve the LLMs predictions. Finding evaluation metrics that are aligned with human judgments better than the traditional ROUGE and BERT-based scores remains an open research question.
Assuntos
Registros Eletrônicos de Saúde , Unified Medical Language System , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural , Sistemas de Apoio a Decisões Clínicas , Diagnóstico por Computador/métodosRESUMO
Electronic health records (EHRs) store an extensive array of patient information, encompassing medical histories, diagnoses, treatments, and test outcomes. These records are crucial for enabling healthcare providers to make well-informed decisions regarding patient care. Summarizing clinical notes further assists healthcare professionals in pinpointing potential health risks and making better-informed decisions. This process contributes to reducing errors and enhancing patient outcomes by ensuring providers have access to the most pertinent and current patient data. Recent research has shown that incorporating instruction prompts with large language models (LLMs) substantially boosts the efficacy of summarization tasks. However, we show that this approach also leads to increased performance variance, resulting in significantly distinct summaries even when instruction prompts share similar meanings. To tackle this challenge, we introduce a model-agnostic Soft Prompt-BasedCalibration (SPeC) pipeline that employs soft prompts to lower variance while preserving the advantages of prompt-based summarization. Experimental findings on multiple clinical note tasks and LLMs indicate that our method not only bolsters performance but also effectively regulates variance across different LLMs, providing a more consistent and reliable approach to summarizing critical medical information.
Assuntos
Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Humanos , Calibragem , Idioma , Pessoal de SaúdeRESUMO
BACKGROUND: Familiar cardiopathies are genetic disorders that affect the heart. Cardiologists face a significant problem when treating patients suffering from these disorders: most DNA variations are novel (i.e., they have not been classified before). To facilitate the analysis of novel variations, we present CardioGraph, a platform specially designed to support the analysis of novel variations and help determine whether they are relevant for diagnosis. To do this, CardioGraph identifies and annotates the consequence of variations and provides contextual information regarding which heart structures, pathways, and biological processes are potentially affected by those variations. METHODS: We conducted our work through three steps. First, we define a data model to support the representation of the heterogeneous information. Second, we instantiate this data model to integrate and represent all the genomics knowledge available for familiar cardiopathies. In this step, we consider genomic data sources and the scientific literature. Third, the design and implementation of the CardioGraph platform. A three-tier structure was used: the database, the backend, and the frontend. RESULTS: Three main results were obtained: the data model, the knowledge base generated with the instantiation of the data model, and the platform itself. The platform code has been included as supplemental material in this manuscript. Besides, an instance is publicly available in the following link: https://genomics-hub.pros.dsic.upv.es:3090 . CONCLUSION: CardioGraph is a platform that supports the analysis of novel variations. Future work will expand the body of knowledge about familiar cardiopathies and include new information about hotspots, functional studies, and previously reported variations.
Assuntos
Cardiopatias , Humanos , GenômicaRESUMO
With the advent of robotics and artificial intelligence, the potential for automating tasks within human-centric environments has increased significantly. This is particularly relevant in the retail sector where the demand for efficient operations and the shortage of labor drive the need for rapid advancements in robot-based technologies. Densely packed retail shelves pose unique challenges for robotic manipulation and detection due to limited space and diverse object shapes. Vacuum-based grasping technologies offer a promising solution but face challenges with object shape adaptability. The study proposes a framework for robotic grasping in retail environments, an adaptive vacuum-based grasping solution, and a new evaluation metric-termed grasp shear force resilience-for measuring the effectiveness and stability of the grasp during manipulation. The metric provides insights into how retail objects behave under different manipulation scenarios, allowing for better assessment and optimization of robotic grasping performance. The study's findings demonstrate the adaptive suction cups' ability to successfully handle a wide range of object shapes and sizes, which, in some cases, overcome commercially available solutions, particularly in adaptability. Additionally, the grasp shear force resilience metric highlights the effects of the manipulation process, such as in shear force and shake, on the manipulated object. This offers insights into its interaction with different vacuum cup grasping solutions in retail picking and restocking scenarios.
RESUMO
Ontologies serve as comprehensive frameworks for organizing domain-specific knowledge, offering significant benefits for managing clinical data. This study presents the development of the Fall Risk Management Ontology (FRMO), designed to enhance clinical text mining, facilitate integration and interoperability between disparate data sources, and streamline clinical data analysis. By representing major entities within the fall risk management domain, the FRMO supports the unification of clinical language and decision-making processes, ultimately contributing to the prevention of falls among older adults. We used Ontology Web Language (OWL) to build the FRMO in Protégé. Of the seven steps of the Stanford approach, six steps were utilized in the development of the FRMO: (1) defining the domain and scope of the ontology, (2) reusing existing ontologies when possible, (3) enumerating ontology terms, (4) specifying the classes and their hierarchy, (5) defining the properties of the classes, and (6) defining the facets of the properties. We evaluated the FRMO using four main criteria: consistency, completeness, accuracy, and clarity. The developed ontology comprises 890 classes arranged in a hierarchical structure, including six top-level classes with a total of 43 object properties and 28 data properties. FRMO is the first comprehensively described semantic ontology for fall risk management. Healthcare providers can use the ontology as the basis of clinical decision technology for managing falls among older adults.
Assuntos
Acidentes por Quedas , Mineração de Dados , Gestão de Riscos , Acidentes por Quedas/prevenção & controle , Humanos , Mineração de Dados/métodos , Ontologias Biológicas , Registros Eletrônicos de Saúde/organização & administração , SemânticaRESUMO
Ontologies have long been employed in the life sciences to formally represent and reason over domain knowledge and they are employed in almost every major biological database. Recently, ontologies are increasingly being used to provide background knowledge in similarity-based analysis and machine learning models. The methods employed to combine ontologies and machine learning are still novel and actively being developed. We provide an overview over the methods that use ontologies to compute similarity and incorporate them in machine learning methods; in particular, we outline how semantic similarity measures and ontology embeddings can exploit the background knowledge in ontologies and how ontologies can provide constraints that improve machine learning models. The methods and experiments we describe are available as a set of executable notebooks, and we also provide a set of slides and additional resources at https://github.com/bio-ontology-research-group/machine-learning-with-ontologies.
Assuntos
Ontologias Biológicas , Aprendizado de Máquina , Modelos Biológicos , SemânticaRESUMO
INTRODUCTION: The use and interoperability of clinical knowledge starts with the quality of the formalism utilized to express medical expertise. However, a crucial challenge is that existing formalisms are often suboptimal, lacking the fidelity to represent complex knowledge thoroughly and concisely. Often this leads to difficulties when seeking to unambiguously capture, share, and implement the knowledge for care improvement in clinical information systems used by providers and patients. OBJECTIVES: To provide a systematic method to address some of the complexities of knowledge composition and interoperability related to standards-based representational formalisms of medical knowledge. METHODS: Several cross-industry (Healthcare, Linguistics, System Engineering, Standards Development, and Knowledge Engineering) frameworks were synthesized into a proposed reference knowledge framework. The framework utilizes IEEE 42010, the MetaObject Facility, the Semantic Triangle, an Ontology Framework, and the Domain and Comprehensibility Appropriateness criteria. The steps taken were: 1) identify foundational cross-industry frameworks, 2) select architecture description method, 3) define life cycle viewpoints, 4) define representation and knowledge viewpoints, 5) define relationships between neighboring viewpoints, and 6) establish characteristic definitions of the relationships between components. System engineering principles applied included separation of concerns, cohesion, and loose coupling. RESULTS: A "Multilayer Metamodel for Representation and Knowledge" (M*R/K) reference framework was defined. It provides a standard vocabulary for organizing and articulating medical knowledge curation perspectives, concepts, and relationships across the artifacts created during the life cycle of language creation, authoring medical knowledge, and knowledge implementation in clinical information systems such as electronic health records (EHR). CONCLUSION: M*R/K provides a systematic means to address some of the complexities of knowledge composition and interoperability related to medical knowledge representations used in diverse standards. The framework may be used to guide the development, assessment, and coordinated use of knowledge representation formalisms. M*R/K could promote the alignment and aggregated use of distinct domain-specific languages in composite knowledge artifacts such as clinical practice guidelines (CPGs).
Assuntos
Atenção à Saúde , Registros Eletrônicos de Saúde , Humanos , SemânticaRESUMO
BACKGROUND: Pharmacokinetic natural product-drug interactions (NPDIs) occur when botanical or other natural products are co-consumed with pharmaceutical drugs. With the growing use of natural products, the risk for potential NPDIs and consequent adverse events has increased. Understanding mechanisms of NPDIs is key to preventing or minimizing adverse events. Although biomedical knowledge graphs (KGs) have been widely used for drug-drug interaction applications, computational investigation of NPDIs is novel. We constructed NP-KG as a first step toward computational discovery of plausible mechanistic explanations for pharmacokinetic NPDIs that can be used to guide scientific research. METHODS: We developed a large-scale, heterogeneous KG with biomedical ontologies, linked data, and full texts of the scientific literature. To construct the KG, biomedical ontologies and drug databases were integrated with the Phenotype Knowledge Translator framework. The semantic relation extraction systems, SemRep and Integrated Network and Dynamic Reasoning Assembler, were used to extract semantic predications (subject-relation-object triples) from full texts of the scientific literature related to the exemplar natural products green tea and kratom. A literature-based graph constructed from the predications was integrated into the ontology-grounded KG to create NP-KG. NP-KG was evaluated with case studies of pharmacokinetic green tea- and kratom-drug interactions through KG path searches and meta-path discovery to determine congruent and contradictory information in NP-KG compared to ground truth data. We also conducted an error analysis to identify knowledge gaps and incorrect predications in the KG. RESULTS: The fully integrated NP-KG consisted of 745,512 nodes and 7,249,576 edges. Evaluation of NP-KG resulted in congruent (38.98% for green tea, 50% for kratom), contradictory (15.25% for green tea, 21.43% for kratom), and both congruent and contradictory (15.25% for green tea, 21.43% for kratom) information compared to ground truth data. Potential pharmacokinetic mechanisms for several purported NPDIs, including the green tea-raloxifene, green tea-nadolol, kratom-midazolam, kratom-quetiapine, and kratom-venlafaxine interactions were congruent with the published literature. CONCLUSION: NP-KG is the first KG to integrate biomedical ontologies with full texts of the scientific literature focused on natural products. We demonstrate the application of NP-KG to identify known pharmacokinetic interactions between natural products and pharmaceutical drugs mediated by drug metabolizing enzymes and transporters. Future work will incorporate context, contradiction analysis, and embedding-based methods to enrich NP-KG. NP-KG is publicly available at https://doi.org/10.5281/zenodo.6814507. The code for relation extraction, KG construction, and hypothesis generation is available at https://github.com/sanyabt/np-kg.
Assuntos
Ontologias Biológicas , Produtos Biológicos , Reconhecimento Automatizado de Padrão , Interações Medicamentosas , Semântica , Preparações FarmacêuticasRESUMO
BACKGROUND: In return for their nutritional properties and broad availability, cereal crops have been associated with different alimentary disorders and symptoms, with the majority of the responsibility being attributed to gluten. Therefore, the research of gluten-related literature data continues to be produced at ever-growing rates, driven in part by the recent exploratory studies that link gluten to non-traditional diseases and the popularity of gluten-free diets, making it increasingly difficult to access and analyse practical and structured information. In this sense, the accelerated discovery of novel advances in diagnosis and treatment, as well as exploratory studies, produce a favourable scenario for disinformation and misinformation. OBJECTIVES: Aligned with, the European Union strategy "Delivering on EU Food Safety and Nutrition in 2050â³ which emphasizes the inextricable links between imbalanced diets, the increased exposure to unreliable sources of information and misleading information, and the increased dependency on reliable sources of information; this paper presents GlutKNOIS, a public and interactive literature-based database that reconstructs and represents the experimental biomedical knowledge extracted from the gluten-related literature. The developed platform includes different external database knowledge, bibliometrics statistics and social media discussion to propose a novel and enhanced way to search, visualise and analyse potential biomedical and health-related interactions in relation to the gluten domain. METHODS: For this purpose, the presented study applies a semi-supervised curation workflow that combines natural language processing techniques, machine learning algorithms, ontology-based normalization and integration approaches, named entity recognition methods, and graph knowledge reconstruction methodologies to process, classify, represent and analyse the experimental findings contained in the literature, which is also complemented by data from the social discussion. RESULTS AND CONCLUSIONS: In this sense, 5814 documents were manually annotated and 7424 were fully automatically processed to reconstruct the first online gluten-related knowledge database of evidenced health-related interactions that produce health or metabolic changes based on the literature. In addition, the automatic processing of the literature combined with the knowledge representation methodologies proposed has the potential to assist in the revision and analysis of years of gluten research. The reconstructed knowledge base is public and accessible at https://sing-group.org/glutknois/.
Assuntos
Glutens , Bases de Conhecimento , Humanos , Aprendizado de Máquina , Algoritmos , Processamento de Linguagem NaturalRESUMO
BACKGROUND: Scientific discovery progresses by exploring new and uncharted territory. More specifically, it advances by a process of transforming unknown unknowns first into known unknowns, and then into knowns. Over the last few decades, researchers have developed many knowledge bases to capture and connect the knowns, which has enabled topic exploration and contextualization of experimental results. But recognizing the unknowns is also critical for finding the most pertinent questions and their answers. Prior work on known unknowns has sought to understand them, annotate them, and automate their identification. However, no knowledge-bases yet exist to capture these unknowns, and little work has focused on how scientists might use them to trace a given topic or experimental result in search of open questions and new avenues for exploration. We show here that a knowledge base of unknowns can be connected to ontologically grounded biomedical knowledge to accelerate research in the field of prenatal nutrition. RESULTS: We present the first ignorance-base, a knowledge-base created by combining classifiers to recognize ignorance statements (statements of missing or incomplete knowledge that imply a goal for knowledge) and biomedical concepts over the prenatal nutrition literature. This knowledge-base places biomedical concepts mentioned in the literature in context with the ignorance statements authors have made about them. Using our system, researchers interested in the topic of vitamin D and prenatal health were able to uncover three new avenues for exploration (immune system, respiratory system, and brain development) by searching for concepts enriched in ignorance statements. These were buried among the many standard enriched concepts. Additionally, we used the ignorance-base to enrich concepts connected to a gene list associated with vitamin D and spontaneous preterm birth and found an emerging topic of study (brain development) in an implied field (neuroscience). The researchers could look to the field of neuroscience for potential answers to the ignorance statements. CONCLUSION: Our goal is to help students, researchers, funders, and publishers better understand the state of our collective scientific ignorance (known unknowns) in order to help accelerate research through the continued illumination of and focus on the known unknowns and their respective goals for scientific knowledge.
Assuntos
Bases de Conhecimento , Conhecimento , Processamento de Linguagem Natural , Feminino , Humanos , Recém-Nascido , Nascimento Prematuro , Publicações , Vitamina DRESUMO
BACKGROUND: Causal feature selection is essential for estimating effects from observational data. Identifying confounders is a crucial step in this process. Traditionally, researchers employ content-matter expertise and literature review to identify confounders. Uncontrolled confounding from unidentified confounders threatens validity, conditioning on intermediate variables (mediators) weakens estimates, and conditioning on common effects (colliders) induces bias. Additionally, without special treatment, erroneous conditioning on variables combining roles introduces bias. However, the vast literature is growing exponentially, making it infeasible to assimilate this knowledge. To address these challenges, we introduce a novel knowledge graph (KG) application enabling causal feature selection by combining computable literature-derived knowledge with biomedical ontologies. We present a use case of our approach specifying a causal model for estimating the total causal effect of depression on the risk of developing Alzheimer's disease (AD) from observational data. METHODS: We extracted computable knowledge from a literature corpus using three machine reading systems and inferred missing knowledge using logical closure operations. Using a KG framework, we mapped the output to target terminologies and combined it with ontology-grounded resources. We translated epidemiological definitions of confounder, collider, and mediator into queries for searching the KG and summarized the roles played by the identified variables. We compared the results with output from a complementary method and published observational studies and examined a selection of confounding and combined role variables in-depth. RESULTS: Our search identified 128 confounders, including 58 phenotypes, 47 drugs, 35 genes, 23 collider, and 16 mediator phenotypes. However, only 31 of the 58 confounder phenotypes were found to behave exclusively as confounders, while the remaining 27 phenotypes played other roles. Obstructive sleep apnea emerged as a potential novel confounder for depression and AD. Anemia exemplified a variable playing combined roles. CONCLUSION: Our findings suggest combining machine reading and KG could augment human expertise for causal feature selection. However, the complexity of causal feature selection for depression with AD highlights the need for standardized field-specific databases of causal variables. Further work is needed to optimize KG search and transform the output for human consumption.
Assuntos
Doença de Alzheimer , Humanos , Depressão , Reconhecimento Automatizado de Padrão , Causalidade , Fatores de RiscoRESUMO
OBJECTIVE: Computing phenotypes that provide high-fidelity, time-dependent characterizations and yield personalized interpretations is challenging, especially given the complexity of physiological and healthcare systems and clinical data quality. This paper develops a methodological pipeline to estimate unmeasured physiological parameters and produce high-fidelity, personalized phenotypes anchored to physiological mechanics from electronic health record (EHR). METHODS: A methodological phenotyping pipeline is developed that computes new phenotypes defined with unmeasurable computational biomarkers quantifying specific physiological properties in real time. Working within the inverse problem framework, this pipeline is applied to the glucose-insulin system for ICU patients using data assimilation to estimate an established mathematical physiological model with stochastic optimization. This produces physiological model parameter vectors of clinically unmeasured endocrine properties, here insulin secretion, clearance, and resistance, estimated for individual patient. These physiological parameter vectors are used as inputs to unsupervised machine learning methods to produce phenotypic labels and discrete physiological phenotypes. These phenotypes are inherently interpretable because they are based on parametric physiological descriptors. To establish potential clinical utility, the computed phenotypes are evaluated with external EHR data for consistency and reliability and with clinician face validation. RESULTS: The phenotype computation was performed on a cohort of 109 ICU patients who received no or short-acting insulin therapy, rendering continuous and discrete physiological phenotypes as specific computational biomarkers of unmeasured insulin secretion, clearance, and resistance on time windows of three days. Six, six, and five discrete phenotypes were found in the first, middle, and last three-day periods of ICU stays, respectively. Computed phenotypic labels were predictive with an average accuracy of 89%. External validation of discrete phenotypes showed coherence and consistency in clinically observable differences based on laboratory measurements and ICD 9/10 codes and clinical concordance from face validity. A particularly clinically impactful parameter, insulin secretion, had a concordance accuracy of 83%±27%. CONCLUSION: The new physiological phenotypes computed with individual patient ICU data and defined by estimates of mechanistic model parameters have high physiological fidelity, are continuous, time-specific, personalized, interpretable, and predictive. This methodology is generalizable to other clinical and physiological settings and opens the door for discovering deeper physiological information to personalize medical care.
Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Humanos , Reprodutibilidade dos Testes , Fenótipo , Biomarcadores , Unidades de Terapia IntensivaRESUMO
Cognitive scientists have a long-standing interest in quantifying the structure of semantic memory. Here, we investigate whether a commonly used paradigm to study the structure of semantic memory, the semantic fluency task, as well as computational methods from network science could be leveraged to explore the underlying knowledge structures of academic disciplines such as psychology or biology. To compare the knowledge representations of individuals with relatively different levels of expertise in academic subjects, undergraduate students (i.e., experts) and preuniversity high school students (i.e., novices) completed a semantic fluency task with cue words corresponding to general semantic categories (i.e., animals, fruits) and specific academic domains (e.g., psychology, biology). Network analyses of their fluency networks found that both domain-general and domain-specific semantic networks of undergraduates were more efficiently connected and less modular than the semantic networks of high school students. Our results provide an initial proof-of-concept that the semantic fluency task could be used by educators and cognitive scientists to study the representation of more specific domains of knowledge, potentially providing new ways of quantifying the nature of expert cognitive representations.
Assuntos
Memória , Semântica , Humanos , Testes NeuropsicológicosRESUMO
OBJECTIVE: This study used the looking-at-nothing phenomenon to explore situation awareness (SA) and the effects of working memory (WM) load in driving situations. BACKGROUND: While driving, people develop a mental representation of the environment. Since errors in retrieving information from this representation can have fatal consequences, it is essential for road safety to investigate this process. During retrieval, people tend to fixate spatial positions of visually encoded information, even if it is no longer available at that location. Previous research has shown that this "looking-at-nothing" behavior can be used to trace retrieval processes. METHOD: In a video-based laboratory experiment with 2 (WM) x 3 (SA level) within-subjects design, participants (N = 33) viewed a reduced screen and evaluated auditory statements relating to different SA levels on previously seen dynamic traffic scenarios while eye movements were recorded. RESULTS: When retrieving information, subjects more frequently fixated emptied spatial locations associated with the information relevant for the probed SA level. The retrieval of anticipations (SA level 3) in contrast to the other SA level information resulted in more frequent gaze transitions that corresponded to the spatial dynamics of future driving behavior. CONCLUSION: The results support the idea that people build a visual-spatial mental image of a driving situation. Different gaze patterns when retrieving level-specific information indicate divergent retrieval processes. APPLICATION: Potential applications include developing new methodologies to assess the mental representation and SA of drivers objectively.
Assuntos
Compreensão , Movimentos Oculares , Humanos , Conscientização , Memória de Curto PrazoRESUMO
Eye movements have been examined as an index of attention and comprehension during reading in the literature for over 30 years. Although eye-movement measurements are acknowledged as reliable indicators of readers' comprehension skill, few studies have analyzed eye-movement patterns using network science. In this study, we offer a new approach to analyze eye-movement data. Specifically, we recorded visual scanpaths when participants were reading expository science text, and used these to construct scanpath networks that reflect readers' processing of the text. Results showed that low ability and high ability readers' scanpath networks exhibited distinctive properties, which are reflected in different network metrics including density, centrality, small-worldness, transitivity, and global efficiency. Such patterns provide a new way to show how skilled readers, as compared with less skilled readers, process information more efficiently. Implications of our analyses are discussed in light of current theories of reading comprehension.
Assuntos
Movimentos Oculares , Leitura , Humanos , Individualidade , Compreensão , AtençãoRESUMO
BACKGROUND: Model card reports aim to provide informative and transparent description of machine learning models to stakeholders. This report document is of interest to the National Institutes of Health's Bridge2AI initiative to address the FAIR challenges with artificial intelligence-based machine learning models for biomedical research. We present our early undertaking in developing an ontology for capturing the conceptual-level information embedded in model card reports. RESULTS: Sourcing from existing ontologies and developing the core framework, we generated the Model Card Report Ontology. Our development efforts yielded an OWL2-based artifact that represents and formalizes model card report information. The current release of this ontology utilizes standard concepts and properties from OBO Foundry ontologies. Also, the software reasoner indicated no logical inconsistencies with the ontology. With sample model cards of machine learning models for bioinformatics research (HIV social networks and adverse outcome prediction for stent implantation), we showed the coverage and usefulness of our model in transforming static model card reports to a computable format for machine-based processing. CONCLUSIONS: The benefit of our work is that it utilizes expansive and standard terminologies and scientific rigor promoted by biomedical ontologists, as well as, generating an avenue to make model cards machine-readable using semantic web technology. Our future goal is to assess the veracity of our model and later expand the model to include additional concepts to address terminological gaps. We discuss tools and software that will utilize our ontology for potential application services.