Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
1.
Sensors (Basel) ; 23(11)2023 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-37300022

RESUMO

Fault diagnosis is crucial for repairing aircraft and ensuring their proper functioning. However, with the higher complexity of aircraft, some traditional diagnosis methods that rely on experience are becoming less effective. Therefore, this paper explores the construction and application of an aircraft fault knowledge graph to improve the efficiency of fault diagnosis for maintenance engineers. Firstly, this paper analyzes the knowledge elements required for aircraft fault diagnosis, and defines a schema layer of a fault knowledge graph. Secondly, with deep learning as the main method and heuristic rules as the auxiliary method, fault knowledge is extracted from structured and unstructured fault data, and a fault knowledge graph for a certain type of craft is constructed. Finally, a fault question-answering system based on a fault knowledge graph was developed, which can accurately answer questions from maintenance engineers. The practical implementation of our proposed methodology highlights how knowledge graphs provide an effective means of managing aircraft fault knowledge, ultimately assisting engineers in identifying fault roots accurately and quickly.


Assuntos
Aeronaves , Reconhecimento Automatizado de Padrão , Engenharia , Heurística , Conhecimento
2.
Sensors (Basel) ; 22(6)2022 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-35336486

RESUMO

National infrastructure is a material engineering facility that provides public services for social production and residents' lives, and a large-scale complex device or system is used to ensure normal social and economic activities. Due to the problems of difficult data collection, long project period, complex data, poor security, difficult traceability and data intercommunication, the archives management of most national infrastructure is still in the pre-information era. To solve these problems, this paper proposes a trusted data storage architecture for national infrastructure based on blockchain. This consists of real-time collection of national infrastructure construction data through sensors and other Internet of Things devices, conversion of heterogeneous data source data into a unified format according to specific business flows, and timely storage of data in the blockchain to ensure data security and persistence. Knowledge extraction of data stored in the chain and the data of multiple regions or fields are jointly modeled through federal learning. The parameters and results are stored in the chain, and the information of each node is shared to solve the problem of data intercommunication.


Assuntos
Blockchain , Armazenamento e Recuperação da Informação , Segurança Computacional
3.
J Environ Manage ; 324: 116413, 2022 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-36352717

RESUMO

Deriving knowledge and learning from past experiences is essential for the successful adoption of Nature-Based Solutions (NBS) as novel integrative solutions that involve many uncertainties. Past experiences in implementing NBS have been collected in a number of repositories; however, it is a major challenge to derive knowledge from the huge amount of information provided by these repositories. This calls for information systems that can facilitate the knowledge extraction process. This paper introduces the NBS Case-Based System (NBS-CBS), an expert system that uses a hybrid architecture to derive information and recommendations from an NBS experience repository. The NBS-CBS combines a 'black-box' artificial neural networks model with a 'white-box' case-based reasoning model to deliver an intelligent, adaptive, and explainable system. Experts have tested this system to assess its functionality and accuracy. Accordingly, the NBS-CBS appears to provide inspirational recommendations and information for the NBS planning and design process.


Assuntos
Conservação dos Recursos Naturais , Sistemas Inteligentes , Redes Neurais de Computação
4.
BMC Med Inform Decis Mak ; 19(Suppl 2): 49, 2019 04 09.
Artigo em Inglês | MEDLINE | ID: mdl-30961582

RESUMO

BACKGROUND: Diabetes has become one of the hot topics in life science researches. To support the analytical procedures, researchers and analysts expend a mass of labor cost to collect experimental data, which is also error-prone. To reduce the cost and to ensure the data quality, there is a growing trend of extracting clinical events in form of knowledge from electronic medical records (EMRs). To do so, we first need a high-coverage knowledge base (KB) of a specific disease to support the above extraction tasks called KB-based Extraction. METHODS: We propose an approach to build a diabetes-centric knowledge base (a.k.a. DKB) via mining the Web. In particular, we first extract knowledge from semi-structured contents of vertical portals, fuse individual knowledge from each site, and further map them to a unified KB. The target DKB is then extracted from the overall KB based on a distance-based Expectation-Maximization (EM) algorithm. RESULTS: During the experiments, we selected eight popular vertical portals in China as data sources to construct DKB. There are 7703 instances and 96,041 edges in the final diabetes KB covering diseases, symptoms, western medicines, traditional Chinese medicines, examinations, departments, and body structures. The accuracy of DKB is 95.91%. Besides the quality assessment of extracted knowledge from vertical portals, we also carried out detailed experiments for evaluating the knowledge fusion performance as well as the convergence of the distance-based EM algorithm with positive results. CONCLUSIONS: In this paper, we introduced an approach to constructing DKB. A knowledge extraction and fusion pipeline was first used to extract semi-structured data from vertical portals and individual KBs were further fused into a unified knowledge base. After that, we develop a distance based Expectation Maximization algorithm to extract a subset from the overall knowledge base forming the target DKB. Experiments showed that the data in DKB are rich and of high-quality.


Assuntos
Algoritmos , Mineração de Dados , Diabetes Mellitus , Internet , Bases de Conhecimento , China , Registros Eletrônicos de Saúde , Humanos , Armazenamento e Recuperação da Informação
5.
BMC Bioinformatics ; 19(Suppl 10): 354, 2018 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-30367574

RESUMO

BACKGROUND: The high growth of Next Generation Sequencing data currently demands new knowledge extraction methods. In particular, the RNA sequencing gene expression experimental technique stands out for case-control studies on cancer, which can be addressed with supervised machine learning techniques able to extract human interpretable models composed of genes, and their relation to the investigated disease. State of the art rule-based classifiers are designed to extract a single classification model, possibly composed of few relevant genes. Conversely, we aim to create a large knowledge base composed of many rule-based models, and thus determine which genes could be potentially involved in the analyzed tumor. This comprehensive and open access knowledge base is required to disseminate novel insights about cancer. RESULTS: We propose CamurWeb, a new method and web-based software that is able to extract multiple and equivalent classification models in form of logic formulas ("if then" rules) and to create a knowledge base of these rules that can be queried and analyzed. The method is based on an iterative classification procedure and an adaptive feature elimination technique that enables the computation of many rule-based models related to the cancer under study. Additionally, CamurWeb includes a user friendly interface for running the software, querying the results, and managing the performed experiments. The user can create her profile, upload her gene expression data, run the classification analyses, and interpret the results with predefined queries. In order to validate the software we apply it to all public available RNA sequencing datasets from The Cancer Genome Atlas database obtaining a large open access knowledge base about cancer. CamurWeb is available at http://bioinformatics.iasi.cnr.it/camurweb . CONCLUSIONS: The experiments prove the validity of CamurWeb, obtaining many classification models and thus several genes that are associated to 21 different cancer types. Finally, the comprehensive knowledge base about cancer and the software tool are released online; interested researchers have free access to them for further studies and to design biological experiments in cancer research.


Assuntos
Regulação Neoplásica da Expressão Gênica , Bases de Conhecimento , Neoplasias/genética , Software , Sequência de Bases , Genes Neoplásicos , Genoma Humano , Humanos , Análise de Sequência de RNA
6.
Sci Technol Adv Mater ; 19(1): 649-659, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30245757

RESUMO

In this study, we develop a computer-aided material design system to represent and extract knowledge related to material design from natural language texts. A machine learning model is trained on a text corpus weakly labeled by minimal annotated relationship data (~100 labeled relationships) to extract knowledge from scientific articles. The knowledge is represented by relationships between scientific concepts, such as {annealing, grain size, strength}. The extracted relationships are represented as a knowledge graph formatted according to design charts, inspired by the process-structure-property-performance (PSPP) reciprocity. The design chart provides an intuitive effect of processes on properties and prospective processes to achieve the certain desired properties. Our system semantically searches the scientific literature and provides knowledge in the form of a design chart, and we hope it contributes more efficient developments of new materials.

7.
Am J Med Genet B Neuropsychiatr Genet ; 177(7): 613-624, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-28862395

RESUMO

The heterogeneity of patient phenotype data are an impediment to the research into the origins and progression of neuropsychiatric disorders. This difficulty is compounded in the case of rare disorders such as Phelan-McDermid Syndrome (PMS) by the paucity of patient clinical data. PMS is a rare syndromic genetic cause of autism and intellectual deficiency. In this paper, we describe the Phelan-McDermid Syndrome Data Network (PMS_DN), a platform that facilitates research into phenotype-genotype correlation and progression of PMS by: a) integrating knowledge of patient phenotypes extracted from Patient Reported Outcomes (PRO) data and clinical notes-two heterogeneous, underutilized sources of knowledge about patient phenotypes-with curated genetic information from the same patient cohort and b) making this integrated knowledge, along with a suite of statistical tools, available free of charge to authorized investigators on a Web portal https://pmsdn.hms.harvard.edu. PMS_DN is a Patient Centric Outcomes Research Initiative (PCORI) where patients and their families are involved in all aspects of the management of patient data in driving research into PMS. To foster collaborative research, PMS_DN also makes patient aggregates from this knowledge available to authorized investigators using distributed research networks such as the PCORnet PopMedNet. PMS_DN is hosted on a scalable cloud based environment and complies with all patient data privacy regulations. As of October 31, 2016, PMS_DN integrates high-quality knowledge extracted from the clinical notes of 112 patients and curated genetic reports of 176 patients with preprocessed PRO data from 415 patients.


Assuntos
Mineração de Dados/métodos , Estudos de Associação Genética/métodos , Armazenamento e Recuperação da Informação/métodos , Transtorno do Espectro Autista/genética , Deleção Cromossômica , Transtornos Cromossômicos/genética , Transtornos Cromossômicos/fisiopatologia , Cromossomos Humanos Par 22/genética , Estudos de Coortes , Bases de Dados Genéticas , Feminino , Humanos , Deficiência Intelectual/genética , Masculino , Prontuários Médicos , Proteínas do Tecido Nervoso/genética , Medidas de Resultados Relatados pelo Paciente , Fenótipo
8.
BMC Bioinformatics ; 18(1): 322, 2017 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-28666416

RESUMO

BACKGROUND: Current -omics technologies are able to sense the state of a biological sample in a very wide variety of ways. Given the high dimensionality that typically characterises these data, relevant knowledge is often hidden and hard to identify. Machine learning methods, and particularly feature selection algorithms, have proven very effective over the years at identifying small but relevant subsets of variables from a variety of application domains, including -omics data. Many methods exist with varying trade-off between the size of the identified variable subsets and the predictive power of such subsets. In this paper we focus on an heuristic for the identification of biomarkers called RGIFE: Rank Guided Iterative Feature Elimination. RGIFE is guided in its biomarker identification process by the information extracted from machine learning models and incorporates several mechanisms to ensure that it creates minimal and highly predictive features sets. RESULTS: We compare RGIFE against five well-known feature selection algorithms using both synthetic and real (cancer-related transcriptomics) datasets. First, we assess the ability of the methods to identify relevant and highly predictive features. Then, using a prostate cancer dataset as a case study, we look at the biological relevance of the identified biomarkers. CONCLUSIONS: We propose RGIFE, a heuristic for the inference of reduced panels of biomarkers that obtains similar predictive performance to widely adopted feature selection methods while selecting significantly fewer feature. Furthermore, focusing on the case study, we show the higher biological relevance of the biomarkers selected by our approach. The RGIFE source code is available at: http://ico2s.org/software/rgife.html .


Assuntos
Algoritmos , Biomarcadores/análise , Interface Usuário-Computador , Biomarcadores/metabolismo , Bases de Dados Factuais , Humanos , Internet , Neoplasias/diagnóstico , Neoplasias/genética , Neoplasias/metabolismo
9.
BMC Bioinformatics ; 18(1): 6, 2017 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-28049410

RESUMO

BACKGROUND: Data extraction and integration methods are becoming essential to effectively access and take advantage of the huge amounts of heterogeneous genomics and clinical data increasingly available. In this work, we focus on The Cancer Genome Atlas, a comprehensive archive of tumoral data containing the results of high-throughout experiments, mainly Next Generation Sequencing, for more than 30 cancer types. RESULTS: We propose TCGA2BED a software tool to search and retrieve TCGA data, and convert them in the structured BED format for their seamless use and integration. Additionally, it supports the conversion in CSV, GTF, JSON, and XML standard formats. Furthermore, TCGA2BED extends TCGA data with information extracted from other genomic databases (i.e., NCBI Entrez Gene, HGNC, UCSC, and miRBase). We also provide and maintain an automatically updated data repository with publicly available Copy Number Variation, DNA-methylation, DNA-seq, miRNA-seq, and RNA-seq (V1,V2) experimental data of TCGA converted into the BED format, and their associated clinical and biospecimen meta data in attribute-value text format. CONCLUSIONS: The availability of the valuable TCGA data in BED format reduces the time spent in taking advantage of them: it is possible to efficiently and effectively deal with huge amounts of cancer genomic data integratively, and to search, retrieve and extend them with additional information. The BED format facilitates the investigators allowing several knowledge discovery analyses on all tumor types in TCGA with the final aim of understanding pathological mechanisms and aiding cancer treatments.


Assuntos
Neoplasias/genética , Interface Usuário-Computador , Variações do Número de Cópias de DNA , Metilação de DNA , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Internet , MicroRNAs/química , MicroRNAs/metabolismo , Neoplasias/patologia , Análise de Sequência de DNA
10.
Sensors (Basel) ; 17(5)2017 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-28468291

RESUMO

The development of the Internet of Things (IoT) has accelerated research in indoor navigation systems, a majority of which rely on adequate wireless signals and sources. Nonetheless, deploying such a system requires periodic site-survey, which is time consuming and labor intensive. To address this issue, in this paper we present Canoe, an indoor navigation system that considers shopping mall scenarios. In our system, we do not assume any prior knowledge, such as floor-plan or the shop locations, access point placement or power settings, historical RSS measurements or fingerprints, etc. Instead, Canoe requires only that the shop owners collect and publish RSS values at the entrances of their shops and can direct a consumer to any of these shops by comparing the observed RSS values. The locations of the consumers and the shops are estimated using maximum likelihood estimation. In doing this, the direction of the target shop relative to the current orientation of the consumer can be precisely computed, such that the direction that a consumer should move can be determined. We have conducted extensive simulations using a real-world dataset. Our experiments in a real shopping mall demonstrate that if 50% of the shops publish their RSS values, Canoe can precisely navigate a consumer within 30 s, with an error rate below 9%.

11.
Inf Process Manag ; 52(1): 129-138, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-27065510

RESUMO

This paper presents a Web intelligence portal that captures and aggregates news and social media coverage about "Game of Thrones", an American drama television series created for the HBO television network based on George R.R. Martin's series of fantasy novels. The system collects content from the Web sites of Anglo-American news media as well as from four social media platforms: Twitter, Facebook, Google+ and YouTube. An interactive dashboard with trend charts and synchronized visual analytics components not only shows how often Game of Thrones events and characters are being mentioned by journalists and viewers, but also provides a real-time account of concepts that are being associated with the unfolding storyline and each new episode. Positive or negative sentiment is computed automatically, which sheds light on the perception of actors and new plot elements.

12.
J Biomed Inform ; 49: 84-100, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24632080

RESUMO

Real-time Obstructive Sleep Apnea (OSA) episode detection and monitoring are important for society in terms of an improvement in the health of the general population and of a reduction in mortality and healthcare costs. Currently, to diagnose OSA patients undergo PolySomnoGraphy (PSG), a complicated and invasive test to be performed in a specialized center involving many sensors and wires. Accordingly, each patient is required to stay in the same position throughout the duration of one night, thus restricting their movements. This paper proposes an easy, cheap, and portable approach for the monitoring of patients with OSA, which collects single-channel ElectroCardioGram (ECG) data only. It is easy to perform from the patient's point of view because only one wearable sensor is required, so the patient is not restricted to keeping the same position all night long, and the detection and monitoring can be carried out in any place through the use of a mobile device. Our approach is based on the automatic extraction, from a database containing information about the monitored patient, of explicit knowledge in the form of a set of IF…THEN rules containing typical parameters derived from Heart Rate Variability (HRV) analysis. The extraction is carried out off-line by means of a Differential Evolution algorithm. This set of rules can then be exploited in the real-time mobile monitoring system developed at our Laboratory: the ECG data is gathered by a wearable sensor and sent to a mobile device, where it is processed in real time. Subsequently, HRV-related parameters are computed from this data, and, if their values activate some of the rules describing the occurrence of OSA, an alarm is automatically produced. This approach has been tested on a well-known literature database of OSA patients. The numerical results show its effectiveness in terms of accuracy, sensitivity, and specificity, and the achieved sets of rules evidence the user-friendliness of the approach. Furthermore, the method is compared against other well known classifiers, and its discrimination ability is shown to be higher.


Assuntos
Automação , Apneia Obstrutiva do Sono/fisiopatologia , Humanos , Polissonografia/métodos
13.
Knowl Based Syst ; 69: 78-85, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-25431524

RESUMO

This paper presents a novel method for contextualizing and enriching large semantic knowledge bases for opinion mining with a focus on Web intelligence platforms and other high-throughput big data applications. The method is not only applicable to traditional sentiment lexicons, but also to more comprehensive, multi-dimensional affective resources such as SenticNet. It comprises the following steps: (i) identify ambiguous sentiment terms, (ii) provide context information extracted from a domain-specific training corpus, and (iii) ground this contextual information to structured background knowledge sources such as ConceptNet and WordNet. A quantitative evaluation shows a significant improvement when using an enriched version of SenticNet for polarity classification. Crowdsourced gold standard data in conjunction with a qualitative evaluation sheds light on the strengths and weaknesses of the concept grounding, and on the quality of the enrichment process.

14.
J Healthc Inform Res ; 8(1): 158-179, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38273979

RESUMO

Recent advancements in natural language processing (NLP), particularly contextual word embedding models, have improved knowledge extraction from biomedical and healthcare texts. However, limited comprehensive research compares these models. This study conducts a scoping review and compares the performance of the major contextual word embedding models for biomedical knowledge extraction. From 26 articles identified from Scopus, PubMed, PubMed Central, and Google Scholar between 2017 and 2021, 18 notable contextual word embedding models were identified. These include ELMo, BERT, BioBERT, BlueBERT, CancerBERT, DDS-BERT, RuBERT, LABSE, EhrBERT, MedBERT, Clinical BERT, Clinical BioBERT, Discharge Summary BERT, Discharge Summary BioBERT, GPT, GPT-2, GPT-3, and GPT2-Bio-Pt. A case study compared the performance of six representative models-ELMo, BERT, BioBERT, BlueBERT, Clinical BioBERT, and GPT-3-across text classification, named entity recognition, and question answering. The evaluation utilized datasets comprising biomedical text from tweets, NCBI, PubMed, and clinical notes sourced from two electronic health record datasets. Performance metrics, including accuracy and F1 score, were used. The results of this case study reveal that BioBERT performs the best in analyzing biomedical text, while Clinical BioBERT excels in analyzing clinical notes. These findings offer crucial insights into word embedding models for researchers, practitioners, and stakeholders utilizing NLP in biomedical and clinical document analysis. Supplementary Information: The online version contains supplementary material available at 10.1007/s41666-023-00157-y.

15.
J Diabetes Sci Technol ; : 19322968241253568, 2024 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-38767382

RESUMO

BACKGROUND: Large language models (LLMs) offer significant potential in medical information extraction but carry risks of generating incorrect information. This study aims to develop and validate a retriever-augmented generation (RAG) model that provides accurate medical knowledge about diabetes and diabetic foot care to laypersons with an eighth-grade literacy level. Improving health literacy through patient education is paramount to addressing the problem of limb loss in the diabetic population. In addition to affecting patient well-being through improved outcomes, improved physician well-being is an important outcome of a self-management model for patient health education. METHODS: We used an RAG architecture and built a question-and-answer artificial intelligence (AI) model to extract knowledge in response to questions pertaining to diabetes and diabetic foot care. We utilized GPT-4 by OpenAI, with Pinecone as a vector database. The NIH National Standards for Diabetes Self-Management Education served as the basis for our knowledge base. The model's outputs were validated through expert review against established guidelines and literature. Fifty-eight keywords were used to select 295 articles and the model was tested against 175 questions across topics. RESULTS: The study demonstrated that with appropriate content volume and few-shot learning prompts, the RAG model achieved 98% accuracy, confirming its capability to offer user-friendly and comprehensible medical information. CONCLUSION: The RAG model represents a promising tool for delivering reliable medical knowledge to the public which can be used for self-education and self-management for diabetes, highlighting the importance of content validation and innovative prompt engineering in AI applications.

16.
J Biomed Semantics ; 14(1): 12, 2023 08 31.
Artigo em Inglês | MEDLINE | ID: mdl-37653549

RESUMO

BACKGROUND: This paper proposes Cyrus, a new transparency evaluation framework, for Open Knowledge Extraction (OKE) systems. Cyrus is based on the state-of-the-art transparency models and linked data quality assessment dimensions. It brings together a comprehensive view of transparency dimensions for OKE systems. The Cyrus framework is used to evaluate the transparency of three linked datasets, which are built from the same corpus by three state-of-the-art OKE systems. The evaluation is automatically performed using a combination of three state-of-the-art FAIRness (Findability, Accessibility, Interoperability, Reusability) assessment tools and a linked data quality evaluation framework, called Luzzu. This evaluation includes six Cyrus data transparency dimensions for which existing assessment tools could be identified. OKE systems extract structured knowledge from unstructured or semi-structured text in the form of linked data. These systems are fundamental components of advanced knowledge services. However, due to the lack of a transparency framework for OKE, most OKE systems are not transparent. This means that their processes and outcomes are not understandable and interpretable. A comprehensive framework sheds light on different aspects of transparency, allows comparison between the transparency of different systems by supporting the development of transparency scores, gives insight into the transparency weaknesses of the system, and ways to improve them. Automatic transparency evaluation helps with scalability and facilitates transparency assessment. The transparency problem has been identified as critical by the European Union Trustworthy Artificial Intelligence (AI) guidelines. In this paper, Cyrus provides the first comprehensive view of transparency dimensions for OKE systems by merging the perspectives of the FAccT (Fairness, Accountability, and Transparency), FAIR, and linked data quality research communities. RESULTS: In Cyrus, data transparency includes ten dimensions which are grouped in two categories. In this paper, six of these dimensions, i.e., provenance, interpretability, understandability, licensing, availability, interlinking have been evaluated automatically for three state-of-the-art OKE systems, using the state-of-the-art metrics and tools. Covid-on-the-Web is identified to have the highest mean transparency. CONCLUSIONS: This is the first research to study the transparency of OKE systems that provides a comprehensive set of transparency dimensions spanning ethics, trustworthy AI, and data quality approaches to transparency. It also demonstrates how to perform automated transparency evaluation that combines existing FAIRness and linked data quality assessment tools for the first time. We show that state-of-the-art OKE systems vary in the transparency of the linked data generated and that these differences can be automatically quantified leading to potential applications in trustworthy AI, compliance, data protection, data governance, and future OKE system design and testing.


Assuntos
Inteligência Artificial , COVID-19 , Humanos , Web Semântica
17.
Comput Methods Programs Biomed ; 235: 107536, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37060685

RESUMO

BACKGROUND AND OBJECTIVE: This paper focuses on nutritional recommendation systems (RS), i.e. AI-powered automatic systems providing users with suggestions about what to eat to pursue their weight/body shape goals. A trade-off among (potentially) conflictual requirements must be taken into account when designing these kinds of systems, there including: (i) adherence to experts' prescriptions, (ii) adherence to users' tastes and preferences, (iii) explainability of the whole recommendation process. Accordingly, in this paper we propose a novel approach to the engineering of nutritional RS, combining machine learning and symbolic knowledge extraction to profile users-hence harmonising the aforementioned requirements. MethodsOur contribution focuses on the data processing workflow. Stemming from neural networks (NN) trained to predict user preferences, we use CART Breiman et al.(1984) to extract symbolic rules in Prolog Körner et al.(2022) form, and we combine them with expert prescriptions brought in similar form. We can then query the resulting symbolic knowledge base via logic solvers, to draw explainable recommendations. ResultsExperiments are performed involving a publicly available dataset of 45,723 recipes, plus 12 synthetic datasets about as many imaginary users, and 6 experts' prescriptions. Fully-connected 4-layered NN are trained on those datasets, reaching ∼86% test-set accuracy, on average. Extracted rules, in turn, have ∼80% fidelity w.r.t. those NN. The resulting recommendation system has a test-set precision of ∼74%. The symbolic approach makes it possible to devise how the system draws recommendations. ConclusionsThanks to our approach, intelligent agents may learn users' preferences from data, convert them into symbolic form, and extend them with experts' goal-directed prescriptions. The resulting recommendations are then simultaneously acceptable for the end user and adequate under a nutritional perspective, while the whole process of recommendation generation is made explainable.


Assuntos
Algoritmos , Inteligência Artificial , Redes Neurais de Computação , Aprendizado de Máquina , Bases de Conhecimento
18.
JMIR AI ; 2: e44835, 2023 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-38875570

RESUMO

BACKGROUND: With the growing volume and complexity of laboratory repositories, it has become tedious to parse unstructured data into structured and tabulated formats for secondary uses such as decision support, quality assurance, and outcome analysis. However, advances in natural language processing (NLP) approaches have enabled efficient and automated extraction of clinically meaningful medical concepts from unstructured reports. OBJECTIVE: In this study, we aimed to determine the feasibility of using the NLP model for information extraction as an alternative approach to a time-consuming and operationally resource-intensive handcrafted rule-based tool. Therefore, we sought to develop and evaluate a deep learning-based NLP model to derive knowledge and extract information from text-based laboratory reports sourced from a provincial laboratory repository system. METHODS: The NLP model, a hierarchical multilabel classifier, was trained on a corpus of laboratory reports covering testing for 14 different respiratory viruses and viral subtypes. The corpus includes 87,500 unique laboratory reports annotated by 8 subject matter experts (SMEs). The classification task involved assigning the laboratory reports to labels at 2 levels: 24 fine-grained labels in level 1 and 6 coarse-grained labels in level 2. A "label" also refers to the status of a specific virus or strain being tested or detected (eg, influenza A is detected). The model's performance stability and variation were analyzed across all labels in the classification task. Additionally, the model's generalizability was evaluated internally and externally on various test sets. RESULTS: Overall, the NLP model performed well on internal, out-of-time (pre-COVID-19), and external (different laboratories) test sets with microaveraged F1-scores >94% across all classes. Higher precision and recall scores with less variability were observed for the internal and pre-COVID-19 test sets. As expected, the model's performance varied across categories and virus types due to the imbalanced nature of the corpus and sample sizes per class. There were intrinsically fewer classes of viruses being detected than those tested; therefore, the model's performance (lowest F1-score of 57%) was noticeably lower in the detected cases. CONCLUSIONS: We demonstrated that deep learning-based NLP models are promising solutions for information extraction from text-based laboratory reports. These approaches enable scalable, timely, and practical access to high-quality and encoded laboratory data if integrated into laboratory information system repositories.

19.
Stud Health Technol Inform ; 305: 97-101, 2023 Jun 29.
Artigo em Inglês | MEDLINE | ID: mdl-37386967

RESUMO

Currently, there is very little research aimed at developing medical knowledge extraction tools for major West Slavic languages (Czech, Polish, and Slovak). This project lays the groundwork for a general medical knowledge extraction pipeline, introducing the resource vocabularies available for the respective languages (UMLS resources, ICD-10 translations and national drug databases). It demonstrates the utility of this approach on a case study using a large proprietary corpus of Czech oncology records consisting of more than 40 million words written about more than 4,000 patients. After correlating MedDRA terms found in patients' records with drugs prescribed to them, significant non-obvious associations were found between selected medical conditions being mentioned and the probability of certain drugs being prescribed over the course of the patient's treatment, in some cases increasing the probability of prescriptions by over 250%. This direction of research, producing large amounts of annotated data, is a prerequisite for training deep learning models and predictive systems.


Assuntos
Bases de Dados de Produtos Farmacêuticos , Idioma , Humanos , Classificação Internacional de Doenças , Conhecimento , Oncologia
20.
Chemosphere ; 308(Pt 1): 136248, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36057344

RESUMO

In this study, Gradient Boosted Regression Trees is applied, for the first time, to predict governing factors for methylene blue (MB) adsorption on a variety of adsorbents involving clay minerals, such as kaolinite and sepiolite together with industrial wastes red mud and fly ash, and alkali activated materials synthesized from aforementioned raw materials. Dataset was constructed using electronic databases, such as ScienceDirect, Scopus, Elsevier, and Google, experimental studies published between 2005 and 2022 were covered. The final dataset included experimental conditions, such as adsorbent type, adsorbent properties (surface characteristics, density, and chemical modifications), pH of the medium, adsorbent dosage, and temperature; and it involved 914 datapoints, which were extracted out of 75 papers (out of ∼1360 initially screened). Among distinct parameters, initial adsorbate concentration was found to be the most dominant factor affecting the MB uptake. Concordantly, pH of the solution medium, raw material selection, and modification types were also found to be significant in MB adsorption. Results showed that in terms of raw material and modification types, sepiolite and chemical (acid and/or alkaline modification) and thermal treatments, respectively, come forward as the most powerful candidates for enhanced MB adsorption performance. Modifications applied on adsorbents should be evaluated separately, as there is no general rule applicable for all experimental conditions, and the strength of the contribution of modification type also depends on initial adsorbate concentration. Implementation of various imputation methods showed the importance of reporting experimental factors, such as surface area, in the literature. Range of applicability of the suggested modeling procedure was assessed to help experimenters in testing MB uptake under novel experimental conditions.


Assuntos
Azul de Metileno , Poluentes Químicos da Água , Adsorção , Álcalis , Argila , Cinza de Carvão , Concentração de Íons de Hidrogênio , Resíduos Industriais , Caulim , Cinética , Silicatos de Magnésio , Azul de Metileno/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA