Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 53
Filter
Add more filters

Publication year range
1.
Nature ; 587(7833): 240-245, 2020 11.
Article in English | MEDLINE | ID: mdl-33177664

ABSTRACT

The Zoonomia Project is investigating the genomics of shared and specialized traits in eutherian mammals. Here we provide genome assemblies for 131 species, of which all but 9 are previously uncharacterized, and describe a whole-genome alignment of 240 species of considerable phylogenetic diversity, comprising representatives from more than 80% of mammalian families. We find that regions of reduced genetic diversity are more abundant in species at a high risk of extinction, discern signals of evolutionary selection at high resolution and provide insights from individual reference genomes. By prioritizing phylogenetic diversity and making data available quickly and without restriction, the Zoonomia Project aims to support biological discovery, medical research and the conservation of biodiversity.


Subject(s)
Conservation of Natural Resources , Eutheria/classification , Eutheria/genetics , Genetic Variation , Genomics/methods , Knowledge Discovery , Animals , Biodiversity , Biomedical Research , Conservation of Natural Resources/methods , Evolution, Molecular , Extinction, Biological , Genetic Speciation , Humans , Infections , Knowledge Discovery/methods , Loss of Heterozygosity , Neoplasms , Phylogeny , Risk Assessment , Selection, Genetic , Sequence Alignment , Species Specificity , Venoms
2.
BMC Bioinformatics ; 25(1): 273, 2024 Aug 21.
Article in English | MEDLINE | ID: mdl-39169321

ABSTRACT

BACKGROUND: There has been a considerable advancement in AI technologies like LLM and machine learning to support biomedical knowledge discovery. MAIN BODY: We propose a novel biomedical neural search service called 'VAIV Bio-Discovery', which supports enhanced knowledge discovery and document search on unstructured text such as PubMed. It mainly handles with information related to chemical compound/drugs, gene/proteins, diseases, and their interactions (chemical compounds/drugs-proteins/gene including drugs-targets, drug-drug, and drug-disease). To provide comprehensive knowledge, the system offers four search options: basic search, entity and interaction search, and natural language search. We employ T5slim_dec, which adapts the autoregressive generation task of the T5 (text-to-text transfer transformer) to the interaction extraction task by removing the self-attention layer in the decoder block. It also assists in interpreting research findings by summarizing the retrieved search results for a given natural language query with Retrieval Augmented Generation (RAG). The search engine is built with a hybrid method that combines neural search with the probabilistic search, BM25. CONCLUSION: As a result, our system can better understand the context, semantics and relationships between terms within the document, enhancing search accuracy. This research contributes to the rapidly evolving biomedical field by introducing a new service to access and discover relevant knowledge.


Subject(s)
Natural Language Processing , Data Mining/methods , Knowledge Discovery/methods , PubMed , Search Engine , Machine Learning , Information Storage and Retrieval/methods , Neural Networks, Computer
3.
J Biomed Inform ; 143: 104362, 2023 07.
Article in English | MEDLINE | ID: mdl-37146741

ABSTRACT

Scientific literature presents a wealth of information yet to be explored. As the number of researchers increase with each passing year and publications are released, this contributes to an era where specialized fields of research are becoming more prevalent. As this trend continues, this further propagates the separation of interdisciplinary publications and makes keeping up to date with literature a laborious task. Literature-based discovery (LBD) aims to mitigate these concerns by promoting information sharing among non-interacting literature while extracting potentially meaningful information. Furthermore, recent advances in neural network architectures and data representation techniques have fueled their respective research communities in achieving state-of-the-art performance in many downstream tasks. However, studies of neural network-based methods for LBD remain to be explored. We introduce and explore a deep learning neural network-based approach for LBD. Additionally, we investigate various approaches to represent terms as concepts and analyze the affect of feature scaling representations into our model. We compare the evaluation performance of our method on five hallmarks of cancer datasets utilized for closed discovery. Our results show the chosen representation as input into our model affects evaluation performance. We found feature scaling our input representations increases evaluation performance and decreases the necessary number of epochs needed to achieve model generalization. We also explore two approaches to represent model output. We found reducing the model's output to capturing a subset of concepts improved evaluation performance at the cost of model generalizability. We also compare the efficacy of our method on the five hallmarks of cancer datasets to a set of randomly chosen relations between concepts. We found these experiments confirm our method's suitability for LBD.


Subject(s)
Deep Learning , Neoplasms , Humans , Neural Networks, Computer , Knowledge Discovery/methods , Publications
4.
J Biomed Inform ; 142: 104383, 2023 06.
Article in English | MEDLINE | ID: mdl-37196989

ABSTRACT

OBJECTIVE: To demonstrate and develop an approach enabling individual researchers or small teams to create their own ad-hoc, lightweight knowledge bases tailored for specialized scientific interests, using text-mining over scientific literature, and demonstrate the effectiveness of these knowledge bases in hypothesis generation and literature-based discovery (LBD). METHODS: We propose a lightweight process using an extractive search framework to create ad-hoc knowledge bases, which require minimal training and no background in bio-curation or computer science. These knowledge bases are particularly effective for LBD and hypothesis generation using Swanson's ABC method. The personalized nature of the knowledge bases allows for a somewhat higher level of noise than "public facing" ones, as researchers are expected to have prior domain experience to separate signal from noise. Fact verification is shifted from exhaustive verification of the knowledge base to post-hoc verification of specific entries of interest, allowing researchers to assess the correctness of relevant knowledge base entries by considering the paragraphs in which the facts were introduced. RESULTS: We demonstrate the methodology by constructing several knowledge bases of different kinds: three knowledge bases that support lab-internal hypothesis generation: Drug Delivery to Ovarian Tumors (DDOT); Tissue Engineering and Regeneration; Challenges in Cancer Research; and an additional comprehensive, accurate knowledge base designated as a public resource for the wider community on the topic of Cell Specific Drug Delivery (CSDD). In each case, we show the design and construction process, along with relevant visualizations for data exploration, and hypothesis generation. For CSDD and DDOT we also show meta-analysis, human evaluation, and in vitro experimental evaluation. CONCLUSION: Our approach enables researchers to create personalized, lightweight knowledge bases for specialized scientific interests, effectively facilitating hypothesis generation and literature-based discovery (LBD). By shifting fact verification efforts to post-hoc verification of specific entries, researchers can focus on exploring and generating hypotheses based on their expertise. The constructed knowledge bases demonstrate the versatility and adaptability of our approach to versatile research interests. The web-based platform, available at https://spike-kbc.apps.allenai.org, provides researchers with a valuable tool for rapid construction of knowledge bases tailored to their needs.


Subject(s)
Data Mining , Knowledge Discovery , Humans , Data Mining/methods , Knowledge Discovery/methods , Publications
5.
J Biomed Inform ; 145: 104464, 2023 09.
Article in English | MEDLINE | ID: mdl-37541406

ABSTRACT

OBJECTIVE: We explore the framing of literature-based discovery (LBD) as link prediction and graph embedding learning, with Alzheimer's Disease (AD) as our focus disease context. The key link prediction setting of prediction window length is specifically examined in the context of a time-sliced evaluation methodology. METHODS: We propose a four-stage approach to explore literature-based discovery for Alzheimer's Disease, creating and analyzing a knowledge graph tailored to the AD context, and predicting and evaluating new knowledge based on time-sliced link prediction. The first stage is to collect an AD-specific corpus. The second stage involves constructing an AD knowledge graph with identified AD-specific concepts and relations from the corpus. In the third stage, 20 pairs of training and testing datasets are constructed with the time-slicing methodology. Finally, we infer new knowledge with graph embedding-based link prediction methods. We compare different link prediction methods in this context. The impact of limiting prediction evaluation of LBD models in the context of short-term and longer-term knowledge evolution for Alzheimer's Disease is assessed. RESULTS: We constructed an AD corpus of over 16 k papers published in 1977-2021, and automatically annotated it with concepts and relations covering 11 AD-specific semantic entity types. The knowledge graph of Alzheimer's Disease derived from this resource consisted of ∼11 k nodes and ∼394 k edges, among which 34% were genotype-phenotype relationships, 57% were genotype-genotype relationships, and 9% were phenotype-phenotype relationships. A Structural Deep Network Embedding (SDNE) model consistently showed the best performance in terms of returning the most confident set of link predictions as time progresses over 20 years. A huge improvement in model performance was observed when changing the link prediction evaluation setting to consider a more distant future, reflecting the time required for knowledge accumulation. CONCLUSION: Neural network graph-embedding link prediction methods show promise for the literature-based discovery context, although the prediction setting is extremely challenging, with graph densities of less than 1%. Varying prediction window length on the time-sliced evaluation methodology leads to hugely different results and interpretations of LBD studies. Our approach can be generalized to enable knowledge discovery for other diseases. AVAILABILITY: Code, AD ontology, and data are available at https://github.com/READ-BioMed/readbiomed-lbd.


Subject(s)
Alzheimer Disease , Knowledge Discovery , Humans , Knowledge Discovery/methods , Alzheimer Disease/diagnosis , Neural Networks, Computer , Learning , Phenotype
8.
J Community Psychol ; 49(6): 1718-1731, 2021 08.
Article in English | MEDLINE | ID: mdl-34004017

ABSTRACT

Large amounts of text-based data, like study abstracts, often go unanalyzed because the task is laborious. Natural language processing (NLP) uses computer-based algorithms not traditionally implemented in community psychology to effectively and efficiently process text. These methods include examining the frequency of words and phrases, the clustering of topics, and the interrelationships of words. This article applied NLP to explore the concept of equity in community psychology. The COVID-19 crisis has made pre-existing health equity gaps even more salient. Community psychology has a specific interest in working with organizations, systems, and communities to address social determinants that perpetuate inequities by refocusing interventions around achieving health and wellness for all. This article examines how community psychology has discussed equity thus far to identify strengths and gaps for future research and practice. The results showed the prominence of community-based participatory research and the diversity of settings researchers work in. However, the total number of abstracts with equity concepts was lower than expected, which suggests there is a need for a continued focus on equity.


Subject(s)
Community Psychiatry/methods , Community-Based Participatory Research/methods , Health Equity/statistics & numerical data , Knowledge Discovery/methods , Natural Language Processing , Social Determinants of Health/statistics & numerical data , Humans , Periodicals as Topic
9.
Adv Exp Med Biol ; 1194: 181-191, 2020.
Article in English | MEDLINE | ID: mdl-32468534

ABSTRACT

The exponential growth of the number and variety of IoT devices and applications for personal use, as well as the improvement of their quality and performance, facilitates the realization of intelligent eHealth concepts. Nowadays, it is easier than ever for individuals to monitor themselves, quantify, and log their everyday activities in order to gain insights about their body's performance and receive recommendations and incentives to improve it. Of course, in order for such systems to live up to the promise, given the treasure trove of data that is collected, machine learning techniques need to be integrated in the processing and analysis of the data. This systematic and automated quantification, logging, and analysis of personal data, using IoT and AI technologies, have given birth to the phenomenon of Quantified-Self. This work proposes a prototype decentralized Quantified-Self application, built on top of a dedicated IoT gateway that aggregates and analyzes data from multiple sources, such as biosignal sensors and wearables, and performs analytics on it.


Subject(s)
Knowledge Discovery , Monitoring, Physiologic , Fitness Trackers/standards , Fitness Trackers/trends , Humans , Knowledge Discovery/methods , Machine Learning , Monitoring, Physiologic/instrumentation , Monitoring, Physiologic/methods , Telemedicine
10.
Int J Health Care Qual Assur ; 33(2): 221-234, 2020 Feb 12.
Article in English | MEDLINE | ID: mdl-32233355

ABSTRACT

PURPOSE: Incident reporting systems are commonly deployed in healthcare but resulting datasets are largely warehoused. This study explores if intelligence from such datasets could be used to improve quality, efficiency, and safety. DESIGN/METHODOLOGY/APPROACH: Incident reporting data recorded in one NHS acute Trust was mined for insight (n = 133,893 April 2005-July 2016 across 201 fields, 26,912,493 items). An a priori dataset was overlaid consisting of staffing, vital signs, and national safety indicators such as falls. Analysis was primarily nonlinear statistical approaches using Mathematica V11. FINDINGS: The organization developed a deeper understanding of the use of incident reporting systems both in terms of usability and possible reflection of culture. Signals emerged which focused areas of improvement or risk. An example of this is a deeper understanding of the timing and staffing levels associated with falls. Insight into the nature and grading of reporting was also gained. PRACTICAL IMPLICATIONS: Healthcare incident reporting data is underused and with a small amount of analysis can provide real insight and application to patient safety. ORIGINALITY/VALUE: This study shows that insight can be gained by mining incident reporting datasets, particularly when integrated with other routinely collected data.


Subject(s)
Data Mining/methods , Knowledge Discovery/methods , Risk Management/methods , Humans , Patient Safety , Quality of Health Care/organization & administration , Safety Management , State Medicine , United Kingdom
11.
Med Humanit ; 46(3): 204-213, 2020 Sep.
Article in English | MEDLINE | ID: mdl-31611283

ABSTRACT

This paper reviews the literature on health and female homosexuality in Brazil and, along the way, outlines an alternative approach to reviewing academic literature. Rather than summarising the contents of previously published papers, we relate to these publications primarily as partakers in the creation of knowledge. Inspired by Actor-Network Theory (ANT), we apply ethnographic methods to understand the papers as study participants endowed with action. We also draw on the notions of inscription and intertextuality to trace the complex relationship between the findings in the articles and the realities outside of them. We claim that 'evidence' is the product of translational processes in which original events, such as experiments, blood tests and interviews, are changed into textual entities. In addition, text production is seen as an absorption of everything else surrounding its creation. When events are turned into articles, the text incorporates the political environment to which original events once belonged. We thus observe a political text inscribed into the written evidence of sexually transmitted infections, and the practice of publishing about scientific vulnerabilities emerges as political action. In contrast with traditional ways of reviewing literature in medical scholarship, this article offers a reminder that although there is a connection between textual evidence and the reality outside publications, these dimensions are not neutrally interchangeable.


Subject(s)
Anthropology, Cultural/methods , Homosexuality, Female , Knowledge Discovery/methods , Brazil , Female , Humans
12.
Nurs Philos ; 21(3): e12309, 2020 Jul.
Article in English | MEDLINE | ID: mdl-32537914

ABSTRACT

To revitalize nursing science, there is a need for a new approach to guide nurse scientists in addressing complex problems in health care. By applying theoretical concepts from a revolutionary philosopher of science, Paul K. Feyerabend, new nursing knowledge can be produced using creativity and pluralistic approaches. Feyerabend proposed that methods within and outside of science can produce knowledge. Despite the recognition of Feyerabendian philosophy within science, there is currently a lack of literature regarding the relevance of Feyerabendian philosophy for nursing science. We aim to (a) describe and critique Feyerabendian concepts, (b) discuss the potential application of Feyerabendian philosophy for knowledge production within gerontological nursing and (c) describe theoretical possibilities for nurse scientists in using Feyerabendian philosophy to guide nursing knowledge development. We begin by introducing Feyerabend's life and his inspirations for his theoretical concepts, epistemological anarchism, theoretical pluralism and humanitarianism, and conclude by offering suggestions of how to apply Feyerabendian philosophy in nursing research.


Subject(s)
Knowledge Discovery/methods , Nursing/methods , Philosophy , Humans , Nursing/trends
13.
J Biomed Inform ; 94: 103172, 2019 06.
Article in English | MEDLINE | ID: mdl-30965136

ABSTRACT

This paper presents and describes eHealth-KD corpus. The corpus is a collection of 1173 Spanish health-related sentences manually annotated with a general semantic structure that captures most of the content, without resorting to domain-specific labels. The semantic representation is first defined and illustrated with example sentences from the corpus. Next, the paper summarizes the process of annotation and provides key metrics of the corpus. Finally, three baseline implementations, which are supported by machine learning models, were designed to consider the complexity of learning the corpus semantics. The resulting corpus was used as an evaluation scenario in TASS 2018 (Martínez-Cámara et al., 2018) and the findings obtained by participants are discussed. The eHealth-KD corpus provides the first step in the design of a general-purpose semantic framework that can be used to extract knowledge from a variety of domains.


Subject(s)
Knowledge Discovery/methods , Telemedicine , Data Mining , Semantics , Spain
14.
J Biomed Inform ; 74: 20-32, 2017 10.
Article in English | MEDLINE | ID: mdl-28838802

ABSTRACT

OBJECTIVES: This paper provides an introduction and overview of literature based discovery (LBD) in the biomedical domain. It introduces the reader to modern and historical LBD models, key system components, evaluation methodologies, and current trends. After completion, the reader will be familiar with the challenges and methodologies of LBD. The reader will be capable of distinguishing between recent LBD systems and publications, and be capable of designing an LBD system for a specific application. TARGET AUDIENCE: From biomedical researchers curious about LBD, to someone looking to design an LBD system, to an LBD expert trying to catch up on trends in the field. The reader need not be familiar with LBD, but knowledge of biomedical text processing tools is helpful. SCOPE: This paper describes a unifying framework for LBD systems. Within this framework, different models and methods are presented to both distinguish and show overlap between systems. Topics include term and document representation, system components, and an overview of models including co-occurrence models, semantic models, and distributional models. Other topics include uninformative term filtering, term ranking, results display, system evaluation, an overview of the application areas of drug development, drug repurposing, and adverse drug event prediction, and challenges and future directions. A timeline showing contributions to LBD, and a table summarizing the works of several authors is provided. Topics are presented from a high level perspective. References are given if more detailed analysis is required.


Subject(s)
Knowledge Discovery/methods , Models, Theoretical , Algorithms , Data Mining
15.
BMC Med Inform Decis Mak ; 17(Suppl 1): 55, 2017 May 18.
Article in English | MEDLINE | ID: mdl-28539121

ABSTRACT

BACKGROUND: Although drug discoveries can provide meaningful insights and significant enhancements in pharmaceutical field, the longevity and cost that it takes can be extensive where the success rate is low. In order to circumvent the problem, there has been increased interest in 'Drug Repositioning' where one searches for already approved drugs that have high potential of efficacy when applied to other diseases. To increase the success rate for drug repositioning, one considers stepwise screening and experiments based on biological reactions. Given the amount of drugs and diseases, however, the one-by-one procedure may be time consuming and expensive. METHODS: In this study, we propose a machine learning based approach for efficiently selecting candidate diseases and drugs. We assume that if two diseases are similar, then a drug for one disease can be effective against the other disease too. For the procedure, we first construct two disease networks; one with disease-protein association and the other with disease-drug information. If two networks are dissimilar, in a sense that the edge distribution of a disease node differ, it indicates high potential for repositioning new candidate drugs for that disease. The Kullback-Leibler divergence is employed to measure difference of connections in two constructed disease networks. Lastly, we perform repositioning of drugs to the top 20% ranked diseases. RESULTS: The results showed that F-measure of the proposed method was 0.75, outperforming 0.5 of greedy searching for the entire diseases. For the utility of the proposed method, it was applied to dementia and verified 75% accuracy for repositioned drugs assuming that there are not any known drugs to be used for dementia. CONCLUSION: This research has novelty in that it discovers drugs with high potential of repositioning based on disease networks with the quantitative measure. Through the study, it is expected to produce profound insights for possibility of undiscovered drug repositioning.


Subject(s)
Drug Repositioning/methods , Knowledge Discovery/methods , Machine Learning , Computational Biology , Data Mining , Disease , Drug Therapy , Humans , Proteins/metabolism
16.
BMC Med Inform Decis Mak ; 16 Suppl 1: 57, 2016 07 18.
Article in English | MEDLINE | ID: mdl-27455071

ABSTRACT

BACKGROUND: The volume of research published in the biomedical domain has increasingly lead to researchers focussing on specific areas of interest and connections between findings being missed. Literature based discovery (LBD) attempts to address this problem by searching for previously unnoticed connections between published information (also known as "hidden knowledge"). A common approach is to identify hidden knowledge via shared linking terms. However, biomedical documents are highly ambiguous which can lead LBD systems to over generate hidden knowledge by hypothesising connections through different meanings of linking terms. Word Sense Disambiguation (WSD) aims to resolve ambiguities in text by identifying the meaning of ambiguous terms. This study explores the effect of WSD accuracy on LBD performance. METHODS: An existing LBD system is employed and four approaches to WSD of biomedical documents integrated with it. The accuracy of each WSD approach is determined by comparing its output against a standard benchmark. Evaluation of the LBD output is carried out using timeslicing approach, where hidden knowledge is generated from articles published prior to a certain cutoff date and a gold standard extracted from publications after the cutoff date. RESULTS: WSD accuracy varies depending on the approach used. The connection between the performance of the LBD and WSD systems are analysed to reveal a correlation between WSD accuracy and LBD performance. CONCLUSION: This study reveals that LBD performance is sensitive to WSD accuracy. It is therefore concluded that WSD has the potential to improve the output of LBD systems by reducing the amount of spurious hidden knowledge that is generated. It is also suggested that further improvements in WSD accuracy have the potential to improve LBD accuracy.


Subject(s)
Biomedical Research , Data Mining/methods , Knowledge Discovery/methods , MEDLINE , Humans
17.
J Biomed Inform ; 54: 141-57, 2015 Apr.
Article in English | MEDLINE | ID: mdl-25661592

ABSTRACT

BACKGROUND: Literature-based discovery (LBD) is characterized by uncovering hidden associations in non-interacting scientific literature. Prior approaches to LBD include use of: (1) domain expertise and structured background knowledge to manually filter and explore the literature, (2) distributional statistics and graph-theoretic measures to rank interesting connections, and (3) heuristics to help eliminate spurious connections. However, manual approaches to LBD are not scalable and purely distributional approaches may not be sufficient to obtain insights into the meaning of poorly understood associations. While several graph-based approaches have the potential to elucidate associations, their effectiveness has not been fully demonstrated. A considerable degree of a priori knowledge, heuristics, and manual filtering is still required. OBJECTIVES: In this paper we implement and evaluate a context-driven, automatic subgraph creation method that captures multifaceted complex associations between biomedical concepts to facilitate LBD. Given a pair of concepts, our method automatically generates a ranked list of subgraphs, which provide informative and potentially unknown associations between such concepts. METHODS: To generate subgraphs, the set of all MEDLINE articles that contain either of the two specified concepts (A, C) are first collected. Then binary relationships or assertions, which are automatically extracted from the MEDLINE articles, called semantic predications, are used to create a labeled directed predications graph. In this predications graph, a path is represented as a sequence of semantic predications. The hierarchical agglomerative clustering (HAC) algorithm is then applied to cluster paths that are bounded by the two concepts (A, C). HAC relies on implicit semantics captured through Medical Subject Heading (MeSH) descriptors, and explicit semantics from the MeSH hierarchy, for clustering. Paths that exceed a threshold of semantic relatedness are clustered into subgraphs based on their shared context. Finally, the automatically generated clusters are provided as a ranked list of subgraphs. RESULTS: The subgraphs generated using this approach facilitated the rediscovery of 8 out of 9 existing scientific discoveries. In particular, they directly (or indirectly) led to the recovery of several intermediates (or B-concepts) between A- and C-terms, while also providing insights into the meaning of the associations. Such meaning is derived from predicates between the concepts, as well as the provenance of the semantic predications in MEDLINE. Additionally, by generating subgraphs on different thematic dimensions (such as Cellular Activity, Pharmaceutical Treatment and Tissue Function), the approach may enable a broader understanding of the nature of complex associations between concepts. Finally, in a statistical evaluation to determine the interestingness of the subgraphs, it was observed that an arbitrary association is mentioned in only approximately 4 articles in MEDLINE on average. CONCLUSION: These results suggest that leveraging the implicit and explicit semantics provided by manually assigned MeSH descriptors is an effective representation for capturing the underlying context of complex associations, along multiple thematic dimensions in LBD situations.


Subject(s)
Cluster Analysis , Data Mining/methods , Knowledge Discovery/methods , Algorithms , Databases, Factual , Humans , Medical Subject Headings , Models, Theoretical , Semantics
19.
Comput Biol Med ; 179: 108920, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39047506

ABSTRACT

This study introduces RheumaLinguisticpack (RheumaLpack), the first specialised linguistic web corpus designed for the field of musculoskeletal disorders. By combining web mining (i.e., web scraping) and natural language processing (NLP) techniques, as well as clinical expertise, RheumaLpack systematically captures and curates structured and unstructured data across a spectrum of web sources including clinical trials registers (i.e., ClinicalTrials.gov), bibliographic databases (i.e., PubMed), medical agencies (i.e. European Medicines Agency), social media (i.e., Reddit), and accredited health websites (i.e., MedlinePlus, Harvard Health Publishing, and Cleveland Clinic). Given the complexity of rheumatic and musculoskeletal diseases (RMDs) and their significant impact on quality of life, this resource can be proposed as a useful tool to train algorithms that could mitigate the diseases' effects. Therefore, the corpus aims to improve the training of artificial intelligence (AI) algorithms and facilitate knowledge discovery in RMDs. The development of RheumaLpack involved a systematic six-step methodology covering data identification, characterisation, selection, collection, processing, and corpus description. The result is a non-annotated, monolingual, and dynamic corpus, featuring almost 3 million records spanning from 2000 to 2023. RheumaLpack represents a pioneering contribution to rheumatology research, providing a useful resource for the development of advanced AI and NLP applications. This corpus highlights the value of web data to address the challenges posed by musculoskeletal diseases, illustrating the corpus's potential to improve research and treatment paradigms in rheumatology. Finally, the methodology shown can be replicated to obtain data from other medical specialities. The code and details on how to build RheumaLpack are also provided to facilitate the dissemination of such resource.


Subject(s)
Natural Language Processing , Rheumatology , Humans , Internet , Data Mining/methods , Knowledge Discovery/methods , Musculoskeletal Diseases
20.
Comput Biol Med ; 176: 108525, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38749322

ABSTRACT

Deep neural networks have become increasingly popular for analyzing ECG data because of their ability to accurately identify cardiac conditions and hidden clinical factors. However, the lack of transparency due to the black box nature of these models is a common concern. To address this issue, explainable AI (XAI) methods can be employed. In this study, we present a comprehensive analysis of post-hoc XAI methods, investigating the glocal (aggregated local attributions over multiple samples) and global (concept based XAI) perspectives. We have established a set of sanity checks to identify saliency as the most sensible attribution method. We provide a dataset-wide analysis across entire patient subgroups, which goes beyond anecdotal evidence, to establish the first quantitative evidence for the alignment of model behavior with cardiologists' decision rules. Furthermore, we demonstrate how these XAI techniques can be utilized for knowledge discovery, such as identifying subtypes of myocardial infarction. We believe that these proposed methods can serve as building blocks for a complementary assessment of the internal validity during a certification process, as well as for knowledge discovery in the field of ECG analysis.


Subject(s)
Deep Learning , Electrocardiography , Electrocardiography/methods , Humans , Knowledge Discovery/methods , Neural Networks, Computer , Signal Processing, Computer-Assisted
SELECTION OF CITATIONS
SEARCH DETAIL