Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 20 de 238
Filtrar
Más filtros

Tipo del documento
Publication year range
1.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36631408

RESUMEN

The gut microbial communities are highly plastic throughout life, and the human gut microbial communities show spatial-temporal dynamic patterns at different life stages. However, the underlying association between gut microbial communities and time-related factors remains unclear. The lack of context-awareness, insufficient data, and the existence of batch effect are the three major issues, making the life trajection of the host based on gut microbial communities problematic. Here, we used a novel computational approach (microDELTA, microbial-based deep life trajectory) to track longitudinal human gut microbial communities' alterations, which employs transfer learning for context-aware mining of gut microbial community dynamics at different life stages. Using an infant cohort, we demonstrated that microDELTA outperformed Neural Network for accurately predicting the age of infant with different delivery mode, especially for newborn infants of vaginal delivery with the area under the receiver operating characteristic curve of microDELTA and Neural Network at 0.811 and 0.436, respectively. In this context, we have discovered the influence of delivery mode on infant gut microbial communities. Along the human lifespan, we also applied microDELTA to a Chinese traveler cohort, a Hadza hunter-gatherer cohort and an elderly cohort. Results revealed the association between long-term dietary shifts during travel and adult gut microbial communities, the seasonal cycling of gut microbial communities for the Hadza hunter-gatherers, and the distinctive microbial pattern of elderly gut microbial communities. In summary, microDELTA can largely solve the issues in tracing the life trajectory of the human microbial communities and generate accurate and flexible models for a broad spectrum of microbial-based longitudinal researches.


Asunto(s)
Aprendizaje Profundo , Microbioma Gastrointestinal , Microbiota , Recién Nacido , Lactante , Femenino , Humanos , Anciano , Dieta
2.
Brief Bioinform ; 23(2)2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35091743

RESUMEN

With the rapid accumulation of microbiome data around the world, numerous computational bioinformatics methods have been developed for pattern mining from such paramount microbiome data. Current microbiome data mining methods, such as gene and species mining, rely heavily on sequence comparison. Most of these methods, however, have a clear trade-off, particularly, when it comes to big-data analytical efficiency and accuracy. Microbiome entities are usually organized in ontology structures, and pattern mining methods that have considered ontology structures could offer advantages in mining efficiency and accuracy. Here, we have summarized the ontology-aware neural network (ONN) as a novel framework for microbiome data mining. We have discussed the applications of ONN in multiple contexts, including gene mining, species mining and microbial community dynamic pattern mining. We have then highlighted one of the most important characteristics of ONN, namely, novel knowledge discovery, which makes ONN a standout among all microbiome data mining methods. Finally, we have provided several applications to showcase the advantage of ONN over other methods in microbiome data mining. In summary, ONN represents a paradigm shift for pattern mining from microbiome data: from traditional machine learning approach to ontology-aware and model-based approach, which has found its broad application scenarios in microbiome data mining.


Asunto(s)
Minería de Datos , Microbiota , Biología Computacional , Minería de Datos/métodos , Aprendizaje Automático , Redes Neurales de la Computación
3.
Brief Bioinform ; 23(6)2022 11 19.
Artículo en Inglés | MEDLINE | ID: mdl-36124759

RESUMEN

Microbial community classification enables identification of putative type and source of the microbial community, thus facilitating a better understanding of how the taxonomic and functional structure were developed and maintained. However, previous classification models required a trade-off between speed and accuracy, and faced difficulties to be customized for a variety of contexts, especially less studied contexts. Here, we introduced EXPERT based on transfer learning that enabled the classification model to be adaptable in multiple contexts, with both high efficiency and accuracy. More importantly, we demonstrated that transfer learning can facilitate microbial community classification in diverse contexts, such as classification of microbial communities for multiple diseases with limited number of samples, as well as prediction of the changes in gut microbiome across successive stages of colorectal cancer. Broadly, EXPERT enables accurate and context-aware customized microbial community classification, and potentiates novel microbial knowledge discovery.


Asunto(s)
Microbioma Gastrointestinal , Microbiota , Aprendizaje , Aprendizaje Automático
4.
BMC Med Inform Decis Mak ; 24(1): 165, 2024 Jun 13.
Artículo en Inglés | MEDLINE | ID: mdl-38872146

RESUMEN

BACKGROUND: Pattern mining techniques are helpful tools when extracting new knowledge in real practice, but the overwhelming number of patterns is still a limiting factor in the health-care domain. Current efforts concerning the definition of measures of interest for patterns are focused on reducing the number of patterns and quantifying their relevance (utility/usefulness). However, although the temporal dimension plays a key role in medical records, few efforts have been made to extract temporal knowledge about the patient's evolution from multivariate sequential patterns. METHODS: In this paper, we propose a method to extract a new type of patterns in the clinical domain called Jumping Diagnostic Odds Ratio Sequential Patterns (JDORSP). The aim of this method is to employ the odds ratio to identify a concise set of sequential patterns that represent a patient's state with a statistically significant protection factor (i.e., a pattern associated with patients that survive) and those extensions whose evolution suddenly changes the patient's clinical state, thus making the sequential patterns a statistically significant risk factor (i.e., a pattern associated with patients that do not survive), or vice versa. RESULTS: The results of our experiments highlight that our method reduces the number of sequential patterns obtained with state-of-the-art pattern reduction methods by over 95%. Only by achieving this drastic reduction can medical experts carry out a comprehensive clinical evaluation of the patterns that might be considered medical knowledge regarding the temporal evolution of the patients. We have evaluated the surprisingness and relevance of the sequential patterns with clinicians, and the most interesting fact is the high surprisingness of the extensions of the patterns that become a protection factor, that is, the patients that recover after several days of being at high risk of dying. CONCLUSIONS: Our proposed method with which to extract JDORSP generates a set of interpretable multivariate sequential patterns with new knowledge regarding the temporal evolution of the patients. The number of patterns is greatly reduced when compared to those generated by other methods and measures of interest. An additional advantage of this method is that it does not require any parameters or thresholds, and that the reduced number of patterns allows a manual evaluation.


Asunto(s)
Minería de Datos , Humanos , Oportunidad Relativa , Minería de Datos/métodos , Factores de Tiempo , Reconocimiento de Normas Patrones Automatizadas , Atención a la Salud , Registros Electrónicos de Salud
5.
Telemed J E Health ; 30(5): 1378-1393, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38153985

RESUMEN

Introduction: Telemedicine, which is the provision of remote clinical services via telecommunication technology, has undergone an upsurge since the COVID-19 pandemic. To capture this paradigm, this study surveyed telemedicine literature, including postpandemic publications, to identify dominant research themes and temporal trends and suggest directions for future research. Methods: A corpus of 56,445 telemedicine studies is sourced from PubMed. Latent Dirichlet allocation (LDA) topic modeling performed using the Konstanz Information Miner platform. The textual data for topic modeling were processed by following standard procedures for natural language processing. Moreover, the term frequency-inverse document frequency approach was used to capture the importance of words within the corpus. We assessed perplexity, coherence, and the elbow method to determine the optimal number of topics for modeling. Results: The findings confirm the surge in telemedicine research after 2020, signifying its prominence. LDA topic modeling reveals seven distinct research themes, with the most prominent topic being "patient satisfaction" (21.38%) followed by "perspectives and challenges" (17.95%), and "smartphone apps" (14.32%). Furthermore, the results demonstrate a noticeable shift in topics from screening to therapeutic applications of telemedicine. Conclusions: This study serves as a guide for a broad range of telemedicine research topics. This synthesis of themes reflects the commitment of scholars to address the changing dynamics and health care needs, such as the COVID-19 pandemic, aging in place, smartphone usage, and technological advancement. The analysis also reveals flexible research responses to policy and contextual shifts, highlighting the collective drive to broaden the application of telemedicine in community health care.


Asunto(s)
COVID-19 , Telemedicina , Telemedicina/organización & administración , Humanos , COVID-19/epidemiología , SARS-CoV-2 , Pandemias , Satisfacción del Paciente
6.
Angew Chem Int Ed Engl ; 63(14): e202317978, 2024 Apr 02.
Artículo en Inglés | MEDLINE | ID: mdl-38357744

RESUMEN

Nanoparticle (NP) characterization is essential because diverse shapes, sizes, and morphologies inevitably occur in as-synthesized NP mixtures, profoundly impacting their properties and applications. Currently, the only technique to concurrently determine these structural parameters is electron microscopy, but it is time-intensive and tedious. Here, we create a three-dimensional (3D) NP structural space to concurrently determine the purity, size, and shape of 1000 sets of as-synthesized Ag nanocubes mixtures containing interfering nanospheres and nanowires from their extinction spectra, attaining low predictive errors at 2.7-7.9 %. We first use plasmonically-driven feature enrichment to extract localized surface plasmon resonance attributes from spectra and establish a lasso regressor (LR) model to predict purity, size, and shape. Leveraging the learned LR, we artificially generate 425,592 augmented extinction spectra to overcome data scarcity and create a comprehensive NP structural space to bidirectionally predict extinction spectra from structural parameters with <4 % error. Our interpretable NP structural space further elucidates the two higher-order combined electric dipole, quadrupole, and magnetic dipole as the critical structural parameter predictors. By incorporating other NP shapes and mixtures' extinction spectra, we anticipate our approach, especially the data augmentation, can create a fully generalizable NP structural space to drive on-demand, autonomous synthesis-characterization platforms.

7.
Exp Mol Pathol ; 132-133: 104867, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37634863

RESUMEN

Mast cells (MCs) are tissue-resident innate immune cells that express the high-affinity receptor for immunoglobulin E and are responsible for host defense and an array of diseases related to immune system. We aimed in this study to characterize the pathways and gene signatures of human cord blood-derived MCs (hCBMCs) in comparison to cells originating from CD34- progenitors using next-generation knowledge discovery methods. CD34+ cells were isolated from human umbilical cord blood using magnetic activated cell sorting and differentiated into MCs with rhIL-6 and rhSCF supplementation for 6-8 weeks. The purity of hCBMCs was analyzed by flow cytometry exhibiting the surface markers CD117+CD34-CD45-CD23-FcεR1αdim. Total RNA from hCBMCs and CD34- cells were isolated and hybridized using microarray. Differentially expressed genes were analyzed using iPathway Guide and Pre-Ranked Gene Set Enrichment Analysis. Next-generation knowledge discovery platforms revealed MC-specific gene signatures and molecular pathways enriched in hCBMCs and pertain the immunological response repertoire.


Asunto(s)
Sangre Fetal , Mastocitos , Humanos , Descubrimiento del Conocimiento , Antígenos CD34/genética , Diferenciación Celular/genética
8.
J Biomed Inform ; 140: 104344, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36940896

RESUMEN

Understanding the actual work (i.e., "work-as-done") rather than theorized work (i.e., "work-as-imagined") during complex medical processes is critical for developing approaches that improve patient outcomes. Although process mining has been used to discover process models from medical activity logs, it often omits critical steps or produces cluttered and unreadable models. In this paper, we introduce a TraceAlignment-based ProcessDiscovery method called TAD Miner to build interpretable process models for complex medical processes. TAD Miner creates simple linear process models using a threshold metric that optimizes the consensus sequence to represent the backbone process, and then identifies both concurrent activities and uncommon-but-critical activities to represent the side branches. TAD Miner also identifies the locations of repeated activities, an essential feature for representing medical treatment steps. We conducted a study using activity logs of 308 pediatric trauma resuscitations to develop and evaluate TAD Miner. TAD Miner was used to discover process models for five resuscitation goals, including establishing intravenous (IV) access, administering non-invasive oxygenation, performing back assessment, administering blood transfusion, and performing intubation. We quantitively evaluated the process models with several complexity and accuracy metrics, and performed qualitative evaluation with four medical experts to assess the accuracy and interpretability of the discovered models. Through these evaluations, we compared the performance of our method to that of two state-of-the-art process discovery algorithms: Inductive Miner and Split Miner. The process models discovered by TAD Miner had lower complexity and better interpretability than the state-of-the-art methods, and the fitness and precision of the models were comparable. We used the TAD process models to identify (1) the errors and (2)the best locations for the tentative steps in knowledge-driven expert models. The knowledge-driven models were revised based on the modifications suggested by the discovered models. The improved modeling using TAD Miner may enhance understanding of complex medical processes.


Asunto(s)
Algoritmos , Resucitación , Humanos , Niño , Resucitación/métodos , Registros
9.
J Biomed Inform ; 143: 104362, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37146741

RESUMEN

Scientific literature presents a wealth of information yet to be explored. As the number of researchers increase with each passing year and publications are released, this contributes to an era where specialized fields of research are becoming more prevalent. As this trend continues, this further propagates the separation of interdisciplinary publications and makes keeping up to date with literature a laborious task. Literature-based discovery (LBD) aims to mitigate these concerns by promoting information sharing among non-interacting literature while extracting potentially meaningful information. Furthermore, recent advances in neural network architectures and data representation techniques have fueled their respective research communities in achieving state-of-the-art performance in many downstream tasks. However, studies of neural network-based methods for LBD remain to be explored. We introduce and explore a deep learning neural network-based approach for LBD. Additionally, we investigate various approaches to represent terms as concepts and analyze the affect of feature scaling representations into our model. We compare the evaluation performance of our method on five hallmarks of cancer datasets utilized for closed discovery. Our results show the chosen representation as input into our model affects evaluation performance. We found feature scaling our input representations increases evaluation performance and decreases the necessary number of epochs needed to achieve model generalization. We also explore two approaches to represent model output. We found reducing the model's output to capturing a subset of concepts improved evaluation performance at the cost of model generalizability. We also compare the efficacy of our method on the five hallmarks of cancer datasets to a set of randomly chosen relations between concepts. We found these experiments confirm our method's suitability for LBD.


Asunto(s)
Aprendizaje Profundo , Neoplasias , Humanos , Redes Neurales de la Computación , Descubrimiento del Conocimiento/métodos , Publicaciones
10.
J Med Internet Res ; 25: e48115, 2023 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-37632414

RESUMEN

BACKGROUND: Biomedical relation extraction (RE) is of great importance for researchers to conduct systematic biomedical studies. It not only helps knowledge mining, such as knowledge graphs and novel knowledge discovery, but also promotes translational applications, such as clinical diagnosis, decision-making, and precision medicine. However, the relations between biomedical entities are complex and diverse, and comprehensive biomedical RE is not yet well established. OBJECTIVE: We aimed to investigate and improve large-scale RE with diverse relation types and conduct usability studies with application scenarios to optimize biomedical text mining. METHODS: Data sets containing 125 relation types with different entity semantic levels were constructed to evaluate the impact of entity semantic information on RE, and performance analysis was conducted on different model architectures and domain models. This study also proposed a continued pretraining strategy and integrated models with scripts into a tool. Furthermore, this study applied RE to the COVID-19 corpus with article topics and application scenarios of clinical interest to assess and demonstrate its biological interpretability and usability. RESULTS: The performance analysis revealed that RE achieves the best performance when the detailed semantic type is provided. For a single model, PubMedBERT with continued pretraining performed the best, with an F1-score of 0.8998. Usability studies on COVID-19 demonstrated the interpretability and usability of RE, and a relation graph database was constructed, which was used to reveal existing and novel drug paths with edge explanations. The models (including pretrained and fine-tuned models), integrated tool (Docker), and generated data (including the COVID-19 relation graph database and drug paths) have been made publicly available to the biomedical text mining community and clinical researchers. CONCLUSIONS: This study provided a comprehensive analysis of RE with diverse relation types. Optimized RE models and tools for diverse relation types were developed, which can be widely used in biomedical text mining. Our usability studies provided a proof-of-concept demonstration of how large-scale RE can be leveraged to facilitate novel research.


Asunto(s)
COVID-19 , Humanos , Minería de Datos , Bases de Datos Factuales , Conocimiento , Medicina de Precisión
11.
IEEE Trans Knowl Data Eng ; 35(2): 1402-1420, 2023 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-36798878

RESUMEN

Shortened time to knowledge discovery and adapting prior domain knowledge is a challenge for computational and data-intensive communities such as e.g., bioinformatics and neuroscience. The challenge for a domain scientist lies in the actions to obtain guidance through query of massive information from diverse text corpus comprising of a wide-ranging set of topics when: investigating new methods, developing new tools, or integrating datasets. In this paper, we propose a novel "domain-specific topic model" (DSTM) to discover latent knowledge patterns about relationships among research topics, tools and datasets from exemplary scientific domains. Our DSTM is a generative model that extends the Latent Dirichlet Allocation (LDA) model and uses the Markov chain Monte Carlo (MCMC) algorithm to infer latent patterns within a specific domain in an unsupervised manner. We apply our DSTM to large collections of data from bioinformatics and neuroscience domains that include more than 25,000 of papers over the last ten years, featuring hundreds of tools and datasets that are commonly used in relevant studies. Evaluation experiments based on generalization and information retrieval metrics show that our model has better performance than the state-of-the-art baseline models for discovering highly-specific latent topics within a domain. Lastly, we demonstrate applications that benefit from our DSTM to discover intra-domain, cross-domain and trend knowledge patterns.

12.
Soc Sci Res ; 110: 102817, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36796993

RESUMEN

The interdisciplinary field of knowledge discovery and data mining emerged from a necessity of big data requiring new analytical methods beyond the traditional statistical approaches to discover new knowledge from the data mine. This emergent approach is a dialectic research process that is both deductive and inductive. The data mining approach automatically or semi-automatically considers a larger number of joint, interactive, and independent predictors to address causal heterogeneity and improve prediction. Instead of challenging the conventional model-building approach, it plays an important complementary role in improving model goodness of fit, revealing valid and significant hidden patterns in data, identifying nonlinear and non-additive effects, providing insights into data developments, methods, and theory, and enriching scientific discovery. Machine learning builds models and algorithms by learning and improving from data when the explicit model structure is unclear and algorithms with good performance are difficult to attain. The most recent development is to incorporate this new paradigm of predictive modeling with the classical approach of parameter estimation regressions to produce improved models that combine explanation and prediction.


Asunto(s)
Minería de Datos , Descubrimiento del Conocimiento , Humanos , Minería de Datos/métodos , Aprendizaje Automático
13.
Pak J Med Sci ; 39(2): 423-429, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36950431

RESUMEN

Objectives: Accurately identifying the cellular, biomolecular, and toxicological functions of anticancer drugs help to decipher the potential risk of genotoxicity and other side effects. Here, we examined bleomycin for cellular, molecular and toxicological mechanisms using next-generation knowledge discovery (NGKD) tools. Methods: This study was conducted at the Faculty of Applied Medical Sciences, King Abdulaziz University (KAU), Jeddah, Saudi Arabia in October 2022. We first analyzed the raw Toxicogenomic and DNA damage-inducing (TGx-DDI) gene expression data from Gene Expression Omnibus (GEO) (GSE196373) of TK6 cells treated with 10 µM bleomycin and TK6 cells treated with DMSO for four hours using the GEO2R tool based on the Linear Models for Microarray Analysis (limma) R packages to derive the differentially expressed genes (DEGs). Then, iPathwayGuide was used to determine differentially regulated signaling pathways, biological processes, cellular, molecular functions and upstream regulators (genes and miRNAs). Results: Bleomycin differently regulates the p53 pathway, transcriptional dysregulation in cancer, FOXO pathway, viral carcinogenesis, and cancer pathways. The biological processes such as p53 class mediator signaling, intrinsic apoptotic signaling, DNA damage response, and DNA damage-induced intrinsic apoptotic signaling and molecular functions like ubiquitin protein transferase and p53 binding were differentially regulated by bleomycin. iPathwayGuide analysis showed that the p53 and its regulatory gene and microRNA networks induced by bleomycin. Conclusion: Analysis of TGx-DDI data of bleomycin using NGKD tools provided information about toxicogenomics and other mechanisms. Integration of all "omics" based approaches is crucial for the development of translatable biomarkers for evaluating anticancer drugs for safety and efficacy.

14.
Zhongguo Zhong Yao Za Zhi ; 48(6): 1682-1690, 2023 Mar.
Artículo en Zh | MEDLINE | ID: mdl-37005856

RESUMEN

This study aimed to explore the underlying framework and data characteristics of Tibetan prescription information. The information on Tibetan medicine prescriptions was collected based on 11 Tibetan medicine classics, such as Four Medical Canons(Si Bu Yi Dian). The optimal classification method was used to summarize the information structure of Tibetan medicine prescriptions and sort out the key problems and solutions in data collection, standardization, translation, and analysis. A total of 11 316 prescriptions were collected, involving 139 011 entries and 63 567 pieces of efficacy information of drugs in prescriptions. The information on Tibe-tan medicine prescriptions could be summarized into a "seven-in-one" framework of "serial number-source-name-composition-efficacy-appendix-remarks" and 18 expansion layers, which contained all information related to the inheritance, processing, origin, dosage, semantics, etc. of prescriptions. Based on the framework, this study proposed a "historical timeline" method for mining the origin of prescription inheritance, a "one body and five layers" method for formulating prescription drug specifications, a "link-split-link" method for constructing efficacy information, and an advanced algorithm suitable for the research of Tibetan prescription knowledge discovery. Tibetan medicine prescriptions have obvious characteristics and advantages under the guidance of the theories of "three factors", "five sources", and "Ro-nus-zhu-rjes" of Tibetan medicine. Based on the characteristics of Tibetan medicine prescriptions, this study proposed a multi-level and multi-attribute underlying data architecture, providing new methods and models for the construction of Tibetan medicine prescription information database and knowledge discovery and improving the consistency and interoperability of Tibetan medicine prescription information with standards at all levels, which is expected to realize the "ancient and modern connection-cleaning up the source-data sharing", so as to promote the informatization and modernization research path of Tibetan medicine prescriptions.


Asunto(s)
Medicamentos Herbarios Chinos , Medicina Tradicional Tibetana , Descubrimiento del Conocimiento , Prescripciones de Medicamentos , Bases de Datos Factuales , Algoritmos , Medicina Tradicional China , Medicamentos Herbarios Chinos/uso terapéutico
15.
BMC Bioinformatics ; 23(1): 351, 2022 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-35996085

RESUMEN

BACKGROUND: Integration of multi-omics data can provide a more complex view of the biological system consisting of different interconnected molecular components, the crucial aspect for developing novel personalised therapeutic strategies for complex diseases. Various tools have been developed to integrate multi-omics data. However, an efficient multi-omics framework for regulatory network inference at the genome level that incorporates prior knowledge is still to emerge. RESULTS: We present IntOMICS, an efficient integrative framework based on Bayesian networks. IntOMICS systematically analyses gene expression, DNA methylation, copy number variation and biological prior knowledge to infer regulatory networks. IntOMICS complements the missing biological prior knowledge by so-called empirical biological knowledge, estimated from the available experimental data. Regulatory networks derived from IntOMICS provide deeper insights into the complex flow of genetic information on top of the increasing accuracy trend compared to a published algorithm designed exclusively for gene expression data. The ability to capture relevant crosstalks between multi-omics modalities is verified using known associations in microsatellite stable/instable colon cancer samples. Additionally, IntOMICS performance is compared with two algorithms for multi-omics regulatory network inference that can also incorporate prior knowledge in the inference framework. IntOMICS is also applied to detect potential predictive biomarkers in microsatellite stable stage III colon cancer samples. CONCLUSIONS: We provide IntOMICS, a framework for multi-omics data integration using a novel approach to biological knowledge discovery. IntOMICS is a powerful resource for exploratory systems biology and can provide valuable insights into the complex mechanisms of biological processes that have a vital role in personalised medicine.


Asunto(s)
Neoplasias del Colon , Variaciones en el Número de Copia de ADN , Algoritmos , Teorema de Bayes , Redes Reguladoras de Genes , Humanos , Biología de Sistemas/métodos
16.
Malar J ; 21(1): 232, 2022 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-35915484

RESUMEN

BACKGROUND: Data integration and visualisation techniques have been widely used in scientific research to allow the exploitation of large volumes of data and support highly complex or long-lasting research questions. Integration allows data from different sources to be aggregated into a single database comprising variables of interest for different types of studies. Visualisation allows large and complex data sets to be manipulated and interpreted in a more intuitive way. METHODS: Integration and visualisation techniques were applied in a malaria surveillance ecosystem to build an integrated database comprising notifications, deaths, vector control and climate data. This database is accessed through Malaria-VisAnalytics, a visual mining platform for descriptive and predictive analysis supporting decision and policy-making by governmental and health agents. RESULTS: Experimental and validation results have proved that the visual exploration and interaction mechanisms allow effective surveillance for rapid action in suspected outbreaks, as well as support a set of different research questions over integrated malaria electronic health records. CONCLUSION: The integrated database and the visual mining platform (Malaria-VisAnalytics) allow different types of users to explore malaria-related data in a user-friendly interface. Summary data and key insights can be obtained through different techniques and dimensions. The case study on Manaus can serve as a reference for future replication in other municipalities. Finally, both the database and the visual mining platform can be extended with new data sources and functionalities to accommodate more complex scenarios (such as real-time data capture and analysis).


Asunto(s)
Ecosistema , Malaria , Brasil/epidemiología , Bases de Datos Factuales , Técnicas de Apoyo para la Decisión , Humanos , Malaria/epidemiología
17.
BMC Infect Dis ; 22(1): 274, 2022 Mar 21.
Artículo en Inglés | MEDLINE | ID: mdl-35313829

RESUMEN

BACKGROUND: Motivated by the need for precise epidemic control and epidemic-resilient urban design, this study aims to reveal the joint and interactive associations between urban socioeconomic, density, connectivity, and functionality characteristics and the COVID-19 spread within a high-density city. Many studies have been made on the associations between urban characteristics and the COVID-19 spread, but there is a scarcity of such studies in the intra-city scale and as regards complex joint and interactive associations by using advanced machine learning approaches. METHODS: Differential-evolution-based association rule mining was used to investigate the joint and interactive associations between the urban characteristics and the spatiotemporal distribution of COVID-19 confirmed cases, at the neighborhood scale in Hong Kong. The associations were comparatively studied for the distribution of the cases in four waves of COVID-19 transmission: before Jun 2020 (wave 1 and 2), Jul-Oct 2020 (wave 3), and Nov 2020-Feb 2021 (wave 4), and for local and imported confirmed cases. RESULTS: The first two waves of COVID-19 were found mainly characterized by higher-socioeconomic-status (SES) imported cases. The third-wave outbreak concentrated in densely populated and usually lower-SES neighborhoods, showing a high risk of within-neighborhood virus transmissions jointly contributed by high density and unfavorable SES. Starting with a super-spread which considerably involved high-SES population, the fourth-wave outbreak showed a stronger link to cross-neighborhood transmissions driven by urban functionality. Then the outbreak diffused to lower-SES neighborhoods and interactively aggravated the within-neighborhood pandemic transmissions. Association was also found between a higher SES and a slightly longer waiting period (i.e., the period from symptom onset to diagnosis of symptomatic cases), which further indicated the potential contribution of higher-SES population to the pandemic transmission. CONCLUSIONS: The results of this study may provide references to developing precise anti-pandemic measures for specific neighborhoods and virus transmission routes. The study also highlights the essentiality of reliving co-locating overcrowdedness and unfavorable SES for developing epidemic-resilient compact cities, and the higher obligation of higher-SES population to conform anti-pandemic policies.


Asunto(s)
COVID-19 , COVID-19/epidemiología , Ciudades/epidemiología , Estudios Transversales , Humanos , Características de la Residencia , Clase Social
18.
J Biomed Inform ; 134: 104169, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-36038065

RESUMEN

Temporal knowledge discovery in clinical problems, is crucial to investigate problems in the data science era. Meaningful progress has been made computationally in the discovery of frequent temporal patterns, which may store potentially meaningful knowledge. However, for temporal knowledge discovery and acquisition, effective visualization is essential and still stores much room for contributions. While visualization of frequent temporal patterns was relatively under researched, it stores meaningful opportunities in facilitating usable ways to assist domain experts, or researchers, in exploring and acquiring temporal knowledge. In this paper, a novel approach for the visualization of an enumeration tree of frequent temporal patterns is introduced for, whether mined from a single population, or for the comparison of patterns that were discovered in two separate populations. While this approach is relevant to any sequence-based patterns, we demonstrate its use on the most complex scenario of time intervals related patterns (TIRPs). The interface enables users to browse an enumeration tree of frequent patterns, or search for specific patterns, as well as discover the most discriminating TIRPs among two populations. For that a novel visualization of the temporal patterns is introduced using a bubble chart, in which each bubble represents a temporal pattern, and the chart axes represent the various metrics of the patterns, such as their frequency, reoccurrence, and more, which provides a fast overview of the patterns as a whole, as well as access specific ones. We present a comprehensive and rigorous user study on two real-life datasets, demonstrating the usability advantages of the novel approaches.


Asunto(s)
Visualización de Datos , Reconocimiento de Normas Patrones Automatizadas , Tiempo
19.
J Biomed Inform ; 135: 104212, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-36182054

RESUMEN

Machine learning is now an essential part of any biomedical study but its integration into real effective Learning Health Systems, including the whole process of Knowledge Discovery from Data (KDD), is not yet realised. We propose an original extension of the KDD process model that involves an inductive database. We designed for the first time a generic model of Inductive Clinical DataBase (ICDB) aimed at hosting both patient data and learned models. We report experiments conducted on patient data in the frame of a project dedicated to fight heart failure. The results show how the ICDB approach allows to identify biomarker combinations, specific and predictive of heart fibrosis phenotype, that put forward hypotheses relative to underlying mechanisms. Two main scenarios were considered, a local-to-global KDD scenario and a trans-cohort alignment scenario. This promising proof of concept enables us to draw the contours of a next-generation Knowledge Discovery Environment (KDE).


Asunto(s)
Minería de Datos , Descubrimiento del Conocimiento , Bases de Datos Factuales
20.
J Biomed Inform ; 131: 104120, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-35709900

RESUMEN

OBJECTIVE: Develop a novel methodology to create a comprehensive knowledge graph (SuppKG) to represent a domain with limited coverage in the Unified Medical Language System (UMLS), specifically dietary supplement (DS) information for discovering drug-supplement interactions (DSI), by leveraging biomedical natural language processing (NLP) technologies and a DS domain terminology. MATERIALS AND METHODS: We created SemRepDS (an extension of an NLP tool, SemRep), capable of extracting semantic relations from abstracts by leveraging a DS-specific terminology (iDISK) containing 28,884 DS terms not found in the UMLS. PubMed abstracts were processed using SemRepDS to generate semantic relations, which were then filtered using a PubMedBERT model to remove incorrect relations before generating SuppKG. Two discovery pathways were applied to SuppKG to identify potential DSIs, which are then compared with an existing DSI database and also evaluated by medical professionals for mechanistic plausibility. RESULTS: SemRepDS returned 158.5% more DS entities and 206.9% more DS relations than SemRep. The fine-tuned PubMedBERT model (significantly outperformed other machine learning and BERT models) obtained an F1 score of 0.8605 and removed 43.86% of semantic relations, improving the precision of the relations by 26.4% over pre-filtering. SuppKG consists of 56,635 nodes and 595,222 directed edges with 2,928 DS-specific nodes and 164,738 edges. Manual review of findings identified 182 of 250 (72.8%) proposed DS-Gene-Drug and 77 of 100 (77%) proposed DS-Gene1-Function-Gene2-Drug pathways to be mechanistically plausible. DISCUSSION: With added DS terminology to the UMLS, SemRepDS has the capability to find more DS-specific semantic relationships from PubMed than SemRep. The utility of the resulting SuppKG was demonstrated using discovery patterns to find novel DSIs. CONCLUSION: For the domain with limited coverage in the traditional terminology (e.g., UMLS), we demonstrated an approach to leverage domain terminology and improve existing NLP tools to generate a more comprehensive knowledge graph for the downstream task. Even this study focuses on DSI, the method may be adapted to other domains.


Asunto(s)
Procesamiento de Lenguaje Natural , Unified Medical Language System , Suplementos Dietéticos , PubMed , Semántica
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda