Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.728
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Am J Hum Genet ; 110(10): 1661-1672, 2023 10 05.
Artigo em Inglês | MEDLINE | ID: mdl-37741276

RESUMO

In the effort to treat Mendelian disorders, correcting the underlying molecular imbalance may be more effective than symptomatic treatment. Identifying treatments that might accomplish this goal requires extensive and up-to-date knowledge of molecular pathways-including drug-gene and gene-gene relationships. To address this challenge, we present "parsing modifiers via article annotations" (PARMESAN), a computational tool that searches PubMed and PubMed Central for information to assemble these relationships into a central knowledge base. PARMESAN then predicts putatively novel drug-gene relationships, assigning an evidence-based score to each prediction. We compare PARMESAN's drug-gene predictions to all of the drug-gene relationships displayed by the Drug-Gene Interaction Database (DGIdb) and show that higher-scoring relationship predictions are more likely to match the directionality (up- versus down-regulation) indicated by this database. PARMESAN had more than 200,000 drug predictions scoring above 8 (as one example cutoff), for more than 3,700 genes. Among these predicted relationships, 210 were registered in DGIdb and 201 (96%) had matching directionality. This publicly available tool provides an automated way to prioritize drug screens to target the most-promising drugs to test, thereby saving time and resources in the development of therapeutics for genetic disorders.


Assuntos
PubMed , Humanos , Bases de Dados Factuais
2.
Mol Cell Proteomics ; 23(1): 100682, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37993103

RESUMO

Global phosphoproteomics experiments quantify tens of thousands of phosphorylation sites. However, data interpretation is hampered by our limited knowledge on functions, biological contexts, or precipitating enzymes of the phosphosites. This study establishes a repository of phosphosites with associated evidence in biomedical abstracts, using deep learning-based natural language processing techniques. Our model for illuminating the dark phosphoproteome through PubMed mining (IDPpub) was generated by fine-tuning BioBERT, a deep learning tool for biomedical text mining. Trained using sentences containing protein substrates and phosphorylation site positions from 3000 abstracts, the IDPpub model was then used to extract phosphorylation sites from all MEDLINE abstracts. The extracted proteins were normalized to gene symbols using the National Center for Biotechnology Information gene query, and sites were mapped to human UniProt sequences using ProtMapper and mouse UniProt sequences by direct match. Precision and recall were calculated using 150 curated abstracts, and utility was assessed by analyzing the CPTAC (Clinical Proteomics Tumor Analysis Consortium) pan-cancer phosphoproteomics datasets and the PhosphoSitePlus database. Using 10-fold cross validation, pairs of correct substrates and phosphosite positions were extracted with an average precision of 0.93 and recall of 0.94. After entity normalization and site mapping to human reference sequences, an independent validation achieved a precision of 0.91 and recall of 0.77. The IDPpub repository contains 18,458 unique human phosphorylation sites with evidence sentences from 58,227 abstracts and 5918 mouse sites in 14,610 abstracts. This included evidence sentences for 1803 sites identified in CPTAC studies that are not covered by manually curated functional information in PhosphoSitePlus. Evaluation results demonstrate the potential of IDPpub as an effective biomedical text mining tool for collecting phosphosites. Moreover, the repository (http://idppub.ptmax.org), which can be automatically updated, can serve as a powerful complement to existing resources.


Assuntos
Mineração de Dados , Processamento de Linguagem Natural , Humanos , Mineração de Dados/métodos , Bases de Dados Factuais , PubMed
3.
J Pathol ; 262(3): 310-319, 2024 03.
Artigo em Inglês | MEDLINE | ID: mdl-38098169

RESUMO

Deep learning applied to whole-slide histopathology images (WSIs) has the potential to enhance precision oncology and alleviate the workload of experts. However, developing these models necessitates large amounts of data with ground truth labels, which can be both time-consuming and expensive to obtain. Pathology reports are typically unstructured or poorly structured texts, and efforts to implement structured reporting templates have been unsuccessful, as these efforts lead to perceived extra workload. In this study, we hypothesised that large language models (LLMs), such as the generative pre-trained transformer 4 (GPT-4), can extract structured data from unstructured plain language reports using a zero-shot approach without requiring any re-training. We tested this hypothesis by utilising GPT-4 to extract information from histopathological reports, focusing on two extensive sets of pathology reports for colorectal cancer and glioblastoma. We found a high concordance between LLM-generated structured data and human-generated structured data. Consequently, LLMs could potentially be employed routinely to extract ground truth data for machine learning from unstructured pathology reports in the future. © 2023 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.


Assuntos
Glioblastoma , Medicina de Precisão , Humanos , Aprendizado de Máquina , Reino Unido
4.
Methods ; 228: 48-54, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38789016

RESUMO

With the rapid advancements in molecular biology and genomics, a multitude of connections between RNA and diseases has been unveiled, making the efficient and accurate extraction of RNA-disease (RD) relationships from extensive biomedical literature crucial for advancing research in this field. This study introduces RDscan, a novel text mining method developed based on the pre-training and fine-tuning strategy, aimed at automatically extracting RD-related information from a vast corpus of literature using pre-trained biomedical large language models (LLM). Initially, we constructed a dedicated RD corpus by manually curating from literature, comprising 2,082 positive and 2,000 negative sentences, alongside an independent test dataset (comprising 500 positive and 500 negative sentences) for training and evaluating RDscan. Subsequently, by fine-tuning the Bioformer and BioBERT pre-trained models, RDscan demonstrated exceptional performance in text classification and named entity recognition (NER) tasks. In 5-fold cross-validation, RDscan significantly outperformed traditional machine learning methods (Support Vector Machine, Logistic Regression and Random Forest). In addition, we have developed an accessible webserver that assists users in extracting RD relationships from text. In summary, RDscan represents the first text mining tool specifically designed for RD relationship extraction, and is poised to emerge as an invaluable tool for researchers dedicated to exploring the intricate interactions between RNA and diseases. Webserver of RDscan is free available at https://cellknowledge.com.cn/RDscan/.


Assuntos
Mineração de Dados , RNA , Mineração de Dados/métodos , RNA/genética , Humanos , Aprendizado de Máquina , Doença/genética , Máquina de Vetores de Suporte , Software
5.
BMC Bioinformatics ; 25(1): 273, 2024 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-39169321

RESUMO

BACKGROUND: There has been a considerable advancement in AI technologies like LLM and machine learning to support biomedical knowledge discovery. MAIN BODY: We propose a novel biomedical neural search service called 'VAIV Bio-Discovery', which supports enhanced knowledge discovery and document search on unstructured text such as PubMed. It mainly handles with information related to chemical compound/drugs, gene/proteins, diseases, and their interactions (chemical compounds/drugs-proteins/gene including drugs-targets, drug-drug, and drug-disease). To provide comprehensive knowledge, the system offers four search options: basic search, entity and interaction search, and natural language search. We employ T5slim_dec, which adapts the autoregressive generation task of the T5 (text-to-text transfer transformer) to the interaction extraction task by removing the self-attention layer in the decoder block. It also assists in interpreting research findings by summarizing the retrieved search results for a given natural language query with Retrieval Augmented Generation (RAG). The search engine is built with a hybrid method that combines neural search with the probabilistic search, BM25. CONCLUSION: As a result, our system can better understand the context, semantics and relationships between terms within the document, enhancing search accuracy. This research contributes to the rapidly evolving biomedical field by introducing a new service to access and discover relevant knowledge.


Assuntos
Processamento de Linguagem Natural , Mineração de Dados/métodos , Descoberta do Conhecimento/métodos , PubMed , Ferramenta de Busca , Aprendizado de Máquina , Armazenamento e Recuperação da Informação/métodos , Redes Neurais de Computação
6.
BMC Bioinformatics ; 25(1): 101, 2024 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-38448845

RESUMO

PURPOSE: The expansion of research across various disciplines has led to a substantial increase in published papers and journals, highlighting the necessity for reliable text mining platforms for database construction and knowledge acquisition. This abstract introduces GPDMiner(Gene, Protein, and Disease Miner), a platform designed for the biomedical domain, addressing the challenges posed by the growing volume of academic papers. METHODS: GPDMiner is a text mining platform that utilizes advanced information retrieval techniques. It operates by searching PubMed for specific queries, extracting and analyzing information relevant to the biomedical field. This system is designed to discern and illustrate relationships between biomedical entities obtained from automated information extraction. RESULTS: The implementation of GPDMiner demonstrates its efficacy in navigating the extensive corpus of biomedical literature. It efficiently retrieves, extracts, and analyzes information, highlighting significant connections between genes, proteins, and diseases. The platform also allows users to save their analytical outcomes in various formats, including Excel and images. CONCLUSION: GPDMiner offers a notable additional functionality among the array of text mining tools available for the biomedical field. This tool presents an effective solution for researchers to navigate and extract relevant information from the vast unstructured texts found in biomedical literature, thereby providing distinctive capabilities that set it apart from existing methodologies. Its application is expected to greatly benefit researchers in this domain, enhancing their capacity for knowledge discovery and data management.


Assuntos
Gerenciamento de Dados , Mineração de Dados , Bases de Dados Factuais , Descoberta do Conhecimento , PubMed
7.
Brief Bioinform ; 23(4)2022 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-35679594

RESUMO

Disease pathogenesis is always a major topic in biomedical research. With the exponential growth of biomedical information, drug effect analysis for specific phenotypes has shown great promise in uncovering disease-associated pathways. However, this method has only been applied to a limited number of drugs. Here, we extracted the data of 4634 diseases, 3671 drugs, 112 809 disease-drug associations and 81 527 drug-gene associations by text mining of 29 168 919 publications. On this basis, we proposed a 'Drug Set Enrichment Analysis by Text Mining (DSEATM)' pipeline and applied it to 3250 diseases, which outperformed the state-of-the-art method. Furthermore, diseases pathways enriched by DSEATM were similar to those obtained using the TCGA cancer RNA-seq differentially expressed genes. In addition, the drug number, which showed a remarkable positive correlation of 0.73 with the AUC, plays a determining role in the performance of DSEATM. Taken together, DSEATM is an auspicious and accurate disease research tool that offers fresh insights.


Assuntos
Pesquisa Biomédica , Mineração de Dados , Mineração de Dados/métodos , Fenótipo
8.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36347537

RESUMO

Target discovery and identification processes are driven by the increasing amount of biomedical data. The vast numbers of unstructured texts of biomedical publications provide a rich source of knowledge for drug target discovery research and demand the development of specific algorithms or tools to facilitate finding disease genes and proteins. Text mining is a method that can automatically mine helpful information related to drug target discovery from massive biomedical literature. However, there is a substantial lag between biomedical publications and the subsequent abstraction of information extracted by text mining to databases. The knowledge graph is introduced to integrate heterogeneous biomedical data. Here, we describe e-TSN (Target significance and novelty explorer, http://www.lilab-ecust.cn/etsn/), a knowledge visualization web server integrating the largest database of associations between targets and diseases from the full scientific literature by constructing significance and novelty scoring methods based on bibliometric statistics. The platform aims to visualize target-disease knowledge graphs to assist in prioritizing candidate disease-related proteins. Approved drugs and associated bioactivities for each interested target are also provided to facilitate the visualization of drug-target relationships. In summary, e-TSN is a fast and customizable visualization resource for investigating and analyzing the intricate target-disease networks, which could help researchers understand the mechanisms underlying complex disease phenotypes and improve the drug discovery and development efficiency, especially for the unexpected outbreak of infectious disease pandemics like COVID-19.


Assuntos
COVID-19 , Humanos , Mineração de Dados/métodos , Publicações , Conhecimento , Algoritmos , Proteínas
9.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36156661

RESUMO

Pre-trained language models have attracted increasing attention in the biomedical domain, inspired by their great success in the general natural language domain. Among the two main branches of pre-trained language models in the general language domain, i.e. BERT (and its variants) and GPT (and its variants), the first one has been extensively studied in the biomedical domain, such as BioBERT and PubMedBERT. While they have achieved great success on a variety of discriminative downstream biomedical tasks, the lack of generation ability constrains their application scope. In this paper, we propose BioGPT, a domain-specific generative Transformer language model pre-trained on large-scale biomedical literature. We evaluate BioGPT on six biomedical natural language processing tasks and demonstrate that our model outperforms previous models on most tasks. Especially, we get 44.98%, 38.42% and 40.76% F1 score on BC5CDR, KD-DTI and DDI end-to-end relation extraction tasks, respectively, and 78.2% accuracy on PubMedQA, creating a new record. Our case study on text generation further demonstrates the advantage of BioGPT on biomedical literature to generate fluent descriptions for biomedical terms.


Assuntos
Mineração de Dados , Processamento de Linguagem Natural
10.
Mol Syst Biol ; 19(5): e11325, 2023 05 09.
Artigo em Inglês | MEDLINE | ID: mdl-36938926

RESUMO

The analysis of omic data depends on machine-readable information about protein interactions, modifications, and activities as found in protein interaction networks, databases of post-translational modifications, and curated models of gene and protein function. These resources typically depend heavily on human curation. Natural language processing systems that read the primary literature have the potential to substantially extend knowledge resources while reducing the burden on human curators. However, machine-reading systems are limited by high error rates and commonly generate fragmentary and redundant information. Here, we describe an approach to precisely assemble molecular mechanisms at scale using multiple natural language processing systems and the Integrated Network and Dynamical Reasoning Assembler (INDRA). INDRA identifies full and partial overlaps in information extracted from published papers and pathway databases, uses predictive models to improve the reliability of machine reading, and thereby assembles individual pieces of information into non-redundant and broadly usable mechanistic knowledge. Using INDRA to create high-quality corpora of causal knowledge we show it is possible to extend protein-protein interaction databases and explain co-dependencies in the Cancer Dependency Map.


Assuntos
Mineração de Dados , Processamento de Linguagem Natural , Humanos , Reprodutibilidade dos Testes , Bases de Dados Factuais
11.
J Sleep Res ; : e14210, 2024 Apr 05.
Artigo em Inglês | MEDLINE | ID: mdl-38577714

RESUMO

This study evaluates the performance of two major artificial intelligence-based tools (ChatGPT-4 and Google Bard) in debunking sleep-related myths. More in detail, the present research assessed 20 sleep misconceptions using a 5-point Likert scale for falseness and public health significance, comparing responses of artificial intelligence tools with expert opinions. The results indicated that Google Bard correctly identified 19 out of 20 statements as false (95.0% accuracy), not differing from ChatGPT-4 (85.0% accuracy, Fisher's exact test p = 0.615). Google Bard's ratings of the falseness of the sleep misconceptions averaged 4.25 ± 0.70, showing a moderately negative skewness (-0.42) and kurtosis (-0.83), and suggesting a distribution with fewer extreme values compared with ChatGPT-4. In assessing public health significance, Google Bard's mean score was 2.4 ± 0.80, with skewness and kurtosis of 0.36 and -0.07, respectively, indicating a more normal distribution compared with ChatGPT-4. The inter-rater agreement between Google Bard and sleep experts had an intra-class correlation coefficient of 0.58 for falseness and 0.69 for public health significance, showing moderate alignment (p = 0.065 and p = 0.014, respectively). Text-mining analysis revealed Google Bard's focus on practical advice, while ChatGPT-4 concentrated on theoretical aspects of sleep. The readability analysis suggested Google Bard's responses were more accessible, aligning with 8th-grade level material, versus ChatGPT-4's 12th-grade level complexity. The study demonstrates the potential of artificial intelligence in public health education, especially in sleep health, and underscores the importance of accurate, reliable artificial intelligence-generated information, calling for further collaboration between artificial intelligence developers, sleep health professionals and educators to enhance the effectiveness of sleep health promotion.

12.
Ann Fam Med ; 22(2): 113-120, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38527823

RESUMO

PURPOSE: Worldwide clinical knowledge is expanding rapidly, but physicians have sparse time to review scientific literature. Large language models (eg, Chat Generative Pretrained Transformer [ChatGPT]), might help summarize and prioritize research articles to review. However, large language models sometimes "hallucinate" incorrect information. METHODS: We evaluated ChatGPT's ability to summarize 140 peer-reviewed abstracts from 14 journals. Physicians rated the quality, accuracy, and bias of the ChatGPT summaries. We also compared human ratings of relevance to various areas of medicine to ChatGPT relevance ratings. RESULTS: ChatGPT produced summaries that were 70% shorter (mean abstract length of 2,438 characters decreased to 739 characters). Summaries were nevertheless rated as high quality (median score 90, interquartile range [IQR] 87.0-92.5; scale 0-100), high accuracy (median 92.5, IQR 89.0-95.0), and low bias (median 0, IQR 0-7.5). Serious inaccuracies and hallucinations were uncommon. Classification of the relevance of entire journals to various fields of medicine closely mirrored physician classifications (nonlinear standard error of the regression [SER] 8.6 on a scale of 0-100). However, relevance classification for individual articles was much more modest (SER 22.3). CONCLUSIONS: Summaries generated by ChatGPT were 70% shorter than mean abstract length and were characterized by high quality, high accuracy, and low bias. Conversely, ChatGPT had modest ability to classify the relevance of articles to medical specialties. We suggest that ChatGPT can help family physicians accelerate review of the scientific literature and have developed software (pyJournalWatch) to support this application. Life-critical medical decisions should remain based on full, critical, and thoughtful evaluation of the full text of research articles in context with clinical guidelines.


Assuntos
Medicina , Humanos , Médicos de Família
13.
BMC Med Res Methodol ; 24(1): 68, 2024 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-38494501

RESUMO

BACKGROUND: The challenging nature of studies with incarcerated populations and other offender groups can impede the conduct of research, particularly that involving complex study designs such as randomised control trials and clinical interventions. Providing an overview of study designs employed in this area can offer insights into this issue and how research quality may impact on health and justice outcomes. METHODS: We used a rule-based approach to extract study designs from a sample of 34,481 PubMed abstracts related to epidemiological criminology published between 1963 and 2023. The results were compared against an accepted hierarchy of scientific evidence. RESULTS: We evaluated our method in a random sample of 100 PubMed abstracts. An F1-Score of 92.2% was returned. Of 34,481 study abstracts, almost 40.0% (13,671) had an extracted study design. The most common study design was observational (37.3%; 5101) while experimental research in the form of trials (randomised, non-randomised) was present in 16.9% (2319). Mapped against the current hierarchy of scientific evidence, 13.7% (1874) of extracted study designs could not be categorised. Among the remaining studies, most were observational (17.2%; 2343) followed by systematic reviews (10.5%; 1432) with randomised controlled trials accounting for 8.7% (1196) of studies and meta-analysis for 1.4% (190) of studies. CONCLUSIONS: It is possible to extract epidemiological study designs from a large-scale PubMed sample computationally. However, the number of trials, systematic reviews, and meta-analysis is relatively small - just 1 in 5 articles. Despite an increase over time in the total number of articles, study design details in the abstracts were missing. Epidemiological criminology still lacks the experimental evidence needed to address the health needs of the marginalized and isolated population that is prisoners and offenders.


Assuntos
Criminosos , Prisioneiros , Humanos , Mineração de Dados , Projetos de Pesquisa
14.
J Urban Health ; 101(2): 327-343, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38466494

RESUMO

Understanding how outdoor environments affect mental health outcomes is vital in today's fast-paced and urbanized society. Recently, advancements in data-gathering technologies and deep learning have facilitated the study of the relationship between the outdoor environment and human perception. In a systematic review, we investigate how deep learning techniques can shed light on a better understanding of the influence of outdoor environments on human perceptions and emotions, with an emphasis on mental health outcomes. We have systematically reviewed 40 articles published in SCOPUS and the Web of Science databases which were the published papers between 2016 and 2023. The study presents and utilizes a novel topic modeling method to identify coherent keywords. By extracting the top words of each research topic, and identifying the current topics, we indicate that current studies are classified into three areas. The first topic was "Urban Perception and Environmental Factors" where the studies aimed to evaluate perceptions and mental health outcomes. Within this topic, the studies were divided based on human emotions, mood, stress, and urban features impacts. The second topic was titled "Data Analysis and Urban Imagery in Modeling" which focused on refining deep learning techniques, data collection methods, and participants' variability to understand human perceptions more accurately. The last topic was named "Greenery and visual exposure in urban spaces" which focused on the impact of the amount and the exposure of green features on mental health and perceptions. Upon reviewing the papers, this study provides a guide for subsequent research to enhance the view of using deep learning techniques to understand how urban environments influence mental health. It also provides various suggestions that should be taken into account when planning outdoor spaces.


Assuntos
Mineração de Dados , Aprendizado Profundo , Saúde Mental , Humanos , Mineração de Dados/métodos , Percepção , Emoções
15.
J Biomed Inform ; 157: 104716, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39197732

RESUMO

OBJECTIVE: This study aims to review the recent advances in community challenges for biomedical text mining in China. METHODS: We collected information of evaluation tasks released in community challenges of biomedical text mining, including task description, dataset description, data source, task type and related links. A systematic summary and comparative analysis were conducted on various biomedical natural language processing tasks, such as named entity recognition, entity normalization, attribute extraction, relation extraction, event extraction, text classification, text similarity, knowledge graph construction, question answering, text generation, and large language model evaluation. RESULTS: We identified 39 evaluation tasks from 6 community challenges that spanned from 2017 to 2023. Our analysis revealed the diverse range of evaluation task types and data sources in biomedical text mining. We explored the potential clinical applications of these community challenge tasks from a translational biomedical informatics perspective. We compared with their English counterparts, and discussed the contributions, limitations, lessons and guidelines of these community challenges, while highlighting future directions in the era of large language models. CONCLUSION: Community challenge evaluation competitions have played a crucial role in promoting technology innovation and fostering interdisciplinary collaboration in the field of biomedical text mining. These challenges provide valuable platforms for researchers to develop state-of-the-art solutions.


Assuntos
Mineração de Dados , Processamento de Linguagem Natural , China , Mineração de Dados/métodos , Informática Médica/métodos
16.
J Biomed Inform ; 159: 104738, 2024 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-39426695

RESUMO

Document-level relation triplet extraction is crucial in biomedical text mining, aiding in drug discovery and the construction of biomedical knowledge graphs. Current language models face challenges in generalizing to unseen datasets and relation types in biomedical relation triplet extraction, which limits their effectiveness in these crucial tasks. To address this challenge, our study optimizes models from two critical dimensions: data-task relevance and granularity of relations, aiming to enhance their generalization capabilities significantly. We introduce a novel progressive learning strategy to obtain the PLRTE model. This strategy not only enhances the model's capability to comprehend diverse relation types in the biomedical domain but also implements a structured four-level progressive learning process through semantic relation augmentation, compositional instruction, and dual-axis level learning. Our experiments on the DDI and BC5CDR document-level biomedical relation triplet datasets demonstrate a significant performance improvement of 5% to 20% over the current state-of-the-art baselines. Furthermore, our model exhibits exceptional generalization capabilities on the unseen Chemprot and GDA datasets, further validating the effectiveness of optimizing data-task association and relation granularity for enhancing model generalizability.

17.
BMC Public Health ; 24(1): 39, 2024 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-38166879

RESUMO

BACKGROUND: With the rapid development of China's chemical industry, although researchers have developed many methods in the field of chemical safety, the situation of chemical safety in China is still not optimistic. How to prevent accidents has always been the focus of scholars' attention. METHODS: Based on the characteristics of chemical enterprises and the Heinrich accident triangle, this paper developed the organizational-level accident triangle, which divides accidents into group-level, unit-level, and workshop-level accidents. Based on 484 accident records of a large chemical enterprise in China, the Spearman correlation coefficient was used to analyze the rationality of accident classification and the occurrence rules of accidents at different levels. In addition, this paper used TF-IDF and K-means algorithms to extract keywords and perform text clustering analysis for accidents at different levels based on accident classification. The risk factors of each accident cluster were further analyzed, and improvement measures were proposed for the sample enterprises. RESULTS: The results show that reducing unit-level accidents can prevent group-level accidents. The accidents of the sample enterprises are mainly personal injury accidents, production accidents, environmental pollution accidents, and quality accidents. The leading causes of personal injury accidents are employees' unsafe behaviors, such as poor safety awareness, non-standard operation, illegal operation, untimely communication, etc. The leading causes of production accidents, environmental pollution accidents, and quality accidents include the unsafe state of materials, such as equipment damage, pipeline leakage, short-circuiting, excessive fluctuation of process parameters, etc. CONCLUSION: Compared with the traditional accident classification method, the accident triangle proposed in this paper based on the organizational level dramatically reduces the differences between accidents, helps enterprises quickly identify risk factors, and prevents accidents. This method can effectively prevent accidents and provide helpful guidance for the safety management of chemical enterprises.


Assuntos
Acidentes , Vazamento de Resíduos Químicos , Humanos , Poluição Ambiental , Fatores de Risco , Gestão da Segurança
18.
BMC Palliat Care ; 23(1): 83, 2024 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-38556869

RESUMO

BACKGROUND: Due to limited numbers of palliative care specialists and/or resources, accessing palliative care remains limited in many low and middle-income countries. Data science methods, such as rule-based algorithms and text mining, have potential to improve palliative care by facilitating analysis of electronic healthcare records. This study aimed to develop and evaluate a rule-based algorithm for identifying cancer patients who may benefit from palliative care based on the Thai version of the Supportive and Palliative Care Indicators for a Low-Income Setting (SPICT-LIS) criteria. METHODS: The medical records of 14,363 cancer patients aged 18 years and older, diagnosed between 2016 and 2020 at Songklanagarind Hospital, were analyzed. Two rule-based algorithms, strict and relaxed, were designed to identify key SPICT-LIS indicators in the electronic medical records using tokenization and sentiment analysis. The inter-rater reliability between these two algorithms and palliative care physicians was assessed using percentage agreement and Cohen's kappa coefficient. Additionally, factors associated with patients might be given palliative care as they will benefit from it were examined. RESULTS: The strict rule-based algorithm demonstrated a high degree of accuracy, with 95% agreement and Cohen's kappa coefficient of 0.83. In contrast, the relaxed rule-based algorithm demonstrated a lower agreement (71% agreement and Cohen's kappa of 0.16). Advanced-stage cancer with symptoms such as pain, dyspnea, edema, delirium, xerostomia, and anorexia were identified as significant predictors of potentially benefiting from palliative care. CONCLUSION: The integration of rule-based algorithms with electronic medical records offers a promising method for enhancing the timely and accurate identification of patients with cancer might benefit from palliative care.


Assuntos
Neoplasias , Cuidados Paliativos , Humanos , Reprodutibilidade dos Testes , Registros Eletrônicos de Saúde , Neoplasias/terapia , Mineração de Dados , Algoritmos
19.
J Med Internet Res ; 26: e55937, 2024 Aug 14.
Artigo em Inglês | MEDLINE | ID: mdl-39141911

RESUMO

BACKGROUND: Nowadays, social media plays a crucial role in disseminating information about cancer prevention and treatment. A growing body of research has focused on assessing access and communication effects of cancer information on social media. However, there remains a limited understanding of the comprehensive presentation of cancer prevention and treatment methods across social media platforms. Furthermore, research comparing the differences between medical social media (MSM) and common social media (CSM) is also lacking. OBJECTIVE: Using big data analytics, this study aims to comprehensively map the characteristics of cancer treatment and prevention information on MSM and CSM. This approach promises to enhance cancer coverage and assist patients in making informed treatment decisions. METHODS: We collected all posts (N=60,843) from 4 medical WeChat official accounts (accounts with professional medical backgrounds, classified as MSM in this paper) and 5 health and lifestyle WeChat official accounts (accounts with nonprofessional medical backgrounds, classified as CSM in this paper). We applied latent Dirichlet allocation topic modeling to extract cancer-related posts (N=8427) and identified 6 cancer themes separately in CSM and MSM. After manually labeling posts according to our codebook, we used a neural-based method for automated labeling. Specifically, we framed our task as a multilabel task and utilized different pretrained models, such as Bidirectional Encoder Representations from Transformers (BERT) and Global Vectors for Word Representation (GloVe), to learn document-level semantic representations for labeling. RESULTS: We analyzed a total of 4479 articles from MSM and 3948 articles from CSM related to cancer. Among these, 35.52% (2993/8427) contained prevention information and 44.43% (3744/8427) contained treatment information. Themes in CSM were predominantly related to lifestyle, whereas MSM focused more on medical aspects. The most frequently mentioned prevention measures were early screening and testing, healthy diet, and physical exercise. MSM mentioned vaccinations for cancer prevention more frequently compared with CSM. Both types of media provided limited coverage of radiation prevention (including sun protection) and breastfeeding. The most mentioned treatment measures were surgery, chemotherapy, and radiotherapy. Compared with MSM (1137/8427, 13.49%), CSM (2993/8427, 35.52%) focused more on prevention. CONCLUSIONS: The information about cancer prevention and treatment on social media revealed a lack of balance. The focus was primarily limited to a few aspects, indicating a need for broader coverage of prevention measures and treatments in social media. Additionally, the study's findings underscored the potential of applying machine learning to content analysis as a promising research approach for mapping key dimensions of cancer information on social media. These findings hold methodological and practical significance for future studies and health promotion.


Assuntos
Aprendizado de Máquina , Neoplasias , Mídias Sociais , Mídias Sociais/estatística & dados numéricos , Humanos , Neoplasias/prevenção & controle , Neoplasias/terapia , China
20.
J Med Internet Res ; 26: e58309, 2024 Oct 21.
Artigo em Inglês | MEDLINE | ID: mdl-39432897

RESUMO

BACKGROUND: Allergy disorders caused by biological particles, such as the proteins in some airborne pollen grains, are currently considered one of the most common chronic diseases, and European Academy of Allergy and Clinical Immunology forecasts indicate that within 15 years 50% of Europeans will have some kind of allergy as a consequence of urbanization, industrialization, pollution, and climate change. OBJECTIVE: The aim of this study was to monitor and analyze the dissemination of information about pollen symptoms from December 2006 to January 2022. By conducting a comprehensive evaluation of public comments and trends on Twitter, the research sought to provide valuable insights into the impact of pollen on sensitive individuals, ultimately enhancing our understanding of how pollen-related information spreads and its implications for public health awareness. METHODS: Using a blend of large language models, dimensionality reduction, unsupervised clustering, and term frequency-inverse document frequency, alongside visual representations such as word clouds and semantic interaction graphs, our study analyzed Twitter data to uncover insights on respiratory allergies. This concise methodology enabled the extraction of significant themes and patterns, offering a deep dive into public knowledge and discussions surrounding respiratory allergies on Twitter. RESULTS: The months between March and August had the highest volume of messages. The percentage of patient tweets appeared to increase notably during the later years, and there was also a potential increase in the prevalence of symptoms, mainly in the morning hours, indicating a potential rise in pollen allergies and related discussions on social media. While pollen allergy is a global issue, specific sociocultural, political, and economic contexts mean that patients experience symptomatology at a localized level, needing appropriate localized responses. CONCLUSIONS: The interpretation of tweet information represents a valuable tool to take preventive measures to mitigate the impact of pollen allergy on sensitive patients to achieve equity in living conditions and enhance access to health information and services.


Assuntos
Pólen , Mídias Sociais , Mídias Sociais/estatística & dados numéricos , Pólen/efeitos adversos , Humanos , Estudos Retrospectivos , Rinite Alérgica Sazonal/epidemiologia , Disseminação de Informação/métodos , Alérgenos
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa