Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 4.227
Filtrar
1.
Bone Joint J ; 102-B(7_Supple_B): 99-104, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32600201

RESUMO

AIMS: Natural Language Processing (NLP) offers an automated method to extract data from unstructured free text fields for arthroplasty registry participation. Our objective was to investigate how accurately NLP can be used to extract structured clinical data from unstructured clinical notes when compared with manual data extraction. METHODS: A group of 1,000 randomly selected clinical and hospital notes from eight different surgeons were collected for patients undergoing primary arthroplasty between 2012 and 2018. In all, 19 preoperative, 17 operative, and two postoperative variables of interest were manually extracted from these notes. A NLP algorithm was created to automatically extract these variables from a training sample of these notes, and the algorithm was tested on a random test sample of notes. Performance of the NLP algorithm was measured in Statistical Analysis System (SAS) by calculating the accuracy of the variables collected, the ability of the algorithm to collect the correct information when it was indeed in the note (sensitivity), and the ability of the algorithm to not collect a certain data element when it was not in the note (specificity). RESULTS: The NLP algorithm performed well at extracting variables from unstructured data in our random test dataset (accuracy = 96.3%, sensitivity = 95.2%, and specificity = 97.4%). It performed better at extracting data that were in a structured, templated format such as range of movement (ROM) (accuracy = 98%) and implant brand (accuracy = 98%) than data that were entered with variation depending on the author of the note such as the presence of deep-vein thrombosis (DVT) (accuracy = 90%). CONCLUSION: The NLP algorithm used in this study was able to identify a subset of variables from randomly selected unstructured notes in arthroplasty with an accuracy above 90%. For some variables, such as objective exam data, the accuracy was very high. Our findings suggest that automated algorithms using NLP can help orthopaedic practices retrospectively collect information for registries and quality improvement (QI) efforts. Cite this article: Bone Joint J 2020;102-B(7 Supple B):99-104.


Assuntos
Artroplastia de Quadril , Artroplastia do Joelho , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Sistema de Registros , Algoritmos , Confiabilidade dos Dados , Humanos , Qualidade da Assistência à Saúde , Estudos Retrospectivos
2.
Stud Health Technol Inform ; 272: 55-58, 2020 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-32604599

RESUMO

The automated detection of adverse events in medical records might be a cost-effective solution for patient safety management or pharmacovigilance. Our group proposed an information extraction algorithm (IEA) for detecting adverse events in neurosurgery using documents written in a natural rich-in-morphology language. In this paper, we challenge to optimize and evaluate its performance for the detection of any extremity muscle weakness in clinical texts. Our algorithm shows the accuracy of 0.96 and ROC AUC = 0.96 and might be easily implemented in other medical domains.


Assuntos
Debilidade Muscular , Processamento de Linguagem Natural , Registros Eletrônicos de Saúde , Humanos , Armazenamento e Recuperação da Informação , Farmacovigilância
3.
Stud Health Technol Inform ; 272: 95-98, 2020 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-32604609

RESUMO

Having precise information about health IT evaluation studies is important for evidence-based decisions in medical informatics. In a former feasibility study, we used a faceted search based on ontological modeling of key elements of studies to retrieve precisely described health IT evaluation studies. However, extracting the key elements manually for the modeling of the ontology was time and resource-intensive. We now aimed at applying natural language processing to substitute manual data extraction by automatic data extraction. Four methods (Named Entity Recognition, Bag-of-Words, Term-Frequency-Inverse-Document-Frequency, and Latent Dirichlet Allocation Topic Modeling were applied to 24 health IT evaluation studies. We evaluated which of these methods was best suited for extracting key elements of each study. As gold standard, we used results from manual extraction. As a result, Named Entity Recognition is promising but needs to be adapted to the existing study context. After the adaption, key elements of studies could be collected in a more feasible, time- and resource-saving way.


Assuntos
Processamento de Linguagem Natural , Armazenamento e Recuperação da Informação
4.
Stud Health Technol Inform ; 272: 191-194, 2020 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-32604633

RESUMO

The number of scientific publications is constantly growing to make their processing extremely time-consuming. We hypothesized that a user-defined literature tracking may be augmented by machine learning on article summaries. A specific dataset of 671 article abstracts was obtained and nineteen binary classification options using machine learning (ML) techniques on various text representations were proposed in a pilot study. 300 tests with resamples were performed for each classification option. The best classification option demonstrated AUC = 0.78 proving the concept in general and indicating a potential for solution improvement.


Assuntos
Aprendizado de Máquina , Humanos , Processamento de Linguagem Natural , Projetos Piloto
5.
Stud Health Technol Inform ; 270: 203-207, 2020 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-32570375

RESUMO

Radiology reports include various types of clinical information that are used for patient care. Reports are also expected to have secondary uses (e.g., clinical research and the development of decision support systems). For secondary use, it is necessary to extract information from the report and organize it in a structured format. Our goal is to build an application to transform radiology reports written in a free-text form into a structured format. To this end, we propose an end-to-end method that consists of three elements. First, we built a neural network model to extract clinical information from the reports. We experimented on a dataset of chest X-ray reports. Second, we transformed the extracted information into a structured format. Finally, we built a tool that enabled the transformation of terms in reports to standard forms. Through our end-to-end method, we could obtain a structured radiology dataset that was easy to access for secondary use.


Assuntos
Processamento de Linguagem Natural , Redes Neurais de Computação , Sistemas de Informação em Radiologia , Radiologia , Humanos , Relatório de Pesquisa , Software , Redação
6.
Stud Health Technol Inform ; 270: 208-212, 2020 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-32570376

RESUMO

This paper presents five document retrieval systems for a small (few thousands) and domain specific corpora (weekly peer-reviewed medical journals published in French) as well as an evaluation methodology to quantify the models performance. The proposed methodology does not rely on external annotations and therefore can be used as an ad hoc evaluation procedure for most document retrieval tasks. Statistical models and vector space models are empirically compared on a synthetic document retrieval task. For our dataset size and specificities the statistical approaches consistently performed better than its vector space counterparts.


Assuntos
Armazenamento e Recuperação da Informação/métodos , Idioma , Medical Subject Headings , Modelos Estatísticos , Processamento de Linguagem Natural , Humanos
7.
Stud Health Technol Inform ; 270: 272-276, 2020 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-32570389

RESUMO

After kidney transplantation graft rejection must be prevented. Therefore, a multitude of parameters of the patient is observed pre- and postoperatively. To support this process, the Screen Reject research project is developing a data warehouse optimized for kidney rejection diagnostics. In the course of this project it was discovered that important information are only available in form of free texts instead of structured data and can therefore not be processed by standard ETL tools, which is necessary to establish a digital expert system for rejection diagnostics. Due to this reason, data integration has been improved by a combination of methods from natural language processing and methods from image processing. Based on state-of-the-art data warehousing technologies (Microsoft SSIS), a generic data integration tool has been developed. The tool was evaluated by extracting Banff-classification from 218 pathology reports and extracting HLA mismatches from about 1700 PDF files, both written in german language.


Assuntos
Data Warehousing , Transplante de Rim , Rejeição de Enxerto , Humanos , Armazenamento e Recuperação da Informação , Rim , Processamento de Linguagem Natural
8.
Stud Health Technol Inform ; 270: 292-296, 2020 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-32570393

RESUMO

Acronyms frequently occur in clinical text, which makes their identification, disambiguation and resolution an important task in clinical natural language processing. This paper contributes to acronym resolution in Spanish through the creation of a set of sense inventories organized by clinical specialty containing acronyms, their expansions, and corpus-driven features. The new acronym resource is composed of 51 clinical specialties with 3,603 acronyms in total, from which we identified 228 language independent acronyms and 391 language dependent expansions. We further analyzed the sense inventory across specialties and present novel insights of acronym usage in biomedical Spanish texts.


Assuntos
Abreviaturas como Assunto , Processamento de Linguagem Natural , PubMed , Inteligência Artificial , Humanos , Idioma
9.
IEEE Trans Cybern ; 50(7): 2891-2904, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32396126

RESUMO

The coronavirus disease 2019 (COVID-19) breaking out in late December 2019 is gradually being controlled in China, but it is still spreading rapidly in many other countries and regions worldwide. It is urgent to conduct prediction research on the development and spread of the epidemic. In this article, a hybrid artificial-intelligence (AI) model is proposed for COVID-19 prediction. First, as traditional epidemic models treat all individuals with coronavirus as having the same infection rate, an improved susceptible-infected (ISI) model is proposed to estimate the variety of the infection rates for analyzing the transmission laws and development trend. Second, considering the effects of prevention and control measures and the increase of the public's prevention awareness, the natural language processing (NLP) module and the long short-term memory (LSTM) network are embedded into the ISI model to build the hybrid AI model for COVID-19 prediction. The experimental results on the epidemic data of several typical provinces and cities in China show that individuals with coronavirus have a higher infection rate within the third to eighth days after they were infected, which is more in line with the actual transmission laws of the epidemic. Moreover, compared with the traditional epidemic models, the proposed hybrid AI model can significantly reduce the errors of the prediction results and obtain the mean absolute percentage errors (MAPEs) with 0.52%, 0.38%, 0.05%, and 0.86% for the next six days in Wuhan, Beijing, Shanghai, and countrywide, respectively.


Assuntos
Inteligência Artificial , Betacoronavirus , Infecções por Coronavirus/epidemiologia , Modelos Estatísticos , Pneumonia Viral/epidemiologia , China/epidemiologia , Humanos , Processamento de Linguagem Natural , Pandemias
10.
BMC Bioinformatics ; 21(1): 188, 2020 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-32410573

RESUMO

BACKGROUND: In the era of information overload, natural language processing (NLP) techniques are increasingly needed to support advanced biomedical information management and discovery applications. In this paper, we present an in-depth description of SemRep, an NLP system that extracts semantic relations from PubMed abstracts using linguistic principles and UMLS domain knowledge. We also evaluate SemRep on two datasets. In one evaluation, we use a manually annotated test collection and perform a comprehensive error analysis. In another evaluation, we assess SemRep's performance on the CDR dataset, a standard benchmark corpus annotated with causal chemical-disease relationships. RESULTS: A strict evaluation of SemRep on our manually annotated dataset yields 0.55 precision, 0.34 recall, and 0.42 F 1 score. A relaxed evaluation, which more accurately characterizes SemRep performance, yields 0.69 precision, 0.42 recall, and 0.52 F 1 score. An error analysis reveals named entity recognition/normalization as the largest source of errors (26.9%), followed by argument identification (14%) and trigger detection errors (12.5%). The evaluation on the CDR corpus yields 0.90 precision, 0.24 recall, and 0.38 F 1 score. The recall and the F 1 score increase to 0.35 and 0.50, respectively, when the evaluation on this corpus is limited to sentence-bound relationships, which represents a fairer evaluation, as SemRep operates at the sentence level. CONCLUSIONS: SemRep is a broad-coverage, interpretable, strong baseline system for extracting semantic relations from biomedical text. It also underpins SemMedDB, a literature-scale knowledge graph based on semantic relations. Through SemMedDB, SemRep has had significant impact in the scientific community, supporting a variety of clinical and translational applications, including clinical decision making, medical diagnosis, drug repurposing, literature-based discovery and hypothesis generation, and contributing to improved health outcomes. In ongoing development, we are redesigning SemRep to increase its modularity and flexibility, and addressing weaknesses identified in the error analysis.


Assuntos
Algoritmos , Armazenamento e Recuperação da Informação , Semântica , Humanos , Processamento de Linguagem Natural , PubMed , Unified Medical Language System
11.
PLoS One ; 15(5): e0232840, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32396579

RESUMO

Individual electronic health records (EHRs) and clinical reports are often part of a larger sequence-for example, a single patient may generate multiple reports over the trajectory of a disease. In applications such as cancer pathology reports, it is necessary not only to extract information from individual reports, but also to capture aggregate information regarding the entire cancer case based off case-level context from all reports in the sequence. In this paper, we introduce a simple modular add-on for capturing case-level context that is designed to be compatible with most existing deep learning architectures for text classification on individual reports. We test our approach on a corpus of 431,433 cancer pathology reports, and we show that incorporating case-level context significantly boosts classification accuracy across six classification tasks-site, subsite, laterality, histology, behavior, and grade. We expect that with minimal modifications, our add-on can be applied towards a wide range of other clinical text-based tasks.


Assuntos
Registros Eletrônicos de Saúde/classificação , Neoplasias/patologia , Técnicas Histológicas , Humanos , Processamento de Linguagem Natural , Programa de SEER
12.
PLoS One ; 15(5): e0232525, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32357164

RESUMO

Text classification (TC) is the task of automatically assigning documents to a fixed number of categories. TC is an important component in many text applications. Many of these applications perform preprocessing. There are different types of text preprocessing, e.g., conversion of uppercase letters into lowercase letters, HTML tag removal, stopword removal, punctuation mark removal, lemmatization, correction of common misspelled words, and reduction of replicated characters. We hypothesize that the application of different combinations of preprocessing methods can improve TC results. Therefore, we performed an extensive and systematic set of TC experiments (and this is our main research contribution) to explore the impact of all possible combinations of five/six basic preprocessing methods on four benchmark text corpora (and not samples of them) using three ML methods and training and test sets. The general conclusion (at least for the datasets verified) is that it is always advisable to perform an extensive and systematic variety of preprocessing methods combined with TC experiments because it contributes to improve TC accuracy. For all the tested datasets, there was always at least one combination of basic preprocessing methods that could be recommended to significantly improve the TC using a BOW representation. For three datasets, stopword removal was the only single preprocessing method that enabled a significant improvement compared to the baseline result using a bag of 1,000-word unigrams. For some of the datasets, there was minimal improvement when we removed HTML tags, performed spelling correction or removed punctuation marks, and reduced replicated characters. However, for the fourth dataset, the stopword removal was not beneficial. Instead, the conversion of uppercase letters into lowercase letters was the only single preprocessing method that demonstrated a significant improvement compared to the baseline result. The best result for this dataset was obtained when we performed spelling correction and conversion into lowercase letters. In general, for all the datasets processed, there was always at least one combination of basic preprocessing methods that could be recommended to improve the accuracy results when using a bag-of-words representation.


Assuntos
Processamento de Linguagem Natural , Aprendizado de Máquina Supervisionado , Processamento de Texto , Algoritmos , Mineração de Dados/classificação , Bases de Dados Factuais , Humanos , Idioma , Aprendizado de Máquina Supervisionado/classificação , Envio de Mensagens de Texto/classificação , Processamento de Texto/classificação
13.
PLoS One ; 15(5): e0232547, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32413094

RESUMO

Scientific information extraction is a crucial step for understanding scientific publications. In this paper, we focus on scientific keyphrase extraction, which aims to identify keyphrases from scientific articles and classify them into predefined categories. We present a neural network based approach for this task, which employs the bidirectional long short-memory (LSTM) to represent the sentences in the article. On top of the bidirectional LSTM layer in our neural model, conditional random field (CRF) is used to predict the label sequence for the whole sentence. Considering the expensive annotated data for supervised learning methods, we introduce self-training method into our neural model to leverage the unlabeled articles. Experimental results on the ScienceIE corpus and ACL keyphrase corpus show that our neural model achieves promising performance without any hand-designed features and external knowledge resources. Furthermore, it efficiently incorporates the unlabeled data and achieve competitive performance compared with previous state-of-the-art systems.


Assuntos
Aprendizado Profundo , Armazenamento e Recuperação da Informação/métodos , Redes Neurais de Computação , Modelos Estatísticos , Processamento de Linguagem Natural , Publicações
14.
PLoS One ; 15(4): e0230107, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32352986

RESUMO

Predicting innovation is a peculiar problem in data science. Following its definition, an innovation is always a never-seen-before event, leaving no room for traditional supervised learning approaches. Here we propose a strategy to address the problem in the context of innovative patents, by defining innovations as never-seen-before associations of technologies and exploiting self-supervised learning techniques. We think of technological codes present in patents as a vocabulary and the whole technological corpus as written in a specific, evolving language. We leverage such structure with techniques borrowed from Natural Language Processing by embedding technologies in a high dimensional euclidean space where relative positions are representative of learned semantics. Proximity in this space is an effective predictor of specific innovation events, that outperforms a wide range of standard link-prediction metrics. The success of patented innovations follows a complex dynamics characterized by different patterns which we analyze in details with specific examples. The methods proposed in this paper provide a completely new way of understanding and forecasting innovation, by tackling it from a revealing perspective and opening interesting scenarios for a number of applications and further analytic approaches.


Assuntos
Previsões , Idioma , Processamento de Linguagem Natural , Humanos
15.
J Korean Med Sci ; 35(12): e78, 2020 Mar 30.
Artigo em Inglês | MEDLINE | ID: mdl-32233158

RESUMO

BACKGROUND: Human leukocyte antigen (HLA) typing is important for transplant patients to prevent a severe mismatch reaction, and the result can also support the diagnosis of various disease or prediction of drug side effects. However, such secondary applications of HLA typing results are limited because they are typically provided in free-text format or PDFs on electronic medical records. We here propose a method to convert HLA genotype information stored in an unstructured format into a reusable structured format by extracting serotype/allele information. METHODS: We queried HLA typing reports from the clinical data warehouse of Seoul National University Hospital (SUPPREME) from 2000 to 2018 as a rule-development data set (64,024 reports) and from the most recent year (6,181 reports) as a test set. We used a rule-based natural language approach using a Python regex function to extract the 1) number of patients in the report, 2) clinical characteristics such as indication of the HLA testing, and 3) precise HLA genotypes. The performance of the rules and codes was evaluated by comparison between the extracted results from the test set and a validation set generated by manual curation. RESULTS: Among 11,287 reports for development set and 1,107 for the test set describing HLA typing for a single patient, iterative rule generation developed 124 extracting rules and 8 cleaning rules for HLA genotypes. Application of these rules extracted HLA genotypes with 0.892-0.999 precision and 0.795-0.998 recall for the five HLA genes. The precision and recall of the extracting rules for the number of patients in a report were 0.997 and 0.994 and those for the clinical variable extraction were 0.997 and 0.992, respectively. All extracted HLA alleles and serotypes were transformed according to formal HLA nomenclature by the cleaning rules. CONCLUSION: The rule-based HLA genotype extraction method shows reliable accuracy. We believe that there are significant number of patients who takes profit when this under-used genetic information will be return to them.


Assuntos
Antígenos HLA/genética , Teste de Histocompatibilidade , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Algoritmos , Data Warehousing , Registros Eletrônicos de Saúde , Genótipo , Humanos , Seul
16.
PLoS One ; 15(4): e0230876, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32240233

RESUMO

Emergency department triage is the first point in time when a patient's acuity level is determined. The time to assign a priority at triage is short and it is vital to accurately stratify patients at this stage, since under-triage can lead to increased morbidity, mortality and costs. Our aim was to present a model that can assist healthcare professionals in triage decision making, namely in the stratification of patients through the risk prediction of a composite critical outcome-mortality and cardiopulmonary arrest. Our study cohort consisted of 235826 adult patients triaged at a Portuguese Emergency Department from 2012 to 2016. Patients were assigned to emergent, very urgent or urgent priorities of the Manchester Triage System (MTS). Demographics, clinical variables routinely collected at triage and the patients' chief complaint were used. Logistic regression, random forests and extreme gradient boosting were developed using all available variables. The term frequency-inverse document frequency (TF-IDF) natural language processing weighting factor was applied to vectorize the chief complaint. Stratified random sampling was used to split the data into train (70%) and test (30%) data sets. Ten-fold cross validation was performed in train to optimize model hyper-parameters. The performance obtained with the best model was compared against the reference model-a regularized logistic regression trained using only triage priorities. Extreme gradient boosting exhibited good calibration properties and yielded areas under the receiver operating characteristic and precision-recall curves of 0.96 (95% CI 0.95-0.97) and 0.31 (95% CI 0.26-0.36), respectively. The predictors ranked with higher importance by this model were the Glasgow coma score, the patients' age, pulse oximetry and arrival mode. Compared to the reference, the extreme gradient boosting model using clinical variables and the chief complaint presented higher recall for patients assigned MTS-3 and can identify those who are at risk of the composite outcome.


Assuntos
Previsões/métodos , Medição de Risco/métodos , Triagem/métodos , Adulto , Estudos de Coortes , Serviço Hospitalar de Emergência/tendências , Feminino , Parada Cardíaca , Hospitalização , Humanos , Modelos Logísticos , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Processamento de Linguagem Natural , Gravidade do Paciente , Portugal , Curva ROC , Fatores de Risco
17.
J Med Syst ; 44(5): 96, 2020 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-32193703

RESUMO

Optic disc (OD) and optic cup (OC) segmentation are important steps for automatic screening and diagnosing of optic nerve head abnormalities such as glaucoma. Many recent works formulated the OD and OC segmentation as a pixel classification task. However, it is hard for these methods to explicitly model the spatial relations between the labels in the output mask. Furthermore, the proportion of the background, OD and OC are unbalanced which also may result in a biased model as well as introduce more noise. To address these problems, we developed an approach that follows a coarse-to-fine segmentation process. We start with a U-Net to obtain a rough segmenting boundary and then crop the area around the boundary to form a boundary contour centered image. Second, inspired by sequence labeling tasks in natural language processing, we regard the OD and OC segmentation as a sequence labeling task and propose a novel fully convolutional network called SU-Net and combine it with the Viterbi algorithm to jointly decode the segmentation boundary. We also introduced a geometric parameter-based data augmentation method to generate more training samples in order to minimize the differences between training and test sets and reduce overfitting. Experimental results show that our method achieved state-of-the-art results on 2 datasets for both OD and OC segmentation and our method outperforms most of the ophthalmologists in terms of achieving agreement out of 6 ophthalmologists on the MESSIDOR dataset for both OD and OC segmentation. In terms of glaucoma screening, we achieved the best cup-to-disc ratio (CDR) error and area under the ROC curve (AUC) for glaucoma classification on the Drishti-GS dataset.


Assuntos
Glaucoma , Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Disco Óptico/diagnóstico por imagem , Fundo de Olho , Glaucoma/diagnóstico , Humanos , Processamento de Imagem Assistida por Computador/métodos , Processamento de Linguagem Natural
18.
PLoS One ; 15(3): e0229331, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32126097

RESUMO

The risk stratification of patients in the emergency department begins at triage. It is vital to stratify patients early based on their severity, since undertriage can lead to increased morbidity, mortality and costs. Our aim was to present a new approach to assist healthcare professionals at triage in the stratification of patients and in identifying those with higher risk of ICU admission. Adult patients assigned Manchester Triage System (MTS) or Emergency Severity Index (ESI) 1 to 3 from a Portuguese and a United States Emergency Departments were analyzed. Variables routinely collected at triage were used and natural language processing was applied to the patient chief complaint. Stratified random sampling was applied to split the data in train (70%) and test (30%) sets and 10-fold cross validation was performed for model training. Logistic regression, random forests, and a random undersampling boosting algorithm were used. We compared the performance obtained with the reference model-using only triage priorities-with the models using additional variables. For both hospitals, a logistic regression model achieved higher overall performance, yielding areas under the receiver operating characteristic and precision-recall curves of 0.91 (95% CI 0.90-0.92) and 0.30 (95% CI 0.27-0.33) for the United States hospital and of 0.85 (95% CI 0.83-0.86) and 0.06 (95% CI 0.05-0.07) for the Portuguese hospital. Heart rate, pulse oximetry, respiratory rate and systolic blood pressure were the most important predictors of ICU admission. Compared to the reference models, the models using clinical variables and the chief complaint presented higher recall for patients assigned MTS/ESI 3 and can identify patients assigned MTS/ESI 3 who are at risk for ICU admission.


Assuntos
Admissão do Paciente/estatística & dados numéricos , Triagem/métodos , Adulto , Idoso , Idoso de 80 Anos ou mais , Serviço Hospitalar de Emergência , Feminino , Humanos , Unidades de Terapia Intensiva , Modelos Logísticos , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Processamento de Linguagem Natural , Portugal/epidemiologia , Medição de Risco , Estados Unidos/epidemiologia
19.
J Med Internet Res ; 22(1): e16816, 2020 01 23.
Artigo em Inglês | MEDLINE | ID: mdl-32012074

RESUMO

BACKGROUND: Natural language processing (NLP) is an important traditional field in computer science, but its application in medical research has faced many challenges. With the extensive digitalization of medical information globally and increasing importance of understanding and mining big data in the medical field, NLP is becoming more crucial. OBJECTIVE: The goal of the research was to perform a systematic review on the use of NLP in medical research with the aim of understanding the global progress on NLP research outcomes, content, methods, and study groups involved. METHODS: A systematic review was conducted using the PubMed database as a search platform. All published studies on the application of NLP in medicine (except biomedicine) during the 20 years between 1999 and 2018 were retrieved. The data obtained from these published studies were cleaned and structured. Excel (Microsoft Corp) and VOSviewer (Nees Jan van Eck and Ludo Waltman) were used to perform bibliometric analysis of publication trends, author orders, countries, institutions, collaboration relationships, research hot spots, diseases studied, and research methods. RESULTS: A total of 3498 articles were obtained during initial screening, and 2336 articles were found to meet the study criteria after manual screening. The number of publications increased every year, with a significant growth after 2012 (number of publications ranged from 148 to a maximum of 302 annually). The United States has occupied the leading position since the inception of the field, with the largest number of articles published. The United States contributed to 63.01% (1472/2336) of all publications, followed by France (5.44%, 127/2336) and the United Kingdom (3.51%, 82/2336). The author with the largest number of articles published was Hongfang Liu (70), while Stéphane Meystre (17) and Hua Xu (33) published the largest number of articles as the first and corresponding authors. Among the first author's affiliation institution, Columbia University published the largest number of articles, accounting for 4.54% (106/2336) of the total. Specifically, approximately one-fifth (17.68%, 413/2336) of the articles involved research on specific diseases, and the subject areas primarily focused on mental illness (16.46%, 68/413), breast cancer (5.81%, 24/413), and pneumonia (4.12%, 17/413). CONCLUSIONS: NLP is in a period of robust development in the medical field, with an average of approximately 100 publications annually. Electronic medical records were the most used research materials, but social media such as Twitter have become important research materials since 2015. Cancer (24.94%, 103/413) was the most common subject area in NLP-assisted medical research on diseases, with breast cancers (23.30%, 24/103) and lung cancers (14.56%, 15/103) accounting for the highest proportions of studies. Columbia University and the talents trained therein were the most active and prolific research forces on NLP in the medical field.


Assuntos
Bibliometria , Processamento de Linguagem Natural , Medicina de Precisão/métodos , PubMed/normas , Humanos , Fatores de Tempo
20.
BMC Bioinformatics ; 21(1): 29, 2020 Jan 28.
Artigo em Inglês | MEDLINE | ID: mdl-31992184

RESUMO

BACKGROUND: Event extraction from the biomedical literature is one of the most actively researched areas in biomedical text mining and natural language processing. However, most approaches have focused on events within single sentence boundaries, and have thus paid much less attention to events spanning multiple sentences. The Bacteria-Biotope event (BB-event) subtask presented in BioNLP Shared Task 2016 is one such example; a significant amount of relations between bacteria and biotope span more than one sentence, but existing systems have treated them as false negatives because labeled data is not sufficiently large enough to model a complex reasoning process using supervised learning frameworks. RESULTS: We present an unsupervised method for inferring cross-sentence events by propagating intra-sentence information to adjacent sentences using context trigger expressions that strongly signal the implicit presence of entities of interest. Such expressions can be collected from a large amount of unlabeled plain text based on simple syntactic constraints, helping to overcome the limitation of relying only on a small number of training examples available. The experimental results demonstrate that our unsupervised system extracts cross-sentence events quite well and outperforms all the state-of-the-art supervised systems when combined with existing methods for intra-sentence event extraction. Moreover, our system is also found effective at detecting long-distance intra-sentence events, compared favorably with existing high-dimensional models such as deep neural networks, without any supervised learning techniques. CONCLUSIONS: Our linguistically motivated inference model is shown to be effective at detecting implicit events that have not been covered by previous work, without relying on training data or curated knowledge bases. Moreover, it also helps to boost the performance of existing systems by allowing them to detect additional cross-sentence events. We believe that the proposed model offers an effective way to infer implicit information beyond sentence boundaries, especially when human-annotated data is not sufficient enough to train a robust supervised system.


Assuntos
Mineração de Dados/métodos , Bactérias , Processamento de Linguagem Natural , Publicações
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA