Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
J Biomed Inform ; 149: 104560, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38070816

RESUMO

Clinical term embeddings are traditionally obtained using corpus-based methods, however, these methods cannot incorporate knowledge about clinical terms which is already present in medical ontologies. On the other hand, graph-based methods can obtain embeddings of clinical concepts from ontologies, but they cannot obtain embeddings for clinical terms and words. In this paper, a novel method is presented to obtain embeddings for clinical terms and words from the SNOMED CT ontology. The method first obtains embeddings of clinical concepts from SNOMED CT using a graph-based method. Next, these concept embeddings are used as targets to train a deep learning model to map clinical terms to concepts embeddings. The learned model then provides embeddings for clinical terms and words as well as maps novel clinical terms to their embeddings. The embeddings obtained using the method out-performed corpus-based embeddings on the task of predicting clinical term similarity on five benchmark datasets. On the clinical term normalization task, using these embeddings simply as a means of computing similarity between clinical terms obtained accuracy which was competitive to methods trained specifically for this task. Both corpus-based and ontology-based embeddings have a limitation that they tend to learn similar embeddings for opposite or analogous terms. To counter this, we also introduce a method to automatically learn patterns that indicate when two clinical terms represent the same concept and when they represent different concepts. Supplementing the normalization process with these patterns showed improvement. Although clinical term embeddings obtained from SNOMED CT incorporate ontological knowledge which is missed by corpus-based embeddings, they do not incorporate linguistic knowledge which is needed for sentence-based tasks. Hence combining ontology-based embeddings with corpus-based embeddings is an avenue for future work.


Assuntos
Linguística , Systematized Nomenclature of Medicine
2.
Accid Anal Prev ; 159: 106211, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34126276

RESUMO

Work zone safety management and research relies heavily on the quality of work zone crash data. However, it is possible that a police officer may misclassify a crash in structured data due to: restrictive options in the crash report; a lack of understanding about their importance; lack of time due to police officers' work load; and ignorance of work zone as one of the crash contributing factors. Consequently, work zone crashes are under representative in crash statistics. Crash narratives contain valuable information that is not included in the structured data. The objective of this study is to develop a classifier that applies text mining techniques to quickly find missed work zone (WZ) crashes through the unstructured text saved in the crash narratives. The study used three-year crash data from 2017 to 2019. The data from 2017 to 2018 was used as training data, and the 2019 data was used as testing data. A unigram + bigram noisy-OR classifier was developed and proven to be an efficient and effective means of classifying work zone crashes based on key information in the crash narrative. The ad-hoc analysis of misclassified work zone crashes sheds light on when, where and the plausible reasons as to why work zone crashes are more likely to be missed.


Assuntos
Acidentes de Trânsito , Polícia , Mineração de Dados , Humanos , Narração , Gestão da Segurança
3.
JMIR Med Inform ; 9(1): e23104, 2021 Jan 14.
Artigo em Inglês | MEDLINE | ID: mdl-33443483

RESUMO

BACKGROUND: Clinical terms mentioned in clinical text are often not in their standardized forms as listed in clinical terminologies because of linguistic and stylistic variations. However, many automated downstream applications require clinical terms mapped to their corresponding concepts in clinical terminologies, thus necessitating the task of clinical term normalization. OBJECTIVE: In this paper, a system for clinical term normalization is presented that utilizes edit patterns to convert clinical terms into their normalized forms. METHODS: The edit patterns are automatically learned from the Unified Medical Language System (UMLS) Metathesaurus as well as from the given training data. The edit patterns are generalized sequences of edits that are derived from edit distance computations. The edit patterns are both character based as well as word based and are learned separately for different semantic types. In addition to these edit patterns, the system also normalizes clinical terms through the subconcepts mentioned within them. RESULTS: The system was evaluated as part of the 2019 n2c2 Track 3 shared task of clinical term normalization. It obtained 80.79% accuracy on the standard test data. This paper includes ablation studies to evaluate the contributions of different components of the system. A challenging part of the task was disambiguation when a clinical term could be normalized to multiple concepts. CONCLUSIONS: The learned edit patterns led the system to perform well on the normalization task. Given that the system is based on patterns, it is human interpretable and is also capable of giving insights about common variations of clinical terms mentioned in clinical text that are different from their standardized forms.

4.
J Biomed Inform ; 111: 103585, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-33011295

RESUMO

SNOMED CT is the most comprehensive clinical ontology and is also amenable for automated reasoning. However, in order to unleash its full potential for automated reasoning over clinical text, a mechanism to convert clinical terms into SNOMED CT concepts is necessary. In this paper we present, to the best of our knowledge, the first such complete conversion method that is also capable of converting clinical terms into post-coordinated concepts which are not already listed in SNOMED CT. The method does not require any additional manual annotations and learns only from existing SNOMED CT terms paired with their concepts. The method is based on identifying the defining relations of the clinical concept expressed by a clinical term. We evaluate our method on a large-scale using existing data from SNOMED CT as well as on a small-scale using manually annotated dataset of clinical terms found in clinical text.


Assuntos
Systematized Nomenclature of Medicine , Automação
5.
Comput Biol Med ; 116: 103580, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-32001013

RESUMO

Acute kidney injury (AKI) commonly occurs in hospitalized patients and can lead to serious medical complications. But it is preventable and potentially reversible with early diagnosis and management. Therefore, several machine learning based predictive models have been built to predict AKI in advance from electronic health records (EHR) data. These models to predict inpatient AKI were always built to make predictions at a particular time, for example, 24 or 48 h from admission. However, hospital stays can be several days long and AKI can develop any time within a few hours. To optimally predict AKI before it develops at any time during a hospital stay, we present a novel framework in which AKI is continually predicted automatically from EHR data over the entire hospital stay. The continual model predicts AKI every time a patient's AKI-relevant variable changes in the EHR. Thus, the model not only is independent of a particular time for making predictions, it can also leverage the latest values of all the AKI-relevant patient variables for making predictions. A method to comprehensively evaluate the overall performance of a continual prediction model is also introduced, and we experimentally show using a large dataset of hospital stays that the continual prediction model out-performs all one-time prediction models in predicting AKI.


Assuntos
Injúria Renal Aguda , Pacientes Internados , Injúria Renal Aguda/diagnóstico , Registros Eletrônicos de Saúde , Hospitalização , Humanos , Aprendizado de Máquina
6.
Photomed Laser Surg ; 36(7): 354-362, 2018 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-29583080

RESUMO

BACKGROUND AND OBJECTIVE: Studies on laser phototherapy for pain relief have used parameters that vary widely and have reported varying outcomes. The purpose of this study was to determine the optimal parameter ranges of laser phototherapy for pain relief by analyzing data aggregated from existing primary literature. MATERIALS AND METHODS: Original studies were gathered from available sources and were screened to meet the pre-established inclusion criteria. The included articles were then subjected to meta-analysis using Cohen's d statistic for determining treatment effect size. From these studies, ranges of the reported parameters that always resulted into large effect sizes were determined. These optimal ranges were evaluated for their accuracy using leave-one-article-out cross-validation procedure. RESULTS: A total of 96 articles met the inclusion criteria for meta-analysis and yielded 232 effect sizes. The average effect size was highly significant: d = +1.36 [confidence interval (95% CI) = 1.04-1.68]. Among all the parameters, total energy was found to have the greatest effect on pain relief and had the most prominent optimal ranges of 120-162 and 15.36-20.16 J, which always resulted in large effect sizes. The cross-validation accuracy of the optimal ranges for total energy was 68.57% (95% CI = 53.19-83.97). Fewer and less-prominent optimal ranges were obtained for the energy density and duration parameters. None of the remaining parameters was found to be independently related to pain relief outcomes. CONCLUSIONS: The findings of meta-analysis indicate that laser phototherapy is highly effective for pain relief. Based on the analysis of parameters, total energy can be optimized to yield the largest effect on pain relief.


Assuntos
Terapia com Luz de Baixa Intensidade , Dor/radioterapia , Humanos
7.
Int J Med Inform ; 97: 304-311, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-27919388

RESUMO

BACKGROUND: Survivability rates vary widely among various stages of breast cancer. Although machine learning models built in past to predict breast cancer survivability were given stage as one of the features, they were not trained or evaluated separately for each stage. OBJECTIVE: To investigate whether there are differences in performance of machine learning models trained and evaluated across different stages for predicting breast cancer survivability. METHODS: Using three different machine learning methods we built models to predict breast cancer survivability separately for each stage and compared them with the traditional joint models built for all the stages. We also evaluated the models separately for each stage and together for all the stages. RESULTS AND CONCLUSIONS: Our results show that the most suitable model to predict survivability for a specific stage is the model trained for that particular stage. In our experiments, using additional examples of other stages during training did not help, in fact, it made it worse in some cases. The most important features for predicting survivability were also found to be different for different stages. By evaluating the models separately on different stages we found that the performance widely varied across them. We also demonstrate that evaluating predictive models for survivability on all the stages together, as was done in the past, is misleading because it overestimates performance.


Assuntos
Neoplasias da Mama/mortalidade , Neoplasias da Mama/patologia , Árvores de Decisões , Modelos Teóricos , Estadiamento de Neoplasias/normas , Neoplasias da Mama/classificação , Sobreviventes de Câncer , Feminino , Humanos , Redes Neurais de Computação , Prognóstico , Taxa de Sobrevida
8.
AMIA Annu Symp Proc ; 2017: 1421-1429, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29854211

RESUMO

For all cancer types, survivability rates vary widely across different stages of cancer. But survivability prediction models built in past were trained using examples of all stages together and were also evaluated on all stages together. In this work, for ten cancer types and using three machine learning methods, we built survivability prediction models trained on each stage separately and compared their performance with the traditional models trained on all stages together. For both kinds of models, the evaluation was done on each stage separately as well as on all stages together. Our results show that for most cancer types the stages are sufficiently different from each other that it is best to build survivability prediction models separately for each stage. We also found that evaluating survivability prediction models on all stages together, as was done previously, overestimates performance for all the stages on all cancer types.


Assuntos
Aprendizado de Máquina , Estadiamento de Neoplasias , Neoplasias/patologia , Conjuntos de Dados como Assunto , Humanos , Modelos Teóricos , Neoplasias/mortalidade , Prognóstico , Curva ROC , Análise de Sobrevida
9.
BMC Med Inform Decis Mak ; 16: 39, 2016 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-27025458

RESUMO

BACKGROUND: Acute Kidney Injury (AKI) occurs in at least 5 % of hospitalized patients and can result in 40-70 % morbidity and mortality. Even following recovery, many subjects may experience progressive deterioration of renal function. The heterogeneous etiology and pathophysiology of AKI complicates its diagnosis and medical management and can add to poor patient outcomes and incur substantial hospital costs. AKI is predictable and may be avoidable if early risk factors are identified and utilized in the clinical setting. Timely detection of undiagnosed AKI in hospitalized patients can also lead to better disease management. METHODS: Data from 25,521 hospital stays in one calendar year of patients 60 years and older was collected from a large health care system. Four machine learning models (logistic regression, support vector machines, decision trees and naïve Bayes) along with their ensemble were tested for AKI prediction and detection tasks. Patient demographics, laboratory tests, medications and comorbid conditions were used as the predictor variables. The models were compared using the area under ROC curve (AUC) evaluation metric. RESULTS: Logistic regression performed the best for AKI detection (AUC 0.743) and was a close second to the ensemble for AKI prediction (AUC ensemble: 0.664, AUC logistic regression: 0.660). History of prior AKI, use of combination drugs such as ACE inhibitors, NSAIDS and diuretics, and presence of comorbid conditions such as respiratory failure were found significant for both AKI detection and risk prediction. CONCLUSIONS: The machine learning models performed fairly well on both predicting AKI and detecting undiagnosed AKI. To the best of our knowledge, this is the first study examining the difference between prediction and detection of AKI. The distinction has clinical relevance, and can help providers either identify at risk subjects and implement preventative strategies or manage their treatment depending on whether AKI is predicted or detected.


Assuntos
Injúria Renal Aguda/diagnóstico , Hospitalização/estatística & dados numéricos , Aprendizado de Máquina , Modelos Teóricos , Idoso , Idoso de 80 Anos ou mais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Modelos Estatísticos , Prognóstico
10.
Physiol Meas ; 37(3): 360-79, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26862679

RESUMO

Wearable accelerometers can be used to objectively assess physical activity. However, the accuracy of this assessment depends on the underlying method used to process the time series data obtained from accelerometers. Several methods have been proposed that use this data to identify the type of physical activity and estimate its energy cost. Most of the newer methods employ some machine learning technique along with suitable features to represent the time series data. This paper experimentally compares several of these techniques and features on a large dataset of 146 subjects doing eight different physical activities wearing an accelerometer on the hip. Besides features based on statistics, distance based features and simple discrete features straight from the time series were also evaluated. On the physical activity type identification task, the results show that using more features significantly improve results. Choice of machine learning technique was also found to be important. However, on the energy cost estimation task, choice of features and machine learning technique were found to be less influential. On that task, separate energy cost estimation models trained specifically for each type of physical activity were found to be more accurate than a single model trained for all types of physical activities.


Assuntos
Acelerometria/instrumentação , Algoritmos , Metabolismo Energético , Exercício Físico/fisiologia , Árvores de Decisões , Feminino , Humanos , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Modelos Biológicos , Máquina de Vetores de Suporte , Fatores de Tempo
11.
J Am Med Inform Assoc ; 23(2): 380-6, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26232443

RESUMO

BACKGROUND: Variations of clinical terms are very commonly encountered in clinical texts. Normalization methods that use similarity measures or hand-coded approximation rules for matching clinical terms to standard terminologies have limited accuracy and coverage. MATERIALS AND METHODS: In this paper, a novel method is presented that automatically learns patterns of variations of clinical terms from known variations from a resource such as the Unified Medical Language System (UMLS). The patterns are first learned by computing edit distances between the known variations, which are then appropriately generalized for normalizing previously unseen terms. The method was applied and evaluated on the disease and disorder mention normalization task using the dataset of SemEval 2014 and compared with the normalization ability of the MetaMap system and a method based on cosine similarity. RESULTS: Excluding the mentions that already exactly match in UMLS and the training dataset, the proposed method obtained 64.7% accuracy on the rest of the test dataset. The accuracy was calculated as the number of mentions that correctly matched the gold-standard concept unique identifiers (CUIs) or correctly matched to be without a CUI. In comparison, MetaMap's accuracy was 41.9% and cosine similarity's accuracy was 44.6%. When only the output CUIs were evaluated, the proposed method obtained 54.4% best F-measure (at 92.1% precision and 38.6% recall) while MetaMap obtained 19.4% best F-measure (at 38.0% precision and 13.0% recall) and cosine similarity obtained 38.1% best F-measure (at 70.3% precision and 26.1% recall). CONCLUSIONS: The novel method was found to perform much better than the MetaMap system and the cosine similarity based method in normalizing disease mentions in clinical text that did not exactly match in UMLS. The method is also general and can be used for normalizing clinical terms of other semantic types as well.


Assuntos
Algoritmos , Processamento de Linguagem Natural , Terminologia como Assunto , Unified Medical Language System , Aprendizado de Máquina , Vocabulário Controlado
12.
Physiol Meas ; 36(11): 2335-51, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26449155

RESUMO

To develop and test time series single site and multi-site placement models, we used wrist, hip and ankle processed accelerometer data to estimate energy cost and type of physical activity in adults. Ninety-nine subjects in three age groups (18-39, 40-64, 65 + years) performed 11 activities while wearing three triaxial accelereometers: one each on the non-dominant wrist, hip, and ankle. During each activity net oxygen cost (METs) was assessed. The time series of accelerometer signals were represented in terms of uniformly discretized values called bins. Support Vector Machine was used for activity classification with bins and every pair of bins used as features. Bagged decision tree regression was used for net metabolic cost prediction. To evaluate model performance we employed the jackknife leave-one-out cross validation method. Single accelerometer and multi-accelerometer site model estimates across and within age group revealed similar accuracy, with a bias range of -0.03 to 0.01 METs, bias percent of -0.8 to 0.3%, and a rMSE range of 0.81-1.04 METs. Multi-site accelerometer location models improved activity type classification over single site location models from a low of 69.3% to a maximum of 92.8% accuracy. For each accelerometer site location model, or combined site location model, percent accuracy classification decreased as a function of age group, or when young age groups models were generalized to older age groups. Specific age group models on average performed better than when all age groups were combined. A time series computation show promising results for predicting energy cost and activity type. Differences in prediction across age group, a lack of generalizability across age groups, and that age group specific models perform better than when all ages are combined needs to be considered as analytic calibration procedures to detect energy cost and type are further developed.


Assuntos
Acelerometria/instrumentação , Tornozelo , Metabolismo Energético , Exercício Físico , Quadril , Modelos Estatísticos , Punho , Adulto , Fatores Etários , Feminino , Humanos , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Adulto Jovem
13.
Biomed Inform Insights ; 6(Suppl 1): 29-37, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23847425

RESUMO

Converting information contained in natural language clinical text into computer-amenable structured representations can automate many clinical applications. As a step towards that goal, we present a method which could help in converting novel clinical phrases into new expressions in SNOMED CT, a standard clinical terminology. Since expressions in SNOMED CT are written in terms of their relations with other SNOMED CT concepts, we formulate the important task of identifying relations between clinical phrases and SNOMED CT concepts. We present a machine learning approach for this task and using the dataset of existing SNOMED CT relations we show that it performs well.

14.
J Biomed Semantics ; 3 Suppl 3: S4, 2012 Oct 05.
Artigo em Inglês | MEDLINE | ID: mdl-23046834

RESUMO

BACKGROUND: Clinical reports are written using a subset of natural language while employing many domain-specific terms; such a language is also known as a sublanguage for a scientific or a technical domain. Different genres of clinical reports use different sublaguages, and in addition, different medical facilities use different medical language conventions. This makes supervised training of a parser for clinical sentences very difficult as it would require expensive annotation effort to adapt to every type of clinical text. METHODS: In this paper, we present an unsupervised method which automatically induces a grammar and a parser for the sublanguage of a given genre of clinical reports from a corpus with no annotations. In order to capture sentence structures specific to clinical domains, the grammar is induced in terms of semantic classes of clinical terms in addition to part-of-speech tags. Our method induces grammar by minimizing the combined encoding cost of the grammar and the corresponding sentence derivations. The probabilities for the productions of the induced grammar are then learned from the unannotated corpus using an instance of the expectation-maximization algorithm. RESULTS: Our experiments show that the induced grammar is able to parse novel sentences. Using a dataset of discharge summary sentences with no annotations, our method obtains 60.5% F-measure for parse-bracketing on sentences of maximum length 10. By varying a parameter, the method can induce a range of grammars, from very specific to very general, and obtains the best performance in between the two extremes.

15.
Artif Intell Med ; 33(2): 139-55, 2005 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-15811782

RESUMO

OBJECTIVE: Automatically extracting information from biomedical text holds the promise of easily consolidating large amounts of biological knowledge in computer-accessible form. This strategy is particularly attractive for extracting data relevant to genes of the human genome from the 11 million abstracts in Medline. However, extraction efforts have been frustrated by the lack of conventions for describing human genes and proteins. We have developed and evaluated a variety of learned information extraction systems for identifying human protein names in Medline abstracts and subsequently extracting information on interactions between the proteins. METHODS AND MATERIAL: We used a variety of machine learning methods to automatically develop information extraction systems for extracting information on gene/protein name, function and interactions from Medline abstracts. We present cross-validated results on identifying human proteins and their interactions by training and testing on a set of approximately 1000 manually-annotated Medline abstracts that discuss human genes/proteins. RESULTS: We demonstrate that machine learning approaches using support vector machines and maximum entropy are able to identify human proteins with higher accuracy than several previous approaches. We also demonstrate that various rule induction methods are able to identify protein interactions with higher precision than manually-developed rules. CONCLUSION: Our results show that it is promising to use machine learning to automatically build systems for extracting information from biomedical text. The results also give a broad picture of the relative strengths of a wide variety of methods when tested on a reasonably large human-annotated corpus.


Assuntos
Inteligência Artificial , Genes , Armazenamento e Recuperação da Informação/métodos , Proteínas , Algoritmos , Sistemas Inteligentes , Genes/genética , Genes/fisiologia , Humanos , MEDLINE , Proteínas/classificação , Proteínas/fisiologia , Terminologia como Assunto
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA