Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
J Med Syst ; 40(8): 191, 2016 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-27402260

RESUMO

Obesity is a chronic disease with an increasing impact on the world's population. In this work, we present a method of identifying obesity automatically using text mining techniques and information related to body weight measures and obesity comorbidities. We used a dataset of 3015 de-identified medical records that contain labels for two classification problems. The first classification problem distinguishes between obesity, overweight, normal weight, and underweight. The second classification problem differentiates between obesity types: super obesity, morbid obesity, severe obesity and moderate obesity. We used a Bag of Words approach to represent the records together with unigram and bigram representations of the features. We implemented two approaches: a hierarchical method and a nonhierarchical one. We used Support Vector Machine and Naïve Bayes together with ten-fold cross validation to evaluate and compare performances. Our results indicate that the hierarchical approach does not work as well as the nonhierarchical one. In general, our results show that Support Vector Machine obtains better performances than Naïve Bayes for both classification problems. We also observed that bigram representation improves performance compared with unigram representation.


Assuntos
Inteligência Artificial , Mineração de Dados/métodos , Registros Eletrônicos de Saúde/organização & administração , Obesidade/diagnóstico , Teorema de Bayes , Comorbidade , Humanos , Processamento de Linguagem Natural , Sobrepeso/diagnóstico , Máquina de Vetores de Suporte
2.
PLoS One ; 19(4): e0301523, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38662739

RESUMO

INTRODUCTION: The rise of new technologies in the field of health is yielding promising results. In certain chronic conditions such as type 2 diabetes mellitus, which ranks among the top five causes of global mortality, it could be useful in supporting patient management. MATERIALS AND METHODS: A systematic review will be conducted on scientific publications from the last 5 years (January 2019 to October 2023) to describe the effect of mobile app usage on glycated hemoglobin for the management of adult patients with type 2 diabetes mellitus who participated in randomized controlled clinical trials. The search will be carried out in the databases of MEDLINE (Ovid), Embase (Ovid), CINAHL (EBSCOhost), CENTRAL, WoS, Scopus, Epistemonikos, and LILACS. The search strategy will be constructed using both controlled and natural language. Additionally, the Cochrane filter will be applied to identify randomized controlled trials. The review will include scientific articles reporting studies that present results from randomized controlled trials, with texts in Spanish, English, or French, utilizing mobile applications for the management of adult individuals (over 18 years) with type 2 diabetes mellitus, and whose outcomes report the effects on glycated hemoglobin. The Cochrane Risk of Bias Tool will be used to assess the quality of the studies, and the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) methodology will be implemented to evaluate the certainty of the evidence. RESULTS: The analysis will be conducted by observing the value of the glycated hemoglobin levels of the participants. Given that this data is a quantitative and continuous value, it facilitates the identification of the effects of the mobile applications used for the management of type 2 diabetes mellitus (T2DM) in adults. Furthermore, if sufficient data are available, a meta-analysis will be conducted using IBM-SPSS. The effect of the intervention will be estimated by the mean difference. All point estimates will be accompanied by 95% confidence intervals. A random effects model will be used. The heterogeneity of the results will be assessed using Cochrane's Q and I2 statistics. DISCUSSION: Considering that the quality of content and functionality of certain applications in the healthcare field is highly variable, it is necessary to evaluate the scientific evidence reported on the effect of the use of this type of technology in people with T2DM.


Assuntos
Diabetes Mellitus Tipo 2 , Aplicativos Móveis , Revisões Sistemáticas como Assunto , Diabetes Mellitus Tipo 2/terapia , Humanos , Hemoglobinas Glicadas/análise , Hemoglobinas Glicadas/metabolismo , Ensaios Clínicos Controlados Aleatórios como Assunto
3.
BMC Med Inform Decis Mak ; 12: 8, 2012 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-22336388

RESUMO

BACKGROUND: Supervised learning methods need annotated data in order to generate efficient models. Annotated data, however, is a relatively scarce resource and can be expensive to obtain. For both passive and active learning methods, there is a need to estimate the size of the annotated sample required to reach a performance target. METHODS: We designed and implemented a method that fits an inverse power law model to points of a given learning curve created using a small annotated training set. Fitting is carried out using nonlinear weighted least squares optimization. The fitted model is then used to predict the classifier's performance and confidence interval for larger sample sizes. For evaluation, the nonlinear weighted curve fitting method was applied to a set of learning curves generated using clinical text and waveform classification tasks with active and passive sampling methods, and predictions were validated using standard goodness of fit measures. As control we used an un-weighted fitting method. RESULTS: A total of 568 models were fitted and the model predictions were compared with the observed performances. Depending on the data set and sampling method, it took between 80 to 560 annotated samples to achieve mean average and root mean squared error below 0.01. Results also show that our weighted fitting method outperformed the baseline un-weighted method (p < 0.05). CONCLUSIONS: This paper describes a simple and effective sample size prediction algorithm that conducts weighted fitting of learning curves. The algorithm outperformed an un-weighted algorithm described in previous literature. It can help researchers determine annotation sample size for supervised machine learning.


Assuntos
Algoritmos , Curva de Aprendizado , Aprendizagem Baseada em Problemas/métodos , Tamanho da Amostra , Interpretação Estatística de Dados , Diagnóstico por Computador , Humanos , Modelos Estatísticos , Dinâmica não Linear , Reconhecimento Automatizado de Padrão , Valor Preditivo dos Testes , Aprendizagem por Probabilidade , Reprodutibilidade dos Testes , Processos Estocásticos
4.
Annu Int Conf IEEE Eng Med Biol Soc ; 2019: 6085-6088, 2019 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-31947233

RESUMO

In this work, we present FREGEX a method for automatically extracting features from biomedical texts based on regular expressions. Using Smith-Waterman and Needleman-Wunsch sequence alignment algorithms, tokens were extracted from biomedical texts and represented by common patterns. Three manually annotated datasets with information on obesity, obesity types, and smoking habits were used to evaluate the effectiveness of the proposed method. Features extracted using consecutive sequences of tokens (n-grams) were used for comparison, and both types of features were mathematically represented using the TF-IDF vector model. Support Vector Machine and Naïve Bayes classifiers were trained, and their performances were ultimately used to assess the ability of the feature extraction methods. Results indicate that features based on regular expressions not only improved the performance of both classifiers in all datasets but also use fewer features than n-grams, especially in those datasets containing information related to anthropometric measures (obesity and obesity types).


Assuntos
Máquina de Vetores de Suporte , Algoritmos , Teorema de Bayes
5.
Artigo em Inglês | MEDLINE | ID: mdl-25570550

RESUMO

In this work we present a system to identify and extract patient's smoking status from clinical narrative text in Spanish. The clinical narrative text was processed using natural language processing techniques, and annotated by four people with a biomedical background. The dataset used for classification had 2,465 documents, each one annotated with one of the four smoking status categories. We used two feature representations: single word token and bigrams. The classification problem was divided in two levels. First recognizing between smoker (S) and non-smoker (NS); second recognizing between current smoker (CS) and past smoker (PS). For each feature representation and classification level, we used two classifiers: Support Vector Machines (SVM) and Bayesian Networks (BN). We split our dataset as follows: a training set containing 66% of the available documents that was used to build classifiers and a test set containing the remaining 34% of the documents that was used to test and evaluate the model. Our results show that SVM together with the bigram representation performed better in both classification levels. For S vs NS classification level performance measures were: ACC=85%, Precision=85%, and Recall=90%. For CS vs PS classification level performance measures were: ACC=87%, Precision=91%, and Recall=94%.


Assuntos
Bases de Dados Factuais , Registros Eletrônicos de Saúde/classificação , Processamento de Linguagem Natural , Fumar , Teorema de Bayes , Chile , Humanos , Narração , Máquina de Vetores de Suporte
6.
Artigo em Inglês | MEDLINE | ID: mdl-24109833

RESUMO

This work presents a study of medical equipment availability in the short and long term. The work is divided in two parts. The first part is an analysis of the medical equipment inventory for the institution of study. We consider the replacement, maintenance, and reinforcement of the available medical equipment by considering local guidelines and surveying clinical personnel appreciation. The resulting recommendation is to upgrade the current equipment inventory if necessary. The second part considered a demand analysis in the short and medium term. We predicted the future demand with a 5-year horizon using Holt-Winters models. Inventory analysis showed that 27% of the medical equipment in stock was not functional. Due to this poor performance result we suggested that the hospital gradually addresses this situation by replacing 29 non-functional equipment items, reinforcing stock with 40 new items, and adding 11 items not available in the inventory but suggested by the national guidelines. The results suggest that general medicine inpatient demand has a tendency to increase within the time e.g. for general medicine inpatient service the highest increment is obtained by respiratory (12%, RMSE=8%) and genitourinary diseases (20%, RMSE=9%). This increment did not involve any further upgrading of the proposed inventory.


Assuntos
Equipamentos e Provisões/provisão & distribuição , Equipamentos e Provisões/estatística & dados numéricos , Hospitais Públicos/estatística & dados numéricos , Pacientes Internados/estatística & dados numéricos , Coleta de Dados/estatística & dados numéricos , Doença , Humanos , Alta do Paciente/estatística & dados numéricos
7.
Stud Health Technol Inform ; 192: 1193, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23920967

RESUMO

The use of text mining and supervised machine learning algorithms on biomedical databases has become increasingly common. However, a question remains: How much data must be annotated to create a suitable training set for a machine learning classifier? In prior research with active learning in medical text classification, we found evidence that not only sample size but also some of the intrinsic characteristics of the texts being analyzed-such as the size of the vocabulary and the length of a document-may also influence the resulting classifier's performance. This study is an attempt to create a regression model to predict performance based on sample size and other text features. While the model needs to be trained on existing datasets, we believe it is feasible to predict performance without obtaining annotations from new datasets once the model is built.


Assuntos
Inteligência Artificial , Documentação/classificação , Documentação/estatística & dados numéricos , Uso Significativo/estatística & dados numéricos , Processamento de Linguagem Natural , Terminologia como Assunto , Vocabulário Controlado , Curadoria de Dados/métodos , Mineração de Dados/estatística & dados numéricos , Reconhecimento Automatizado de Padrão/métodos , Reconhecimento Automatizado de Padrão/estatística & dados numéricos , Tamanho da Amostra
8.
Artigo em Inglês | MEDLINE | ID: mdl-24110659

RESUMO

Atrial fibrillation (AF) is the most common arrhythmia encountered in clinical research. In particular, the study of AF types or sub-classes is a very interesting research topic. In this paper we present a preliminary study to find sub-classes of AF from real 12-lead ECG recordings using k-means and hierarchical clustering algorithms. We applied blind source separation to an initial set of 218 recordings from which we extracted a subset of 136 atrial activity signals displaying known properties of AF. As features for clustering we proposed the peak frequency mean value (PFM), peak frequency standard deviation (PFSD) and the spectral concentration (SC). We computed the silhouette coefficient to obtain an optimal number of clusters of k=5, and conducted preliminary feature selection to evaluate clustering quality. We observed that the separability increases if we discard SC as a feature. The proposed method is the first stage to a future AF classification method, which combined with specialist advice, should help in the clinical field.


Assuntos
Fibrilação Atrial/fisiopatologia , Eletrocardiografia/métodos , Átrios do Coração/fisiopatologia , Processamento de Sinais Assistido por Computador , Algoritmos , Análise por Conglomerados , Humanos , Distribuição Normal , Reconhecimento Automatizado de Padrão , Reprodutibilidade dos Testes
9.
Comput Biol Med ; 43(10): 1628-36, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-24034755

RESUMO

In this paper we apply independent component analysis (ICA) followed by second order blind identification (SOBI) to an atrial fibrillation (AF) 12-lead electrocardiogram (ECG) recording in order to extract the source that represents atrial activity (AA) (ICA-SOBI method). Still, there is no assurance that only one source obtained from this method will contain AA, and thus we aim to select the most representative source of AA. The novelty in this paper is the proposal of three parameters to select the most representative source of AA. These parameters are correlation coefficient with lead V1 (CV1), peak factor (PF) and spectral concentration (SC). The first two parameters are introduced as new indicators, addressing features overlooked by the SC even when they are present in AA during AF. For synthesized data, at least two of the three parameters select the same representation of AA in 93.3% of the cases. For real data (218 ECG recordings), we observe that PF presents, in 89.5% of the cases, values between 2 and 4.5 for the selected sources, ensuring a well-defined range of values for AA. The actual values of CV1 and SC were scattered throughout their possible ranges (0-1 for CV1 and 0.08-0.7 for SC), and the correlation coefficient between these variables was found to be ρ=0.58. We compared our results with three known algorithms: QRST cancellation, principal components analysis (PCA) and ICA-SOBI. The results obtained from this comparison show that our proposed methods to select the best representation of AA in general outperform the three above-mentioned algorithms.


Assuntos
Fibrilação Atrial/fisiopatologia , Eletrocardiografia/métodos , Átrios do Coração/fisiopatologia , Processamento de Sinais Assistido por Computador , Humanos , Análise de Componente Principal
10.
J Am Med Inform Assoc ; 19(5): 809-16, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22707743

RESUMO

OBJECTIVE: This study explores active learning algorithms as a way to reduce the requirements for large training sets in medical text classification tasks. DESIGN: Three existing active learning algorithms (distance-based (DIST), diversity-based (DIV), and a combination of both (CMB)) were used to classify text from five datasets. The performance of these algorithms was compared to that of passive learning on the five datasets. We then conducted a novel investigation of the interaction between dataset characteristics and the performance results. MEASUREMENTS: Classification accuracy and area under receiver operating characteristics (ROC) curves for each algorithm at different sample sizes were generated. The performance of active learning algorithms was compared with that of passive learning using a weighted mean of paired differences. To determine why the performance varies on different datasets, we measured the diversity and uncertainty of each dataset using relative entropy and correlated the results with the performance differences. RESULTS: The DIST and CMB algorithms performed better than passive learning. With a statistical significance level set at 0.05, DIST outperformed passive learning in all five datasets, while CMB was found to be better than passive learning in four datasets. We found strong correlations between the dataset diversity and the DIV performance, as well as the dataset uncertainty and the performance of the DIST algorithm. CONCLUSION: For medical text classification, appropriate active learning algorithms can yield performance comparable to that of passive learning with considerably smaller training sets. In particular, our results suggest that DIV performs better on data with higher diversity and DIST on data with lower uncertainty.


Assuntos
Mineração de Dados/métodos , Processamento de Linguagem Natural , Algoritmos , Inteligência Artificial , Humanos , Curva ROC
11.
AMIA Annu Symp Proc ; 2009: 188-92, 2009 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-20351847

RESUMO

We developed a method to help tailor a comprehensive vocabulary system (e.g. the UMLS) for a sub-domain (e.g. clinical reports) in support of natural language processing (NLP). The method detects unused sense in a sub-domain by comparing the relational neighborhood of a word/term in the vocabulary with the semantic neighborhood of the word/term in the sub-domain. The semantic neighborhood of the word/term in the sub-domain is determined using latent semantic analysis (LSA). We trained and tested the unused sense detection on two clinical text corpora: one contains discharge summaries and the other outpatient visit notes. We were able to detect unused senses with precision from 79% to 87%, recall from 48% to 74%, and an area under receiver operation curve (AUC) of 72% to 87%.


Assuntos
Processamento de Linguagem Natural , Vocabulário Controlado , Área Sob a Curva , Inteligência Artificial , Curva ROC , Semântica , Unified Medical Language System
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa