Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
Artif Intell Med ; 65(2): 89-96, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26363683

RESUMO

OBJECTIVE: The ability to predict patient readmission risk is extremely valuable for hospitals, especially under the Hospital Readmission Reduction Program of the Center for Medicare and Medicaid Services which went into effect starting October 1, 2012. There is a plethora of work in the literature that deals with developing readmission risk prediction models, but most of them do not have sufficient prediction accuracy to be deployed in a clinical setting, partly because different hospitals may have different characteristics in their patient populations. METHODS AND MATERIALS: We propose a generic framework for institution-specific readmission risk prediction, which takes patient data from a single institution and produces a statistical risk prediction model optimized for that particular institution and, optionally, for a specific condition. This provides great flexibility in model building, and is also able to provide institution-specific insights in its readmitted patient population. We have experimented with classification methods such as support vector machines, and prognosis methods such as the Cox regression. We compared our methods with industry-standard methods such as the LACE model, and showed the proposed framework is not only more flexible but also more effective. RESULTS: We applied our framework to patient data from three hospitals, and obtained some initial results for heart failure (HF), acute myocardial infarction (AMI), pneumonia (PN) patients as well as patients with all conditions. On Hospital 2, the LACE model yielded AUC 0.57, 0.56, 0.53 and 0.55 for AMI, HF, PN and All Cause readmission prediction, respectively, while the proposed model yielded 0.66, 0.65, 0.63, 0.74 for the corresponding conditions, all significantly better than the LACE counterpart. The proposed models that leverage all features at discharge time is more accurate than the models that only leverage features at admission time (0.66 vs. 0.61 for AMI, 0.65 vs. 0.61 for HF, 0.63 vs. 0.56 for PN, 0.74 vs. 0.60 for All Cause). Furthermore, the proposed admission-time models already outperform the performance of LACE, which is a discharge-time model (0.61 vs. 0.57 for AMI, 0.61 vs. 0.56 for HF, 0.56 vs. 0.53 for PN, 0.60 vs. 0.55 for All Cause). Similar conclusions can be drawn from other hospitals as well. The same performance comparison also holds for precision and recall at top-decile predictions. Most of the performance improvements are statistically significant. CONCLUSIONS: The institution-specific readmission risk prediction framework is more flexible and more effective than the one-size-fit-all models like the LACE, sometimes twice and three-time more effective. The admission-time models are able to give early warning signs compared to the discharge-time models, and may be able to help hospital staff intervene early while the patient is still in the hospital.


Assuntos
Modelos Teóricos , Readmissão do Paciente , Humanos , Modelos de Riscos Proporcionais , Medição de Risco , Máquina de Vetores de Suporte
2.
Artigo em Inglês | MEDLINE | ID: mdl-24303296

RESUMO

One of the important pieces of information in a patient's clinical record is the information about their medications. Besides administering information, it also consists of the category of the medication i.e. whether the patient was taking these medications at Home, were administered in the Emergency Department, during course of stay or on discharge etc. Unfortunately, much of this information is presently embedded in unstructured clinical notes e.g. in ER records, History & Physical documents etc. This information is required for adherence to quality and regulatory guidelines or for retrospective analysis e.g. CMS reporting. It is a manually intensive process to extract such information. This paper explains in detail a statistical NLP system developed to extract such information. We have trained a Maximum Entropy Markov model to categorize instances of medication names into previously defined categories. The system was tested on a variety of clinical notes from different institutions and we achieved an average accuracy of 91.3%.

3.
AMIA Annu Symp Proc ; 2011: 1603-11, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-22195226

RESUMO

Information extraction from clinical free text is one of the key elements in medical informatics research. In this paper we propose a general framework to improve learning-based information extraction systems with the help of rich annotations (i.e., annotators provide the medical assertion as well as evidences that support the assertion). A special graphical interface was developed to facilitate the annotation process, and we show how to implement this framework with a state-of-the-art context-based question answering system. Empirical studies demonstrate that with about 10% longer annotation time, we can significantly improve the accuracy of the system. An approach to provide supporting evidence for test documents is also briefly discussed with promising preliminary results.


Assuntos
Algoritmos , Inteligência Artificial , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação/métodos , Humanos , Processamento de Linguagem Natural
4.
AMIA Annu Symp Proc ; 2010: 682-6, 2010 Nov 13.
Artigo em Inglês | MEDLINE | ID: mdl-21347065

RESUMO

This paper describes a machine learning, text processing approach that allows the extraction of key medical information from unstructured text in Electronic Medical Records. The approach utilizes a novel text representation that shares the simplicity of the widely used bag-of-words representation, but can also represent some form of semantic information in the text. The large dimensionality of this type of learning models is controlled by the use of a ℓ(1) regularization to favor parsimonious models. Experimental results demonstrate the accuracy of the approach in extracting medical assertions that can be associated to polarity and relevance detection.


Assuntos
Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Humanos , Semântica
5.
IEEE Trans Pattern Anal Mach Intell ; 30(7): 1158-70, 2008 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-18550900

RESUMO

We consider the problem of learning the ranking function that maximizes a generalization of the Wilcoxon-Mann-Whitney statistic on the training data. Relying on an $\epsilon$-accurate approximation for the error-function, we reduce the computational complexity of each iteration of a conjugate gradient algorithm for learning ranking functions from O(m2) to O(m2), where m is the number of training samples. Experiments on public benchmarks for ordinal regression and collaborative filtering indicate that the proposed algorithm is as accurate as the best available methods in terms of ranking accuracy, when the algorithms are trained on the same data. However, since it is several orders of magnitude faster than the current state-of-the-art approaches, it is able to leverage much larger training datasets.


Assuntos
Algoritmos , Inteligência Artificial , Bases de Dados Factuais , Armazenamento e Recuperação da Informação/métodos , Modelos Estatísticos , Reconhecimento Automatizado de Padrão/métodos , Simulação por Computador , Funções Verossimilhança
6.
IEEE Trans Biomed Eng ; 55(3): 1015-21, 2008 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-18334393

RESUMO

Many computer-aided diagnosis (CAD) problems can be best modelled as a multiple-instance learning (MIL) problem with unbalanced data, i.e., the training data typically consists of a few positive bags, and a very large number of negative instances. Existing MIL algorithms are much too computationally expensive for these datasets. We describe CH, a framework for learning a Convex Hull representation of multiple instances that is significantly faster than existing MIL algorithms. Our CH framework applies to any standard hyperplane-based learning algorithm, and for some algorithms, is guaranteed to find the global optimal solution. Experimental studies on two different CAD applications further demonstrate that the proposed algorithm significantly improves diagnostic accuracy when compared to both MIL and traditional classifiers. Although not designed for standard MIL problems (which have both positive and negative bags and relatively balanced datasets), comparisons against other MIL methods on benchmark problems also indicate that the proposed method is competitive with the state-of-the-art.


Assuntos
Algoritmos , Inteligência Artificial , Neoplasias do Colo/diagnóstico por imagem , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Embolia Pulmonar/diagnóstico por imagem , Humanos , Radiografia , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
7.
Radiother Oncol ; 83(3): 374-82, 2007 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-17532074

RESUMO

BACKGROUND AND PURPOSE: Hypoxia is a common feature of solid tumors associated with therapy resistance, increased malignancy and poor prognosis. Several approaches have been developed with the hope of identifying patients harboring hypoxic tumors including the use of microarray based gene signatures. However, studies to date have largely ignored the strong time dependency of hypoxia-regulated gene expression. We hypothesized that use of time-dependent patterns of gene expression during hypoxia would enable development of superior prognostic expression signatures. MATERIALS AND METHODS: Using published data from the microarray study of Chi et al., we extracted gene signatures correlating with induction during either early or late hypoxic exposure. Gene signatures were derived from in vitro exposed human mammary epithelial cell line (HMEC) under 0% or 2% oxygen. Gene signatures correlating with early and late up-regulation were tested by means of Kaplan-Meier survival, univariate, and multivariate analysis on a patient data set with primary breast cancer treated conventionally (surgery plus on indication radiotherapy and systemic therapy). RESULTS: We found that the two early hypoxia gene signatures extracted from 0% and 2% hypoxia showed significant prognostic power (log-rank test: p=0.004 at 0%, p=0.034 at 2%) in contrast to the late hypoxia signatures. Both early gene signatures were linked to the insulin pathway. From the multivariate Cox-regression analysis, the early hypoxia signature (p=0.254) was found to be the 4th best prognostic factor after lymph node status (p=0.002), tumor size (p=0.016) and Elston grade (p=0.111). On this data set it indeed provided more information than ER status or p53 status. CONCLUSIONS: The hypoxic stress elicits a wide panel of temporal responses corresponding to different biological pathways. Early hypoxia signatures were shown to have a significant prognostic power. These data suggest that gene signatures identified from in vitro experiments could contribute to individualized medicine.


Assuntos
Hipóxia Celular/genética , Perfilação da Expressão Gênica , Fator 1 Induzível por Hipóxia/genética , Fator 1 Induzível por Hipóxia/metabolismo , Neoplasias/genética , Oxigênio/metabolismo , Bases de Dados Genéticas , Células Epiteliais/metabolismo , Feminino , Humanos , Pessoa de Meia-Idade , Neoplasias/diagnóstico , Neoplasias/fisiopatologia , Análise de Sequência com Séries de Oligonucleotídeos , Valor Preditivo dos Testes , Prognóstico , Análise de Sobrevida , Fatores de Tempo
8.
IEEE Trans Pattern Anal Mach Intell ; 29(3): 427-36, 2007 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-17224613

RESUMO

We address the incomplete-data problem in which feature vectors to be classified are missing data (features). A (supervised) logistic regression algorithm for the classification of incomplete data is developed. Single or multiple imputation for the missing data is avoided by performing analytic integration with an estimated conditional density function (conditioned on the observed data). Conditional density functions are estimated using a Gaussian mixture model (GMM), with parameter estimation performed using both Expectation-Maximization (EM) and Variational Bayesian EM (VB-EM). The proposed supervised algorithm is then extended to the semisupervised case by incorporating graph-based regularization. The semisupervised algorithm utilizes all available data-both incomplete and complete, as well as labeled and unlabeled. Experimental results of the proposed classification algorithms are shown.


Assuntos
Algoritmos , Inteligência Artificial , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Simulação por Computador , Modelos Logísticos , Reprodutibilidade dos Testes , Tamanho da Amostra , Sensibilidade e Especificidade
9.
IEEE Trans Pattern Anal Mach Intell ; 28(4): 522-32, 2006 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-16566502

RESUMO

In this paper, we present a varitional Bayes (VB) framework for learning continuous hidden Markov models (CHMMs), and we examine the VB framework within active learning. Unlike a maximum likelihood or maximum a posteriori training procedure, which yield a point estimate of the CHMM parameters, VB-based training yields an estimate of the full posterior of the model parameters. This is particularly important for small training sets since it gives a measure of confidence in the accuracy of the learned model. This is utilized within the context of active learning, for which we acquire labels for those feature vectors for which knowledge of the associated label would be most informative for reducing model-parameter uncertainty. Three active learning algorithms are considered in this paper: 1) query by committee (QBC), with the goal of selecting data for labeling that minimize the classification variance, 2) a maximum expected information gain method that seeks to label data with the goal of reducing the entropy of the model parameters, and 3) an error-reduction-based procedure that attempts to minimize classification error over the test data. The experimental results are presented for synthetic and measured data. We demonstrate that all of these active learning methods can significantly reduce the amount of required labeling, compared to random selection of samples for labeling.


Assuntos
Algoritmos , Inteligência Artificial , Armazenamento e Recuperação da Informação/métodos , Modelos Estatísticos , Reconhecimento Automatizado de Padrão/métodos , Teorema de Bayes , Análise por Conglomerados , Simulação por Computador , Cadeias de Markov
10.
IEEE Trans Pattern Anal Mach Intell ; 27(6): 957-68, 2005 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-15943426

RESUMO

Recently developed methods for learning sparse classifiers are among the state-of-the-art in supervised learning. These methods learn classifiers that incorporate weighted sums of basis functions with sparsity-promoting priors encouraging the weight estimates to be either significantly large or exactly zero. From a learning-theoretic perspective, these methods control the capacity of the learned classifier by minimizing the number of basis functions used, resulting in better generalization. This paper presents three contributions related to learning sparse classifiers. First, we introduce a true multiclass formulation based on multinomial logistic regression. Second, by combining a bound optimization approach with a component-wise update procedure, we derive fast exact algorithms for learning sparse multiclass classifiers that scale favorably in both the number of training samples and the feature dimensionality, making them applicable even to large data sets in high-dimensional feature spaces. To the best of our knowledge, these are the first algorithms to perform exact multinomial logistic regression with a sparsity-promoting prior. Third, we show how nontrivial generalization bounds can be derived for our classifier in the binary case. Experimental results on standard benchmark data sets attest to the accuracy, sparsity, and efficiency of the proposed methods.


Assuntos
Algoritmos , Inteligência Artificial , Armazenamento e Recuperação da Informação/métodos , Modelos Estatísticos , Reconhecimento Automatizado de Padrão/métodos , Análise por Conglomerados , Simulação por Computador , Modelos Biológicos , Análise de Regressão
11.
J Comput Biol ; 11(2-3): 227-42, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15285890

RESUMO

Recent research has demonstrated quite convincingly that accurate cancer diagnosis can be achieved by constructing classifiers that are designed to compare the gene expression profile of a tissue of unknown cancer status to a database of stored expression profiles from tissues of known cancer status. This paper introduces the JCFO, a novel algorithm that uses a sparse Bayesian approach to jointly identify both the optimal nonlinear classifier for diagnosis and the optimal set of genes on which to base that diagnosis. We show that the diagnostic classification accuracy of the proposed algorithm is superior to a number of current state-of-the-art methods in a full leave-one-out cross-validation study of five widely used benchmark datasets. In addition to its superior classification accuracy, the algorithm is designed to automatically identify a small subset of genes (typically around twenty in our experiments) that are capable of providing complete discriminatory information for diagnosis. Focusing attention on a small subset of genes is useful not only because it produces a classifier with good generalization capacity, but also because this set of genes may provide insights into the mechanisms responsible for the disease itself. A number of the genes identified by the JCFO in our experiments are already in use as clinical markers for cancer diagnosis; some of the remaining genes may be excellent candidates for further clinical investigation. If it is possible to identify a small set of genes that is indeed capable of providing complete discrimination, inexpensive diagnostic assays might be widely deployable in clinical settings.


Assuntos
Algoritmos , Biologia Computacional , Perfilação da Expressão Gênica , Neoplasias/diagnóstico , Teorema de Bayes , Interpretação Estatística de Dados , Neoplasias/genética , Análise de Regressão
12.
IEEE Trans Pattern Anal Mach Intell ; 26(9): 1105-11, 2004 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-15742887

RESUMO

This paper adopts a Bayesian approach to simultaneously learn both an optimal nonlinear classifier and a subset of predictor variables (or features) that are most relevant to the classification task. The approach uses heavy-tailed priors to promote sparsity in the utilization of both basis functions and features; these priors act as regularizers for the likelihood function that rewards good classification on the training data. We derive an expectation-maximization (EM) algorithm to efficiently compute a maximum a posteriori (MAP) point estimate of the various parameters. The algorithm is an extension of recent state-of-the-art sparse Bayesian classifiers, which in turn can be seen as Bayesian counterparts of support vector machines. Experimental comparisons using kernel classifiers demonstrate both parsimonious feature selection and excellent classification accuracy on a range of synthetic and benchmark data sets.


Assuntos
Algoritmos , Inteligência Artificial , Teorema de Bayes , Diagnóstico por Computador/métodos , Perfilação da Expressão Gênica/métodos , Modelos Biológicos , Reconhecimento Automatizado de Padrão/métodos , Biomarcadores Tumorais/genética , Análise por Conglomerados , Neoplasias do Colo/diagnóstico , Neoplasias do Colo/genética , Simulação por Computador , Humanos , Armazenamento e Recuperação da Informação/métodos , Leucemia/diagnóstico , Leucemia/genética , Modelos Estatísticos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA