Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
J Biomed Inform ; 61: 267-75, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-27064059

RESUMO

OBJECTIVE: A significant challenge in treating rare forms of cancer such as Glioblastoma (GBM) is to find optimal personalized treatment plans for patients. The goals of our study is to predict which patients survive longer than the median survival time for GBM based on clinical and genomic factors, and to assess the predictive power of treatment patterns. METHOD: We developed a predictive model based on the clinical and genomic data from approximately 300 newly diagnosed GBM patients for a period of 2years. We proposed sequential mining algorithms with novel clinical constraints, namely, 'exact-order' and 'temporal overlap' constraints, to extract treatment patterns as features used in predictive modeling. With diverse features from clinical, genomic information and treatment patterns, we applied both logistic regression model and Cox regression to model patient survival outcome. RESULTS: The most predictive features influencing the survival period of GBM patients included mRNA expression levels of certain genes, some clinical characteristics such as age, Karnofsky performance score, and therapeutic agents prescribed in treatment patterns. Our models achieved c-statistic of 0.85 for logistic regression and 0.84 for Cox regression. CONCLUSIONS: We demonstrated the importance of diverse sources of features in predicting GBM patient survival outcome. The predictive model presented in this study is a preliminary step in a long-term plan of developing personalized treatment plans for GBM patients that can later be extended to other types of cancers.


Assuntos
Neoplasias Encefálicas , Mineração de Dados , Marcadores Genéticos , Glioblastoma , Algoritmos , Humanos , Modelos Teóricos , Prognóstico , RNA Mensageiro/metabolismo , Taxa de Sobrevida
2.
Health Serv Res ; 53(2): 1110-1136, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-28295260

RESUMO

OBJECTIVE: To evaluate the prevalence of seven social factors using physician notes as compared to claims and structured electronic health records (EHRs) data and the resulting association with 30-day readmissions. STUDY SETTING: A multihospital academic health system in southeastern Massachusetts. STUDY DESIGN: An observational study of 49,319 patients with cardiovascular disease admitted from January 1, 2011, to December 31, 2013, using multivariable logistic regression to adjust for patient characteristics. DATA COLLECTION/EXTRACTION METHODS: All-payer claims, EHR data, and physician notes extracted from a centralized clinical registry. PRINCIPAL FINDINGS: All seven social characteristics were identified at the highest rates in physician notes. For example, we identified 14,872 patient admissions with poor social support in physician notes, increasing the prevalence from 0.4 percent using ICD-9 codes and structured EHR data to 16.0 percent. Compared to an 18.6 percent baseline readmission rate, risk-adjusted analysis showed higher readmission risk for patients with housing instability (readmission rate 24.5 percent; p < .001), depression (20.6 percent; p < .001), drug abuse (20.2 percent; p = .01), and poor social support (20.0 percent; p = .01). CONCLUSIONS: The seven social risk factors studied are substantially more prevalent than represented in administrative data. Automated methods for analyzing physician notes may enable better identification of patients with social needs.


Assuntos
Documentação/estatística & dados numéricos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Readmissão do Paciente/estatística & dados numéricos , Médicos , Acidentes por Quedas/estatística & dados numéricos , Adolescente , Adulto , Fatores Etários , Idoso , Idoso de 80 Anos ou mais , Depressão/epidemiologia , Feminino , Pessoas Mal Alojadas/estatística & dados numéricos , Humanos , Revisão da Utilização de Seguros/estatística & dados numéricos , Modelos Logísticos , Masculino , Massachusetts , Pessoa de Meia-Idade , Processamento de Linguagem Natural , Fatores de Risco , Fatores Sexuais , Apoio Social , Fatores Socioeconômicos , Transtornos Relacionados ao Uso de Substâncias/epidemiologia , Fatores de Tempo , Adulto Jovem
3.
Nucleic Acids Res ; 33(Database issue): D611-3, 2005 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-15608272

RESUMO

MITOMAP (http://www.MITOMAP.org), a database for the human mitochondrial genome, has grown rapidly in data content over the past several years as interest in the role of mitochondrial DNA (mtDNA) variation in human origins, forensics, degenerative diseases, cancer and aging has increased dramatically. To accommodate this information explosion, MITOMAP has implemented a new relational database and an improved search engine, and all programs have been rewritten. System administrative changes have been made to improve security and efficiency, and to make MITOMAP compatible with a new automatic mtDNA sequence analyzer known as Mitomaster.


Assuntos
DNA Mitocondrial/química , Bases de Dados de Ácidos Nucleicos , Genoma Humano , Mitocôndrias/genética , Sistemas de Gerenciamento de Base de Dados , Predisposição Genética para Doença , Variação Genética , Genômica , Humanos , Mutação , Integração de Sistemas , Interface Usuário-Computador
4.
Artigo em Inglês | MEDLINE | ID: mdl-17044165

RESUMO

Partitioning closely related genes into clusters has become an important element of practically all statistical analyses of microarray data. A number of computer algorithms have been developed for this task. Although these algorithms have demonstrated their usefulness for gene clustering, some basic problems remain. This paper describes our work on extracting functional keywords from MEDLINE for a set of genes that are isolated for further study from microarray experiments based on their differential expression patterns. The sharing of functional keywords among genes is used as a basis for clustering in a new approach called BEA-PARTITION in this paper. Functional keywords associated with genes were extracted from MEDLINE abstracts. We modified the Bond Energy Algorithm (BEA), which is widely accepted in psychology and database design but is virtually unknown in bioinformatics, to cluster genes by functional keyword associations. The results showed that BEA-PARTITION and hierarchical clustering algorithm outperformed k-means clustering and self-organizing map by correctly assigning 25 of 26 genes in a test set of four known gene groups. To evaluate the effectiveness of BEA-PARTITION for clustering genes identified by microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle and have been widely studied in the literature were used as a second test set. Using established measures of cluster quality, the results produced by BEA-PARTITION had higher purity, lower entropy, and higher mutual information than those produced by k-means and self-organizing map. Whereas BEA-PARTITION and the hierarchical clustering produced similar quality of clusters, BEA-PARTITION provides clear cluster boundaries compared to the hierarchical clustering. BEA-PARTITION is simple to implement and provides a powerful approach to clustering genes or to any clustering problem where starting matrices are available from experimental observations.


Assuntos
Algoritmos , MEDLINE , Família Multigênica/fisiologia , Processamento de Linguagem Natural , Publicações Periódicas como Assunto , Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Indexação e Redação de Resumos/métodos , Perfilação da Expressão Gênica/métodos , Armazenamento e Recuperação da Informação/métodos , Proteínas/classificação , Vocabulário Controlado
5.
Int J Data Min Bioinform ; 1(1): 88-110, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-18402044

RESUMO

One of the key challenges of microarray studies is to derive biological insights from the gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the functional links among genes. However, the quality of the keyword lists significantly affects the clustering results. We compared two keyword weighting schemes: normalised z-score and term frequency-inverse document frequency (TFIDF). Two gene sets were tested to evaluate the effectiveness of the weighting schemes for keyword extraction for gene clustering. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords outperformed those produced from normalised z-score weighted keywords. The optimised algorithms should be useful for partitioning genes from microarray lists into functionally discrete clusters.


Assuntos
Processamento Eletrônico de Dados , Regulação Fúngica da Expressão Gênica/fisiologia , Genes Fúngicos/fisiologia , MEDLINE , Saccharomyces cerevisiae/fisiologia , Vocabulário Controlado
6.
Artigo em Inglês | MEDLINE | ID: mdl-16447994

RESUMO

Specific topic search in the PubMed Database, one of the most important information resources for scientific community, presents a big challenge to the users. The researcher typically formulates boolean queries followed by scanning the retrieved records for relevance, which is very time consuming and error prone. We applied Support Vector Machines (SVM) for automatic retrieval of PubMed articles related to Human genome epidemiological research at CDC (Center for disease Control and Prevention). In this paper, we discuss various investigations into biomedical literature classification and analyze the effect of various issues related to the choice of keywords, training sets, kernel functions and parameters for the SVM technique. We report on the various factors above to show that SVM is a viable technique for automatic classification of biomedical literature into topics of interest such as epidemiology, cancer, birth defects etc. In all our experiments, we achieved high values of PPV, sensitivity and specificity.


Assuntos
Indexação e Redação de Resumos/métodos , Sistemas de Gerenciamento de Base de Dados , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão/métodos , Publicações Periódicas como Assunto , PubMed , Algoritmos , Inteligência Artificial , Vocabulário Controlado
7.
Artigo em Inglês | MEDLINE | ID: mdl-16448032

RESUMO

One of the key challenges of microarray studies is to derive biological insights from the unprecedented quatities of data on gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the nature of the functional links among genes within the derived clusters. However, the quality of the keyword lists extracted from biomedical literature for each gene significantly affects the clustering results. We extracted keywords from MEDLINE that describes the most prominent functions of the genes, and used the resulting weights of the keywords as feature vectors for gene clustering. By analyzing the resulting cluster quality, we compared two keyword weighting schemes: normalized z-score and term frequency-inverse document frequency (TFIDF). The best combination of background comparison set, stop list and stemming algorithm was selected based on precision and recall metrics. In a test set of four known gene groups, a hierarchical algorithm correctly assigned 25 of 26 genes to the appropriate clusters based on keywords extracted by the TDFIDF weighting scheme, but only 23 og 26 with the z-score method. To evaluate the effectiveness of the weighting schemes for keyword extraction for gene clusters from microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle were used as a second test set. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords had higher purity, lower entropy, and higher mutual information than those produced from normalized z-score weighted keywords. The optimized algorithms should be useful for sorting genes from microarray lists into functionally discrete clusters.


Assuntos
Inteligência Artificial , Análise por Conglomerados , MEDLINE , Família Multigênica/genética , Processamento de Linguagem Natural , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Vocabulário Controlado , Armazenamento e Recuperação da Informação/métodos , Relação Estrutura-Atividade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA