Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
1.
Am J Speech Lang Pathol ; 33(3): 1174-1192, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38290536

RESUMO

PURPOSE: Augmentative and alternative communication (AAC) technology innovation is urgently needed to improve outcomes for children on the autism spectrum who are minimally verbal. One potential technology innovation is applying artificial intelligence (AI) to automate strategies such as augmented input to increase language learning opportunities while mitigating communication partner time and learning barriers. Innovation in AAC research and design methodology is also needed to empirically explore this and other applications of AI to AAC. The purpose of this report was to describe (a) the development of an AAC prototype using a design methodology new to AAC research and (b) a preliminary investigation of the efficacy of this potential new AAC capability. METHOD: The prototype was developed using a Wizard-of-Oz prototyping approach that allows for initial exploration of a new technology capability without the time and effort required for full-scale development. The preliminary investigation with three children on the autism spectrum who were minimally verbal used an adapted alternating treatment design to compare the effects of a Wizard-of-Oz prototype that provided automated augmented input (i.e., pairing color photos with speech) to a standard topic display (i.e., a grid display with line drawings) on visual attention, linguistic participation, and (for one participant) word learning during a circle activity. RESULTS: Preliminary investigation results were variable, but overall participants increased visual attention and linguistic participation when using the prototype. CONCLUSIONS: Wizard-of-Oz prototyping could be a valuable approach to spur much needed innovation in AAC. Further research into efficacy, reliability, validity, and attitudes is required to more comprehensively evaluate the use of AI to automate augmented input in AAC.


Assuntos
Transtorno do Espectro Autista , Auxiliares de Comunicação para Pessoas com Deficiência , Humanos , Transtorno do Espectro Autista/terapia , Masculino , Criança , Feminino , Inteligência Artificial , Pré-Escolar , Linguagem Infantil , Dados Preliminares
2.
Methodology (Gott) ; 19(1): 43-59, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37090814

RESUMO

Identification of procedures using International Classification of Diseases or Healthcare Common Procedure Coding System codes is challenging when conducting medical claims research. We demonstrate how Pointwise Mutual Information can be used to find associated codes. We apply the method to an investigation of racial differences in breast cancer outcomes. We used Surveillance Epidemiology and End Results (SEER) data linked to Medicare claims. We identified treatment using two methods. First, we used previously published definitions. Second, we augmented definitions using codes empirically identified by the Pointwise Mutual Information statistic. Similar to previous findings, we found that presentation differences between Black and White women closed much of the estimated survival curve gap. However, we found that survival disparities were completely eliminated with the augmented treatment definitions. We were able to control for a wider range of treatment patterns that might affect survival differences between Black and White women with breast cancer.

3.
Cancer Rep (Hoboken) ; 6(5): e1805, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36943210

RESUMO

BACKGROUND: Additional evaluations, including second opinions, before breast cancer surgery may improve care, but may cause detrimental treatment delays that could allow disease progression. AIMS: We investigate the timing of surgical delays that are associated with survival benefits conferred by preoperative encounters versus the timing that are associated with potential harm. METHODS AND RESULTS: We investigated survival outcomes of SEER Medicare patients with stage 1-3 breast cancer using propensity score-based weighting. We examined interactions between the number of preoperative evaluation components and time from biopsy to definitive surgery. Components include new patient visits, unique surgeons, medical oncologists, or radiation oncologists consulted, established patient encounters, biopsies, and imaging studies. We identified 116 050 cases of whom 99% were female and had an average age of 75.0 (SD = 6.2). We found that new patient visits have a protective association with respect to breast cancer mortality if they occur quickly after diagnosis with breast cancer mortality subdistribution Hazard Ratios [sHRs] = 0.87 (95% Confidence Interval [CI] 0.76-1.00) for 2, 0.71 (CI 0.55-0.92) for 3, and 0.63 (CI 0.37-1.07) for 4+ visits at minimal delay. New patient visits predict worsened mortality compared with no visits if the surgical delay is greater than 33 days (CI 14-53) for 2, 33 days (CI 17-49) for 3, and 44 days (CI 12-75) for 4+. Medical oncologist visits predict worse outcomes if the surgical delay is greater than 29 days (CI 20-39) for 1 and 38 days (CI 12-65) for 2+ visits. Similarly, surgeon encounters switch from a positive to a negative association if the surgical delay exceeds 29 days (CI 17-41) for 1 visit, but the positive estimate persists over time for 3+ surgeon visits. CONCLUSION: Preoperative visits that cause substantial delays may be associated with increased mortality in older patients with breast cancer.


Assuntos
Neoplasias da Mama , Humanos , Feminino , Idoso , Estados Unidos , Masculino , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/cirurgia , Neoplasias da Mama/patologia , Medicare , Encaminhamento e Consulta , Mastectomia/efeitos adversos , Modelos de Riscos Proporcionais
4.
BMC Med Inform Decis Mak ; 22(1): 114, 2022 04 29.
Artigo em Inglês | MEDLINE | ID: mdl-35488252

RESUMO

BACKGROUND: Health providers create Electronic Health Records (EHRs) to describe the conditions and procedures used to treat their patients. Medical notes entered by medical staff in the form of free text are a particularly insightful component of EHRs. There is a great interest in applying machine learning tools on medical notes in numerous medical informatics applications. Learning vector representations, or embeddings, of terms in the notes, is an important pre-processing step in such applications. However, learning good embeddings is challenging because medical notes are rich in specialized terminology, and the number of available EHRs in practical applications is often very small. METHODS: In this paper, we propose a novel algorithm to learn embeddings of medical terms from a limited set of medical notes. The algorithm, called definition2vec, exploits external information in the form of medical term definitions. It is an extension of a skip-gram algorithm that incorporates textual definitions of medical terms provided by the Unified Medical Language System (UMLS) Metathesaurus. RESULTS: To evaluate the proposed approach, we used a publicly available Medical Information Mart for Intensive Care (MIMIC-III) EHR data set. We performed quantitative and qualitative experiments to measure the usefulness of the learned embeddings. The experimental results show that definition2vec keeps the semantically similar medical terms together in the embedding vector space even when they are rare or unobserved in the corpus. We also demonstrate that learned vector embeddings are helpful in downstream medical informatics applications. CONCLUSION: This paper shows that medical term definitions can be helpful when learning embeddings of rare or previously unseen medical terms from a small corpus of specialized documents such as medical notes.


Assuntos
Registros Eletrônicos de Saúde , Unified Medical Language System , Algoritmos , Humanos , Aprendizado de Máquina
5.
SSM Popul Health ; 17: 101023, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35097183

RESUMO

Given the growing number of cancer survivors, it is important to better understand socio-spatial mobility patterns of cancer patients after diagnosis that could have public health implications regarding post-diagnostic access to care for treatment and follow-up surveillance. In this exploratory study, residential histories from LexisNexis were linked to New Jersey colon cancer cases diagnosed from 2006 to 2011 to examine differences in socio-spatial mobility patterns after diagnosis by stage at cancer diagnosis, sex, and race/ethnicity. For the colon cancer cases, we summarized and compared the number of residences and changes in the residential census tract and neighborhood poverty after the diagnosis. We found only minor changes in neighborhood poverty among the cases during the follow-up period after diagnosis. During the follow-up period of up to 10 years after diagnosis, 67% of the patients did not move to a different residential census tract, and 10.8% moved from New Jersey to another state. Cases that moved to a different census tract changed after diagnosis were generally less wealthy than non-movers, but the destination of relocation varied by race/ethnicity and socioeconomic status. We also found a significant association between residential mobility and stage at diagnosis, whereby patients diagnosed with colon cancer at an early stage were more likely to be movers. This study contributes to understanding of the socio-spatial mobility patterns in colon cancer patients and may help to inform cancer research by summarizing the extent to which colon cancer patients move after diagnosis.

6.
AMIA Annu Symp Proc ; 2022: 425-431, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-37128402

RESUMO

Relation Extraction (RE) is an important task in extracting structured data from free biomedical text. Obtaining labeled data needed to train RE models in specialized domains such as biomedicine can be very expensive because it requires expert knowledge. Thus, it is often the case that RE models need to be trained from relatively small labeled data sets. Despite the recent advances in Natural Language Processing (NLP) approaches for RE, training accurate RE models from small labeled data is still an open challenge. In this paper, we propose MERIT, a simple and effective approach for label augmentation that automatically increases the size of labeled data while introducing a moderate labeling noise. We performed extensive experiments on three benchmarks biomedical RE data sets. The results demonstrate the effectiveness of MERIT compared to the baseline.


Assuntos
Processamento de Linguagem Natural , Humanos
7.
Proc ACM Int Conf Inf Knowl Manag ; 2022: 4828-4832, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-36636516

RESUMO

Healthcare providers generate a medical claim after every patient visit. A medical claim consists of a list of medical codes describing the diagnosis and any treatment provided during the visit. Medical claims have been popular in medical research as a data source for retrospective cohort studies. This paper introduces a medical claim visualization system (MedCV) that supports cohort selection from medical claim data. MedCV was developed as part of a design study in collaboration with clinical researchers and statisticians. It helps a researcher to define inclusion rules for cohort selection by revealing relationships between medical codes and visualizing medical claims and patient timelines. Evaluation of our system through a user study indicates that MedCV enables domain experts to define high-quality inclusion rules in a time-efficient manner.

8.
Nat Commun ; 12(1): 6302, 2021 11 02.
Artigo em Inglês | MEDLINE | ID: mdl-34728624

RESUMO

Potts models and variational autoencoders (VAEs) have recently gained popularity as generative protein sequence models (GPSMs) to explore fitness landscapes and predict mutation effects. Despite encouraging results, current model evaluation metrics leave unclear whether GPSMs faithfully reproduce the complex multi-residue mutational patterns observed in natural sequences due to epistasis. Here, we develop a set of sequence statistics to assess the "generative capacity" of three current GPSMs: the pairwise Potts Hamiltonian, the VAE, and the site-independent model. We show that the Potts model's generative capacity is largest, as the higher-order mutational statistics generated by the model agree with those observed for natural sequences, while the VAE's lies between the Potts and site-independent models. Importantly, our work provides a new framework for evaluating and interpreting GPSM accuracy which emphasizes the role of higher-order covariation and epistasis, with broader implications for probabilistic sequence models in general.


Assuntos
Mutação , Proteínas/química , Alinhamento de Sequência/métodos , Algoritmos , Sequência de Aminoácidos , Simulação por Computador , Bases de Dados de Proteínas , Humanos , Modelos Estatísticos , Elementos Estruturais de Proteínas , Proteínas/genética , Relação Estrutura-Atividade
9.
Cancer Causes Control ; 32(9): 989-999, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34117957

RESUMO

PURPOSE: Cutaneous T-cell lymphoma (CTCL) is a rare type of non-Hodgkin lymphoma. Previous studies have reported geographic clustering of CTCL based on the residence at the time of diagnosis. We explore geographic clustering of CTCL using both the residence at the time of diagnosis and past residences using data from the New Jersey State Cancer Registry. METHODS: CTCL cases (n = 1,163) diagnosed between 2006-2014 were matched to colon cancer controls (n = 17,049) on sex, age, race/ethnicity, and birth year. Jacquez's Q-Statistic was used to identify temporal clustering of cases compared to controls. Geographic clustering was assessed using the Bernoulli-based scan-statistic to compare cases to controls, and the Poisson-based scan-statisic to compare the observed number of cases to the number expected based on the general population. Significant clusters (p < 0.05) were mapped, and standard incidence ratios (SIR) reported. We adjusted for diagnosis year, sex, and age. RESULTS: The Q-statistic identified significant temporal clustering of cases based on past residences in the study area from 1992 to 2002. A cluster was detected in 1992 in Bergen County in northern New Jersey based on the Bernoulli (1992 SIR 1.84) and Poisson (1992 SIR 1.86) scan-statistics. Using the Poisson scan-statistic with the diagnosis location, we found evidence of an elevated risk in this same area, but the results were not statistically significant. CONCLUSION: There is evidence of geographic clustering of CTCL cases in New Jersey based on past residences. Additional studies are necessary to understand the possible reasons for the excess of CTCL cases living in this specific area some 8-14 years prior to diagnosis.


Assuntos
Linfoma Cutâneo de Células T , Neoplasias Cutâneas , Análise por Conglomerados , Humanos , Incidência , Linfoma Cutâneo de Células T/diagnóstico , Linfoma Cutâneo de Células T/epidemiologia , New Jersey/epidemiologia , Neoplasias Cutâneas/epidemiologia
10.
Artigo em Inglês | MEDLINE | ID: mdl-33946680

RESUMO

Landscape characteristics have been shown to influence health outcomes, but few studies have examined their relationship with cancer survival. We used data from the National Land Cover Database to examine associations between regional-stage colon cancer survival and 27 different landscape metrics. The study population included all adult New Jersey residents diagnosed between 2006 and 2011. Cases were followed until 31 December 2016 (N = 3949). Patient data were derived from the New Jersey State Cancer Registry and were linked to LexisNexis to obtain residential histories. Cox proportional hazard regression was used to estimate hazard ratios (HR) and 95% confidence intervals (CI95) for the different landscape metrics. An increasing proportion of high-intensity developed lands with 80-100% impervious surfaces per cell/pixel was significantly associated with the risk of colon cancer death (HR = 1.006; CI95 = 1.002-1.01) after controlling for neighborhood poverty and other individual-level factors. In contrast, an increase in the aggregation and connectivity of vegetation-dominated low-intensity developed lands with 20-<40% impervious surfaces per cell/pixel was significantly associated with the decrease in risk of death from colon cancer (HR = 0.996; CI95 = 0.992-0.999). Reducing impervious surfaces in residential areas may increase the aesthetic value and provide conditions more advantageous to a healthy lifestyle, such as walking. Further research is needed to understand how these landscape characteristics impact survival.


Assuntos
Neoplasias do Colo , Características de Residência , Adulto , Neoplasias do Colo/epidemiologia , Humanos , New Jersey/epidemiologia , Pobreza , Modelos de Riscos Proporcionais
11.
Sci Adv ; 7(17)2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-33883136

RESUMO

Incorporation of physical principles in a machine learning (ML) architecture is a fundamental step toward the continued development of artificial intelligence for inorganic materials. As inspired by the Pauling's rule, we propose that structure motifs in inorganic crystals can serve as a central input to a machine learning framework. We demonstrated that the presence of structure motifs and their connections in a large set of crystalline compounds can be converted into unique vector representations using an unsupervised learning algorithm. To demonstrate the use of structure motif information, a motif-centric learning framework is created by combining motif information with the atom-based graph neural networks to form an atom-motif dual graph network (AMDNet), which is more accurate in predicting the electronic structures of metal oxides such as bandgaps. The work illustrates the route toward fundamental design of graph neural network learning architecture for complex materials by incorporating beyond-atom physical principles.

12.
Biometrics ; 77(3): 1089-1100, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-32700317

RESUMO

The pointwise mutual information statistic (PMI), which measures how often two words occur together in a document corpus, is a cornerstone of recently proposed popular natural language processing algorithms such as word2vec. PMI and word2vec reveal semantic relationships between words and can be helpful in a range of applications such as document indexing, topic analysis, or document categorization. We use probability theory to demonstrate the relationship between PMI and word2vec. We use the theoretical results to demonstrate how the PMI can be modeled and estimated in a simple and straight forward manner. We further describe how one can obtain standard error estimates that account for within-patient clustering that arises from patterns of repeated words within a patient's health record due to a unique health history. We then demonstrate the usefulness of PMI on the problem of predictive identification of disease from free text notes of electronic health records. Specifically, we use our methods to distinguish those with and without type 2 diabetes mellitus in electronic health record free text data using over 400 000 clinical notes from an academic medical center.


Assuntos
Diabetes Mellitus Tipo 2 , Processamento de Linguagem Natural , Algoritmos , Registros Eletrônicos de Saúde , Humanos
13.
Cancer Epidemiol Biomarkers Prev ; 29(11): 2119-2125, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32759382

RESUMO

BACKGROUND: Identifying geospatial cancer survival disparities is critical to focus interventions and prioritize efforts with limited resources. Incorporating residential mobility into spatial models may result in different geographic patterns of survival compared with the standard approach using a single location based on the patient's residence at the time of diagnosis. METHODS: Data on 3,949 regional-stage colon cancer cases diagnosed from 2006 to 2011 and followed until December 31, 2016, were obtained from the New Jersey State Cancer Registry. Geographic disparity based on the spatial variance and effect sizes from a Bayesian spatial model using residence at diagnosis was compared with a time-varying spatial model using residential histories [adjusted for sex, gender, substage, race/ethnicity, and census tract (CT) poverty]. Geographic estimates of risk of colon cancer death were mapped. RESULTS: Most patients (65%) remained at the same residence, 22% changed CT, and 12% moved out of state. The time-varying model produced a wider range of adjusted risk of colon cancer death (0.85-1.20 vs. 0.94-1.11) and resulted in greater geographic disparity statewide after adjustment (25.5% vs. 14.2%) compared with the model with only the residence at diagnosis. CONCLUSIONS: Including residential mobility may allow for more precise estimates of spatial risk of death. Results based on the traditional approach using only residence at diagnosis were not substantially different for regional stage colon cancer in New Jersey. IMPACT: Including residential histories opens up new avenues of inquiry to better understand the complex relationships between people and places, and the effect of residential mobility on cancer outcomes.See related commentary by Williams, p. 2107.


Assuntos
Neoplasias do Colo , Características de Residência , Teorema de Bayes , Neoplasias do Colo/epidemiologia , Humanos , New Jersey/epidemiologia , Dinâmica Populacional
14.
PLoS One ; 15(5): e0232528, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32374785

RESUMO

Protein secondary structure prediction remains a vital topic with broad applications. Due to lack of a widely accepted standard in secondary structure predictor evaluation, a fair comparison of predictors is challenging. A detailed examination of factors that contribute to higher accuracy is also lacking. In this paper, we present: (1) new test sets, Test2018, Test2019, and Test2018-2019, consisting of proteins from structures released in 2018 and 2019 with less than 25% identity to any protein published before 2018; (2) a 4-layer convolutional neural network, SecNet, with an input window of ±14 amino acids which was trained on proteins ≤25% identical to proteins in Test2018 and the commonly used CB513 test set; (3) an additional test set that shares no homologous domains with the training set proteins, according to the Evolutionary Classification of Proteins (ECOD) database; (4) a detailed ablation study where we reverse one algorithmic choice at a time in SecNet and evaluate the effect on the prediction accuracy; (5) new 4- and 5-label prediction alphabets that may be more practical for tertiary structure prediction methods. The 3-label accuracy (helix, sheet, coil) of the leading predictors on both Test2018 and CB513 is 81-82%, while SecNet's accuracy is 84% for both sets. Accuracy on the non-homologous ECOD set is only 0.6 points (83.9%) lower than the results on the Test2018-2019 set (84.5%). The ablation study of features, neural network architecture, and training hyper-parameters suggests the best accuracy results are achieved with good choices for each of them while the neural network architecture is not as critical as long as it is not too simple. Protocols for generating and using unbiased test, validation, and training sets are provided. Our data sets, including input features and assigned labels, and SecNet software including third-party dependencies and databases, are downloadable from dunbrack.fccc.edu/ss and github.com/sh-maxim/ss.


Assuntos
Redes Neurais de Computação , Estrutura Secundária de Proteína , Algoritmos , Sequência de Aminoácidos , Aminoácidos/química , Bases de Dados de Proteínas/estatística & dados numéricos , Aprendizado Profundo , Proteínas/química , Software
15.
Epidemiology ; 31(5): 728-735, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32459665

RESUMO

BACKGROUND: Residential histories linked to cancer registry data provide new opportunities to examine cancer outcomes by neighborhood socioeconomic status (SES). We examined differences in regional stage colon cancer survival estimates comparing models using a single neighborhood SES at diagnosis to models using neighborhood SES from residential histories. METHODS: We linked regional stage colon cancers from the New Jersey State Cancer Registry diagnosed from 2006 to 2011 to LexisNexis administrative data to obtain residential histories. We defined neighborhood SES as census tract poverty based on location at diagnosis and across the follow-up period through 31 December 2016 based on residential histories (average, time-weighted average, time-varying). Using Cox proportional hazards regression, we estimated associations between colon cancer and census tract poverty measurements (continuous and categorical), adjusted for age, sex, race/ethnicity, regional substage, and mover status. RESULTS: Sixty-five percent of the sample was nonmovers (one census tract); 35% (movers) changed tract at least once. Cases from tracts with >20% poverty changed residential tracts more often (42%) than cases from tracts with <5% poverty (32%). Hazard ratios (HRs) were generally similar in strength and direction across census tract poverty measurements. In time-varying models, cases in the highest poverty category (>20%) had a 30% higher risk of regional stage colon cancer death than cases in the lowest category (<5%) (95% confidence interval [CI] = 1.04, 1.63). CONCLUSION: Residential changes after regional stage colon cancer diagnosis may be associated with a higher risk of colon cancer death among cases in high-poverty areas. This has important implications for postdiagnostic access to care for treatment and follow-up surveillance. See video abstract: http://links.lww.com/EDE/B705.


Assuntos
Neoplasias do Colo , Disparidades nos Níveis de Saúde , Áreas de Pobreza , Características de Residência , Neoplasias do Colo/epidemiologia , Humanos , New Jersey/epidemiologia , Características de Residência/estatística & dados numéricos , Fatores Socioeconômicos , Análise de Sobrevida
16.
PLoS Comput Biol ; 15(3): e1006844, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30845191

RESUMO

Protein loops connect regular secondary structures and contain 4-residue beta turns which represent 63% of the residues in loops. The commonly used classification of beta turns (Type I, I', II, II', VIa1, VIa2, VIb, and VIII) was developed in the 1970s and 1980s from analysis of a small number of proteins of average resolution, and represents only two thirds of beta turns observed in proteins (with a generic class Type IV representing the rest). We present a new clustering of beta-turn conformations from a set of 13,030 turns from 1074 ultra-high resolution protein structures (≤1.2 Å). Our clustering is derived from applying the DBSCAN and k-medoids algorithms to this data set with a metric commonly used in directional statistics applied to the set of dihedral angles from the second and third residues of each turn. We define 18 turn types compared to the 8 classical turn types in common use. We propose a new 2-letter nomenclature for all 18 beta-turn types using Ramachandran region names for the two central residues (e.g., 'A' and 'D' for alpha regions on the left side of the Ramachandran map and 'a' and 'd' for equivalent regions on the right-hand side; classical Type I turns are 'AD' turns and Type I' turns are 'ad'). We identify 11 new types of beta turn, 5 of which are sub-types of classical beta-turn types. Up-to-date statistics, probability densities of conformations, and sequence profiles of beta turns in loops were collected and analyzed. A library of turn types, BetaTurnLib18, and cross-platform software, BetaTurnTool18, which identifies turns in an input protein structure, are freely available and redistributable from dunbrack.fccc.edu/betaturn and github.com/sh-maxim/BetaTurn18. Given the ubiquitous nature of beta turns, this comprehensive study updates understanding of beta turns and should also provide useful tools for protein structure determination, refinement, and prediction programs.


Assuntos
Proteínas/química , Terminologia como Assunto , Algoritmos , Sequência de Aminoácidos , Aminoácidos/química , Análise por Conglomerados , Conformação Proteica , Reprodutibilidade dos Testes
17.
IJCAI (U S) ; 2019: 4897-4903, 2019 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32116463

RESUMO

Representing words as low dimensional vectors is very useful in many natural language processing tasks. This idea has been extended to medical domain where medical codes listed in medical claims are represented as vectors to facilitate exploratory analysis and predictive modeling. However, depending on a type of a medical provider, medical claims can use medical codes from different ontologies or from a combination of ontologies, which complicates learning of the representations. To be able to properly utilize such multi-source medical claim data, we propose an approach that represents medical codes from different ontologies in the same vector space. We first modify the Pointwise Mutual Information (PMI) measure of similarity between the codes. We then develop a new negative sampling method for word2vec model that implicitly factorizes the modified PMI matrix. The new approach was evaluated on the code cross-reference problem, which aims at identifying similar codes across different ontologies. In our experiments, we evaluated cross-referencing between ICD-9 and CPT medical code ontologies. Our results indicate that vector representations of codes learned by the proposed approach provide superior cross-referencing when compared to several existing approaches.

18.
BMC Med Inform Decis Mak ; 18(Suppl 4): 123, 2018 12 12.
Artigo em Inglês | MEDLINE | ID: mdl-30537974

RESUMO

BACKGROUND: There has been an increasing interest in learning low-dimensional vector representations of medical concepts from Electronic Health Records (EHRs). Vector representations of medical concepts facilitate exploratory analysis and predictive modeling of EHR data to gain insights about the patterns of care and health outcomes. EHRs contain structured data such as diagnostic codes and laboratory tests, as well as unstructured free text data in form of clinical notes, which provide more detail about condition and treatment of patients. METHODS: In this work, we propose a method that jointly learns vector representations of medical concepts and words. This is achieved by a novel learning scheme based on the word2vec model. Our model learns those relationships by integrating clinical notes and sets of accompanying medical codes and by defining joint contexts for each observed word and medical code. RESULTS: In our experiments, we learned joint representations using MIMIC-III data. Using the learned representations of words and medical codes, we evaluated phenotypes for 6 diseases discovered by our and baseline method. The experimental results show that for each of the 6 diseases our method finds highly relevant words. We also show that our representations can be very useful when predicting the reason for the next visit. CONCLUSIONS: The jointly learned representations of medical concepts and words capture not only similarity between codes or words themselves, but also similarity between codes and words. They can be used to extract phenotypes of different diseases. The representations learned by the joint model are also useful for construction of patient features.


Assuntos
Registros Eletrônicos de Saúde , Aprendizado de Máquina , Processamento de Linguagem Natural , Codificação Clínica , Humanos , Fenótipo , Terminologia como Assunto , Vocabulário
19.
KDD ; 2018: 43-51, 2018 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-31037221

RESUMO

Various deep learning models have recently been applied to predictive modeling of Electronic Health Records (EHR). In medical claims data, which is a particular type of EHR data, each patient is represented as a sequence of temporally ordered irregularly sampled visits to health providers, where each visit is recorded as an unordered set of medical codes specifying patient's diagnosis and treatment provided during the visit. Based on the observation that different patient conditions have different temporal progression patterns, in this paper we propose a novel interpretable deep learning model, called Timeline. The main novelty of Timeline is that it has a mechanism that learns time decay factors for every medical code. This allows the Timeline to learn that chronic conditions have a longer lasting impact on future visits than acute conditions. Timeline also has an attention mechanism that improves vector embeddings of visits. By analyzing the attention weights and disease progression functions of Timeline, it is possible to interpret the predictions and understand how risks of future visits change over time. We evaluated Timeline on two large-scale real world data sets. The specific task was to predict what is the primary diagnosis category for the next hospital visit given previous visits. Our results show that Timeline has higher accuracy than the state of the art deep learning models based on RNN. In addition, we demonstrate that time decay factors and attentions learned by Timeline are in accord with the medical knowledge and that Timeline can provide a useful insight into its predictions.

20.
Artigo em Inglês | MEDLINE | ID: mdl-29375929

RESUMO

There has been an increasing interest in learning low-dimensional vector representations of medical concepts from electronic health records (EHRs). While EHRs contain structured data such as diagnostic codes and laboratory tests, they also contain unstructured clinical notes, which provide more nuanced details on a patient's health status. In this work, we propose a method that jointly learns medical concept and word representations. In particular, we focus on capturing the relationship between medical codes and words by using a novel learning scheme for word2vec model. Our method exploits relationships between different parts of EHRs in the same visit and embeds both codes and words in the same continuous vector space. In the end, we are able to derive clusters which reflect distinct disease and treatment patterns. In our experiments, we qualitatively show how our methods of grouping words for given diagnostic codes compares with a topic modeling approach. We also test how well our representations can be used to predict disease patterns of the next visit. The results show that our approach outperforms several common methods.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...