Your browser doesn't support javascript.
loading
Distributed learning from multiple EHR databases: Contextual embedding models for medical events.
Li, Ziyi; Roberts, Kirk; Jiang, Xiaoqian; Long, Qi.
Afiliação
  • Li Z; Emory University, Department of Biostatistics and Bioinformatics, Atlanta, GA 30332, USA.
  • Roberts K; University of Texas, Health Science Center at Houston, School of Biomedical Informatics, Houston, TX 77030, USA.
  • Jiang X; University of Texas, Health Science Center at Houston, School of Biomedical Informatics, Houston, TX 77030, USA. Electronic address: Xiaoqian.Jiang@uth.tmc.edu.
  • Long Q; University of Pennsylvania, Perelman School of Medicine, Department of Biostatistics, Epidemiology and Informatics, Philadelphia, PA 19104, USA. Electronic address: qlong@pennmedicine.upenn.edu.
J Biomed Inform ; 92: 103138, 2019 04.
Article em En | MEDLINE | ID: mdl-30825539
ABSTRACT
Electronic health record (EHR) data provide promising opportunities to explore personalized treatment regimes and to make clinical predictions. Compared with regular clinical data, EHR data are known for their irregularity and complexity. In addition, analyzing EHR data involves privacy issues and sharing such data is often infeasible among multiple research sites due to regulatory and other hurdles. A recently published work uses contextual embedding models and successfully builds one predictive model for more than seventy common diagnoses. Despite of the high predictive power, the model cannot be generalized to other institutions without sharing data. In this work, a novel method is proposed to learn from multiple databases and build predictive models based on Distributed Noise Contrastive Estimation (Distributed NCE). We use differential privacy to safeguard the intermediary information sharing. The numerical study with a real dataset demonstrates that the proposed method not only can build predictive models in a distributed manner with privacy protection, but also preserve model structure well and achieve comparable prediction accuracy. The proposed methods have been implemented as a stand-alone Python library and the implementation is available on Github (https//github.com/ziyili20/DistributedLearningPredictor) with installation instructions and use-cases.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Redes de Comunicação de Computadores / Registros Eletrônicos de Saúde / Aprendizado de Máquina Tipo de estudo: Diagnostic_studies / Prognostic_studies Limite: Humans Idioma: En Ano de publicação: 2019 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Redes de Comunicação de Computadores / Registros Eletrônicos de Saúde / Aprendizado de Máquina Tipo de estudo: Diagnostic_studies / Prognostic_studies Limite: Humans Idioma: En Ano de publicação: 2019 Tipo de documento: Article