Your browser doesn't support javascript.
loading
Variable selection for latent class analysis in the presence of missing data with application to record linkage.
Xu, Huiping; Li, Xiaochun; Zhang, Zuoyi; Grannis, Shaun.
Afiliação
  • Xu H; Department of Biostatistics and Health Data Science, Indiana University, Indianapolis, IN, USA.
  • Li X; Department of Biostatistics and Health Data Science, Indiana University, Indianapolis, IN, USA.
  • Zhang Z; AbbVie Inc., North Chicago, IL, USA.
  • Grannis S; Regenstrief Institute Inc., Indianapolis, IN, USA.
Stat Methods Med Res ; 33(6): 966-980, 2024 Jun.
Article em En | MEDLINE | ID: mdl-38592341
ABSTRACT
The Fellegi-Sunter model is a latent class model widely used in probabilistic linkage to identify records that belong to the same entity. Record linkage practitioners typically employ all available matching fields in the model with the premise that more fields convey greater information about the true match status and hence result in improved match performance. In the context of model-based clustering, it is well known that such a premise is incorrect and the inclusion of noisy variables could compromise the clustering. Variable selection procedures have therefore been developed to remove noisy variables. Although these procedures have the potential to improve record matching, they cannot be applied directly due to the ubiquity of the missing data in record linkage applications. In this paper, we modify the stepwise variable selection procedure proposed by Fop, Smart, and Murphy and extend it to account for missing data common in record linkage. Through simulation studies, our proposed method is shown to select the correct set of matching fields across various settings, leading to better-performing algorithms. The improved match performance is also seen in a real-world application. We therefore recommend the use of our proposed selection procedure to identify informative matching fields for probabilistic record linkage algorithms.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Registro Médico Coordenado / Análise de Classes Latentes Limite: Humans Idioma: En Revista: Stat Methods Med Res Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Registro Médico Coordenado / Análise de Classes Latentes Limite: Humans Idioma: En Revista: Stat Methods Med Res Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos