Your browser doesn't support javascript.
loading
Bridging the gap in author names: building an enhanced author name dataset for biomedical literature system.
Zhang, Li; Song, Ningyuan; Gui, Sisi; Wu, Keye; Lu, Wei.
Afiliación
  • Zhang L; Laboratory of Data Intelligence and Interdisciplinary Innovation of Nanjing University, Nanjing, Jiangsu, 210023, China.
  • Song N; School of Information Management, Nanjing University, Nanjing, Jiangsu, 210023, China.
  • Gui S; Laboratory of Data Intelligence and Interdisciplinary Innovation of Nanjing University, Nanjing, Jiangsu, 210023, China.
  • Wu K; School of Information Management, Nanjing University, Nanjing, Jiangsu, 210023, China.
  • Lu W; School of Information Management, Nanjing Agricultural University, Nanjing, Jiangsu, 210023, China.
J Am Med Inform Assoc ; 31(8): 1648-1656, 2024 Aug 01.
Article en En | MEDLINE | ID: mdl-38916911
ABSTRACT

OBJECTIVE:

Author name incompleteness, referring to only first initial available instead of full first name, is a long-standing problem in MEDLINE and has a negative impact on biomedical literature systems. The purpose of this study is to create an Enhanced Author Names (EAN) dataset for MEDLINE that maximizes the number of complete author names. MATERIALS AND

METHODS:

The EAN dataset is built based on a large-scale name comparison and restoration with author names collected from multiple literature databases such as MEDLINE, Microsoft Academic Graph, and Semantic Scholar. We assess the impact of EAN on biomedical literature systems by conducting comparative and statistical analyses between EAN and MEDLINE's author names dataset (MAN) on 2 important tasks, author name search and author name disambiguation.

RESULTS:

Evaluation results show that EAN improves the number of full author names in MEDLINE from 69.73 million to 110.9 million. EAN not only restores a substantial number of abbreviated names prior to the year 2002 when the NLM changed its author name indexing policy but also improves the availability of full author names in articles published afterward. The evaluation of the author name search and author name disambiguation tasks reveal that EAN is able to significantly enhance both tasks compared to MAN.

CONCLUSION:

The extensive coverage of full names in EAN suggests that the name incompleteness issue can be largely mitigated. This has significant implications for the development of an improved biomedical literature system. EAN is available at https//zenodo.org/record/10251358, and an updated version is available at https//zenodo.org/records/10663234.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Autoria / MEDLINE Idioma: En Revista: J Am Med Inform Assoc Asunto de la revista: INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: China

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Autoria / MEDLINE Idioma: En Revista: J Am Med Inform Assoc Asunto de la revista: INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: China
...