The Classification of Scientific Abstracts Using Text Statistical Features.

Ishankulov, Timur; Danilov, Gleb; Kotik, Konstantin; Orlov, Yuriy; Shifrin, Mikhail; Potapov, Alexander

Ishankulov, Timur; Danilov, Gleb; Kotik, Konstantin; Orlov, Yuriy; Shifrin, Mikhail; Potapov, Alexander.

Affiliation

Ishankulov T; Laboratory of Biomedical Informatics and Artificial Intelligence, National Medical Research Center for Neurosurgery named after N.N. Burdenko, Moscow, Russian Federation.
Danilov G; Laboratory of Biomedical Informatics and Artificial Intelligence, National Medical Research Center for Neurosurgery named after N.N. Burdenko, Moscow, Russian Federation.
Kotik K; Laboratory of Biomedical Informatics and Artificial Intelligence, National Medical Research Center for Neurosurgery named after N.N. Burdenko, Moscow, Russian Federation.
Orlov Y; Keldysh Institute of Applied Mathematics, Russian Academy of Sciences, Moscow, Russian Federation.
Shifrin M; Laboratory of Biomedical Informatics and Artificial Intelligence, National Medical Research Center for Neurosurgery named after N.N. Burdenko, Moscow, Russian Federation.
Potapov A; Laboratory of Biomedical Informatics and Artificial Intelligence, National Medical Research Center for Neurosurgery named after N.N. Burdenko, Moscow, Russian Federation.

Stud Health Technol Inform ; 290: 263-267, 2022 Jun 06.

Article in En | MEDLINE | ID: mdl-35673014

ABSTRACT

ABSTRACT

Automated abstracts classification could significantly facilitate scientific literature screening. The classification of short texts could be based on their statistical properties. This research aimed to evaluate the quality of short medical abstracts classification primarily based on text statistical features. Twelve experiments with machine learning models over the sets of text features were performed on a dataset of 671 article abstracts. Each experiment was repeated 300 times to estimate the classification quality, ending up with 3600 tests total. We achieved the best result (F1 = 0.775) using a random forest machine learning model with keywords and three-dimensional Word2Vec embeddings. The classification of scientific abstracts might be implemented using straightforward and computationally inexpensive methods presented in this paper. The approach we described is expected to facilitate literature selection by researchers.

Subject(s)

Machine Learning; Natural Language Processing

Key words

Machine Learning; Natural Language Processing; Neurosurgery

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Natural Language Processing / Machine Learning Language: En Journal: Stud Health Technol Inform Journal subject: INFORMATICA MEDICA / PESQUISA EM SERVICOS DE SAUDE Year: 2022 Document type: Article

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google