Your browser doesn't support javascript.
loading
Bag-of-Words Technique in Natural Language Processing: A Primer for Radiologists.
Juluru, Krishna; Shih, Hao-Hsin; Keshava Murthy, Krishna Nand; Elnajjar, Pierre.
Affiliation
  • Juluru K; From the Department of Radiology, Memorial Sloan Kettering Cancer Center, 1275 York Ave, Box 29, New York, NY 10065.
  • Shih HH; From the Department of Radiology, Memorial Sloan Kettering Cancer Center, 1275 York Ave, Box 29, New York, NY 10065.
  • Keshava Murthy KN; From the Department of Radiology, Memorial Sloan Kettering Cancer Center, 1275 York Ave, Box 29, New York, NY 10065.
  • Elnajjar P; From the Department of Radiology, Memorial Sloan Kettering Cancer Center, 1275 York Ave, Box 29, New York, NY 10065.
Radiographics ; 41(5): 1420-1426, 2021.
Article in En | MEDLINE | ID: mdl-34388050
ABSTRACT
Natural language processing (NLP) is a methodology designed to extract concepts and meaning from human-generated unstructured (free-form) text. It is intended to be implemented by using computer algorithms so that it can be run on a corpus of documents quickly and reliably. To enable machine learning (ML) techniques in NLP, free-form text must be converted to a numerical representation. After several stages of preprocessing including tokenization, removal of stop words, token normalization, and creation of a master dictionary, the bag-of-words (BOW) technique can be used to represent each remaining word as a feature of the document. The preprocessing steps simplify the documents but also potentially degrade meaning. The values of the features in BOW can be modified by using techniques such as term count, term frequency, and term frequency-inverse document frequency. Experience and experimentation will guide decisions on which specific techniques will optimize ML performance. These and other NLP techniques are being applied in radiology. Radiologists' understanding of the strengths and limitations of these techniques will help in communication with data scientists and in implementation for specific tasks. Online supplemental material is available for this article. ©RSNA, 2021.
Subject(s)

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Radiology / Natural Language Processing Limits: Humans Language: En Journal: Radiographics Year: 2021 Document type: Article

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Radiology / Natural Language Processing Limits: Humans Language: En Journal: Radiographics Year: 2021 Document type: Article