Search | VHL Regional Portal

Chemical identification and indexing in full-text articles: an overview of the NLM-Chem track at BioCreative VII.

Leaman, Robert; Islamaj, Rezarta; Adams, Virginia; Alliheedi, Mohammed A; Almeida, João Rafael; Antunes, Rui; Bevan, Robert; Chang, Yung-Chun; Erdengasileng, Arslan; Hodgskiss, Matthew; Ida, Ryuki; Kim, Hyunjae; Li, Keqiao; Mercer, Robert E; Mertová, Lukrécia; Mobasher, Ghadeer; Shin, Hoo-Chang; Sung, Mujeen; Tsujimura, Tomoki; Yeh, Wen-Chao; Lu, Zhiyong.

Database (Oxford) ; 20232023 03 07.

Article in English | MEDLINE | ID: mdl-36882099

ABSTRACT

The BioCreative National Library of Medicine (NLM)-Chem track calls for a community effort to fine-tune automated recognition of chemical names in the biomedical literature. Chemicals are one of the most searched biomedical entities in PubMed, and-as highlighted during the coronavirus disease 2019 pandemic-their identification may significantly advance research in multiple biomedical subfields. While previous community challenges focused on identifying chemical names mentioned in titles and abstracts, the full text contains valuable additional detail. We, therefore, organized the BioCreative NLM-Chem track as a community effort to address automated chemical entity recognition in full-text articles. The track consisted of two tasks: (i) chemical identification and (ii) chemical indexing. The chemical identification task required predicting all chemicals mentioned in recently published full-text articles, both span [i.e. named entity recognition (NER)] and normalization (i.e. entity linking), using Medical Subject Headings (MeSH). The chemical indexing task required identifying which chemicals reflect topics for each article and should therefore appear in the listing of MeSH terms for the document in the MEDLINE article indexing. This manuscript summarizes the BioCreative NLM-Chem track and post-challenge experiments. We received a total of 85 submissions from 17 teams worldwide. The highest performance achieved for the chemical identification task was 0.8672 F-score (0.8759 precision and 0.8587 recall) for strict NER performance and 0.8136 F-score (0.8621 precision and 0.7702 recall) for strict normalization performance. The highest performance achieved for the chemical indexing task was 0.6073 F-score (0.7417 precision and 0.5141 recall). This community challenge demonstrated that (i) the current substantial achievements in deep learning technologies can be utilized to improve automated prediction accuracy further and (ii) the chemical indexing task is substantially more challenging. We look forward to further developing biomedical text-mining methods to respond to the rapid growth of biomedical literature. The NLM-Chem track dataset and other challenge materials are publicly available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BC7-NLM-Chem-track/. Database URL https://ftp.ncbi.nlm.nih.gov/pub/lu/BC7-NLM-Chem-track/.

Subject(s)

COVID-19 , United States , Humans , National Library of Medicine (U.S.) , Data Mining , Databases, Factual , MEDLINE

A large language model for electronic health records.

Yang, Xi; Chen, Aokun; PourNejatian, Nima; Shin, Hoo Chang; Smith, Kaleb E; Parisien, Christopher; Compas, Colin; Martin, Cheryl; Costa, Anthony B; Flores, Mona G; Zhang, Ying; Magoc, Tanja; Harle, Christopher A; Lipori, Gloria; Mitchell, Duane A; Hogan, William R; Shenkman, Elizabeth A; Bian, Jiang; Wu, Yonghui.

NPJ Digit Med ; 5(1): 194, 2022 Dec 26.

Article in English | MEDLINE | ID: mdl-36572766

ABSTRACT

There is an increasing interest in developing artificial intelligence (AI) systems to process and interpret electronic health records (EHRs). Natural language processing (NLP) powered by pretrained language models is the key technology for medical AI systems utilizing clinical narratives. However, there are few clinical language models, the largest of which trained in the clinical domain is comparatively small at 110 million parameters (compared with billions of parameters in the general domain). It is not clear how large clinical language models with billions of parameters can help medical AI systems utilize unstructured EHRs. In this study, we develop from scratch a large clinical language model-GatorTron-using >90 billion words of text (including >82 billion words of de-identified clinical text) and systematically evaluate it on five clinical NLP tasks including clinical concept extraction, medical relation extraction, semantic textual similarity, natural language inference (NLI), and medical question answering (MQA). We examine how (1) scaling up the number of parameters and (2) scaling up the size of the training data could benefit these NLP tasks. GatorTron models scale up the clinical language model from 110 million to 8.9 billion parameters and improve five clinical NLP tasks (e.g., 9.6% and 9.5% improvement in accuracy for NLI and MQA), which can be applied to medical AI systems to improve healthcare delivery. The GatorTron models are publicly available at: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/models/gatortron_og .

The Potential Dangers of Artificial Intelligence for Radiology and Radiologists.

Chu, Linda C; Anandkumar, Anima; Shin, Hoo Chang; Fishman, Elliot K.

J Am Coll Radiol ; 17(10): 1309-1311, 2020 Oct.

Article in English | MEDLINE | ID: mdl-32360451

Subject(s)

Artificial Intelligence , Radiology , Humans , Radiography , Radiologists

Holistic classification of CT attenuation patterns for interstitial lung diseases via deep convolutional neural networks.

Gao, Mingchen; Bagci, Ulas; Lu, Le; Wu, Aaron; Buty, Mario; Shin, Hoo-Chang; Roth, Holger; Papadakis, Georgios Z; Depeursinge, Adrien; Summers, Ronald M; Xu, Ziyue; Mollura, Daniel J.

Comput Methods Biomech Biomed Eng Imaging Vis ; 6(1): 1-6, 2018.

Article in English | MEDLINE | ID: mdl-29623248

ABSTRACT

Interstitial lung diseases (ILD) involve several abnormal imaging patterns observed in computed tomography (CT) images. Accurate classification of these patterns plays a significant role in precise clinical decision making of the extent and nature of the diseases. Therefore, it is important for developing automated pulmonary computer-aided detection systems. Conventionally, this task relies on experts' manual identification of regions of interest (ROIs) as a prerequisite to diagnose potential diseases. This protocol is time consuming and inhibits fully automatic assessment. In this paper, we present a new method to classify ILD imaging patterns on CT images. The main difference is that the proposed algorithm uses the entire image as a holistic input. By circumventing the prerequisite of manual input ROIs, our problem set-up is significantly more difficult than previous work but can better address the clinical workflow. Qualitative and quantitative results using a publicly available ILD database demonstrate state-of-the-art classification accuracy under the patch-based classification and shows the potential of predicting the ILD type using holistic image.

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning.

Shin, Hoo-Chang; Roth, Holger R; Gao, Mingchen; Lu, Le; Xu, Ziyue; Nogues, Isabella; Yao, Jianhua; Mollura, Daniel; Summers, Ronald M.

IEEE Trans Med Imaging ; 35(5): 1285-98, 2016 05.

Article in English | MEDLINE | ID: mdl-26886976

ABSTRACT

Remarkable progress has been made in image recognition, primarily due to the availability of large-scale annotated datasets and deep convolutional neural networks (CNNs). CNNs enable learning data-driven, highly representative, hierarchical image features from sufficient training data. However, obtaining datasets as comprehensively annotated as ImageNet in the medical imaging domain remains a challenge. There are currently three major techniques that successfully employ CNNs to medical image classification: training the CNN from scratch, using off-the-shelf pre-trained CNN features, and conducting unsupervised CNN pre-training with supervised fine-tuning. Another effective method is transfer learning, i.e., fine-tuning CNN models pre-trained from natural image dataset to medical image tasks. In this paper, we exploit three important, but previously understudied factors of employing deep convolutional neural networks to computer-aided detection problems. We first explore and evaluate different CNN architectures. The studied models contain 5 thousand to 160 million parameters, and vary in numbers of layers. We then evaluate the influence of dataset scale and spatial image context on performance. Finally, we examine when and why transfer learning from pre-trained ImageNet (via fine-tuning) can be useful. We study two specific computer-aided detection (CADe) problems, namely thoraco-abdominal lymph node (LN) detection and interstitial lung disease (ILD) classification. We achieve the state-of-the-art performance on the mediastinal LN detection, and report the first five-fold cross-validation classification results on predicting axial CT slices with ILD categories. Our extensive empirical evaluation, CNN model analysis and valuable insights can be extended to the design of high performance CAD systems for other medical imaging tasks.

Subject(s)

Diagnosis, Computer-Assisted/methods , Neural Networks, Computer , Databases, Factual , Humans , Image Interpretation, Computer-Assisted , Lung Diseases, Interstitial/diagnostic imaging , Lymph Nodes/diagnostic imaging , Reproducibility of Results

The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS).

Menze, Bjoern H; Jakab, Andras; Bauer, Stefan; Kalpathy-Cramer, Jayashree; Farahani, Keyvan; Kirby, Justin; Burren, Yuliya; Porz, Nicole; Slotboom, Johannes; Wiest, Roland; Lanczi, Levente; Gerstner, Elizabeth; Weber, Marc-André; Arbel, Tal; Avants, Brian B; Ayache, Nicholas; Buendia, Patricia; Collins, D Louis; Cordier, Nicolas; Corso, Jason J; Criminisi, Antonio; Das, Tilak; Delingette, Hervé; Demiralp, Çagatay; Durst, Christopher R; Dojat, Michel; Doyle, Senan; Festa, Joana; Forbes, Florence; Geremia, Ezequiel; Glocker, Ben; Golland, Polina; Guo, Xiaotao; Hamamci, Andac; Iftekharuddin, Khan M; Jena, Raj; John, Nigel M; Konukoglu, Ender; Lashkari, Danial; Mariz, José Antonió; Meier, Raphael; Pereira, Sérgio; Precup, Doina; Price, Stephen J; Raviv, Tammy Riklin; Reza, Syed M S; Ryan, Michael; Sarikaya, Duygu; Schwartz, Lawrence; Shin, Hoo-Chang.

IEEE Trans Med Imaging ; 34(10): 1993-2024, 2015 Oct.

Article in English | MEDLINE | ID: mdl-25494501

ABSTRACT

In this paper we report the set-up and results of the Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) organized in conjunction with the MICCAI 2012 and 2013 conferences. Twenty state-of-the-art tumor segmentation algorithms were applied to a set of 65 multi-contrast MR scans of low- and high-grade glioma patients-manually annotated by up to four raters-and to 65 comparable scans generated using tumor image simulation software. Quantitative evaluations revealed considerable disagreement between the human raters in segmenting various tumor sub-regions (Dice scores in the range 74%-85%), illustrating the difficulty of this task. We found that different algorithms worked best for different sub-regions (reaching performance comparable to human inter-rater variability), but that no single algorithm ranked in the top for all sub-regions simultaneously. Fusing several good algorithms using a hierarchical majority vote yielded segmentations that consistently ranked above all individual algorithms, indicating remaining opportunities for further methodological improvements. The BRATS image data and manual annotations continue to be publicly available through an online evaluation system as an ongoing benchmarking resource.

Subject(s)

Magnetic Resonance Imaging , Neuroimaging , Algorithms , Benchmarking , Glioma/pathology , Humans , Magnetic Resonance Imaging/methods , Magnetic Resonance Imaging/standards , Neuroimaging/methods , Neuroimaging/standards

Stacked autoencoders for unsupervised feature learning and multiple organ detection in a pilot study using 4D patient data.

Shin, Hoo-Chang; Orton, Matthew R; Collins, David J; Doran, Simon J; Leach, Martin O.

IEEE Trans Pattern Anal Mach Intell ; 35(8): 1930-43, 2013 Aug.

Article in English | MEDLINE | ID: mdl-23787345

ABSTRACT

Medical image analysis remains a challenging application area for artificial intelligence. When applying machine learning, obtaining ground-truth labels for supervised learning is more difficult than in many more common applications of machine learning. This is especially so for datasets with abnormalities, as tissue types and the shapes of the organs in these datasets differ widely. However, organ detection in such an abnormal dataset may have many promising potential real-world applications, such as automatic diagnosis, automated radiotherapy planning, and medical image retrieval, where new multimodal medical images provide more information about the imaged tissues for diagnosis. Here, we test the application of deep learning methods to organ identification in magnetic resonance medical images, with visual and temporal hierarchical features learned to categorize object classes from an unlabeled multimodal DCE-MRI dataset so that only a weakly supervised training is required for a classifier. A probabilistic patch-based method was employed for multiple organ detection, with the features learned from the deep learning model. This shows the potential of the deep learning model for application to medical images, despite the difficulty of obtaining libraries of correctly labeled training datasets and despite the intrinsic abnormalities present in patient datasets.

Subject(s)

Artificial Intelligence , Image Processing, Computer-Assisted/methods , Magnetic Resonance Imaging/methods , Databases, Factual , Humans , Image Enhancement/methods , Pattern Recognition, Automated , Pilot Projects

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL