Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 103
Filter
1.
IEEE Trans Cybern ; PP2024 May 10.
Article in English | MEDLINE | ID: mdl-38728131

ABSTRACT

Radiation therapy treatment planning requires balancing the delivery of the target dose while sparing normal tissues, making it a complex process. To streamline the planning process and enhance its quality, there is a growing demand for knowledge-based planning (KBP). Ensemble learning has shown impressive power in various deep learning tasks, and it has great potential to improve the performance of KBP. However, the effectiveness of ensemble learning heavily depends on the diversity and individual accuracy of the base learners. Moreover, the complexity of model ensembles is a major concern, as it requires maintaining multiple models during inference, leading to increased computational cost and storage overhead. In this study, we propose a novel learning-based ensemble approach named LENAS, which integrates neural architecture search with knowledge distillation for 3-D radiotherapy dose prediction. Our approach starts by exhaustively searching each block from an enormous architecture space to identify multiple architectures that exhibit promising performance and significant diversity. To mitigate the complexity introduced by the model ensemble, we adopt the teacher-student paradigm, leveraging the diverse outputs from multiple learned networks as supervisory signals to guide the training of the student network. Furthermore, to preserve high-level semantic information, we design a hybrid loss to optimize the student network, enabling it to recover the knowledge embedded within the teacher networks. The proposed method has been evaluated on two public datasets: 1) OpenKBP and 2) AIMIS. Extensive experimental results demonstrate the effectiveness of our method and its superior performance to the state-of-the-art methods. Code: github.com/hust-linyi/LENAS.

2.
Article in English | MEDLINE | ID: mdl-38684792

ABSTRACT

OBJECTIVES: Large Language Models (LLMs) such as ChatGPT and Med-PaLM have excelled in various medical question-answering tasks. However, these English-centric models encounter challenges in non-English clinical settings, primarily due to limited clinical knowledge in respective languages, a consequence of imbalanced training corpora. We systematically evaluate LLMs in the Chinese medical context and develop a novel in-context learning framework to enhance their performance. MATERIALS AND METHODS: The latest China National Medical Licensing Examination (CNMLE-2022) served as the benchmark. We collected 53 medical books and 381 149 medical questions to construct the medical knowledge base and question bank. The proposed Knowledge and Few-shot Enhancement In-context Learning (KFE) framework leverages the in-context learning ability of LLMs to integrate diverse external clinical knowledge sources. We evaluated KFE with ChatGPT (GPT-3.5), GPT-4, Baichuan2-7B, Baichuan2-13B, and QWEN-72B in CNMLE-2022 and further investigated the effectiveness of different pathways for incorporating LLMs with medical knowledge from 7 distinct perspectives. RESULTS: Directly applying ChatGPT failed to qualify for the CNMLE-2022 at a score of 51. Cooperated with the KFE framework, the LLMs with varying sizes yielded consistent and significant improvements. The ChatGPT's performance surged to 70.04 and GPT-4 achieved the highest score of 82.59. This surpasses the qualification threshold (60) and exceeds the average human score of 68.70, affirming the effectiveness and robustness of the framework. It also enabled a smaller Baichuan2-13B to pass the examination, showcasing the great potential in low-resource settings. DISCUSSION AND CONCLUSION: This study shed light on the optimal practices to enhance the capabilities of LLMs in non-English medical scenarios. By synergizing medical knowledge through in-context learning, LLMs can extend clinical insight beyond language barriers in healthcare, significantly reducing language-related disparities of LLM applications and ensuring global benefit in this field.

3.
Med Image Anal ; 94: 103151, 2024 May.
Article in English | MEDLINE | ID: mdl-38527405

ABSTRACT

Self-supervised learning has emerged as a powerful tool for pretraining deep networks on unlabeled data, prior to transfer learning of target tasks with limited annotation. The relevance between the pretraining pretext and target tasks is crucial to the success of transfer learning. Various pretext tasks have been proposed to utilize properties of medical image data (e.g., three dimensionality), which are more relevant to medical image analysis than generic ones for natural images. However, previous work rarely paid attention to data with anatomy-oriented imaging planes, e.g., standard cardiac magnetic resonance imaging views. As these imaging planes are defined according to the anatomy of the imaged organ, pretext tasks effectively exploiting this information can pretrain the networks to gain knowledge on the organ of interest. In this work, we propose two complementary pretext tasks for this group of medical image data based on the spatial relationship of the imaging planes. The first is to learn the relative orientation between the imaging planes and implemented as regressing their intersecting lines. The second exploits parallel imaging planes to regress their relative slice locations within a stack. Both pretext tasks are conceptually straightforward and easy to implement, and can be combined in multitask learning for better representation learning. Thorough experiments on two anatomical structures (heart and knee) and representative target tasks (semantic segmentation and classification) demonstrate that the proposed pretext tasks are effective in pretraining deep networks for remarkably boosted performance on the target tasks, and superior to other recent approaches.


Subject(s)
Heart , Knee Joint , Humans , Heart/diagnostic imaging , Semantics , Supervised Machine Learning , Image Processing, Computer-Assisted
4.
Artif Intell Med ; 149: 102801, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38462290

ABSTRACT

Since different disease grades require different treatments from physicians, i.e., the low-grade patients may recover with follow-up observations whereas the high-grade may need immediate surgery, the accuracy of disease grading is pivotal in clinical practice. In this paper, we propose a Triplet-Branch Network with ContRastive priOr-knoWledge embeddiNg (TBN-CROWN) for the accurate disease grading, which enables physicians to accordingly take appropriate treatments. Specifically, our TBN-CROWN has three branches, which are implemented for representation learning, classifier learning and grade-related prior-knowledge learning, respectively. The former two branches deal with the issue of class-imbalanced training samples, while the latter one embeds the grade-related prior-knowledge via a novel auxiliary module, termed contrastive embedding module. The proposed auxiliary module takes the features embedded by different branches as input, and accordingly constructs positive and negative embeddings for the model to deploy grade-related prior-knowledge via contrastive learning. Extensive experiments on our private and two publicly available disease grading datasets show that our TBN-CROWN can effectively tackle the class-imbalance problem and yield a satisfactory grading accuracy for various diseases, such as fatigue fracture, ulcerative colitis, and diabetic retinopathy.


Subject(s)
Diabetic Retinopathy , Physicians , Humans , Learning
5.
Med Image Anal ; 93: 103102, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38367598

ABSTRACT

Rare diseases are characterized by low prevalence and are often chronically debilitating or life-threatening. Imaging phenotype classification of rare diseases is challenging due to the severe shortage of training examples. Few-shot learning (FSL) methods tackle this challenge by extracting generalizable prior knowledge from a large base dataset of common diseases and normal controls and transferring the knowledge to rare diseases. Yet, most existing methods require the base dataset to be labeled and do not make full use of the precious examples of rare diseases. In addition, the extremely small size of the training samples may result in inter-class performance imbalance due to insufficient sampling of the true distributions. To this end, we propose in this work a novel hybrid approach to rare disease imaging phenotype classification, featuring three key novelties targeted at the above drawbacks. First, we adopt the unsupervised representation learning (URL) based on self-supervising contrastive loss, whereby to eliminate the overhead in labeling the base dataset. Second, we integrate the URL with pseudo-label supervised classification for effective self-distillation of the knowledge about the rare diseases, composing a hybrid approach taking advantage of both unsupervised and (pseudo-) supervised learning on the base dataset. Third, we use the feature dispersion to assess the intra-class diversity of training samples, to alleviate the inter-class performance imbalance via dispersion-aware correction. Experimental results of imaging phenotype classification of both simulated (skin lesions and cervical smears) and real clinical rare diseases (retinal diseases) show that our hybrid approach substantially outperforms existing FSL methods (including those using a fully supervised base dataset) via effective integration of the URL, pseudo-label driven self-distillation, and dispersion-aware imbalance correction, thus establishing a new state of the art.


Subject(s)
Rare Diseases , Retinal Diseases , Humans , Phenotype , Diagnostic Imaging
6.
Med Image Anal ; 93: 103095, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38310678

ABSTRACT

Segmenting prostate from magnetic resonance imaging (MRI) is a critical procedure in prostate cancer staging and treatment planning. Considering the nature of labeled data scarcity for medical images, semi-supervised learning (SSL) becomes an appealing solution since it can simultaneously exploit limited labeled data and a large amount of unlabeled data. However, SSL relies on the assumption that the unlabeled images are abundant, which may not be satisfied when the local institute has limited image collection capabilities. An intuitive solution is to seek support from other centers to enrich the unlabeled image pool. However, this further introduces data heterogeneity, which can impede SSL that works under identical data distribution with certain model assumptions. Aiming at this under-explored yet valuable scenario, in this work, we propose a separated collaborative learning (SCL) framework for semi-supervised prostate segmentation with multi-site unlabeled MRI data. Specifically, on top of the teacher-student framework, SCL exploits multi-site unlabeled data by: (i) Local learning, which advocates local distribution fitting, including the pseudo label learning that reinforces confirmation of low-entropy easy regions and the cyclic propagated real label learning that leverages class prototypes to regularize the distribution of intra-class features; (ii) External multi-site learning, which aims to robustly mine informative clues from external data, mainly including the local-support category mutual dependence learning, which takes the spirit that mutual information can effectively measure the amount of information shared by two variables even from different domains, and the stability learning under strong adversarial perturbations to enhance robustness to heterogeneity. Extensive experiments on prostate MRI data from six different clinical centers show that our method can effectively generalize SSL on multi-site unlabeled data and significantly outperform other semi-supervised segmentation methods. Besides, we validate the extensibility of our method on the multi-class cardiac MRI segmentation task with data from four different clinical centers.


Subject(s)
Interdisciplinary Placement , Prostatic Neoplasms , Male , Humans , Prostate/diagnostic imaging , Prostatic Neoplasms/diagnostic imaging , Entropy , Magnetic Resonance Imaging
7.
Article in English | MEDLINE | ID: mdl-38294925

ABSTRACT

Federated learning enables multiple hospitals to cooperatively learn a shared model without privacy disclosure. Existing methods often take a common assumption that the data from different hospitals have the same modalities. However, such a setting is difficult to fully satisfy in practical applications, since the imaging guidelines may be different between hospitals, which makes the number of individuals with the same set of modalities limited. To this end, we formulate this practical-yet-challenging cross-modal vertical federated learning task, in which data from multiple hospitals have different modalities with a small amount of multi-modality data collected from the same individuals. To tackle such a situation, we develop a novel framework, namely Federated Consistent Regularization constrained Feature Disentanglement (Fed-CRFD), for boosting MRI reconstruction by effectively exploring the overlapping samples (i.e., same patients with different modalities at different hospitals) and solving the domain shift problem caused by different modalities. Particularly, our Fed-CRFD involves an intra-client feature disentangle scheme to decouple data into modality-invariant and modality-specific features, where the modality-invariant features are leveraged to mitigate the domain shift problem. In addition, a cross-client latent representation consistency constraint is proposed specifically for the overlapping samples to further align the modality-invariant features extracted from different modalities. Hence, our method can fully exploit the multi-source data from hospitals while alleviating the domain shift problem. Extensive experiments on two typical MRI datasets demonstrate that our network clearly outperforms state-of-the-art MRI reconstruction methods. The source code is available at https://github.com/IAMJackYan/FedCRFD.

8.
Med Phys ; 51(3): 1832-1846, 2024 Mar.
Article in English | MEDLINE | ID: mdl-37672318

ABSTRACT

BACKGROUND: View planning for the acquisition of cardiac magnetic resonance (CMR) imaging remains a demanding task in clinical practice. PURPOSE: Existing approaches to its automation relied either on an additional volumetric image not typically acquired in clinic routine, or on laborious manual annotations of cardiac structural landmarks. This work presents a clinic-compatible, annotation-free system for automatic CMR view planning. METHODS: The system mines the spatial relationship-more specifically, locates the intersecting lines-between the target planes and source views, and trains U-Net-based deep networks to regress heatmaps defined by distances from the intersecting lines. On the one hand, the intersection lines are the prescription lines prescribed by the technologists at the time of image acquisition using cardiac landmarks, and retrospectively identified from the spatial relationship. On the other hand, as the spatial relationship is self-contained in properly stored data, for example, in the DICOM format, the need for additional manual annotation is eliminated. In addition, the interplay of the multiple target planes predicted in a source view is utilized in a stacked hourglass architecture consisting of repeated U-Net-style building blocks to gradually improve the regression. Then, a multiview planning strategy is proposed to aggregate information from the predicted heatmaps for all the source views of a target plane, for a globally optimal prescription, mimicking the similar strategy practiced by skilled human prescribers. For performance evaluation, the retrospectively identified planes prescribed by the technologists are used as the ground truth, and the plane angle differences and localization distances between the planes prescribed by our system and the ground truth are compared. RESULTS: The retrospective experiments include 181 clinical CMR exams, which are randomly split into training, validation, and test sets in the ratio of 64:16:20. Our system yields the mean angular difference and point-to-plane distance of 5.68 ∘ $^\circ$ and 3.12 mm, respectively, on the held-out test set. It not only achieves superior accuracy to existing approaches including conventional atlas-based and newer deep-learning-based in prescribing the four standard CMR planes but also demonstrates prescription of the first cardiac-anatomy-oriented plane(s) from the body-oriented scout. CONCLUSIONS: The proposed system demonstrates accurate automatic CMR view plane prescription based on deep learning on properly archived data, without the need for further manual annotation. This work opens a new direction for automatic view planning of anatomy-oriented medical imaging beyond CMR.


Subject(s)
Heart , Magnetic Resonance Imaging, Cine , Humans , Retrospective Studies , Magnetic Resonance Imaging, Cine/methods , Heart/diagnostic imaging , Magnetic Resonance Imaging , Automation
9.
IEEE Trans Med Imaging ; 43(1): 489-502, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37656650

ABSTRACT

X-ray computed tomography (CT) has been broadly adopted in clinical applications for disease diagnosis and image-guided interventions. However, metals within patients always cause unfavorable artifacts in the recovered CT images. Albeit attaining promising reconstruction results for this metal artifact reduction (MAR) task, most of the existing deep-learning-based approaches have some limitations. The critical issue is that most of these methods have not fully exploited the important prior knowledge underlying this specific MAR task. Therefore, in this paper, we carefully investigate the inherent characteristics of metal artifacts which present rotationally symmetrical streaking patterns. Then we specifically propose an orientation-shared convolution representation mechanism to adapt such physical prior structures and utilize Fourier-series-expansion-based filter parametrization for modelling artifacts, which can finely separate metal artifacts from body tissues. By adopting the classical proximal gradient algorithm to solve the model and then utilizing the deep unfolding technique, we easily build the corresponding orientation-shared convolutional network, termed as OSCNet. Furthermore, considering that different sizes and types of metals would lead to different artifact patterns (e.g., intensity of the artifacts), to better improve the flexibility of artifact learning and fully exploit the reconstructed results at iterative stages for information propagation, we design a simple-yet-effective sub-network for the dynamic convolution representation of artifacts. By easily integrating the sub-network into the proposed OSCNet framework, we further construct a more flexible network structure, called OSCNet+, which improves the generalization performance. Through extensive experiments conducted on synthetic and clinical datasets, we comprehensively substantiate the effectiveness of our proposed methods. Code will be released at https://github.com/hongwang01/OSCNet.


Subject(s)
Artifacts , Image Processing, Computer-Assisted , Humans , Image Processing, Computer-Assisted/methods , Tomography, X-Ray Computed/methods , Algorithms , Metals , Phantoms, Imaging
10.
IEEE J Biomed Health Inform ; 28(2): 858-869, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38032774

ABSTRACT

Medical image segmentation is a critical task for clinical diagnosis and research. However, dealing with highly imbalanced data remains a significant challenge in this domain, where the region of interest (ROI) may exhibit substantial variations across different slices. This presents a significant hurdle to medical image segmentation, as conventional segmentation methods may either overlook the minority class or overly emphasize the majority class, ultimately leading to a decrease in the overall generalization ability of the segmentation results. To overcome this, we propose a novel approach based on multi-step reinforcement learning, which integrates prior knowledge of medical images and pixel-wise segmentation difficulty into the reward function. Our method treats each pixel as an individual agent, utilizing diverse actions to evaluate its relevance for segmentation. To validate the effectiveness of our approach, we conduct experiments on four imbalanced medical datasets, and the results show that our approach surpasses other state-of-the-art methods in highly imbalanced scenarios. These findings hold substantial implications for clinical diagnosis and research.


Subject(s)
Algorithms , Imaging, Three-Dimensional , Humans , Imaging, Three-Dimensional/methods , Image Interpretation, Computer-Assisted/methods , Image Processing, Computer-Assisted/methods
11.
Med Image Anal ; 91: 103019, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37944431

ABSTRACT

Layer segmentation is important to quantitative analysis of retinal optical coherence tomography (OCT). Recently, deep learning based methods have been developed to automate this task and yield remarkable performance. However, due to the large spatial gap and potential mismatch between the B-scans of an OCT volume, all of them were based on 2D segmentation of individual B-scans, which may lose the continuity and diagnostic information of the retinal layers in 3D space. Besides, most of these methods required dense annotation of the OCT volumes, which is labor-intensive and expertise-demanding. This work presents a novel framework based on hybrid 2D-3D convolutional neural networks (CNNs) to obtain continuous 3D retinal layer surfaces from OCT volumes, which works well with both full and sparse annotations. The 2D features of individual B-scans are extracted by an encoder consisting of 2D convolutions. These 2D features are then used to produce the alignment displacement vectors and layer segmentation by two 3D decoders coupled via a spatial transformer module. Two losses are proposed to utilize the retinal layers' natural property of being smooth for B-scan alignment and layer segmentation, respectively, and are the key to the semi-supervised learning with sparse annotation. The entire framework is trained end-to-end. To the best of our knowledge, this is the first work that attempts 3D retinal layer segmentation in volumetric OCT images based on CNNs. Experiments on a synthetic dataset and three public clinical datasets show that our framework can effectively align the B-scans for potential motion correction, and achieves superior performance to state-of-the-art 2D deep learning methods in terms of both layer segmentation accuracy and cross-B-scan 3D continuity in both fully and semi-supervised settings, thus offering more clinical values than previous works.


Subject(s)
Retina , Tomography, Optical Coherence , Humans , Retina/diagnostic imaging , Neural Networks, Computer , Supervised Machine Learning
12.
NPJ Digit Med ; 6(1): 226, 2023 Dec 02.
Article in English | MEDLINE | ID: mdl-38042919

ABSTRACT

Deep neural networks have been integrated into the whole clinical decision procedure which can improve the efficiency of diagnosis and alleviate the heavy workload of physicians. Since most neural networks are supervised, their performance heavily depends on the volume and quality of available labels. However, few such labels exist for rare diseases (e.g., new pandemics). Here we report a medical multimodal large language model (Med-MLLM) for radiograph representation learning, which can learn broad medical knowledge (e.g., image understanding, text semantics, and clinical phenotypes) from unlabelled data. As a result, when encountering a rare disease, our Med-MLLM can be rapidly deployed and easily adapted to them with limited labels. Furthermore, our model supports medical data across visual modality (e.g., chest X-ray and CT) and textual modality (e.g., medical report and free-text clinical note); therefore, it can be used for clinical tasks that involve both visual and textual data. We demonstrate the effectiveness of our Med-MLLM by showing how it would perform using the COVID-19 pandemic "in replay". In the retrospective setting, we test the model on the early COVID-19 datasets; and in the prospective setting, we test the model on the new variant COVID-19-Omicron. The experiments are conducted on 1) three kinds of input data; 2) three kinds of downstream tasks, including disease reporting, diagnosis, and prognosis; 3) five COVID-19 datasets; and 4) three different languages, including English, Chinese, and Spanish. All experiments show that our model can make accurate and robust COVID-19 decision-support with little labelled data.

13.
IEEE Trans Med Imaging ; PP2023 Nov 23.
Article in English | MEDLINE | ID: mdl-37995172

ABSTRACT

Deep learning based methods for medical images can be easily compromised by adversarial examples (AEs), posing a great security flaw in clinical decision-making. It has been discovered that conventional adversarial attacks like PGD which optimize the classification logits, are easy to distinguish in the feature space, resulting in accurate reactive defenses. To better understand this phenomenon and reassess the reliability of the reactive defenses for medical AEs, we thoroughly investigate the characteristic of conventional medical AEs. Specifically, we first theoretically prove that conventional adversarial attacks change the outputs by continuously optimizing vulnerable features in a fixed direction, thereby leading to outlier representations in the feature space. Then, a stress test is conducted to reveal the vulnerability of medical images, by comparing with natural images. Interestingly, this vulnerability is a double-edged sword, which can be exploited to hide AEs. We then propose a simple-yet-effective hierarchical feature constraint (HFC), a novel add-on to conventional white-box attacks, which assists to hide the adversarial feature in the target feature distribution. The proposed method is evaluated on three medical datasets, both 2D and 3D, with different modalities. The experimental results demonstrate the superiority of HFC, i.e., it bypasses an array of state-of-the-art adversarial medical AE detectors more efficiently than competing adaptive attacks1, which reveals the deficiencies of medical reactive defense and allows to develop more robust defenses in future.

14.
Pattern Recognit ; 138: None, 2023 Jun.
Article in English | MEDLINE | ID: mdl-37781685

ABSTRACT

Supervised machine learning methods have been widely developed for segmentation tasks in recent years. However, the quality of labels has high impact on the predictive performance of these algorithms. This issue is particularly acute in the medical image domain, where both the cost of annotation and the inter-observer variability are high. Different human experts contribute estimates of the "actual" segmentation labels in a typical label acquisition process, influenced by their personal biases and competency levels. The performance of automatic segmentation algorithms is limited when these noisy labels are used as the expert consensus label. In this work, we use two coupled CNNs to jointly learn, from purely noisy observations alone, the reliability of individual annotators and the expert consensus label distributions. The separation of the two is achieved by maximally describing the annotator's "unreliable behavior" (we call it "maximally unreliable") while achieving high fidelity with the noisy training data. We first create a toy segmentation dataset using MNIST and investigate the properties of the proposed algorithm. We then use three public medical imaging segmentation datasets to demonstrate our method's efficacy, including both simulated (where necessary) and real-world annotations: 1) ISBI2015 (multiple-sclerosis lesions); 2) BraTS (brain tumors); 3) LIDC-IDRI (lung abnormalities). Finally, we create a real-world multiple sclerosis lesion dataset (QSMSC at UCL: Queen Square Multiple Sclerosis Center at UCL, UK) with manual segmentations from 4 different annotators (3 radiologists with different level skills and 1 expert to generate the expert consensus label). In all datasets, our method consistently outperforms competing methods and relevant baselines, especially when the number of annotations is small and the amount of disagreement is large. The studies also reveal that the system is capable of capturing the complicated spatial characteristics of annotators' mistakes.

15.
Med Image Anal ; 90: 102973, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37757643

ABSTRACT

In the field of medical image analysis, accurate lesion segmentation is beneficial for the subsequent clinical diagnosis and treatment planning. Currently, various deep learning-based methods have been proposed to deal with the segmentation task. Albeit achieving some promising performances, the fully-supervised learning approaches require pixel-level annotations for model training, which is tedious and time-consuming for experienced radiologists to collect. In this paper, we propose a weakly semi-supervised segmentation framework, called Point Segmentation Transformer (Point SEGTR). Particularly, the framework utilizes a small amount of fully-supervised data with pixel-level segmentation masks and a large amount of weakly-supervised data with point-level annotations (i.e., annotating a point inside each object) for network training, which largely reduces the demand of pixel-level annotations significantly. To fully exploit the pixel-level and point-level annotations, we propose two regularization terms, i.e., multi-point consistency and symmetric consistency, to boost the quality of pseudo labels, which are then adopted to train a student model for inference. Extensive experiments are conducted on three endoscopy datasets with different lesion structures and several body sites (e.g., colorectal and nasopharynx). Comprehensive experimental results finely substantiate the effectiveness and the generality of our proposed method, as well as its potential to loosen the requirements of pixel-level annotations, which is valuable for clinical applications.

16.
Med Image Anal ; 89: 102933, 2023 10.
Article in English | MEDLINE | ID: mdl-37611532

ABSTRACT

Nuclei segmentation is a crucial task for whole slide image analysis in digital pathology. Generally, the segmentation performance of fully-supervised learning heavily depends on the amount and quality of the annotated data. However, it is time-consuming and expensive for professional pathologists to provide accurate pixel-level ground truth, while it is much easier to get coarse labels such as point annotations. In this paper, we propose a weakly-supervised learning method for nuclei segmentation that only requires point annotations for training. First, coarse pixel-level labels are derived from the point annotations based on the Voronoi diagram and the k-means clustering method to avoid overfitting. Second, a co-training strategy with an exponential moving average method is designed to refine the incomplete supervision of the coarse labels. Third, a self-supervised visual representation learning method is tailored for nuclei segmentation of pathology images that transforms the hematoxylin component images into the H&E stained images to gain better understanding of the relationship between the nuclei and cytoplasm. We comprehensively evaluate the proposed method using two public datasets. Both visual and quantitative results demonstrate the superiority of our method to the state-of-the-art methods, and its competitive performance compared to the fully-supervised methods. Codes are available at https://github.com/hust-linyi/SC-Net.


Subject(s)
Cell Nucleus , Image Processing, Computer-Assisted , Humans , Hematoxylin , Supervised Machine Learning
17.
Med Image Anal ; 88: 102880, 2023 08.
Article in English | MEDLINE | ID: mdl-37413792

ABSTRACT

Semi-supervised learning has greatly advanced medical image segmentation since it effectively alleviates the need of acquiring abundant annotations from experts, wherein the mean-teacher model, known as a milestone of perturbed consistency learning, commonly serves as a standard and simple baseline. Inherently, learning from consistency can be regarded as learning from stability under perturbations. Recent improvement leans toward more complex consistency learning frameworks, yet, little attention is paid to the consistency target selection. Considering that the ambiguous regions from unlabeled data contain more informative complementary clues, in this paper, we improve the mean-teacher model to a novel ambiguity-consensus mean-teacher (AC-MT) model. Particularly, we comprehensively introduce and benchmark a family of plug-and-play strategies for ambiguous target selection from the perspectives of entropy, model uncertainty and label noise self-identification, respectively. Then, the estimated ambiguity map is incorporated into the consistency loss to encourage consensus between the two models' predictions in these informative regions. In essence, our AC-MT aims to find out the most worthwhile voxel-wise targets from the unlabeled data, and the model especially learns from the perturbed stability of these informative regions. The proposed methods are extensively evaluated on left atrium segmentation and brain tumor segmentation. Encouragingly, our strategies bring substantial improvement over recent state-of-the-art methods. The ablation study further demonstrates our hypothesis and shows impressive results under various extreme annotation conditions.


Subject(s)
Benchmarking , Brain Neoplasms , Humans , Brain Neoplasms/diagnostic imaging , Consensus , Entropy , Heart Atria , Supervised Machine Learning , Image Processing, Computer-Assisted
18.
IEEE Trans Pattern Anal Mach Intell ; 45(11): 13553-13566, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37432804

ABSTRACT

Unsupervised domain adaption has been widely adopted in tasks with scarce annotated data. Unfortunately, mapping the target-domain distribution to the source-domain unconditionally may distort the essential structural information of the target-domain data, leading to inferior performance. To address this issue, we first propose to introduce active sample selection to assist domain adaptation regarding the semantic segmentation task. By innovatively adopting multiple anchors instead of a single centroid, both source and target domains can be better characterized as multimodal distributions, in which way more complementary and informative samples are selected from the target domain. With only a little workload to manually annotate these active samples, the distortion of the target-domain distribution can be effectively alleviated, achieving a large performance gain. In addition, a powerful semi-supervised domain adaptation strategy is proposed to alleviate the long-tail distribution problem and further improve the segmentation performance. Extensive experiments are conducted on public datasets, and the results demonstrate that the proposed approach outperforms state-of-the-art methods by large margins and achieves similar performance to the fully-supervised upperbound, i.e., 71.4% mIoU on GTA5 and 71.8% mIoU on SYNTHIA. The effectiveness of each component is also verified by thorough ablation studies.

19.
IEEE Trans Med Imaging ; 42(12): 3579-3589, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37440389

ABSTRACT

Medical contrastive vision-language pretraining has shown great promise in many downstream tasks, such as data-efficient/zero-shot recognition. Current studies pretrain the network with contrastive loss by treating the paired image-reports as positive samples and the unpaired ones as negative samples. However, unlike natural datasets, many medical images or reports from different cases could have large similarity especially for the normal cases, and treating all the unpaired ones as negative samples could undermine the learned semantic structure and impose an adverse effect on the representations. Therefore, we design a simple yet effective approach for better contrastive learning in medical vision-language field. Specifically, by simplifying the computation of similarity between medical image-report pairs into the calculation of the inter-report similarity, the image-report tuples are divided into positive, negative, and additional neutral groups. With this better categorization of samples, more suitable contrastive loss is constructed. For evaluation, we perform extensive experiments by applying the proposed model-agnostic strategy to two state-of-the-art pretraining frameworks. The consistent improvements on four common downstream tasks, including cross-modal retrieval, zero-shot/data-efficient image classification, and image segmentation, demonstrate the effectiveness of the proposed strategy in medical field.


Subject(s)
Semantics , Triage , Language
20.
IEEE Trans Med Imaging ; 42(10): 3000-3011, 2023 10.
Article in English | MEDLINE | ID: mdl-37145949

ABSTRACT

Pathological primary tumor (pT) stage focuses on the infiltration degree of the primary tumor to surrounding tissues, which relates to the prognosis and treatment choices. The pT staging relies on the field-of-views from multiple magnifications in the gigapixel images, which makes pixel-level annotation difficult. Therefore, this task is usually formulated as a weakly supervised whole slide image (WSI) classification task with the slide-level label. Existing weakly-supervised classification methods mainly follow the multiple instance learning paradigm, which takes the patches from single magnification as the instances and extracts their morphological features independently. However, they cannot progressively represent the contextual information from multiple magnifications, which is critical for pT staging. Therefore, we propose a structure-aware hierarchical graph-based multi-instance learning framework (SGMF) inspired by the diagnostic process of pathologists. Specifically, a novel graph-based instance organization method is proposed, namely structure-aware hierarchical graph (SAHG), to represent the WSI. Based on that, we design a novel hierarchical attention-based graph representation (HAGR) network to capture the critical patterns for pT staging by learning cross-scale spatial features. Finally, the top nodes of SAHG are aggregated by a global attention layer for bag-level representation. Extensive studies on three large-scale multi-center pT staging datasets with two different cancer types demonstrate the effectiveness of SGMF, which outperforms state-of-the-art up to 5.6% in the F1 score.


Subject(s)
Deep Learning , Image Processing, Computer-Assisted
SELECTION OF CITATIONS
SEARCH DETAIL
...