RESUMEN
Medical image segmentation plays a critical role in accurate diagnosis and treatment planning, enabling precise analysis across a wide range of clinical tasks. This review begins by offering a comprehensive overview of traditional segmentation techniques, including thresholding, edge-based methods, region-based approaches, clustering, and graph-based segmentation. While these methods are computationally efficient and interpretable, they often face significant challenges when applied to complex, noisy, or variable medical images. The central focus of this review is the transformative impact of deep learning on medical image segmentation. We delve into prominent deep learning architectures such as Convolutional Neural Networks (CNNs), Fully Convolutional Networks (FCNs), U-Net, Recurrent Neural Networks (RNNs), Adversarial Networks (GANs), and Autoencoders (AEs). Each architecture is analyzed in terms of its structural foundation and specific application to medical image segmentation, illustrating how these models have enhanced segmentation accuracy across various clinical contexts. Finally, the review examines the integration of deep learning with traditional segmentation methods, addressing the limitations of both approaches. These hybrid strategies offer improved segmentation performance, particularly in challenging scenarios involving weak edges, noise, or inconsistent intensities. By synthesizing recent advancements, this review provides a detailed resource for researchers and practitioners, offering valuable insights into the current landscape and future directions of medical image segmentation.
RESUMEN
Diffusion magnetic resonance imaging (dMRI) currently stands as the foremost noninvasive method for quantifying brain tissue microstructure and reconstructing white matter fiber pathways. However, the inherent free diffusion motion of water molecules in dMRI results in signal decay, diminishing the signal-to-noise ratio (SNR) and adversely affecting the accuracy and precision of microstructural data. In response to this challenge, we propose a novel method known as the Multiscale Fast Attention-Multibranch Irregular Convolutional Neural Network for dMRI image denoising. In this work, we introduce Multiscale Fast Channel Attention, a novel approach for efficient multiscale feature extraction with attention weight computation across feature channels. This enhances the model's capability to capture complex features and improves overall performance. Furthermore, we propose a multi-branch irregular convolutional architecture that effectively disrupts spatial noise correlation and captures noise features, thereby further enhancing the denoising performance of the model. Lastly, we design a novel loss function, which ensures excellent performance in both edge and flat regions. Experimental results demonstrate that the proposed method outperforms other state-of-the-art deep learning denoising methods in both quantitative and qualitative aspects for dMRI image denoising with fewer parameters and faster operational speed.
Asunto(s)
Imagen de Difusión por Resonancia Magnética , Procesamiento de Imagen Asistido por Computador , Redes Neurales de la Computación , Relación Señal-Ruido , Procesamiento de Imagen Asistido por Computador/métodos , HumanosRESUMEN
Artificial intelligence (AI) and machine learning (ML) aim to mimic human intelligence and enhance decision making processes across various fields. A key performance determinant in a ML model is the ratio between the training and testing dataset. This research investigates the impact of varying train-test split ratios on machine learning model performance and generalization capabilities using the BraTS 2013 dataset. Logistic regression, random forest, k nearest neighbors, and support vector machines were trained with split ratios ranging from 60:40 to 95:05. Findings reveal significant variations in accuracies across these ratios, emphasizing the critical need to strike a balance to avoid overfitting or underfitting. The study underscores the importance of selecting an optimal train-test split ratio that considers tradeoffs such as model performance metrics, statistical measures, and resource constraints. Ultimately, these insights contribute to a deeper understanding of how ratio selection impacts the effectiveness and reliability of machine learning applications across diverse fields.
RESUMEN
Polyp segmentation remains challenging for two reasons: (a) the size and shape of colon polyps are variable and diverse; (b) the distinction between polyps and mucosa is not obvious. To solve the above two challenging problems and enhance the generalization ability of segmentation method, we propose the Localized Transformer Fusion with Balanced Constraint (BCL-Former) for Polyp Segmentation. In BCL-Former, the Strip Local Enhancement module (SLE module) is proposed to capture the enhanced local features. The Progressive Feature Fusion module (PFF module) is presented to make the feature aggregation smoother and eliminate the difference between high-level and low-level features. Moreover, the Tversky-based Appropriate Constrained Loss (TacLoss) is proposed to achieve the balance and constraint between True Positives and False Negatives, improving the ability to generalize across datasets. Extensive experiments are conducted on four benchmark datasets. Results show that our proposed method achieves state-of-the-art performance in both segmentation precision and generalization ability. Also, the proposed method is 5%-8% faster than the benchmark method in training and inference. The code is available at: https://github.com/sjc-lbj/BCL-Former.
RESUMEN
Background Dental caries is one of the most prevalent conditions in dentistry worldwide. Early identification and classification of dental caries are essential for effective prevention and treatment. Panoramic dental radiographs are commonly used to screen for overall oral health, including dental caries and tooth anomalies. However, manual interpretation of these radiographs can be time-consuming and prone to human error. Therefore, an automated classification system could help streamline diagnostic workflows and provide timely insights for clinicians. Methods This article presents a deep learning-based, custom-built model for the binary classification of panoramic dental radiographs. The use of histogram equalization and filtering methods as preprocessing techniques effectively addresses issues related to irregular illumination and contrast in dental radiographs, enhancing overall image quality. By incorporating three separate panoramic dental radiograph datasets, the model benefits from a diverse dataset that improves its training and evaluation process across a wide range of caries and abnormalities. Results The dental radiograph analysis model is designed for binary classification to detect the presence of dental caries, restorations, and periapical region abnormalities, achieving accuracies of 97.01%, 81.63%, and 77.53%, respectively. Conclusions The proposed algorithm extracts discriminative features from dental radiographs, detecting subtle patterns indicative of tooth caries, restorations, and region-based abnormalities. Automating this classification could assist dentists in the early detection of caries and anomalies, aid in treatment planning, and enhance the monitoring of dental diseases, ultimately improving and promoting patients' oral healthcare.
RESUMEN
BACKGROUND: Chest X-ray image classification for multiple diseases is an important research direction in the field of computer vision and medical image processing. It aims to utilize advanced image processing techniques and deep learning algorithms to automatically analyze and identify X-ray images, determining whether specific pathologies or structural abnormalities exist in the images. OBJECTIVE: We present the MMPDenseNet network designed specifically for chest multi-label disease classification. METHODS: Initially, the network employs the adaptive activation function Meta-ACON to enhance feature representation. Subsequently, the network incorporates a multi-head self-attention mechanism, merging the conventional convolutional neural network with the Transformer, thereby bolstering the ability to extract both local and global features. Ultimately, the network integrates a pyramid squeeze attention module to capture spatial information and enrich the feature space. RESULTS: The concluding experiment yielded an average AUC of 0.898, marking an average accuracy improvement of 0.6% over the baseline model. When compared with the original network, the experimental results highlight that MMPDenseNet considerably elevates the classification accuracy of various chest diseases. CONCLUSION: It can be concluded that the network, thus, holds substantial value for clinical applications.
Asunto(s)
Redes Neurales de la Computación , Radiografía Torácica , Humanos , Radiografía Torácica/métodos , Aprendizaje Profundo , Algoritmos , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Procesamiento de Imagen Asistido por Computador/métodosRESUMEN
Hyperspectral imaging has demonstrated its potential to provide correlated spatial and spectral information of a sample by a non-contact and non-invasive technology. In the medical field, especially in histopathology, HSI has been applied for the classification and identification of diseased tissue and for the characterization of its morphological properties. In this work, we propose a hybrid scheme to classify non-tumor and tumor histological brain samples by hyperspectral imaging. The proposed approach is based on the identification of characteristic components in a hyperspectral image by linear unmixing, as a features engineering step, and the subsequent classification by a deep learning approach. For this last step, an ensemble of deep neural networks is evaluated by a cross-validation scheme on an augmented dataset and a transfer learning scheme. The proposed method can classify histological brain samples with an average accuracy of 88%, and reduced variability, computational cost, and inference times, which presents an advantage over methods in the state-of-the-art. Hence, the work demonstrates the potential of hybrid classification methodologies to achieve robust and reliable results by combining linear unmixing for features extraction and deep learning for classification.
RESUMEN
Surgical robotics application in the field of minimally invasive surgery has developed rapidly and has been attracting increasingly more research attention in recent years. A common consensus has been reached that surgical procedures are to become less traumatic and with the implementation of more intelligence and higher autonomy, which is a serious challenge faced by the environmental sensing capabilities of robotic systems. One of the main sources of environmental information for robots are images, which are the basis of robot vision. In this review article, we divide clinical image into direct and indirect based on the object of information acquisition, and into continuous, intermittent continuous, and discontinuous according to the target-tracking frequency. The characteristics and applications of the existing surgical robots in each category are introduced based on these two dimensions. Our purpose in conducting this review was to analyze, summarize, and discuss the current evidence on the general rules on the application of image technologies for medical purposes. Our analysis gives insight and provides guidance conducive to the development of more advanced surgical robotics systems in the future.
RESUMEN
Brain tumors occur due to the expansion of abnormal cell tissues and can be malignant (cancerous) or benign (not cancerous). Numerous factors such as the position, size, and progression rate are considered while detecting and diagnosing brain tumors. Detecting brain tumors in their initial phases is vital for diagnosis where MRI (magnetic resonance imaging) scans play an important role. Over the years, deep learning models have been extensively used for medical image processing. The current study primarily investigates the novel Fine-Tuned Vision Transformer models (FTVTs)-FTVT-b16, FTVT-b32, FTVT-l16, FTVT-l32-for brain tumor classification, while also comparing them with other established deep learning models such as ResNet50, MobileNet-V2, and EfficientNet - B0. A dataset with 7,023 images (MRI scans) categorized into four different classes, namely, glioma, meningioma, pituitary, and no tumor are used for classification. Further, the study presents a comparative analysis of these models including their accuracies and other evaluation metrics including recall, precision, and F1-score across each class. The deep learning models ResNet-50, EfficientNet-B0, and MobileNet-V2 obtained an accuracy of 96.5%, 95.1%, and 94.9%, respectively. Among all the FTVT models, FTVT-l16 model achieved a remarkable accuracy of 98.70% whereas other FTVT models FTVT-b16, FTVT-b32, and FTVT-132 achieved an accuracy of 98.09%, 96.87%, 98.62%, respectively, hence proving the efficacy and robustness of FTVT's in medical image processing.
RESUMEN
Background and Objective: Coronary artery disease remains a leading cause of mortality among individuals with cardiovascular conditions. The therapeutic use of bioresorbable vascular scaffolds (BVSs) through stent implantation is common, yet the effectiveness of current BVS segmentation techniques from Intravascular Optical Coherence Tomography (IVOCT) images is inadequate. Methods: This paper introduces an enhanced segmentation approach using a novel Wavelet-based U-shape network to address these challenges. We developed a Wavelet-based U-shape network that incorporates an Attention Gate (AG) and an Atrous Multi-scale Field Module (AMFM), designed to enhance the segmentation accuracy by improving the differentiation between the stent struts and the surrounding tissue. A unique wavelet fusion module mitigates the semantic gaps between different feature map branches, facilitating more effective feature integration. Results: Extensive experiments demonstrate that our model surpasses existing techniques in key metrics such as Dice coefficient, accuracy, sensitivity, and Intersection over Union (IoU), achieving scores of 85.10%, 99.77%, 86.93%, and 73.81%, respectively. The integration of AG, AMFM, and the fusion module played a crucial role in achieving these outcomes, indicating a significant enhancement in capturing detailed contextual information. Conclusion: The introduction of the Wavelet-based U-shape network marks a substantial improvement in the segmentation of BVSs in IVOCT images, suggesting potential benefits for clinical practices in coronary artery disease treatment. This approach may also be applicable to other intricate medical imaging segmentation tasks, indicating a broad scope for future research.
RESUMEN
The performance of existing lesion semantic segmentation models has shown a steady improvement with the introduction of mechanisms like attention, skip connections, and deep supervision. However, these advancements often come at the expense of computational requirements, necessitating powerful graphics processing units with substantial video memory. Consequently, certain models may exhibit poor or non-existent performance on more affordable edge devices, such as smartphones and other point-of-care devices. To tackle this challenge, our paper introduces a lesion segmentation model with a low parameter count and minimal operations. This model incorporates polar transformations to simplify images, facilitating faster training and improved performance. We leverage the characteristics of polar images by directing the model's focus to areas most likely to contain segmentation information, achieved through the introduction of a learning-efficient polar-based contrast attention (PCA). This design utilizes Hadamard products to implement a lightweight attention mechanism without significantly increasing model parameters and complexities. Furthermore, we present a novel skip cross-channel aggregation (SC2A) approach for sharing cross-channel corrections, introducing Gaussian depthwise convolution to enhance nonlinearity. Extensive experiments on the ISIC 2018 and Kvasir datasets demonstrate that our model surpasses state-of-the-art models while maintaining only about 25K parameters. Additionally, our proposed model exhibits strong generalization to cross-domain data, as confirmed through experiments on the PH2 dataset and CVC-Polyp dataset. In addition, we evaluate the model's performance in a mobile setting against other lightweight models. Notably, our proposed model outperforms other advanced models in terms of IoU and Dice score, and running time.
Asunto(s)
Redes Neurales de la Computación , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Aprendizaje Profundo , AlgoritmosRESUMEN
The incorporation of automatic segmentation methodologies into dental X-ray images refined the paradigms of clinical diagnostics and therapeutic planning by facilitating meticulous, pixel-level articulation of both dental structures and proximate tissues. This underpins the pillars of early pathological detection and meticulous disease progression monitoring. Nonetheless, conventional segmentation frameworks often encounter significant setbacks attributable to the intrinsic limitations of X-ray imaging, including compromised image fidelity, obscured delineation of structural boundaries, and the intricate anatomical structures of dental constituents such as pulp, enamel, and dentin. To surmount these impediments, we propose the Deformable Convolution and Mamba Integration Network, an innovative 2D dental X-ray image segmentation architecture, which amalgamates a Coalescent Structural Deformable Encoder, a Cognitively-Optimized Semantic Enhance Module, and a Hierarchical Convergence Decoder. Collectively, these components bolster the management of multi-scale global features, fortify the stability of feature representation, and refine the amalgamation of feature vectors. A comparative assessment against 14 baselines underscores its efficacy, registering a 0.95% enhancement in the Dice Coefficient and a diminution of the 95th percentile Hausdorff Distance to 7.494.
Asunto(s)
Procesamiento de Imagen Asistido por Computador , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Redes Neurales de la Computación , Algoritmos , Diente/diagnóstico por imagenRESUMEN
In clinical conditions limited by equipment, attaining lightweight skin lesion segmentation is pivotal as it facilitates the integration of the model into diverse medical devices, thereby enhancing operational efficiency. However, the lightweight design of the model may face accuracy degradation, especially when dealing with complex images such as skin lesion images with irregular regions, blurred boundaries, and oversized boundaries. To address these challenges, we propose an efficient lightweight attention network (ELANet) for the skin lesion segmentation task. In ELANet, two different attention mechanisms of the bilateral residual module (BRM) can achieve complementary information, which enhances the sensitivity to features in spatial and channel dimensions, respectively, and then multiple BRMs are stacked for efficient feature extraction of the input information. In addition, the network acquires global information and improves segmentation accuracy by putting feature maps of different scales through multi-scale attention fusion (MAF) operations. Finally, we evaluate the performance of ELANet on three publicly available datasets, ISIC2016, ISIC2017, and ISIC2018, and the experimental results show that our algorithm can achieve 89.87%, 81.85%, and 82.87% of the mIoU on the three datasets with a parametric of 0.459 M, which is an excellent balance between accuracy and lightness and is superior to many existing segmentation methods.
Asunto(s)
Algoritmos , Redes Neurales de la Computación , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Piel/diagnóstico por imagen , Piel/patologíaRESUMEN
BACKGROUND: The incidence of kidney tumors is progressively increasing each year. The precision of segmentation for kidney tumors is crucial for diagnosis and treatment. OBJECTIVE: To enhance accuracy and reduce manual involvement, propose a deep learning-based method for the automatic segmentation of kidneys and kidney tumors in CT images. METHODS: The proposed method comprises two parts: object detection and segmentation. We first use a model to detect the position of the kidney, then narrow the segmentation range, and finally use an attentional recurrent residual convolutional network for segmentation. RESULTS: Our model achieved a kidney dice score of 0.951 and a tumor dice score of 0.895 on the KiTS19 dataset. Experimental results show that our model significantly improves the accuracy of kidney and kidney tumor segmentation and outperforms other advanced methods. CONCLUSION: The proposed method provides an efficient and automatic solution for accurately segmenting kidneys and renal tumors on CT images. Additionally, this study can assist radiologists in assessing patients' conditions and making informed treatment decisions.
Asunto(s)
Aprendizaje Profundo , Neoplasias Renales , Tomografía Computarizada por Rayos X , Humanos , Neoplasias Renales/diagnóstico por imagen , Neoplasias Renales/patología , Tomografía Computarizada por Rayos X/métodos , Riñón/diagnóstico por imagen , Algoritmos , Redes Neurales de la Computación , Procesamiento de Imagen Asistido por Computador/métodosRESUMEN
Medical image segmentation commonly involves diverse tissue types and structures, including tasks such as blood vessel segmentation and nerve fiber bundle segmentation. Enhancing the continuity of segmentation outcomes represents a pivotal challenge in medical image segmentation, driven by the demands of clinical applications, focusing on disease localization and quantification. In this study, a novel segmentation model is specifically designed for retinal vessel segmentation, leveraging vessel orientation information, boundary constraints, and continuity constraints to improve segmentation accuracy. To achieve this, we cascade U-Net with a long-short-term memory network (LSTM). U-Net is characterized by a small number of parameters and high segmentation efficiency, while LSTM offers a parameter-sharing capability. Additionally, we introduce an orientation information enhancement module inserted into the model's bottom layer to obtain feature maps containing orientation information through an orientation convolution operator. Furthermore, we design a new hybrid loss function that consists of connectivity loss, boundary loss, and cross-entropy loss. Experimental results demonstrate that the model achieves excellent segmentation outcomes across three widely recognized retinal vessel segmentation datasets, CHASE_DB1, DRIVE, and ARIA.
Asunto(s)
Procesamiento de Imagen Asistido por Computador , Redes Neurales de la Computación , Vasos Retinianos , Humanos , Vasos Retinianos/diagnóstico por imagen , Procesamiento de Imagen Asistido por Computador/métodos , Algoritmos , Bases de Datos FactualesRESUMEN
The evaluation of mammographic breast density, a critical indicator of breast cancer risk, is traditionally performed by radiologists via visual inspection of mammography images, utilizing the Breast Imaging-Reporting and Data System (BI-RADS) breast density categories. However, this method is subject to substantial interobserver variability, leading to inconsistencies and potential inaccuracies in density assessment and subsequent risk estimations. To address this, we present a deep learning-based automatic detection algorithm (DLAD) designed for the automated evaluation of breast density. Our multicentric, multi-reader study leverages a diverse dataset of 122 full-field digital mammography studies (488 images in CC and MLO projections) sourced from three institutions. We invited two experienced radiologists to conduct a retrospective analysis, establishing a ground truth for 72 mammography studies (BI-RADS class A: 18, BI-RADS class B: 43, BI-RADS class C: 7, BI-RADS class D: 4). The efficacy of the DLAD was then compared to the performance of five independent radiologists with varying levels of experience. The DLAD showed robust performance, achieving an accuracy of 0.819 (95% CI: 0.736-0.903), along with an F1 score of 0.798 (0.594-0.905), precision of 0.806 (0.596-0.896), recall of 0.830 (0.650-0.946), and a Cohen's Kappa (κ) of 0.708 (0.562-0.841). The algorithm achieved robust performance that matches and in four cases exceeds that of individual radiologists. The statistical analysis did not reveal a significant difference in accuracy between DLAD and the radiologists, underscoring the model's competitive diagnostic alignment with professional radiologist assessments. These results demonstrate that the deep learning-based automatic detection algorithm can enhance the accuracy and consistency of breast density assessments, offering a reliable tool for improving breast cancer screening outcomes.
RESUMEN
BACKGROUND: The rapid development of deep learning techniques has greatly improved the performance of medical image segmentation, and medical image segmentation networks based on convolutional neural networks and Transformer have been widely used in this field. However, due to the limitation of the restricted receptive field of convolutional operation and the lack of local fine information extraction ability of the self-attention mechanism in Transformer, the current neural networks with pure convolutional or Transformer structure as the backbone still perform poorly in medical image segmentation. METHODS: In this paper, we propose FDB-Net (Fusion Double Branch Network, FDB-Net), a double branch medical image segmentation network combining CNN and Transformer, by using a CNN containing gnConv blocks and a Transformer containing Varied-Size Window Attention (VWA) blocks as the feature extraction backbone network, the dual-path encoder ensures that the network has a global receptive field as well as access to the target local detail features. We also propose a new feature fusion module (Deep Feature Fusion, DFF), which helps the image to simultaneously fuse features from two different structural encoders during the encoding process, ensuring the effective fusion of global and local information of the image. CONCLUSION: Our model achieves advanced results in all three typical tasks of medical image segmentation, which fully validates the effectiveness of FDB-Net.
Asunto(s)
Procesamiento de Imagen Asistido por Computador , Redes Neurales de la Computación , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Aprendizaje Profundo , Algoritmos , Tomografía Computarizada por Rayos X/métodosRESUMEN
Trabecular bone analysis plays a crucial role in understanding bone health and disease, with applications like osteoporosis diagnosis. This paper presents a comprehensive study on 3D trabecular computed tomography (CT) image restoration, addressing significant challenges in this domain. The research introduces a backbone model, Cascade-SwinUNETR, for single-view 3D CT image restoration. This model leverages deep layer aggregation with supervision and capabilities of Swin-Transformer to excel in feature extraction. Additionally, this study also brings DVSR3D, a dual-view restoration model, achieving good performance through deep feature fusion with attention mechanisms and Autoencoders. Furthermore, an Unsupervised Domain Adaptation (UDA) method is introduced, allowing models to adapt to input data distributions without additional labels, holding significant potential for real-world medical applications, and eliminating the need for invasive data collection procedures. The study also includes the curation of a new dual-view dataset for CT image restoration, addressing the scarcity of real human bone data in Micro-CT. Finally, the dual-view approach is validated through downstream medical bone microstructure measurements. Our contributions open several paths for trabecular bone analysis, promising improved clinical outcomes in bone health assessment and diagnosis.
Asunto(s)
Hueso Esponjoso , Aprendizaje Profundo , Imagenología Tridimensional , Tomografía Computarizada por Rayos X , Humanos , Tomografía Computarizada por Rayos X/métodos , Imagenología Tridimensional/métodos , Hueso Esponjoso/diagnóstico por imagenRESUMEN
Three-dimensional vessel model reconstruction from patient-specific magnetic resonance angiography (MRA) images often requires some manual maneuvers. This study aimed to establish the deep learning (DL)-based method for vessel model reconstruction. Time of flight MRA of 40 patients with internal carotid artery aneurysms was prepared, and three-dimensional vessel models were constructed using the threshold and region-growing method. Using those datasets, supervised deep learning using 2D U-net was performed to reconstruct 3D vessel models. The accuracy of the DL-based vessel segmentations was assessed using 20 MRA images outside the training dataset. The dice coefficient was used as the indicator of the model accuracy, and the blood flow simulation was performed using the DL-based vessel model. The created DL model could successfully reconstruct a three-dimensional model in all 60 cases. The dice coefficient in the test dataset was 0.859. Of note, the DL-generated model proved its efficacy even for large aneurysms (> 10 mm in their diameter). The reconstructed model was feasible in performing blood flow simulation to assist clinical decision-making. Our DL-based method could successfully reconstruct a three-dimensional vessel model with moderate accuracy. Future studies are warranted to exhibit that DL-based technology can promote medical image processing.
Asunto(s)
Aprendizaje Profundo , Imagenología Tridimensional , Angiografía por Resonancia Magnética , Humanos , Angiografía por Resonancia Magnética/métodos , Imagenología Tridimensional/métodos , Masculino , Femenino , Persona de Mediana Edad , Anciano , Adulto , Arteria Carótida Interna/diagnóstico por imagen , Procesamiento de Imagen Asistido por Computador/métodos , Aneurisma Intracraneal/diagnóstico por imagen , Aneurisma Intracraneal/fisiopatologíaRESUMEN
Congenital heart defects (CHD) are one of the serious problems that arise during pregnancy. Early CHD detection reduces death rates and morbidity but is hampered by the relatively low detection rates (i.e., 60%) of current screening technology. The detection rate could be increased by supplementing ultrasound imaging with fetal ultrasound image evaluation (FUSI) using deep learning techniques. As a result, the non-invasive foetal ultrasound image has clear potential in the diagnosis of CHD and should be considered in addition to foetal echocardiography. This review paper highlights cutting-edge technologies for detecting CHD using ultrasound images, which involve pre-processing, localization, segmentation, and classification. Existing technique of preprocessing includes spatial domain filter, non-linear mean filter, transform domain filter, and denoising methods based on Convolutional Neural Network (CNN); segmentation includes thresholding-based techniques, region growing-based techniques, edge detection techniques, Artificial Neural Network (ANN) based segmentation methods, non-deep learning approaches and deep learning approaches. The paper also suggests future research directions for improving current methodologies.