RESUMO
PURPOSE: Computer-aided diagnosis (CAD) systems on breast ultrasound (BUS) aim to increase the efficiency and effectiveness of breast screening, helping specialists to detect and classify breast lesions. CAD system development requires a set of annotated images, including lesion segmentation, biopsy results to specify benign and malignant cases, and BI-RADS categories to indicate the likelihood of malignancy. Besides, standardized partitions of training, validation, and test sets promote reproducibility and fair comparisons between different approaches. Thus, we present a publicly available BUS dataset whose novelty is the substantial increment of cases with the above-mentioned annotations and the inclusion of standardized partitions to objectively assess and compare CAD systems. ACQUISITION AND VALIDATION METHODS: The BUS dataset comprises 1875 anonymized images from 1064 female patients acquired via four ultrasound scanners during systematic studies at the National Institute of Cancer (Rio de Janeiro, Brazil). The dataset includes biopsy-proven tumors divided into 722 benign and 342 malignant cases. Besides, a senior ultrasonographer performed a BI-RADS assessment in categories 2 to 5. Additionally, the ultrasonographer manually outlined the breast lesions to obtain ground truth segmentations. Furthermore, 5- and 10-fold cross-validation partitions are provided to standardize the training and test sets to evaluate and reproduce CAD systems. Finally, to validate the utility of the BUS dataset, an evaluation framework is implemented to assess the performance of deep neural networks for segmenting and classifying breast lesions. DATA FORMAT AND USAGE NOTES: The BUS dataset is publicly available for academic and research purposes through an open-access repository under the name BUS-BRA: A Breast Ultrasound Dataset for Assessing CAD Systems. BUS images and reference segmentations are saved in Portable Network Graphic (PNG) format files, and the dataset information is stored in separate Comma-Separated Value (CSV) files. POTENTIAL APPLICATIONS: The BUS-BRA dataset can be used to develop and assess artificial intelligence-based lesion detection and segmentation methods, and the classification of BUS images into pathological classes and BI-RADS categories. Other potential applications include developing image processing methods like despeckle filtering and contrast enhancement methods to improve image quality and feature engineering for image description.
Assuntos
Inteligência Artificial , Neoplasias da Mama , Feminino , Humanos , Reprodutibilidade dos Testes , Brasil , Ultrassonografia Mamária/métodos , Computadores , Neoplasias da Mama/diagnóstico por imagemRESUMO
Around the world, citrus production and quality are threatened by diseases caused by fungi, bacteria, and viruses. Citrus growers are currently demanding technological solutions to reduce the economic losses caused by citrus diseases. In this context, image analysis techniques have been widely used to detect citrus diseases, extracting discriminant features from an input image to distinguish between healthy and abnormal cases. The dataset presented in this article is helpful for training, validating, and comparing citrus abnormality detection algorithms. The data collection comprises 953 color images taken from the orange leaves of Citrus sinensis (L.) Osbeck species. There are 12 nutritional deficiencies and diseases supporting the development of automatic detection methods that can reduce economic losses in citrus production.
RESUMO
This article presents a learning algorithm for dendrite morphological neurons (DMN) based on stochastic gradient descent (SGD). In particular, we focus on a DMN topology that comprises spherical dendrites, smooth maximum activation function nodes, and a softmax output layer, whose original learning algorithm is performed in two independent stages: (1) dendrites' centroids are learned by k-means, and (2) softmax layer weights are adjusted by gradient descent. A drawback of this learning method is that both stages are unplugged; once dendrites' centroids are defined, they keep static during weights learning, so no feedback is performed to correct the dendrites' positions to improve classification performance. To overcome this issue, we derive the delta rules for adjusting the dendrites' centroids and the output layer weights by minimizing the cross-entropy loss function under an SGD scheme. This gradient descent-based learning is feasible because the smooth maximum activation function that interfaces the dendrite units with the output layer is differentiable. The proposed DMN is compared against eight morphological neuron models with distinct topologies and learning methods and four well-established classifiers: support vector machine (SVM), multilayer perceptron (MLP), and random forest (RF), and k-nearest neighbors (k-NN). Besides, the classification performance is evaluated on 81 datasets. The experimental results show that the proposed method tends to outperform the DMN methods and is competitive or even better than SVM, MLP, RF, and k-NN. Thus, it is an alternative approach that can effectively be used for pattern classification. Moreover, SGD for DMN learning standardizes this neural model, like current artificial neural networks.
Assuntos
Algoritmos , Redes Neurais de Computação , Neurônios , Algoritmo Florestas Aleatórias , DendritosRESUMO
Breast ultrasound (BUS) image classification in benign and malignant classes is often based on pre-trained convolutional neural networks (CNNs) to cope with small-sized training data. Nevertheless, BUS images are single-channel gray-level images, whereas pre-trained CNNs learned from color images with red, green, and blue (RGB) components. Thus, a gray-to-color conversion method is applied to fit the BUS image to the CNN's input layer size. This paper evaluates 13 gray-to-color conversion methods proposed in the literature that follow three strategies: replicating the gray-level image to all RGB channels, decomposing the image to enhance inherent information like the lesion's texture and morphology, and learning a matching layer. Besides, we introduce an image decomposition method based on the lesion's structural information to describe its inner and outer complexity. These gray-to-color conversion methods are evaluated under the same experimental framework using a pre-trained CNN architecture named ResNet-18 and a BUS dataset with more than 3000 images. In addition, the Matthews correlation coefficient (MCC), sensitivity (SEN), and specificity (SPE) measure the classification performance. The experimental results show that decomposition methods outperform replication and learning-based methods when using information from the lesion's binary mask (obtained from a segmentation method), reaching an MCC value greater than 0.70 and specificity up to 0.92, although the sensitivity is about 0.80. On the other hand, regarding the proposed method, the trade-off between sensitivity and specificity is better balanced, obtaining about 0.88 for both indices and an MCC of 0.73. This study contributes to the objective assessment of different gray-to-color conversion approaches in classifying breast lesions, revealing that mask-based decomposition methods improve classification performance. Besides, the proposed method based on structural information improves the sensitivity, obtaining more reliable classification results on malignant cases and potentially benefiting clinical practice.
Assuntos
Mama , Redes Neurais de Computação , Feminino , Humanos , Mama/diagnóstico por imagem , Ultrassonografia , Ultrassonografia Mamária , Sensibilidade e EspecificidadeRESUMO
Dendrite morphological neurons (DMNs) are neural models for pattern classification, where dendrites are represented by a geometric shape enclosing patterns of the same class. This study evaluates the impact of three dendrite geometries-namely, box, ellipse, and sphere-on pattern classification. In addition, we propose using smooth maximum and minimum functions to reduce the coarseness of decision boundaries generated by typical DMNs, and a softmax layer is attached at the DMN output to provide posterior probabilities from weighted dendrites responses. To adjust the number of dendrites per class automatically, a tuning algorithm based on an incremental-decremental procedure is introduced. The classification performance assessment is conducted on nine synthetic and 49 real-world datasets. Meanwhile, 12 DMN variants are evaluated in terms of accuracy and model complexity. The DMN reaches its highest potential by combining spherical dendrites with smooth activation functions and a learnable softmax layer. It attained the highest accuracy, uses the simplest geometric shape, is insensitive to variables with zero variance, and its structural complexity diminishes by using the smooth maximum function. Furthermore, this DMN configuration performed competitively or even better than other well-established classifiers in terms of accuracy, such as support vector machine, multilayer perceptron, radial basis function network, k -nearest neighbors, and random forest. Thus, the proposed DMN is an attractive alternative for pattern classification in real-world problems.
Assuntos
Algoritmos , Redes Neurais de Computação , Neurônios , Análise por Conglomerados , Máquina de Vetores de Suporte , DendritosRESUMO
In the radiomics workflow, machine learning builds classification models from a set of input features. However, some features can be irrelevant and redundant, reducing the classification performance. This paper proposes using the Genetic Programming (GP) algorithm to automatically construct a reduced number of independent and relevant radiomic features. The proposed method is applied to patients affected by Non-Small Cell Lung Cancer (NSCLC) with pre-operative computed tomography (CT) images to predict the two-year survival by the use of linear classifiers. The model built using GP features is compared with benchmark models built using traditional features. The use of the GP algorithm increased classification performance: [Formula: see text] for the proposed model vs. [Formula: see text] and 0.64 for the benchmark models. Hence, the proposed approach better stratifies patients at high and low risk according to their overall postoperative survival time.
Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Benchmarking , Carcinoma Pulmonar de Células não Pequenas/diagnóstico por imagem , Carcinoma Pulmonar de Células não Pequenas/genética , Humanos , Neoplasias Pulmonares/diagnóstico por imagem , Neoplasias Pulmonares/genética , Aprendizado de Máquina , Tomografia Computadorizada por Raios XRESUMO
A typical feature of hyperbox-based dendrite morphological neurons (DMN) is the generation of sharp and rough decision boundaries that inaccurately track the distribution shape of classes of patterns. This feature is because the minimum and maximum activation functions force the decision boundaries to match the faces of the hyperboxes. To improve the DMN response, we introduce a dendritic model that uses smooth maximum and minimum functions to soften the decision boundaries. The classification performance assessment is conducted on nine synthetic and 28 real-world datasets. Based on the experimental results, we demonstrate that the smooth activation functions improve the generalization capacity of DMN. The proposed approach is competitive with four machine learning techniques, namely, Multilayer Perceptron, Radial Basis Function Network, Support Vector Machine, and Nearest Neighbor algorithm. Besides, the computational complexity of DMN training is lower than MLP and SVM classifiers.
Assuntos
Dendritos , Aprendizado de Máquina , Redes Neurais de Computação , Neurônios , Máquina de Vetores de Suporte , Algoritmos , Dendritos/fisiologia , Humanos , Neurônios/fisiologiaRESUMO
The automatic segmentation of breast tumors in ultrasound (BUS) has recently been addressed using convolutional neural networks (CNN). These CNN-based approaches generally modify a previously proposed CNN architecture or they design a new architecture using CNN ensembles. Although these methods have reported satisfactory results, the trained CNN architectures are often unavailable for reproducibility purposes. Moreover, these methods commonly learn from small BUS datasets with particular properties, which limits generalization in new cases. This paper evaluates four public CNN-based semantic segmentation models that were developed by the computer vision community, as follows: (1) Fully Convolutional Network (FCN) with AlexNet network, (2) U-Net network, (3) SegNet using VGG16 and VGG19 networks, and (4) DeepLabV3+ using ResNet18, ResNet50, MobileNet-V2, and Xception networks. By transfer learning, these CNNs are fine-tuned to segment BUS images in normal and tumoral pixels. The goal is to select a potential CNN-based segmentation model to be further used in computer-aided diagnosis (CAD) systems. The main significance of this study is the comparison of eight well-established CNN architectures using a more extensive BUS dataset than those used by approaches that are currently found in the literature. More than 3000 BUS images acquired from seven US machine models are used for training and validation. The F1-score (F1s) and the Intersection over Union (IoU) quantify the segmentation performance. The segmentation models based on SegNet and DeepLabV3+ obtain the best results with F1s>0.90 and IoU>0.81. In the case of U-Net, the segmentation performance is F1s=0.89 and IoU=0.80, whereas FCN-AlexNet attains the lowest results with F1s=0.84 and IoU=0.73. In particular, ResNet18 obtains F1s=0.905 and IoU=0.827 and requires less training time among SegNet and DeepLabV3+ networks. Hence, ResNet18 is a potential candidate for implementing fully automated end-to-end CAD systems. The CNN models generated in this study are available to researchers at https://github.com/wgomezf/CNN-BUS-segment, which attempts to impact the fair comparison with other CNN-based segmentation approaches for BUS images.
Assuntos
Neoplasias da Mama , Semântica , Neoplasias da Mama/diagnóstico por imagem , Feminino , Humanos , Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Reprodutibilidade dos TestesRESUMO
BACKGROUND AND OBJECTIVES: Computer-aided diagnosis (CAD) systems are intended to assist specialists in the interpretation of images aiming to support clinical conduct. In breast tumor classification, CAD systems involve a feature extraction stage, in which morphological features are used to describe the tumor shape. Such features are expected to satisfy at least two conditions: (1) discriminant to distinguish between benign and malignant tumors, and (2) invariant to geometric transformations. Herein, 39 morphological features were evaluated in terms of invariance and discriminant power for breast tumor classification. METHODS: Morphological features were divided into region-based features, for describing the irregularity of the tumor shape, and boundary-based features, for measuring the anfractuosity of the tumor margin. Also, two datasets were considered in the experiments: 2054 breast ultrasound images and 892 mammographies. From both datasets, synthetic data augmentation was performed to obtain distinct combinations of rotation and scaling of breast tumors, from which morphological features were calculated. The linear discriminant analysis was used to classify breast tumors in benign and malignant classes. The area under the ROC curve (AUC) quantified the discriminant power of every morphological feature, whereas the relative difference (RD) between AUC values measured the invariance to geometric transformations. For indicating adequate performance, AUC and RD should tend toward unity and zero, respectively. RESULTS: For both datasets, the convexity was the most discriminant feature that reached AUCâ¯>â¯0.81 with RD<1×10-2, while the most invariant feature was the roundness that attained RD<1×10-3 with AUCâ¯<â¯0.72. Additionally, for each dataset, the most discriminant and invariant features were combined for performing tumor classification. For mammography, it was achieved accuracy (ACC) of 0.76, sensitivity (SEN) of 0.76, and specificity (SPE) of 0.84, whereas for breast ultrasound the results were ACC=0.88,SEN=0.81, and SPE=0.91. CONCLUSIONS: In general, region-based features are more discriminant and invariant than boundary-based features. Moreover, it was observed that an invariant feature is not necessarily a discriminant feature; hence, a balance between invariance and discriminant power should be attained for breast tumor classification.
Assuntos
Neoplasias da Mama/patologia , Neoplasias da Mama/classificação , Neoplasias da Mama/diagnóstico por imagem , Feminino , Humanos , Mamografia , Ultrassonografia MamáriaRESUMO
Described here is a novel texture extraction method based on auto-mutual information (AMI) for classifying breast lesions. The objective is to extract discriminating information found in the non-linear relationship of textures in breast ultrasound (BUS) images. The AMI method performs three basic tasks: (i) it transforms the input image using the ranklet transform to handle intensity variations of BUS images acquired with distinct ultrasound scanners; (ii) it extracts the AMI-based texture features in the horizontal and vertical directions from each ranklet image; and (iii) it classifies the breast lesions into benign and malignant classes, in which a support-vector machine is used as the underlying classifier. The image data set is composed of 2050 BUS images consisting of 1347 benign and 703 malignant tumors. Additionally, nine commonly used texture extraction methods proposed in the literature for BUS analysis are compared with the AMI method. The bootstrap method, which considers 1000 bootstrap samples, is used to evaluate classification performance. The experimental results indicate that the proposed approach outperforms its counterparts in terms of area under the receiver operating characteristic curve, sensitivity, specificity and Matthews correlation coefficient, with values of 0.82, 0.80, 0.85 and 0.63, respectively. These results suggest that the AMI method is suitable for breast lesion classification systems.
Assuntos
Neoplasias da Mama/diagnóstico por imagem , Interpretação de Imagem Assistida por Computador/métodos , Ultrassonografia Mamária/métodos , Mama , Feminino , Humanos , Sensibilidade e EspecificidadeRESUMO
BACKGROUND AND OBJECTIVE: Conventional computer-aided diagnosis (CAD) systems for breast ultrasound (BUS) are trained to classify pathological classes, that is, benign and malignant. However, from a clinical perspective, this kind of classification does not agree totally with radiologists' diagnoses. Usually, the tumors are assessed by using a BI-RADS (Breast Imaging-Reporting and Data System) category and, accordingly, a recommendation is emitted: annual study for category 2 (benign), six-month follow-up study for category 3 (probably benign), and biopsy for categories 4 and 5 (suspicious of malignancy). Hence, in this paper, a CAD system based on BI-RADS categories weighted by pathological information is presented. The goal is to increase the classification performance by reducing the common class imbalance found in pathological classes as well as to provide outcomes quite similar to radiologists' recommendations. METHODS: The BUS dataset considers 781 benign lesions and 347 malignant tumors proven by biopsy. Moreover, every lesion is associated to one BI-RADS category in the set {2, 3, 4, 5}. Thus, the dataset is split into three weighted classes: benign, BI-RADS 2 in benign lesions; probably benign, BI-RADS 3 and 4 in benign lesions; and malignant, BI-RADS 4 and 5 in malignant lesions. Thereafter, a random forest (RF) classifier, denoted by RFw, is trained to predict the weighted BI-RADS classes. In addition, for comparison purposes, a RF classifier is trained to predict pathological classes, denoted as RFp. RESULTS: The ability of the classifiers to predict the pathological classes is measured by the area under the ROC curve (AUC), sensitivity (SEN), and specificity (SPE). The RFw classifier obtained AUC=0.872,SEN=0.826, and SPE=0.919, whereas the RFp classifier reached AUC=0.868,SEN=0.808, and SPE=0.929. According to a one-way analysis of variance test, the RFw classifier statistically outperforms (pâ¯<â¯0.001) the RFp classifier in terms of the AUC and SEN. Moreover, the classification performance of RFw to predict weighted BI-RADS classes is given by the Matthews correlation coefficient that obtained 0.614. CONCLUSIONS: The division of the classification problem into three classes reduces the imbalance between benign and malignant classes; thus, the sensitivity is increased without degrading the specificity. Therefore, the CAD based on weighted BI-RADS classes improves the classification performance of the conventional CAD systems. Additionally, the proposed approach has the advantage of being capable of providing a multiclass outcome related to radiologists' recommendations.
Assuntos
Doenças Mamárias/diagnóstico por imagem , Mama/diagnóstico por imagem , Diagnóstico por Computador , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Mama/patologia , Doenças Mamárias/patologia , Feminino , Humanos , Pessoa de Meia-Idade , Adulto JovemRESUMO
The study described here explored a fully automatic segmentation approach based on texture analysis for breast lesions on ultrasound images. The proposed method involves two main stages: (i) In lesion region detection, the original gray-scale image is transformed into a texture domain based on log-Gabor filters. Local texture patterns are then extracted from overlapping lattices that are further classified by a linear discriminant analysis classifier to distinguish between the "normal tissue" and "breast lesion" classes. Next, an incremental method based on the average radial derivative function reveals the region with the highest probability of being a lesion. (ii) In lesion delineation, using the detected region and the pre-processed ultrasound image, an iterative thresholding procedure based on the average radial derivative function is performed to determine the final lesion contour. The experiments are carried out on a data set of 544 breast ultrasound images (including cysts, benign solid masses and malignant lesions) acquired with three distinct ultrasound machines. In terms of the area under the receiver operating characteristic curve, the one-way analysis of variance test (α=0.05) indicates that the proposed approach significantly outperforms two published fully automatic methods (p<0.001), for which the areas under the curve are 0.91, 0.82 and 0.63, respectively. Hence, these results suggest that the log-Gabor domain improves the discrimination power of texture features to accurately segment breast lesions. In addition, the proposed approach can potentially be used for automated computer diagnosis purposes to assist physicians in detection and classification of breast masses.
Assuntos
Neoplasias da Mama/diagnóstico por imagem , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Ultrassonografia Mamária/métodos , Mama/diagnóstico por imagem , Feminino , Humanos , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
Medical images (MI) are relevant sources of information for detecting and diagnosing a large number of illnesses and abnormalities. Due to their importance, this study is focused on breast ultrasound (BUS), which is the main adjunct for mammography to detect common breast lesions among women worldwide. On the other hand, aiming to enhance data security, image fidelity, authenticity, and content verification in e-health environments, MI watermarking has been widely used, whose main goal is to embed patient meta-data into MI so that the resulting image keeps its original quality. In this sense, this paper deals with the comparison of two watermarking approaches, namely spread spectrum based on the discrete cosine transform (SS-DCT) and the high-capacity data-hiding (HCDH) algorithm, so that the watermarked BUS images are guaranteed to be adequate for a computer-aided diagnosis (CADx) system, whose two principal outcomes are lesion segmentation and classification. Experimental results show that HCDH algorithm is highly recommended for watermarking medical images, maintaining the image quality and without introducing distortion into the output of CADx.
Assuntos
Algoritmos , Mama/patologia , Interpretação de Imagem Assistida por Computador/métodos , Processamento de Imagem Assistida por Computador/métodos , Ultrassonografia Mamária/métodos , Feminino , HumanosRESUMO
Breast ultrasound (BUS) is considered the most important adjunct method to mammography for diagnosing cancer. However, this image modality suffers from an intrinsic artifact called speckle noise, which degrades spatial and contrast resolution and obscures the screened anatomy. Hence, it is necessary to reduce speckle artifacts before performing image analysis by means of computer-aided diagnosis systems, for example. In addition, the trade-off between smoothing level and preservation of lesion contour details should be addressed by speckle reduction schemes. In this scenario, we propose a BUS despeckling method based on anisotropic diffusion guided by Log-Gabor filters (ADLG). Because we assume that different breast tissues have distinct textures, in our approach we perform a multichannel decomposition of the BUS image using Log-Gabor filters. Next, the conduction coefficient of anisotropic diffusion filtering is computed using texture responses instead of intensity values as stated originally. The proposed algorithm is validated using both synthetic and real breast data sets, with 900 and 50 images, respectively. The performance measures are compared with four existing speckle reduction schemes based on anisotropic diffusion: conventional anisotropic diffusion filtering (CADF), speckle-reducing anisotropic diffusion (SRAD), texture-oriented anisotropic diffusion (TOAD), and interference-based speckle filtering followed by anisotropic diffusion (ISFAD). The validity metrics are the Pratt's figure of merit, for synthetic images, and the mean radial distance (in pixels), for real sonographies. Figure of merit and mean radial distance indices should tend toward '1' and '0', respectively, to indicate adequate edge preservation. The results suggest that ADLG outperforms the four speckle removal filters compared with respect to simulated and real BUS images. For each method--ADLG, CADF, SRAD, TOAD and ISFAD--the figure of merit median values are 0.83, 0.40, 0.39, 0.51 and 0.59, and the mean radial distance median results are 4.19, 6.29, 6.39, 6.43 and 5.88.