RESUMO
MicroRNAs (miRNAs) synergize with various biomolecules in human cells resulting in diverse functions in regulating a wide range of biological processes. Predicting potential disease-associated miRNAs as valuable biomarkers contributes to the treatment of human diseases. However, few previous methods take a holistic perspective and only concentrate on isolated miRNA and disease objects, thereby ignoring that human cells are responsible for multiple relationships. In this work, we first constructed a multi-view graph based on the relationships between miRNAs and various biomolecules, and then utilized graph attention neural network to learn the graph topology features of miRNAs and diseases for each view. Next, we added an attention mechanism again, and developed a multi-scale feature fusion module, aiming to determine the optimal fusion results for the multi-view topology features of miRNAs and diseases. In addition, the prior attribute knowledge of miRNAs and diseases was simultaneously added to achieve better prediction results and solve the cold start problem. Finally, the learned miRNA and disease representations were then concatenated and fed into a multi-layer perceptron for end-to-end training and predicting potential miRNA-disease associations. To assess the efficacy of our model (called MUSCLE), we performed 5- and 10-fold cross-validation (CV), which got average the Area under ROC curves of 0.966${\pm }$0.0102 and 0.973${\pm }$0.0135, respectively, outperforming most current state-of-the-art models. We then examined the impact of crucial parameters on prediction performance and performed ablation experiments on the feature combination and model architecture. Furthermore, the case studies about colon cancer, lung cancer and breast cancer also fully demonstrate the good inductive capability of MUSCLE. Our data and code are free available at a public GitHub repository: https://github.com/zht-code/MUSCLE.git.
Assuntos
Neoplasias do Colo , Neoplasias Pulmonares , MicroRNAs , Humanos , Músculos , Aprendizagem , MicroRNAs/genética , Algoritmos , Biologia ComputacionalRESUMO
Shrimp fry counting is an important task for biomass estimation in aquaculture. Accurate counting of the number of shrimp fry in tanks can not only assess the production of mature shrimp but also assess the density of shrimp fry in the tanks, which is very helpful for the subsequent growth status, transportation management, and yield assessment. However, traditional manual counting methods are often inefficient and prone to counting errors; a more efficient and accurate method for shrimp fry counting is urgently needed. In this paper, we first collected and labeled the images of shrimp fry in breeding tanks according to the constructed experimental environment and generated corresponding density maps using the Gaussian kernel function. Then, we proposed a multi-scale attention fusion-based shrimp fry counting network called the SFCNet. Experiments showed that our proposed SFCNet model reached the optimal performance in terms of shrimp fry counting compared to CNN-based baseline counting models, with MAEs and RMSEs of 3.96 and 4.682, respectively. This approach was able to effectively calculate the number of shrimp fry and provided a better solution for accurately calculating the number of shrimp fry.
Assuntos
Aquicultura , Penaeidae , Animais , Penaeidae/fisiologia , Aquicultura/métodos , Algoritmos , Biomassa , Redes Neurais de ComputaçãoRESUMO
Universal lesion detection (ULD) in computed tomography (CT) images is an important and challenging prerequisite for computer-aided diagnosis (CAD) to find abnormal tissue, such as tumors of lymph nodes, liver tumors, and lymphadenopathy. The key challenge is that lesions have a tiny size and high similarity with non-lesions, which can easily lead to high false positives. Specifically , non-lesions are nearby normal anatomy that include the bowel, vasculature, and mesentery, which decrease the conspicuity of small lesions since they are often hard to differentiate. In this study, we present a novel scale-attention module that enhances feature discrimination between lesion and non-lesion regions by utilizing the domain knowledge of radiologists to reduce false positives effectively. Inspired by the domain knowledge that radiologists tend to divide each CT image into multiple areas, then detect lesions in these smaller areas separately, a local axial scale-attention (LASA) module is proposed to re-weight each pixel in a feature map by aggregating local features from multiple scales adaptively. In addition, to keep the same weight, a combination of axial pixels in the height- and width-axes is designed, attached with position embedding. The model can be used in CNNs easily and flexibly. We test our method on the DeepLesion dataset. The sensitivities at 0.5, 1, 2, 4, 8, and 16 false positives (FPs) per image and average sensitivity at [0.5, 1, 2, 4] are calculated to evaluate the accuracy. The sensitivities are 78.30%, 84.96%, 89.86%, 93.14%, 95.36%, and 95.54% at 0.5, 1, 2, 4, 8, and 16 FPs per image; the average sensitivity is 86.56%, outperforming the former methods. The proposed method enhances feature discrimination between lesion and non-lesion regions by adding LASA modules. These encouraging results illustrate the potential advantage of exploiting the domain knowledge for lesion detection.
Assuntos
Diagnóstico por Computador , Tomografia Computadorizada por Raios X , Humanos , Tomografia Computadorizada por Raios X/métodos , Diagnóstico por Computador/métodos , Interpretação de Imagem Radiográfica Assistida por Computador/métodosRESUMO
Medical studies have found that tumor mutation burden (TMB) is positively correlated with the efficacy of immunotherapy for non-small cell lung cancer (NSCLC), and TMB value can be used to predict the efficacy of targeted therapy and chemotherapy. However, the calculation of TMB value mainly depends on the whole exon sequencing (WES) technology, which usually costs too much time and expenses. To deal with above problem, this paper studies the correlation between TMB and slice images by taking advantage of digital pathological slices commonly used in clinic and then predicts the patient TMB level accordingly. This paper proposes a deep learning model (RCA-MSAG) based on residual coordinate attention (RCA) structure and combined with multi-scale attention guidance (MSAG) module. The model takes ResNet-50 as the basic model and integrates coordinate attention (CA) into bottleneck module to capture the direction-aware and position-sensitive information, which makes the model able to locate and identify the interesting positions more accurately. And then, MSAG module is embedded into the network, which makes the model able to extract the deep features of lung cancer pathological sections and the interactive information between channels. The cancer genome map (TCGA) open dataset is adopted in the experiment, which consists of 200 pathological sections of lung adenocarcinoma, including 80 data samples with high TMB value, 77 data samples with medium TMB value and 43 data samples with low TMB value. Experimental results demonstrate that the accuracy, precision, recall and F1 score of the proposed model are 96.2%, 96.4%, 96.2% and 96.3%, respectively, which are superior to the existing mainstream deep learning models. The model proposed in this paper can promote clinical auxiliary diagnosis and has certain theoretical guiding significance for TMB prediction.
Assuntos
Adenocarcinoma de Pulmão , Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Humanos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patologia , Carcinoma Pulmonar de Células não Pequenas/genética , Mutação , Adenocarcinoma de Pulmão/genética , Biomarcadores Tumorais/genéticaRESUMO
Gesture recognition is an important direction in computer vision research. Information from the hands is crucial in this task. However, current methods consistently achieve attention on hand regions based on estimated keypoints, which will significantly increase both time and complexity, and may lose position information of the hand due to wrong keypoint estimations. Moreover, for dynamic gesture recognition, it is not enough to consider only the attention in the spatial dimension. This paper proposes a multi-scale attention 3D convolutional network for gesture recognition, with a fusion of multimodal data. The proposed network achieves attention mechanisms both locally and globally. The local attention leverages the hand information extracted by the hand detector to focus on the hand region, and reduces the interference of gesture-irrelevant factors. Global attention is achieved in both the human-posture context and the channel context through a dual spatiotemporal attention module. Furthermore, to make full use of the differences between different modalities of data, we designed a multimodal fusion scheme to fuse the features of RGB and depth data. The proposed method is evaluated using the Chalearn LAP Isolated Gesture Dataset and the Briareo Dataset. Experiments on these two datasets prove the effectiveness of our network and show it outperforms many state-of-the-art methods.
Assuntos
Algoritmos , Gestos , Mãos , Humanos , Postura , Reconhecimento PsicológicoRESUMO
BACKGROUND: Chest X-rays are the most commonly available and affordable radiological examination for screening thoracic diseases. According to the domain knowledge of screening chest X-rays, the pathological information usually lay on the lung and heart regions. However, it is costly to acquire region-level annotation in practice, and model training mainly relies on image-level class labels in a weakly supervised manner, which is highly challenging for computer-aided chest X-ray screening. To address this issue, some methods have been proposed recently to identify local regions containing pathological information, which is vital for thoracic disease classification. Inspired by this, we propose a novel deep learning framework to explore discriminative information from lung and heart regions. RESULT: We design a feature extractor equipped with a multi-scale attention module to learn global attention maps from global images. To exploit disease-specific cues effectively, we locate lung and heart regions containing pathological information by a well-trained pixel-wise segmentation model to generate binarization masks. By introducing element-wise logical AND operator on the learned global attention maps and the binarization masks, we obtain local attention maps in which pixels are are 1 for lung and heart region and 0 for other regions. By zeroing features of non-lung and heart regions in attention maps, we can effectively exploit their disease-specific cues in lung and heart regions. Compared to existing methods fusing global and local features, we adopt feature weighting to avoid weakening visual cues unique to lung and heart regions. Our method with pixel-wise segmentation can help overcome the deviation of locating local regions. Evaluated by the benchmark split on the publicly available chest X-ray14 dataset, the comprehensive experiments show that our method achieves superior performance compared to the state-of-the-art methods. CONCLUSION: We propose a novel deep framework for the multi-label classification of thoracic diseases in chest X-ray images. The proposed network aims to effectively exploit pathological regions containing the main cues for chest X-ray screening. Our proposed network has been used in clinic screening to assist the radiologists. Chest X-ray accounts for a significant proportion of radiological examinations. It is valuable to explore more methods for improving performance.
Assuntos
Aprendizado Profundo , Cardiopatias/diagnóstico por imagem , Pneumopatias/diagnóstico por imagem , Radiografia Torácica , Doenças Torácicas/diagnóstico por imagem , Coração/diagnóstico por imagem , Humanos , Pulmão/diagnóstico por imagem , Curva ROCRESUMO
The CT image is an important reference for clinical diagnosis. However, due to the external influence and equipment limitation in the imaging, the CT image often has problems such as blurring, a lack of detail and unclear edges, which affect the subsequent diagnosis. In order to obtain high-quality medical CT images, we propose an information distillation and multi-scale attention network (IDMAN) for medical CT image super-resolution reconstruction. In a deep residual network, instead of only adding the convolution layer repeatedly, we introduce information distillation to make full use of the feature information. In addition, in order to better capture information and focus on more important features, we use a multi-scale attention block with multiple branches, which can automatically generate weights to adjust the network. Through these improvements, our model effectively solves the problems of insufficient feature utilization and single attention source, improves the learning ability and expression ability, and thus can reconstruct the higher quality medical CT image. We conduct a series of experiments; the results show that our method outperforms the previous algorithms and has a better performance of medical CT image reconstruction in the objective evaluation and visual effect.
Assuntos
Processamento de Imagem Assistida por Computador , Algoritmos , Progressão da Doença , Humanos , Tomografia Computadorizada por Raios XRESUMO
Accurate and robust detection of road damage is essential for public transportation safety. Currently, deep convolutional neural networks (CNNs)-based road damage detection algorithms to localize and classify damage with a bounding box have achieved remarkable progress. However, research in this field fails to take into account two key characteristics of road damage: weak semantic information and abnormal geometric properties, resulting in inappropriate feature representation and suboptimal detection results. To boost the performance, we propose a CNN-based cascaded damage detection network, called CrdNet. The proposed model has three parts: (1) We introduce a novel backbone network, named LrNet, that reuses low-level features and mixes suitable range dependency features to learn high-to-low level feature fusions for road damage weak semantic information representation. (2) We apply multi-scale and multiple aspect ratios anchor mechanism to generate high-quality positive samples regarding the damage with abnormal geometric properties for network training. (3) We designed an adaptive proposal assignment strategy and performed cascade predictions on corresponding branches that can establish different range dependencies. The experiments show that the proposed method achieves mean average precision (mAP) of 90.92% on a collected road damage dataset, demonstrating the good performance and robustness of the model.
RESUMO
BACKGROUND: High resolution 2D whole slide imaging provides rich information about the tissue structure. This information can be a lot richer if these 2D images can be stacked into a 3D tissue volume. A 3D analysis, however, requires accurate reconstruction of the tissue volume from the 2D image stack. This task is not trivial due to the distortions such as tissue tearing, folding and missing at each slide. Performing registration for the whole tissue slices may be adversely affected by distorted tissue regions. Consequently, regional registration is found to be more effective. In this paper, we propose a new approach to an accurate and robust registration of regions of interest for whole slide images. We introduce the idea of multi-scale attention for registration. RESULTS: Using mean similarity index as the metric, the proposed algorithm (mean ± SD [Formula: see text]) followed by a fine registration algorithm ([Formula: see text]) outperformed the state-of-the-art linear whole tissue registration algorithm ([Formula: see text]) and the regional version of this algorithm ([Formula: see text]). The proposed algorithm also outperforms the state-of-the-art nonlinear registration algorithm (original: [Formula: see text], regional: [Formula: see text]) for whole slide images and a recently proposed patch-based registration algorithm (patch size 256: [Formula: see text] , patch size 512: [Formula: see text]) for medical images. CONCLUSION: Using multi-scale attention mechanism leads to a more robust and accurate solution to the problem of regional registration of whole slide images corrupted in some parts by major histological artifacts in the imaged tissue.
Assuntos
Algoritmos , Artefatos , Vasos Sanguíneos/patologia , Processamento de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Vasos Sanguíneos/diagnóstico por imagem , Carcinoma de Células Renais/irrigação sanguínea , Humanos , Imuno-Histoquímica/métodos , MicroscopiaRESUMO
Transformer-based methods effectively capture global dependencies in images, demonstrating outstanding performance in multiple visual tasks. However, existing Transformers cannot effectively denoise large noisy images captured under low-light conditions owing to (1) the global self-attention mechanism causing high computational complexity in the spatial dimension owing to a quadratic increase in computation with the number of tokens; (2) the channel-wise self-attention computation unable to optimise the spatial correlations in images. We propose a local-global interaction Transformer (LGIT) that employs an adaptive strategy to select relevant patches for global interaction, achieving low computational complexity in global self-attention computation. A top-N patch cross-attention model (TPCA) is designed based on superpixel segmentation guidance. TPCA selects top-N patches most similar to the target image patch and applies cross attention to aggregate information from them into the target patch, effectively enhancing the utilisation of the image's nonlocal self-similarity. A mixed-scale dual-gated feedforward network (MDGFF) is introduced for the effective extraction of multiscale local correlations. TPCA and MDGFF were combined to construct a hierarchical encoder-decoder network, LGIT, to compute self-attention within and across patches at different scales. Extensive experiments using real-world image-denoising datasets demonstrated that LGIT outperformed state-of-the-art (SOTA) convolutional neural network (CNN) and Transformer-based methods in qualitative and quantitative results.
RESUMO
Automatic segmentation of breast terminal duct lobular units (TDLUs) on histopathological whole-slide images (WSIs) is crucial for the quantitative evaluation of TDLUs in the diagnostic and prognostic analysis of breast cancer. However, TDLU segmentation remains a great challenge due to its highly heterogeneous sizes, structures, and morphologies as well as the small areas on WSIs. In this study, we propose BreasTDLUSeg, an efficient coarse-to-fine two-stage framework based on multi-scale attention to achieve localization and precise segmentation of TDLUs on hematoxylin and eosin (H&E)-stained WSIs. BreasTDLUSeg consists of two networks: a superpatch-based patch-level classification network (SPPC-Net) and a patch-based pixel-level segmentation network (PPS-Net). SPPC-Net takes a superpatch as input and adopts a sub-region classification head to classify each patch within the superpatch as TDLU positive or negative. PPS-Net takes the TDLU positive patches derived from SPPC-Net as input. PPS-Net deploys a multi-scale CNN-Transformer as an encoder to learn enhanced multi-scale morphological representations and an upsampler to generate pixel-wise segmentation masks for the TDLU positive patches. We also constructed two breast cancer TDLU datasets containing a total of 530 superpatch images with patch-level annotations and 2322 patch images with pixel-level annotations to enable the development of TDLU segmentation methods. Experiments on the two datasets demonstrate that BreasTDLUSeg outperforms other state-of-the-art methods with the highest Dice similarity coefficients of 79.97% and 92.93%, respectively. The proposed method shows great potential to assist pathologists in the pathological analysis of breast cancer. An open-source implementation of our approach can be found at https://github.com/Dian-kai/BreasTDLUSeg.
RESUMO
Breast cancer is the most common cause of female morbidity and death worldwide. Compared with other cancers, early detection of breast cancer is more helpful to improve the prognosis of patients. In order to achieve early diagnosis and treatment, clinical treatment requires rapid and accurate diagnosis. Therefore, the development of an automatic detection system for breast cancer suitable for patient imaging is of great significance for assisting clinical treatment. Accurate classification of pathological images plays a key role in computer-aided medical diagnosis and prognosis. However, in the automatic recognition and classification methods of breast cancer pathological images, the scale information, the loss of image information caused by insufficient feature fusion, and the enormous structure of the model may lead to inaccurate or inefficient classification. To minimize the impact, we proposed a lightweight PCSAM-ResCBAM model based on two-stage convolutional neural network. The model included a Parallel Convolution Scale Attention Module network (PCSAM-Net) and a Residual Convolutional Block Attention Module network (ResCBAM-Net). The first-level convolutional network was built through a 4-layer PCSAM module to achieve prediction and classification of patches extracted from images. To optimize the network's ability to represent global features of images, we proposed a tiled feature fusion method to fuse patch features from the same image, and proposed a residual convolutional attention module. Based on the above, the second-level convolutional network was constructed to achieve predictive classification of images. We evaluated the performance of our proposed model on the ICIAR2018 dataset and the BreakHis dataset, respectively. Furthermore, through model ablation studies, we found that scale attention and dilated convolution play an important role in improving model performance. Our proposed model outperforms the existing state-of-the-art models on 200 × and 400 × magnification datasets with a maximum accuracy of 98.74 %.
RESUMO
BACKGROUND AND OBJECTIVE: Alzheimer's disease (AD) is a dreaded degenerative disease that results in a profound decline in human cognition and memory. Due to its intricate pathogenesis and the lack of effective therapeutic interventions, early diagnosis plays a paramount role in AD. Recent research based on neuroimaging has shown that the application of deep learning methods by multimodal neural images can effectively detect AD. However, these methods only concatenate and fuse the high-level features extracted from different modalities, ignoring the fusion and interaction of low-level features across modalities. It consequently leads to unsatisfactory classification performance. METHOD: In this paper, we propose a novel multi-scale attention and cross-enhanced fusion network, MACFNet, which enables the interaction of multi-stage low-level features between inputs to learn shared feature representations. We first construct a novel Cross-Enhanced Fusion Module (CEFM), which fuses low-level features from different modalities through a multi-stage cross-structure. In addition, an Efficient Spatial Channel Attention (ECSA) module is proposed, which is able to focus on important AD-related features in images more efficiently and achieve feature enhancement from different modalities through two-stage residual concatenation. Finally, we also propose a multiscale attention guiding block (MSAG) based on dilated convolution, which can obtain rich receptive fields without increasing model parameters and computation, and effectively improve the efficiency of multiscale feature extraction. RESULTS: Experiments on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset demonstrate that our MACFNet has better classification performance than existing multimodal methods, with classification accuracies of 99.59 %, 98.85 %, 99.61 %, and 98.23 % for AD vs. CN, AD vs. MCI, CN vs. MCI and AD vs. CN vs. MCI, respectively, and specificity of 98.92 %, 97.07 %, 99.58 % and 99.04 %, and sensitivity of 99.91 %, 99.89 %, 99.63 % and 97.75 %, respectively. CONCLUSIONS: The proposed MACFNet is a high-accuracy multimodal AD diagnostic framework. Through the cross mechanism and efficient attention, MACFNet can make full use of the low-level features of different modal medical images and effectively pay attention to the local and global information of the images. This work provides a valuable reference for multi-mode AD diagnosis.
Assuntos
Doença de Alzheimer , Doença de Alzheimer/diagnóstico por imagem , Humanos , Redes Neurais de Computação , Algoritmos , Aprendizado Profundo , Neuroimagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Processamento de Imagem Assistida por Computador/métodosRESUMO
BACKGROUND: Crop pests seriously affect the yield and quality of crops. Accurately and rapidly detecting and segmenting insect pests in crop leaves is a premise for effectively controlling insect pests. METHODS: Aiming at the detection problem of irregular multi-scale insect pests in the field, a dilated multi-scale attention U-Net (DMSAU-Net) model is constructed for crop insect pest detection. In its encoder, dilated Inception is designed to replace the convolution layer in U-Net to extract the multi-scale features of insect pest images. An attention module is added to its decoder to focus on the edge of the insect pest image. RESULTS: The experiments on the crop insect pest image IP102 dataset are implemented, and achieved the detection accuracy of 92.16% and IoU of 91.2%, which is 3.3% and 1.5% higher than that of MSR-RCNN, respectively. CONCLUSION: The results indicate that the proposed method is effective as a new insect pest detection method. The dilated Inception can improve the accuracy of the model, and the attention module can reduce the noise generated by upsampling and accelerate model convergence. It can be concluded that the proposed method can be applied to practical crop insect pest monitoring system.
RESUMO
In this study, a multi-scale attention transformer (MSAT) was coupled with hyperspectral imaging for classifying peanut kernels contaminated with diverse Aspergillus flavus fungi. The results underscored that the MSAT significantly outperformed classic deep learning models, due to its sophisticated multi-scale attention mechanism which enhanced its classification capabilities. The multi-scale attention mechanism was utilized by employing several multi-head attention layers to focus on both fine-scale and broad-scale features. It also integrated a series of scale processing layers to capture features at different resolutions and incorporated a self-attention mechanism to integrate information across different levels. The MSAT model achieved outstanding performance in different classification tasks, particularly in distinguishing healthy peanut kernels from those contaminated with aflatoxigenic fungi, with test accuracy achieving 98.42±0.22%. However, it faced challenges in differentiating peanut kernels contaminated with aflatoxigenic fungi from those with non-aflatoxigenic contamination. Visualization of attention weights explicitly revealed that the MSAT model's multi-scale attention mechanism progressively refined its focus from broad spatial-spectral features to more specialized signatures. Overall, the MSAT model's advanced processing capabilities marked a notable advancement in the field of food quality safety, offering a robust and reliable tool for the rapid and accurate detection of Aspergillus flavus contaminations in food.
Assuntos
Arachis , Aspergillus flavus , Contaminação de Alimentos , Microbiologia de Alimentos , Aspergillus flavus/isolamento & purificação , Arachis/microbiologia , Contaminação de Alimentos/análise , Inocuidade dos Alimentos , Aflatoxinas/análise , Imageamento Hiperespectral/métodosRESUMO
In response to the high breakage rate of pigeon eggs and the significant labor costs associated with egg-producing pigeon farming, this study proposes an improved YOLOv8-PG (real versus fake pigeon egg detection) model based on YOLOv8n. Specifically, the Bottleneck in the C2f module of the YOLOv8n backbone network and neck network are replaced with Fasternet-EMA Block and Fasternet Block, respectively. The Fasternet Block is designed based on PConv (Partial Convolution) to reduce model parameter count and computational load efficiently. Furthermore, the incorporation of the EMA (Efficient Multi-scale Attention) mechanism helps mitigate interference from complex environments on pigeon-egg feature-extraction capabilities. Additionally, Dysample, an ultra-lightweight and effective upsampler, is introduced into the neck network to further enhance performance with lower computational overhead. Finally, the EXPMA (exponential moving average) concept is employed to optimize the SlideLoss and propose the EMASlideLoss classification loss function, addressing the issue of imbalanced data samples and enhancing the model's robustness. The experimental results showed that the F1-score, mAP50-95, and mAP75 of YOLOv8-PG increased by 0.76%, 1.56%, and 4.45%, respectively, compared with the baseline YOLOv8n model. Moreover, the model's parameter count and computational load are reduced by 24.69% and 22.89%, respectively. Compared to detection models such as Faster R-CNN, YOLOv5s, YOLOv7, and YOLOv8s, YOLOv8-PG exhibits superior performance. Additionally, the reduction in parameter count and computational load contributes to lowering the model deployment costs and facilitates its implementation on mobile robotic platforms.
RESUMO
We propose a cotton pest and disease recognition method, SpemNet, based on efficient multi-scale attention and stacking patch embedding. By introducing the SPE module and the EMA module, we successfully solve the problems of local feature learning difficulty and insufficient multi-scale feature integration in the traditional Vision Transformer model, which significantly improve the performance and efficiency of the model. In our experiments, we comprehensively validate the SpemNet model on the CottonInsect dataset, and the results show that SpemNet performs well in the cotton pest recognition task, with significant effectiveness and superiority. The SpemNet model excels in key metrics such as precision and F1 score, demonstrating significant potential and superiority in the cotton pest and disease recognition task. This study provides an efficient and reliable solution in the field of cotton pest and disease identification, which is of great theoretical and applied significance.
RESUMO
Background: Whole Slide Image (WSI) analysis, driven by deep learning algorithms, has the potential to revolutionize tumor detection, classification, and treatment response prediction. However, challenges persist, such as limited model generalizability across various cancer types, the labor-intensive nature of patch-level annotation, and the necessity of integrating multi-magnification information to attain a comprehensive understanding of pathological patterns. Methods: In response to these challenges, we introduce MAMILNet, an innovative multi-scale attentional multi-instance learning framework for WSI analysis. The incorporation of attention mechanisms into MAMILNet contributes to its exceptional generalizability across diverse cancer types and prediction tasks. This model considers whole slides as "bags" and individual patches as "instances." By adopting this approach, MAMILNet effectively eliminates the requirement for intricate patch-level labeling, significantly reducing the manual workload for pathologists. To enhance prediction accuracy, the model employs a multi-scale "consultation" strategy, facilitating the aggregation of test outcomes from various magnifications. Results: Our assessment of MAMILNet encompasses 1171 cases encompassing a wide range of cancer types, showcasing its effectiveness in predicting complex tasks. Remarkably, MAMILNet achieved impressive results in distinct domains: for breast cancer tumor detection, the Area Under the Curve (AUC) was 0.8872, with an Accuracy of 0.8760. In the realm of lung cancer typing diagnosis, it achieved an AUC of 0.9551 and an Accuracy of 0.9095. Furthermore, in predicting drug therapy responses for ovarian cancer, MAMILNet achieved an AUC of 0.7358 and an Accuracy of 0.7341. Conclusion: The outcomes of this study underscore the potential of MAMILNet in driving the advancement of precision medicine and individualized treatment planning within the field of oncology. By effectively addressing challenges related to model generalization, annotation workload, and multi-magnification integration, MAMILNet shows promise in enhancing healthcare outcomes for cancer patients. The framework's success in accurately detecting breast tumors, diagnosing lung cancer types, and predicting ovarian cancer therapy responses highlights its significant contribution to the field and paves the way for improved patient care.
RESUMO
The number of wheat spikes has an important influence on wheat yield, and the rapid and accurate detection of wheat spike numbers is of great significance for wheat yield estimation and food security. Computer vision and machine learning have been widely studied as potential alternatives to human detection. However, models with high accuracy are computationally intensive and time consuming, and lightweight models tend to have lower precision. To address these concerns, YOLO-FastestV2 was selected as the base model for the comprehensive study and analysis of wheat sheaf detection. In this study, we constructed a wheat target detection dataset comprising 11,451 images and 496,974 bounding boxes. The dataset for this study was constructed based on the Global Wheat Detection Dataset and the Wheat Sheaf Detection Dataset, which was published by PP Flying Paddle. We selected three attention mechanisms, Large Separable Kernel Attention (LSKA), Efficient Channel Attention (ECA), and Efficient Multi-Scale Attention (EMA), to enhance the feature extraction capability of the backbone network and improve the accuracy of the underlying model. First, the attention mechanism was added after the base and output phases of the backbone network. Second, the attention mechanism that further improved the model accuracy after the base and output phases was selected to construct the model with a two-phase added attention mechanism. On the other hand, we constructed SimLightFPN to improve the model accuracy by introducing SimConv to improve the LightFPN module. The results of the study showed that the YOLO-FastestV2-SimLightFPN-ECA-EMA hybrid model, which incorporates the ECA attention mechanism in the base stage and introduces the EMA attention mechanism and the combination of SimLightFPN modules in the output stage, has the best overall performance. The accuracy of the model was P=83.91%, R=78.35%, AP= 81.52%, and F1 = 81.03%, and it ranked first in the GPI (0.84) in the overall evaluation. The research examines the deployment of wheat ear detection and counting models on devices with constrained resources, delivering novel solutions for the evolution of agricultural automation and precision agriculture.
RESUMO
Medical image segmentation has an important application value in the modern medical field, it can help doctors accurately locate and analyze the tissue structure, lesion areas, and organ boundaries in the image, which provides key information support for clinical diagnosis and treatment, but there are still a large number of problems in the accuracy of the segmentation, so in this paper, we propose a medical image segmentation network combining the Hadamard product and dual-scale attention gate (DAU-Net). First, the Hadamard product is introduced in the structure of the fifth layer of the codec for element-by-element multiplication, which can generate feature representations with more representational capabilities. Second, in the jump connection module, we propose a dual scale attention gating (DSAG), which can highlight more valuable features and achieve more efficient jump connections. Finally, in the decoder feature structure, the final segmentation result is obtained by aggregating the feature information provided by each part, and decoding is achieved by up-sampling operation. Through experiments on two public datasets, Luna and Isic2017, DAU-Net is able to extract feature information more efficiently using different modules and has better segmentation results compared to classical segmentation models such as U-Net and U-Net++, and also verifies the effectiveness of the model.