RESUMO
Multi-modal eye disease screening improves diagnostic accuracy by providing lesion information from different sources. However, existing multi-modal automatic diagnosis methods tend to focus on the specificity of modalities and ignore the spatial correlation of images. This paper proposes a novel cross-modal retinal disease diagnosis network (CRD-Net) that digs out the relevant features from modal images aided for multiple retinal disease diagnosis. Specifically, our model introduces a cross-modal attention (CMA) module to query and adaptively pay attention to the relevant features of the lesion in the different modal images. In addition, we also propose multiple loss functions to fuse features with modality correlation and train a multi-modal retinal image classification network to achieve a more accurate diagnosis. Experimental evaluation on three publicly available datasets shows that our CRD-Net outperforms existing single-modal and multi-modal methods, demonstrating its superior performance.
RESUMO
Previous datasets have limitations in generalizing evapotranspiration (ET) across various land cover types due to the scarcity and spatial heterogeneity of observations, along with the incomplete understanding of underlying physical mechanisms as a deeper contributing factor. To fill in these gaps, here we developed a global Highly Generalized Land (HG-Land) ET dataset at 0.5° spatial resolution with monthly values covering the satellite era (1982-2018). Our approach leverages the power of a Deep Forest machine-learning algorithm, which ensures good generalizability and mitigates overfitting by minimizing hyper-parameterization. Model explanations are further provided to enhance model transparency and gain new insights into the ET process. Validation conducted at both the site and basin scales attests to the dataset's satisfactory accuracy, with a pronounced emphasis on the Northern Hemisphere. Furthermore, we find that the primary driver of ET predictions varies across different climatic regions. Overall, the HG-Land ET, underpinned by the interpretability of the machine-learning model, emerges as a validated and generalized resource catering to scientific research and various applications.
RESUMO
PURPOSE: Automatic surgical instrument segmentation is a crucial step for robotic-aided surgery. Encoder-decoder construction-based methods often directly fuse high-level and low-level features by skip connection to supplement some detailed information. However, irrelevant information fusion also increases misclassification or wrong segmentation, especially for complex surgical scenes. Uneven illumination always results in instruments similar to other tissues of background, which greatly increases the difficulty of automatic surgical instrument segmentation. The paper proposes a novel network to solve the problem. METHODS: The paper proposes to guide the network to select effective features for instrument segmentation. The network is named context-guided bidirectional attention network (CGBANet). The guidance connection attention (GCA) module is inserted into the network to adaptively filter out irrelevant low-level features. Moreover, we propose bidirectional attention (BA) module for the GCA module to capture both local information and local-global dependency for surgical scenes to provide accurate instrument features. RESULTS: The superiority of our CGBA-Net is verified by multiple instrument segmentation on two publicly available datasets of different surgical scenarios, including an endoscopic vision dataset (EndoVis 2018) and a cataract surgery dataset. Extensive experimental results demonstrate our CGBA-Net outperforms the state-of-the-art methods on two datasets. Ablation study based on the datasets proves the effectiveness of our modules. CONCLUSION: The proposed CGBA-Net increased the accuracy of multiple instruments segmentation, which accurately classifies and segments the instruments. The proposed modules effectively provided instrument-related features for the network.
Assuntos
Extração de Catarata , Oftalmologia , Procedimentos Cirúrgicos Robóticos , Humanos , Iluminação , Instrumentos Cirúrgicos , Processamento de Imagem Assistida por ComputadorRESUMO
Retinal diseases are the leading causes of vision temporary or permanent loss. Precise retinal disease grading is a prerequisite for early intervention or specific therapeutic schedules. Existing works based on Convolutional Neural Networks (CNN) focus on typical locality structures and cannot capture long-range dependencies. But retinal disease grading relies more on the relationship between the local lesion and the whole retina, which is consistent with the self-attention mechanism. Therefore, the paper proposes a novel Structure-Oriented Transformer (SoT) framework to further construct the relationship between lesions and retina on clinical datasets. To reduce the dependence on the amount of data, we design structure guidance as a model-oriented filter to emphasize the whole retina structure and guide relation construction. Then, we adopt the pre-trained vision transformer that efficiently models all feature patches' relationships via transfer learning. Besides, to make the best of all output tokens, a Token vote classifier is proposed to obtain the final grading results. We conduct extensive experiments on one clinical neovascular Age-related Macular Degeneration (nAMD) dataset. The experiments demonstrate the effectiveness of SoT components and improve the ability of relation construction between lesion and retina, which outperforms the state-of-the-art methods for nAMD grading. Furthermore, we evaluate our SoT on one publicly available retinal diseases dataset, which proves our algorithm has classification superiority and good generality.