RESUMO
With the explosive 3D data growth, the urgency of utilizing zero-shot learning to facilitate data labeling becomes evident. Recently, methods transferring language or language-image pre-training models like Contrastive Language-Image Pre-training (CLIP) to 3D vision have made significant progress in the 3D zero-shot classification task. These methods primarily focus on 3D object classification with an aligned pose; such a setting is, however, rather restrictive, which overlooks the recognition of 3D objects with open poses typically encountered in real-world scenarios, such as an overturned chair or a lying teddy bear. To this end, we propose a more realistic and challenging scenario named open-pose 3D zero-shot classification, focusing on the recognition of 3D objects regardless of their orientation. First, we revisit the current research on 3D zero-shot classification and propose two benchmark datasets specifically designed for the open-pose setting. We empirically validate many of the most popular methods in the proposed open-pose benchmark. Our investigations reveal that most current 3D zero-shot classification models suffer from poor performance, indicating a substantial exploration room towards the new direction. Furthermore, we study a concise pipeline with an iterative angle refinement mechanism that automatically optimizes one ideal angle to classify these open-pose 3D objects. In particular, to make validation more compelling and not just limited to existing CLIP-based methods, we also pioneer the exploration of knowledge transfer based on Diffusion models. While the proposed solutions can serve as a new benchmark for open-pose 3D zero-shot classification, we discuss the complexities and challenges of this scenario that remain for further research development. The code is available publicly at https://github.com/weiguangzhao/Diff-OP3D.
RESUMO
While Graph Neural Networks (GNNs) have achieved enormous success in multiple graph analytical tasks, modern variants mostly rely on the strong inductive bias of homophily. However, real-world networks typically exhibit both homophilic and heterophilic linking patterns, wherein adjacent nodes may share dissimilar attributes and distinct labels. Therefore, GNNs smoothing node proximity holistically may aggregate both task-relevant and irrelevant (even harmful) information, limiting their ability to generalize to heterophilic graphs and potentially causing non-robustness. In this work, we propose a novel Edge Splitting GNN (ES-GNN) framework to adaptively distinguish between graph edges either relevant or irrelevant to learning tasks. This essentially transfers the original graph into two subgraphs with the same node set but complementary edge sets dynamically. Given that, information propagation separately on these subgraphs and edge splitting are alternatively conducted, thus disentangling the task-relevant and irrelevant features. Theoretically, we show that our ES-GNN can be regarded as a solution to a disentangled graph denoising problem, which further illustrates our motivations and interprets the improved generalization beyond homophily. Extensive experiments over 11 benchmark and 1 synthetic datasets not only demonstrate the effective performance of ES-GNN but also highlight its robustness to adversarial graphs and mitigation of the over-smoothing problem.
RESUMO
BACKGROUND: Alzheimer's disease (AD) is a prevalent neurodegenerative disease with no effective treatment. Efficient and rapid detection plays a crucial role in mitigating and managing AD progression. Deep learning-assisted smartphone-based microfluidic paper analysis devices (µPADs) offer the advantages of low cost, good sensitivity, and rapid detection, providing a strategic pathway to address large-scale disease screening in resource-limited areas. However, existing smartphone-based detection platforms usually rely on large devices or cloud servers for data transfer and processing. Additionally, the implementation of automated colorimetric enzyme-linked immunoassay (c-ELISA) on µPADs can further facilitate the realization of smartphone µPADs platforms for efficient disease detection. RESULTS: This paper introduces a new deep learning-assisted offline smartphone platform for early AD screening, offering rapid disease detection in low-resource areas. The proposed platform features a simple mechanical rotating structure controlled by a smartphone, enabling fully automated c-ELISA on µPADs. Our platform successfully applied sandwich c-ELISA for detecting the ß-amyloid peptide 1-42 (Aß 1-42, a crucial AD biomarker) and demonstrated its efficacy in 38 artificial plasma samples (healthy: 19, unhealthy: 19, N = 6). Moreover, we employed the YOLOv5 deep learning model and achieved an impressive 97 % accuracy on a dataset of 1824 images, which is 10.16 % higher than the traditional method of curve-fitting results. The trained YOLOv5 model was seamlessly integrated into the smartphone using the NCNN (Tencent's Neural Network Inference Framework), enabling deep learning-assisted offline detection. A user-friendly smartphone application was developed to control the entire process, realizing a streamlined "samples in, answers out" approach. SIGNIFICANCE: This deep learning-assisted, low-cost, user-friendly, highly stable, and rapid-response automated offline smartphone-based detection platform represents a good advancement in point-of-care testing (POCT). Moreover, our platform provides a feasible approach for efficient AD detection by examining the level of Aß 1-42, particularly in areas with low resources and limited communication infrastructure.
Assuntos
Doença de Alzheimer , Peptídeos beta-Amiloides , Biomarcadores , Ensaio de Imunoadsorção Enzimática , Papel , Smartphone , Doença de Alzheimer/diagnóstico , Doença de Alzheimer/sangue , Humanos , Biomarcadores/sangue , Biomarcadores/análise , Peptídeos beta-Amiloides/análise , Peptídeos beta-Amiloides/sangue , Fragmentos de Peptídeos/sangue , Fragmentos de Peptídeos/análise , Dispositivos Lab-On-A-Chip , Aprendizado Profundo , Automação , Técnicas Analíticas Microfluídicas/instrumentaçãoRESUMO
Zero-shot learning (ZSL) refers to the design of predictive functions on new classes (unseen classes) of data that have never been seen during training. In a more practical scenario, generalized zero-shot learning (GZSL) requires predicting both seen and unseen classes accurately. In the absence of target samples, many GZSL models may overfit training data and are inclined to predict individuals as categories that have been seen in training. To alleviate this problem, we develop a parameter-wise adversarial training process that promotes robust recognition of seen classes while designing during the test a novel model perturbation mechanism to ensure sufficient sensitivity to unseen classes. Concretely, adversarial perturbation is conducted on the model to obtain instance-specific parameters so that predictions can be biased to unseen classes in the test. Meanwhile, the robust training encourages the model robustness, leading to nearly unaffected prediction for seen classes. Moreover, perturbations in the parameter space, computed from multiple individuals simultaneously, can be used to avoid the effect of perturbations that are too extreme and ruin the predictions. Comparison results on four benchmark ZSL data sets show the effective improvement that the proposed framework made on zero-shot methods with learned metrics.
RESUMO
While exogenous variables have a major impact on performance improvement in time series analysis, interseries correlation and time dependence among them are rarely considered in the present continuous methods. The dynamical systems of multivariate time series could be modeled with complex unknown partial differential equations (PDEs) which play a prominent role in many disciplines of science and engineering. In this article, we propose a continuous-time model for arbitrary-step prediction to learn an unknown PDE system in multivariate time series whose governing equations are parameterized by self-attention and gated recurrent neural networks. The proposed model, exogenous-guided PDE network (EgPDE-Net), takes account of the relationships among the exogenous variables and their effects on the target series. Importantly, the model can be reduced into a regularized ordinary differential equation (ODE) problem with specially designed regularization guidance, which makes the PDE problem tractable to obtain numerical solutions and feasible to predict multiple future values of the target series at arbitrary time points. Extensive experiments demonstrate that our proposed model could achieve competitive accuracy over strong baselines: on average, it outperforms the best baseline by reducing 9.85% on RMSE and 13.98% on MAE for arbitrary-step prediction.
RESUMO
Whilst adversarial training has been proven to be one most effective defending method against adversarial attacks for deep neural networks, it suffers from over-fitting on training adversarial data and thus may not guarantee the robust generalization. This may result from the fact that the conventional adversarial training methods generate adversarial perturbations usually in a supervised way so that the resulting adversarial examples are highly biased towards the decision boundary, leading to an inhomogeneous data distribution. To mitigate this limitation, we propose to generate adversarial examples from a perturbation diversity perspective. Specifically, the generated perturbed samples are not only adversarial but also diverse so as to certify robust generalization and significant robustness improvement through a homogeneous data distribution. We provide theoretical and empirical analysis, establishing a foundation to support the proposed method. As a major contribution, we prove that promoting perturbations diversity can lead to a better robust generalization bound. To verify our methods' effectiveness, we conduct extensive experiments over different datasets (e.g., CIFAR-10, CIFAR-100, SVHN) with different adversarial attacks (e.g., PGD, CW). Experimental results show that our method outperforms other state-of-the-art (e.g., PGD and Feature Scattering) in robust generalization performance.
Assuntos
Generalização Psicológica , Redes Neurais de ComputaçãoRESUMO
Zero-shot learning (ZSL) aims to identify unseen classes with zero samples during training. Broadly speaking, present ZSL methods usually adopt class-level semantic labels and compare them with instance-level semantic predictions to infer unseen classes. However, we find that such existing models mostly produce imbalanced semantic predictions, i.e. these models could perform precisely for some semantics, but may not for others. To address the drawback, we aim to introduce an imbalanced learning framework into ZSL. However, we find that imbalanced ZSL has two unique challenges: (1) Its imbalanced predictions are highly correlated with the value of semantic labels rather than the number of samples as typically considered in the traditional imbalanced learning; (2) Different semantics follow quite different error distributions between classes. To mitigate these issues, we first formalize ZSL as an imbalanced regression problem which offers empirical evidences to interpret how semantic labels lead to imbalanced semantic predictions. We then propose a re-weighted loss termed Re-balanced Mean-Squared Error (ReMSE), which tracks the mean and variance of error distributions, thus ensuring rebalanced learning across classes. As a major contribution, we conduct a series of analyses showing that ReMSE is theoretically well established. Extensive experiments demonstrate that the proposed method effectively alleviates the imbalance in semantic prediction and outperforms many state-of-the-art ZSL methods.
RESUMO
48With the growing number of biomaterials and printing technologies, bioprinting has brought about tremendous potential to fabricate biomimetic architectures or living tissue constructs. To make bioprinting and bioprinted constructs more powerful, machine learning (ML) is introduced to optimize the relevant processes, applied materials, and mechanical/biological performances. The objectives of this work were to collate, analyze, categorize, and summarize published articles and papers pertaining to ML applications in bioprinting and their impact on bioprinted constructs, as well as the directions of potential development. From the available references, both traditional ML and deep learning (DL) have been applied to optimize the printing process, structural parameters, material properties, and biological/mechanical performance of bioprinted constructs. The former uses features extracted from image or numerical data as inputs in prediction model building, and the latter uses the image directly for segmentation or classification model building. All of these studies present advanced bioprinting with a stable and reliable printing process, desirable fiber/droplet diameter, and precise layer stacking, and also enhance the bioprinted constructs with better design and cell performance. The current challenges and outlooks in developing process-material-performance models are highlighted, which may pave the way for revolutionizing bioprinting technologies and bioprinted construct design.
RESUMO
Unsupervised cross-modality medical image adaptation aims to alleviate the severe domain gap between different imaging modalities without using the target domain label. A key in this campaign relies upon aligning the distributions of source and target domain. One common attempt is to enforce the global alignment between two domains, which, however, ignores the fatal local-imbalance domain gap problem, i.e., some local features with larger domain gap are harder to transfer. Recently, some methods conduct alignment focusing on local regions to improve the efficiency of model learning. While this operation may cause a deficiency of critical information from contexts. To tackle this limitation, we propose a novel strategy to alleviate the domain gap imbalance considering the characteristics of medical images, namely Global-Local Union Alignment. Specifically, a feature-disentanglement style-transfer module first synthesizes the target-like source images to reduce the global domain gap. Then, a local feature mask is integrated to reduce the 'inter-gap' for local features by prioritizing those discriminative features with larger domain gap. This combination of global and local alignment can precisely localize the crucial regions in segmentation target while preserving the overall semantic consistency. We conduct a series of experiments with two cross-modality adaptation tasks, i,e. cardiac substructure and abdominal multi-organ segmentation. Experimental results indicate that our method achieves state-of-the-art performance in both tasks.
Assuntos
Coração , Semântica , Humanos , Processamento de Imagem Assistida por ComputadorRESUMO
In this paper, we develop a novel transformer-based generative adversarial neural network called U-Transformer for generalized image outpainting problems. Different from most present image outpainting methods conducting horizontal extrapolation, our generalized image outpainting could extrapolate visual context all-side around a given image with plausible structure and details even for complicated scenery, building, and art images. Specifically, we design a generator as an encoder-to-decoder structure embedded with the popular Swin Transformer blocks. As such, our novel neural network can better cope with image long-range dependencies which are crucially important for generalized image outpainting. We propose additionally a U-shaped structure and multi-view Temporal Spatial Predictor (TSP) module to reinforce image self-reconstruction as well as unknown-part prediction smoothly and realistically. By adjusting the predicting step in the TSP module in the testing stage, we can generate arbitrary outpainting size given the input sub-image. We experimentally demonstrate that our proposed method could produce visually appealing results for generalized image outpainting against the state-of-the-art image outpainting approaches.
Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de ComputaçãoRESUMO
Smartphone has long been considered as one excellent platform for disease screening and diagnosis, especially when combined with microfluidic paper-based analytical devices (µPADs) that feature low cost, ease of use, and pump-free operations. In this paper, we report a deep learning-assisted smartphone platform for ultra-accurate testing of paper-based microfluidic colorimetric enzyme-linked immunosorbent assay (c-ELISA). Different from existing smartphone-based µPAD platforms, whose sensing reliability is suffered from uncontrolled ambient lighting conditions, our platform is able to eliminate those random lighting influences for enhanced sensing accuracy. We first constructed a dataset that contains c-ELISA results (n = 2048) of rabbit IgG as the model target on µPADs under eight controlled lighting conditions. Those images are then used to train four different mainstream deep learning algorithms. By training with these images, the deep learning algorithms can well eliminate the influences of lighting conditions. Among them, the GoogLeNet algorithm gives the highest accuracy (>97%) in quantitative rabbit IgG concentration classification/prediction, which also provides 4% higher area under curve (AUC) value than that of the traditional curve fitting results analysis method. In addition, we fully automate the whole sensing process and achieve the "image in, answer out" to maximize the convenience of the smartphone. A simple and user-friendly smartphone application has been developed that controls the whole process. This newly developed platform further enhances the sensing performance of µPADs for use by laypersons in low-resource areas and can be facilely adapted to the real disease protein biomarkers detection by c-ELISA on µPADs.
Assuntos
Aprendizado Profundo , Técnicas Analíticas Microfluídicas , Smartphone , Colorimetria , Reprodutibilidade dos Testes , Ensaio de Imunoadsorção Enzimática , Imunoglobulina G , PapelRESUMO
AdaBelief, one of the current best optimizers, demonstrates superior generalization ability over the popular Adam algorithm by viewing the exponential moving average of observed gradients. AdaBelief is theoretically appealing in which it has a data-dependent O(âT) regret bound when objective functions are convex, where T is a time horizon. It remains, however, an open problem whether the convergence rate can be further improved without sacrificing its generalization ability. To this end, we make the first attempt in this work and design a novel optimization algorithm called FastAdaBelief that aims to exploit its strong convexity in order to achieve an even faster convergence rate. In particular, by adjusting the step size that better considers strong convexity and prevents fluctuation, our proposed FastAdaBelief demonstrates excellent generalization ability and superior convergence. As an important theoretical contribution, we prove that FastAdaBelief attains a data-dependent O(logT) regret bound, which is substantially lower than AdaBelief in strongly convex cases. On the empirical side, we validate our theoretical analysis with extensive experiments in scenarios of strong convexity and nonconvexity using three popular baseline models. Experimental results are very encouraging: FastAdaBelief converges the quickest in comparison to all mainstream algorithms while maintaining an excellent generalization ability, in cases of both strong convexity or nonconvexity. FastAdaBelief is, thus, posited as a new benchmark model for the research community.
RESUMO
3D point clouds are gradually becoming more widely used in the medical field, however, they are rarely used for 3D representation of intracranial vessels and aneurysms due to the time-consuming data reconstruction. In this paper, we simulate the incomplete intracranial vessels (including aneurysms) in the actual collection from different angles, then propose Multi-Scope Feature Extraction Network (MSENet) for Intracranial Aneurysm 3D Point Cloud Completion. MSENet adopts a multi-scope feature extraction encoder to extract the global features from the incomplete point cloud. This encoder utilizes different scopes to fuse the neighborhood information for each point fully. Then a folding-based decoder is applied to obtain the complete 3D shape. To enable the decoder to intuitively match the original geometric structure, we engage the original points coordinates input to perform residual linking. Finally, we merge and sample the complete but coarse point cloud from the decoder to obtain the final refined complete 3D point cloud shape. We conduct extensive experiments on both 3D intracranial aneurysm datasets and general 3D vision PCN datasets. The results demonstrate the effectiveness of the proposed method on three evaluation metrics compared to baseline: our model increases the F-score to 0.379 (+21.1%)/0.320 (+7.7%), reduces Chamfer Distance score to 0.998 (-33.8%)/0.974 (-6.4%), and reduces the Earth Mover's Distance to 2.750 (17.8%)/2.858 (-0.8%).
Assuntos
Aneurisma Intracraniano , HumanosRESUMO
Recent advances in both lightweight deep learning algorithms and edge computing increasingly enable multiple model inference tasks to be conducted concurrently on resource-constrained edge devices, allowing us to achieve one goal collaboratively rather than getting high quality in each standalone task. However, the high overall running latency for performing multi-model inferences always negatively affects the real-time applications. To combat latency, the algorithms should be optimized to minimize the latency for multi-model deployment without compromising the safety-critical situation. This work focuses on the real-time task scheduling strategy for multi-model deployment and investigating the model inference using an open neural network exchange (ONNX) runtime engine. Then, an application deployment strategy is proposed based on the container technology and inference tasks are scheduled to different containers based on the scheduling strategies. Experimental results show that the proposed solution is able to significantly reduce the overall running latency in real-time applications.
Assuntos
Redes Neurais de Computação , Corrida , AlgoritmosRESUMO
Graph convolutional networks (GCNs) emerge as the most successful learning models for graph-structured data. Despite their success, existing GCNs usually ignore the entangled latent factors typically arising in real-world graphs, which results in nonexplainable node representations. Even worse, while the emphasis has been placed on local graph information, the global knowledge of the entire graph is lost to a certain extent. In this work, to address these issues, we propose a novel framework for GCNs, termed LGD-GCN, taking advantage of both local and global information for disentangling node representations in the latent space. Specifically, we propose to represent a disentangled latent continuous space with a statistical mixture model, by leveraging neighborhood routing mechanism locally. From the latent space, various new graphs can then be disentangled and learned, to overall reflect the hidden structures with respect to different factors. On the one hand, a novel regularizer is designed to encourage interfactor diversity for model expressivity in the latent space. On the other hand, the factor-specific information is encoded globally via employing a message passing along these new graphs, in order to strengthen intrafactor consistency. Extensive evaluations on both synthetic and five benchmark datasets show that LGD-GCN brings significant performance gains over the recent competitive models in both disentangling and node classification. Particularly, LGD-GCN is able to outperform averagely the disentangled state-of-the-arts by 7.4% on social network datasets.
RESUMO
We consider the problem of volumetric (3D) unsupervised domain adaptation (UDA) in cross-modality medical image segmentation, aiming to perform segmentation on the unannotated target domain (e.g. MRI) with the help of labeled source domain (e.g. CT). Previous UDA methods in medical image analysis usually suffer from two challenges: 1) they focus on processing and analyzing data at 2D level only, thus missing semantic information from the depth level; 2) one-to-one mapping is adopted during the style-transfer process, leading to insufficient alignment in the target domain. Different from the existing methods, in our work, we conduct a first of its kind investigation on multi-style image translation for complete image alignment to alleviate the domain shift problem, and also introduce 3D segmentation in domain adaptation tasks to maintain semantic consistency at the depth level. In particular, we develop an unsupervised domain adaptation framework incorporating a novel quartet self-attention module to efficiently enhance relationships between widely separated features in spatial regions on a higher dimension, leading to a substantial improvement in segmentation accuracy in the unlabeled target domain. In two challenging cross-modality tasks, specifically brain structures and multi-organ abdominal segmentation, our model is shown to outperform current state-of-the-art methods by a significant margin, demonstrating its potential as a benchmark resource for the biomedical and health informatics research community.
Assuntos
Abdome , Imageamento por Ressonância Magnética , Encéfalo/diagnóstico por imagem , Humanos , Processamento de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodosRESUMO
Fibrous scaffolds have been extensively used in three-dimensional (3D) cell culture systems to establish in vitro models in cell biology, tissue engineering, and drug screening. It is a common practice to characterize cell behaviors on such scaffolds using confocal laser scanning microscopy (CLSM). As a noninvasive technology, CLSM images can be utilized to describe cell-scaffold interaction under varied morphological features, biomaterial composition, and internal structure. Unfortunately, such information has not been fully translated and delivered to researchers due to the lack of effective cell segmentation methods. We developed herein an end-to-end model called Aligned Disentangled Generative Adversarial Network (AD-GAN) for 3D unsupervised nuclei segmentation of CLSM images. AD-GAN utilizes representation disentanglement to separate content representation (the underlying nuclei spatial structure) from style representation (the rendering of the structure) and align the disentangled content in the latent space. The CLSM images collected from fibrous scaffold-based culturing A549, 3T3, and HeLa cells were utilized for nuclei segmentation study. Compared with existing commercial methods such as Squassh and CellProfiler, our AD-GAN can effectively and efficiently distinguish nuclei with the preserved shape and location information. Building on such information, we can rapidly screen cell-scaffold interaction in terms of adhesion, migration and proliferation, so as to improve scaffold design.
RESUMO
We propose a new regularization method for deep learning based on the manifold adversarial training (MAT). Unlike previous regularization and adversarial training methods, MAT further considers the local manifold of latent representations. Specifically, MAT manages to build an adversarial framework based on how the worst perturbation could affect the statistical manifold in the latent space rather than the output space. Particularly, a latent feature space with the Gaussian Mixture Model (GMM) is first derived in a deep neural network. We then define the smoothness by the largest variation of Gaussian mixtures when a local perturbation is given around the input data point. On one hand, the perturbations are added in the way that would rough the statistical manifold of the latent space the worst. On the other hand, the model is trained to promote the manifold smoothness the most in the latent space. Importantly, since the latent space is more informative than the output space, the proposed MAT can learn a more robust and compact data representation, leading to further performance improvement. The proposed MAT is important in that it can be considered as a superset of one recently-proposed discriminative feature learning approach called center loss. We conduct a series of experiments in both supervised and semi-supervised learning on four benchmark data sets, showing that the proposed MAT can achieve remarkable performance, much better than those of the state-of-the-art approaches. In addition, we present a series of visualization which could generate further understanding or explanation on adversarial examples.
Assuntos
Aprendizado de Máquina Supervisionado/normas , BenchmarkingRESUMO
Object detection has wide applications in intelligent systems and sensor applications. Compared with two stage detectors, recent one stage counterparts are capable of running more efficiently with comparable accuracy, which satisfy the requirement of real-time processing. To further improve the accuracy of one stage single shot detector (SSD), we propose a novel Multi-Path fusion Single Shot Detector (MPSSD). Different from other feature fusion methods, we exploit the connection among different scale representations in a pyramid manner. We propose feature fusion module to generate new feature pyramids based on multiscale features in SSD, and these pyramids are sent to our pyramid aggregation module for generating final features. These enhanced features have both localization and semantics information, thus improving the detection performance with little computation cost. A series of experiments on three benchmark datasets PASCAL VOC2007, VOC2012, and MS COCO demonstrate that our approach outperforms many state-of-the-art detectors both qualitatively and quantitatively. In particular, for input images with size 512 × 512, our method attains mean Average Precision (mAP) of 81.8% on VOC2007 test, 80.3% on VOC2012 test, and 33.1% mAP on COCO test-dev 2015.