Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 47
Filtrar
1.
IEEE Trans Med Imaging ; PP2024 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-38607704

RESUMO

Nuclei classification provides valuable information for histopathology image analysis. However, the large variations in the appearance of different nuclei types cause difficulties in identifying nuclei. Most neural network based methods are affected by the local receptive field of convolutions, and pay less attention to the spatial distribution of nuclei or the irregular contour shape of a nucleus. In this paper, we first propose a novel polygon-structure feature learning mechanism that transforms a nucleus contour into a sequence of points sampled in order, and employ a recurrent neural network that aggregates the sequential change in distance between key points to obtain learnable shape features. Next, we convert a histopathology image into a graph structure with nuclei as nodes, and build a graph neural network to embed the spatial distribution of nuclei into their representations. To capture the correlations between the categories of nuclei and their surrounding tissue patterns, we further introduce edge features that are defined as the background textures between adjacent nuclei. Lastly, we integrate both polygon and graph structure learning mechanisms into a whole framework that can extract intra and inter-nucleus structural characteristics for nuclei classification. Experimental results show that the proposed framework achieves significant improvements compared to the previous methods. Code and data are made available via https://github.com/lhaof/SENC.

2.
Artigo em Inglês | MEDLINE | ID: mdl-38277248

RESUMO

Federated learning (FL) makes it possible for multiple clients to collaboratively train a machine-learning model through communicating models instead of data, reducing privacy risk. Thus, FL is more suitable for processing data security and privacy for intelligent systems and applications. Unfortunately, there are several challenges in FL, such as the low training accuracy for nonindependent and identically distributed (non-IID) data and the high cost of computation and communication. Considering these, we propose a novel FL framework named dynamic sparse federated contrastive learning (DSFedCon). DSFedCon combines FL with dynamic sparse (DSR) training of network pruning and contrastive learning to improve model performance and reduce computation costs and communication costs. We analyze DSFedCon from the perspective of accuracy, communication, and security, demonstrating it is communication-efficient and safe. To give a practical evaluation for non-IID data training, we perform experiments and comparisons on the MNIST, CIFAR-10, and CIFAR-100 datasets with different parameters of Dirichlet distribution. Results indicate that DSFedCon can get higher accuracy and better communication cost than other state-of-the-art methods in these two datasets. More precisely, we show that DSFedCon has a 4.67-time speedup of communication rounds in MNIST, a 7.5-time speedup of communication rounds in CIFAR-10, and an 18.33-time speedup of communication rounds in CIFAR-100 dataset while achieving the same training accuracy.

3.
Chin Med J (Engl) ; 137(6): 694-703, 2024 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-37640743

RESUMO

BACKGROUND: The goal of the assisted reproductive treatment is to transfer one euploid blastocyst and to help infertile women giving birth one healthy neonate. Some algorithms have been used to assess the ploidy status of embryos derived from couples with normal chromosome, who subjected to preimplantation genetic testing for aneuploidy (PGT-A) treatment. However, it is currently unknown whether artificial intelligence model can be used to assess the euploidy status of blastocyst derived from populations with chromosomal rearrangement. METHODS: From February 2020 to May 2021, we collected the whole raw time-lapse videos at multiple focal planes from in vitro cultured embryos, the clinical information of couples, and the comprehensive chromosome screening results of those blastocysts that had received PGT treatment. Initially, we developed a novel deep learning model called the Attentive Multi-Focus Selection Network (AMSNet) to analyze time-lapse videos in real time and predict blastocyst formation. Building upon AMSNet, we integrated additional clinically predictive variables and created a second deep learning model, the Attentive Multi-Focus Video and Clinical Information Fusion Network (AMCFNet), to assess the euploidy status of embryos. The efficacy of the AMCFNet was further tested in embryos with parental chromosomal rearrangements. The receiver operating characteristic curve (ROC) was used to evaluate the superiority of the model. RESULTS: A total of 4112 embryos with complete time-lapse videos were enrolled for the blastocyst formation prediction task, and 1422 qualified blastocysts received PGT-A ( n = 589) or PGT for chromosomal structural rearrangement (PGT-SR, n = 833) were enrolled for the euploidy assessment task in this study. The AMSNet model using seven focal raw time-lapse videos has the best real-time accuracy. The real-time accuracy for AMSNet to predict blastocyst formation reached above 70% on the day 2 of embryo culture, and then increased to 80% on the day 4 of embryo culture. Combing with 4 clinical features of couples, the AUC of AMCFNet with 7 focal points increased to 0.729 in blastocysts derived from couples with chromosomal rearrangement. CONCLUSION: Integrating seven focal raw time-lapse images of embryos and parental clinical information, AMCFNet model have the capability of assessing euploidy status in blastocysts derived from couples with chromosomal rearrangement.


Assuntos
Infertilidade Feminina , Diagnóstico Pré-Implantação , Feminino , Recém-Nascido , Gravidez , Humanos , Diagnóstico Pré-Implantação/métodos , Inteligência Artificial , Aberrações Cromossômicas , Testes Genéticos/métodos , Aneuploidia , Estudos Retrospectivos
4.
IEEE Trans Image Process ; 33: 439-450, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38145544

RESUMO

Self-supervised depth estimation methods can achieve competitive performance using only unlabeled monocular videos, but they suffer from the uncertainty of jointly learning depth and pose without any ground truths of both tasks. Supervised framework provides robust and superior performance but is limited by the scope of the labeled data. In this paper, we introduce SENSE, a novel learning paradigm for self-supervised monocular depth estimation that progressively evolves the prediction result using supervised learning, but without requiring labeled data. The key contribution of our approach stems from the novel use of the pseudo labels - the noisy depth estimation from the self-supervised methods. We surprisingly find that a fully supervised depth estimation network trained using the pseudo labels can produce even better results than its "ground truth". To push the envelope further, we then evolve the self-supervised backbone by replacing its depth estimation branch with that fully supervised network. Based on this idea, we devise a comprehensive training pipeline that alternatively enhances the two key branches (depth and pose estimation) of the self-supervised backbone network. Our proposed approach can effectively ease the difficulty of multi-task training in self-supervised depth estimation. Experimental results have shown that our proposed approach achieves state-of-the-art results on the KITTI dataset.

5.
Med Image Anal ; 91: 103018, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37976867

RESUMO

Recently, masked autoencoders have demonstrated their feasibility in extracting effective image and text features (e.g., BERT for natural language processing (NLP) and MAE in computer vision (CV)). This study investigates the potential of applying these techniques to vision-and-language representation learning in the medical domain. To this end, we introduce a self-supervised learning paradigm, multi-modal masked autoencoders (M3AE). It learns to map medical images and texts to a joint space by reconstructing pixels and tokens from randomly masked images and texts. Specifically, we design this approach from three aspects: First, taking into account the varying information densities of vision and language, we employ distinct masking ratios for input images and text, with a notably higher masking ratio for images; Second, we utilize visual and textual features from different layers for reconstruction to address varying levels of abstraction in vision and language; Third, we develop different designs for vision and language decoders. We establish a medical vision-and-language benchmark to conduct an extensive evaluation. Our experimental results exhibit the effectiveness of the proposed method, achieving state-of-the-art results on all downstream tasks. Further analyses validate the effectiveness of the various components and discuss the limitations of the proposed approach. The source code is available at https://github.com/zhjohnchan/M3AE.


Assuntos
Benchmarking , Idioma , Humanos , Software
6.
Artigo em Inglês | MEDLINE | ID: mdl-37922186

RESUMO

Accurate inference of fine-grained traffic flow from coarse-grained one is an emerging yet crucial problem, which can help greatly reduce the number of the required traffic monitoring sensors for cost savings. In this work, we note that traffic flow has a high correlation with road network, which was either completely ignored or simply treated as an external factor in previous works. To facilitate this problem, we propose a novel road-aware traffic flow magnifier (RATFM) that explicitly exploits the prior knowledge of road networks to fully learn the road-aware spatial distribution of fine-grained traffic flow. Specifically, a multidirectional 1-D convolutional layer is first introduced to extract the semantic feature of the road network. Subsequently, we incorporate the road network feature and coarse-grained flow feature to regularize the short-range spatial distribution modeling of road-relative traffic flow. Furthermore, we take the road network feature as a query to capture the long-range spatial distribution of traffic flow with a transformer architecture. Benefiting from the road-aware inference mechanism, our method can generate high-quality fine-grained traffic flow maps. Extensive experiments on three real-world datasets show that the proposed RATFM outperforms state-of-the-art models under various scenarios. Our code and datasets are released at https://github.com/luimoli/RATFM.

7.
Artigo em Inglês | MEDLINE | ID: mdl-37983159

RESUMO

Accurate polyp detection is critical for early colorectal cancer diagnosis. Although remarkable progress has been achieved in recent years, the complex colon environment and concealed polyps with unclear boundaries still pose severe challenges in this area. Existing methods either involve computationally expensive context aggregation or lack prior modeling of polyps, resulting in poor performance in challenging cases. In this paper, we propose the Enhanced CenterNet with Contrastive Learning (ECC-PolypDet), a two-stage training & end-to-end inference framework that leverages images and bounding box annotations to train a general model and fine-tune it based on the inference score to obtain a final robust model. Specifically, we conduct Box-assisted Contrastive Learning (BCL) during training to minimize the intra-class difference and maximize the inter-class difference between foreground polyps and backgrounds, enabling our model to capture concealed polyps. Moreover, to enhance the recognition of small polyps, we design the Semantic Flow-guided Feature Pyramid Network (SFFPN) to aggregate multi-scale features and the Heatmap Propagation (HP) module to boost the model's attention on polyp targets. In the fine-tuning stage, we introduce the IoU-guided Sample Re-weighting (ISR) mechanism to prioritize hard samples by adaptively adjusting the loss weight for each sample during fine-tuning. Extensive experiments on six large-scale colonoscopy datasets demonstrate the superiority of our model compared with previous state-of-the-art detectors.

8.
Int J Syst Evol Microbiol ; 73(10)2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37888976

RESUMO

A Gram-stain-negative, facultative anaerobic, non-flagellated and oval-shaped (0.77-0.98 µm wide and 0.74-1.21 µm long) bacterial strain, designated XY-301T, was isolated from a marine invertebrate collected from the South China Sea. Strain XY-301T grew at 15-37 °C (optimum, 30-35 °C) and at pH 7.0-8.5 (optimum, pH 8.0). The strain was slightly halophilic and it only grew in the presence of 0.5-6.5 % (w/v) NaCl (optimum, 2.5-3.5 %). Its predominant fatty acid (>10 %) was C18 : 1 ω7c. The predominant polar lipids of XY-301T were diphosphatidylglycerol, phosphatidylethanolamine, phosphatidylglycerol, six unidentified aminolipids, three unidentified phospholipids and two unknown polar lipids. The respiratory quinone was Q-10. The genome of XY-301T was 4 979 779 bp in size, with a DNA G+C content of 61.3 mol%. The average nucleotide identity, digital DNA-DNA hybridization and average amino acid identity values between XY-301T and Pseudoprimorskyibacter insulae SSK3-2T were 73.3, 14.5 and 53.5 %, respectively. Based on the results of phylogenetic, phenotypic, chemotaxonomic and genomic analyses, strain XY-301T is considered to represent a novel species and a new genus of the family Roseobacteraceae, for which the name Pacificoceanicola onchidii gen. nov., sp. nov. is proposed. The type strain is XY-301T (=KCTC 72212T=MCCC 1K03614T).


Assuntos
Ácidos Graxos , Ubiquinona , Animais , Ácidos Graxos/química , Filogenia , Ubiquinona/química , Análise de Sequência de DNA , Composição de Bases , Técnicas de Tipagem Bacteriana , DNA Bacteriano/genética , RNA Ribossômico 16S/genética , Fosfolipídeos/química , China , Invertebrados
9.
IEEE Trans Image Process ; 32: 5580-5594, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37782617

RESUMO

Compared to unsupervised domain adaptation, semi-supervised domain adaptation (SSDA) aims to significantly improve the classification performance and generalization capability of the model by leveraging the presence of a small amount of labeled data from the target domain. Several SSDA approaches have been developed to enable semantic-aligned feature confusion between labeled (or pseudo labeled) samples across domains; nevertheless, owing to the scarcity of semantic label information of the target domain, they were arduous to fully realize their potential. In this study, we propose a novel SSDA approach named Graph-based Adaptive Betweenness Clustering (G-ABC) for achieving categorical domain alignment, which enables cross-domain semantic alignment by mandating semantic transfer from labeled data of both the source and target domains to unlabeled target samples. In particular, a heterogeneous graph is initially constructed to reflect the pairwise relationships between labeled samples from both domains and unlabeled ones of the target domain. Then, to degrade the noisy connectivity in the graph, connectivity refinement is conducted by introducing two strategies, namely Confidence Uncertainty based Node Removal and Prediction Dissimilarity based Edge Pruning. Once the graph has been refined, Adaptive Betweenness Clustering is introduced to facilitate semantic transfer by using across-domain betweenness clustering and within-domain betweenness clustering, thereby propagating semantic label information from labeled samples across domains to unlabeled target data. Extensive experiments on three standard benchmark datasets, namely DomainNet, Office-Home, and Office-31, indicated that our method outperforms previous state-of-the-art SSDA approaches, demonstrating the superiority of the proposed G-ABC algorithm.

10.
Bioinformatics ; 39(10)2023 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-37740312

RESUMO

MOTIVATION: Proteins play crucial roles in biological processes, with their functions being closely tied to thermodynamic stability. However, measuring stability changes upon point mutations of amino acid residues using physical methods can be time-consuming. In recent years, several computational methods for protein thermodynamic stability prediction (PTSP) based on deep learning have emerged. Nevertheless, these approaches either overlook the natural topology of protein structures or neglect the inherent noisy samples resulting from theoretical calculation or experimental errors. RESULTS: We propose a novel Global-Local Graph Neural Network powered by Unbiased Curriculum Learning for the PTSP task. Our method first builds a Siamese graph neural network to extract protein features before and after mutation. Since the graph's topological changes stem from local node mutations, we design a local feature transformation module to make the model focus on the mutated site. To address model bias caused by noisy samples, which represent unavoidable errors from physical experiments, we introduce an unbiased curriculum learning method. This approach effectively identifies and re-weights noisy samples during the training process. Extensive experiments demonstrate that our proposed method outperforms advanced protein stability prediction methods, and surpasses state-of-the-art learning methods for regression prediction tasks. AVAILABILITY AND IMPLEMENTATION: All code and data is available at https://github.com/haifangong/UCL-GLGNN.


Assuntos
Aminoácidos , Currículo , Estabilidade Proteica , Redes Neurais de Computação , Termodinâmica
11.
Artigo em Inglês | MEDLINE | ID: mdl-37436859

RESUMO

Most existing methods that cope with noisy labels usually assume that the classwise data distributions are well balanced. They are difficult to deal with the practical scenarios where training samples have imbalanced distributions, since they are not able to differentiate noisy samples from tail classes' clean samples. This article makes an early effort to tackle the image classification task in which the provided labels are noisy and have a long-tailed distribution. To deal with this problem, we propose a new learning paradigm which can screen out noisy samples by matching between inferences on weak and strong data augmentations. A leave-noise-out regularization (LNOR) is further introduced to eliminate the effect of the recognized noisy samples. Besides, we propose a prediction penalty based on the online classwise confidence levels to avoid the bias toward easy classes which are dominated by head classes. Extensive experiments on five datasets including CIFAR-10, CIFAR-100, MNIST, FashionMNIST, and Clothing1M demonstrate that the proposed method outperforms the existing algorithms for learning with long-tailed distribution and label noise.

12.
IEEE J Biomed Health Inform ; 27(9): 4478-4488, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37459259

RESUMO

Locating and stratifying the submucosal tumor of the digestive tract from endoscopy ultrasound (EUS) images are of vital significance to the preliminary diagnosis of tumors. However, the above problems are challenging, due to the poor appearance contrast between different layers of the digestive tract wall (DTW) and the narrowness of each layer. Few of existing deep-learning based diagnosis algorithms are devised to tackle this issue. In this article, we build a multi-task framework for simultaneously locating and stratifying the submucosal tumor. And considering the awareness of the DTW is critical to the localization and stratification of the tumor, we integrate the DTW segmentation task into the proposed multi-task framework. Except for sharing a common backbone model, the three tasks are explicitly directed with a hierarchical guidance module, in which the probability map of DTW itself is used to locally enhance the feature representation for tumor localization, and the probability maps of DTW and tumor are jointly employed to locally enhance the feature representation for tumor stratification. Moreover, by means of the dynamic class activation map, probability maps of DTW and tumor are reused to enforce the stratification inference process to pay more attention to DTW and tumor regions, contributing to a reliable and interpretable submucosal tumor stratification model. Additionally, considering the relation with respect to other structures is beneficial for stratifying tumors, we devise a graph reasoning module to replenish non-local relation knowledge for the stratification branch. Experiments on a Stomach-Esophagus and an Intestinal EUS dataset prove that our method achieves very appealing performance on both tumor localization and stratification, significantly outperforming state-of-the-art object detection approaches.


Assuntos
Neoplasias Gástricas , Humanos , Algoritmos
13.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 11624-11641, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37289602

RESUMO

Existing visual question answering methods often suffer from cross-modal spurious correlations and oversimplified event-level reasoning processes that fail to capture event temporality, causality, and dynamics spanning over the video. In this work, to address the task of event-level visual question answering, we propose a framework for cross-modal causal relational reasoning. In particular, a set of causal intervention operations is introduced to discover the underlying causal structures across visual and linguistic modalities. Our framework, named Cross-Modal Causal RelatIonal Reasoning (CMCIR), involves three modules: i) Causality-aware Visual-Linguistic Reasoning (CVLR) module for collaboratively disentangling the visual and linguistic spurious correlations via front-door and back-door causal interventions; ii) Spatial-Temporal Transformer (STT) module for capturing the fine-grained interactions between visual and linguistic semantics; iii) Visual-Linguistic Feature Fusion (VLFF) module for learning the global semantic-aware visual-linguistic representations adaptively. Extensive experiments on four event-level datasets demonstrate the superiority of our CMCIR in discovering visual-linguistic causal structures and achieving robust event-level visual question answering.

14.
Artigo em Inglês | MEDLINE | ID: mdl-37028347

RESUMO

Due to the difficulty of collecting paired Low-Resolution (LR) and High-Resolution (HR) images, the recent research on single image Super-Resolution (SR) has often been criticized for the data bottleneck of the synthetic image degradation between LRs and HRs. Recently, the emergence of real-world SR datasets, e.g., RealSR and DRealSR, promotes the exploration of Real-World image Super-Resolution (RWSR). RWSR exposes a more practical image degradation, which greatly challenges the learning capacity of deep neural networks to reconstruct high-quality images from low-quality images collected in realistic scenarios. In this paper, we explore Taylor series approximation in prevalent deep neural networks for image reconstruction, and propose a very general Taylor architecture to derive Taylor Neural Networks (TNNs) in a principled manner. Our TNN builds Taylor Modules with Taylor Skip Connections (TSCs) to approximate the feature projection functions, following the spirit of Taylor Series. TSCs introduce the input connected directly with each layer at different layers, to sequentially produces different high-order Taylor maps to attend more image details, and then aggregate the different high-order information from different layers. Only via simple skip connections, TNN is compatible with various existing neural networks to effectively learn high-order components of the input image with little increase of parameters. Furthermore, we have conducted extensive experiments to evaluate our TNNs in different backbones on two RWSR benchmarks, which achieve a superior performance in comparison with existing baseline methods.

15.
IEEE Trans Pattern Anal Mach Intell ; 45(7): 8646-8659, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37018636

RESUMO

Given a natural language referring expression, the goal of referring video segmentation task is to predict the segmentation mask of the referred object in the video. Previous methods only adopt 3D CNNs upon the video clip as a single encoder to extract a mixed spatio-temporal feature for the target frame. Though 3D convolutions are able to recognize which object is performing the described actions, they still introduce misaligned spatial information from adjacent frames, which inevitably confuses features of the target frame and leads to inaccurate segmentation. To tackle this issue, we propose a language-aware spatial-temporal collaboration framework that contains a 3D temporal encoder upon the video clip to recognize the described actions, and a 2D spatial encoder upon the target frame to provide undisturbed spatial features of the referred object. For multimodal features extraction, we propose a Cross-Modal Adaptive Modulation (CMAM) module and its improved version CMAM+ to conduct adaptive cross-modal interaction in the encoders with spatial- or temporal-relevant language features which are also updated progressively to enrich linguistic global context. In addition, we also propose a Language-Aware Semantic Propagation (LASP) module in the decoder to propagate semantic information from deep stages to the shallow stages with language-aware sampling and assignment, which is able to highlight language-compatible foreground visual features and suppress language-incompatible background visual features for better facilitating the spatial-temporal collaboration. Extensive experiments on four popular referring video segmentation benchmarks demonstrate the superiority of our method over the previous state-of-the-art methods.

16.
Comput Biol Med ; 155: 106389, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36812810

RESUMO

Ultrasound segmentation of thyroid nodules is a challenging task, which plays an vital role in the diagnosis of thyroid cancer. However, the following two factors limit the development of automatic thyroid nodule segmentation algorithms: (1) existing automatic nodule segmentation algorithms that directly apply semantic segmentation techniques can easily mistake non-thyroid areas as nodules, because of the lack of the thyroid gland region perception, the large number of similar areas in the ultrasonic images, and the inherently low contrast images; (2) the currently available dataset (i.e., DDTI) is small and collected from a single center, which violates the fact that thyroid ultrasound images are acquired from various devices in real-world situations. To overcome the lack of thyroid gland region prior knowledge, we design a thyroid region prior guided feature enhancement network (TRFE+) for accurate thyroid nodule segmentation. Specifically, (1) a novel multi-task learning framework that simultaneously learns the nodule size, gland position, and the nodule position is designed; (2) an adaptive gland region feature enhancement module is proposed to make full use of the thyroid gland prior knowledge; (3) a normalization approach with respect to the channel dimension is applied to alleviate the domain gap during the training process. To facilitate the development of thyroid nodule segmentation, we have contributed TN3K: an open-access dataset containing 3493 thyroid nodule images with high-quality nodule masks labeling from various devices and views. We perform a thorough evaluation based on the TN3K test set and DDTI to demonstrate the effectiveness of the proposed method. Code and data are available at https://github.com/haifangong/TRFE-Net-for-thyroid-nodule-segmentation.


Assuntos
Nódulo da Glândula Tireoide , Humanos , Ultrassonografia/métodos , Algoritmos
17.
IEEE Trans Neural Netw Learn Syst ; 34(7): 3308-3322, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-35089863

RESUMO

Land remote-sensing analysis is a crucial research in earth science. In this work, we focus on a challenging task of land analysis, i.e., automatic extraction of traffic roads from remote-sensing data, which has widespread applications in urban development and expansion estimation. Nevertheless, conventional methods either only utilized the limited information of aerial images, or simply fused multimodal information (e.g., vehicle trajectories), thus cannot well recognize unconstrained roads. To facilitate this problem, we introduce a novel neural network framework termed cross-modal message propagation network (CMMPNet), which fully benefits the complementary different modal data (i.e., aerial images and crowdsourced trajectories). Specifically, CMMPNet is composed of two deep autoencoders for modality-specific representation learning and a tailor-designed dual enhancement module for cross-modal representation refinement. In particular, the complementary information of each modality is comprehensively extracted and dynamically propagated to enhance the representation of another modality. Extensive experiments on three real-world benchmarks demonstrate the effectiveness of our CMMPNet for robust road extraction benefiting from blending different modal data, either using image and trajectory data or image and light detection and ranging (LiDAR) data. From the experimental results, we observe that the proposed approach outperforms current state-of-the-art methods by large margins. Our source code is resealed on the project page http://lingboliu.com/multimodal_road_extraction.html.


Assuntos
Crowdsourcing , Redes Neurais de Computação , Benchmarking , Redes Reguladoras de Genes , Aprendizagem
18.
IEEE Trans Pattern Anal Mach Intell ; 45(3): 3574-3589, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35639679

RESUMO

Metro origin-destination prediction is a crucial yet challenging time-series analysis task in intelligent transportation systems, which aims to accurately forecast two specific types of cross-station ridership, i.e., Origin-Destination (OD) one and Destination-Origin (DO) one. However, complete OD matrices of previous time intervals can not be obtained immediately in online metro systems, and conventional methods only used limited information to forecast the future OD and DO ridership separately. In this work, we proposed a novel neural network module termed Heterogeneous Information Aggregation Machine (HIAM), which fully exploits heterogeneous information of historical data (e.g., incomplete OD matrices, unfinished order vectors, and DO matrices) to jointly learn the evolutionary patterns of OD and DO ridership. Specifically, an OD modeling branch estimates the potential destinations of unfinished orders explicitly to complement the information of incomplete OD matrices, while a DO modeling branch takes DO matrices as input to capture the spatial-temporal distribution of DO ridership. Moreover, a Dual Information Transformer is introduced to propagate the mutual information among OD features and DO features for modeling the OD-DO causality and correlation. Based on the proposed HIAM, we develop a unified Seq2Seq network to forecast the future OD and DO ridership simultaneously. Extensive experiments conducted on two large-scale benchmarks demonstrate the effectiveness of our method for online metro origin-destination prediction. Our code is resealed at https://github.com/HCPLab-SYSU/HIAM.

19.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 7019-7034, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-32946383

RESUMO

Matching clothing images from customers and online shopping stores has rich applications in e-commerce. Existing algorithms mostly encode an image as a global feature vector and perform retrieval via global representation matching. However, distinctive local information on clothing is immersed in this global representation, resulting in sub-optimized performance. To address this issue, we propose a novel graph reasoning network (GRNet) on a similarity pyramid, which learns similarities between a query and a gallery cloth by using both initial pairwise multi-scale feature representations and matching propagation for unaligned representations. The query local representations at each scale are aligned with those of the gallery via an adaptive window pooling module. The similarity pyramid is represented by a similarity graph, where nodes represent similarities between clothing components at different scales, and the final matching score is obtained by message propagation along edges. In GRNet, graph reasoning is solved by training a graph convolutional network, enabling the alignment of salient clothing components to improve clothing retrieval. To facilitate future research, we introduce a new benchmark, i.e. FindFashion, containing rich annotations of bounding boxes, views, occlusions, and cropping. Extensive experiments show that GRNet obtains new state-of-the-art results on three challenging benchmarks, e.g. pushing the accuracy of top-1, top-20, and top-50 on DeepFashion to 27, 66, and 75 percent (i.e. 6, 12, and 10 percent absolute improvements), outperforming competitors with large margins. On FindFashion, GRNet achieves considerable improvements on all empirical settings.

20.
IEEE Trans Med Imaging ; 42(4): 947-958, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36355729

RESUMO

Recently deep neural networks, which require a large amount of annotated samples, have been widely applied in nuclei instance segmentation of H&E stained pathology images. However, it is inefficient and unnecessary to label all pixels for a dataset of nuclei images which usually contain similar and redundant patterns. Although unsupervised and semi-supervised learning methods have been studied for nuclei segmentation, very few works have delved into the selective labeling of samples to reduce the workload of annotation. Thus, in this paper, we propose a novel full nuclei segmentation framework that chooses only a few image patches to be annotated, augments the training set from the selected samples, and achieves nuclei segmentation in a semi-supervised manner. In the proposed framework, we first develop a novel consistency-based patch selection method to determine which image patches are the most beneficial to the training. Then we introduce a conditional single-image GAN with a component-wise discriminator, to synthesize more training samples. Lastly, our proposed framework trains an existing segmentation model with the above augmented samples. The experimental results show that our proposed method could obtain the same-level performance as a fully-supervised baseline by annotating less than 5% pixels on some benchmarks.


Assuntos
Núcleo Celular , Redes Neurais de Computação , Aprendizado de Máquina Supervisionado
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA