Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 403
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Phys Rev Lett ; 131(14): 140601, 2023 Oct 06.
Artículo en Inglés | MEDLINE | ID: mdl-37862647

RESUMEN

Quantum neural networks (QNNs) have become an important tool for understanding the physical world, but their advantages and limitations are not fully understood. Some QNNs with specific encoding methods can be efficiently simulated by classical surrogates, while others with quantum memory may perform better than classical classifiers. Here we systematically investigate the problem-dependent power of quantum neural classifiers (QCs) on multiclass classification tasks. Through the analysis of expected risk, a measure that weighs the training loss and the generalization error of a classifier jointly, we identify two key findings: first, the training loss dominates the power rather than the generalization ability; second, QCs undergo a U-shaped risk curve, in contrast to the double-descent risk curve of deep neural classifiers. We also reveal the intrinsic connection between optimal QCs and the Helstrom bound and the equiangular tight frame. Using these findings, we propose a method that exploits loss dynamics of QCs to estimate the optimal hyperparameter settings yielding the minimal risk. Numerical results demonstrate the effectiveness of our approach to explain the superiority of QCs over multilayer Perceptron on parity datasets and their limitations over convolutional neural networks on image datasets. Our work sheds light on the problem-dependent power of QNNs and offers a practical tool for evaluating their potential merit.

2.
Int J Comput Vis ; : 1-26, 2023 May 20.
Artículo en Inglés | MEDLINE | ID: mdl-37363293

RESUMEN

Recently, there has been an increasing concern about the privacy issue raised by identifiable information in machine learning. However, previous portrait matting methods were all based on identifiable images. To fill the gap, we present P3M-10k, which is the first large-scale anonymized benchmark for Privacy-Preserving Portrait Matting (P3M). P3M-10k consists of 10,421 high resolution face-blurred portrait images along with high-quality alpha mattes, which enables us to systematically evaluate both trimap-free and trimap-based matting methods and obtain some useful findings about model generalization ability under the privacy preserving training (PPT) setting. We also present a unified matting model dubbed P3M-Net that is compatible with both CNN and transformer backbones. To further mitigate the cross-domain performance gap issue under the PPT setting, we devise a simple yet effective Copy and Paste strategy (P3M-CP), which borrows facial information from public celebrity images and directs the network to reacquire the face context at both data and feature level. Extensive experiments on P3M-10k and public benchmarks demonstrate the superiority of P3M-Net over state-of-the-art methods and the effectiveness of P3M-CP in improving the cross-domain generalization ability, implying a great significance of P3M for future research and real-world applications. The dataset, code and models are available here (https://github.com/ViTAE-Transformer/P3M-Net).

3.
Bioinformatics ; 37(6): 785-792, 2021 05 05.
Artículo en Inglés | MEDLINE | ID: mdl-33070196

RESUMEN

MOTIVATION: There is growing interest in the biomedical research community to incorporate retrospective data, available in healthcare systems, to shed light on associations between different biomarkers. Understanding the association between various types of biomedical data, such as genetic, blood biomarkers, imaging, etc. can provide a holistic understanding of human diseases. To formally test a hypothesized association between two types of data in Electronic Health Records (EHRs), one requires a substantial sample size with both data modalities to achieve a reasonable power. Current association test methods only allow using data from individuals who have both data modalities. Hence, researchers cannot take advantage of much larger EHR samples that includes individuals with at least one of the data types, which limits the power of the association test. RESULTS: We present a new method called the Semi-paired Association Test (SAT) that makes use of both paired and unpaired data. In contrast to classical approaches, incorporating unpaired data allows SAT to produce better control of false discovery and to improve the power of the association test. We study the properties of the new test theoretically and empirically, through a series of simulations and by applying our method on real studies in the context of Chronic Obstructive Pulmonary Disease. We are able to identify an association between the high-dimensional characterization of Computed Tomography chest images and several blood biomarkers as well as the expression of dozens of genes involved in the immune system. AVAILABILITY AND IMPLEMENTATION: Code is available on https://github.com/batmanlab/Semi-paired-Association-Test. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Registros Electrónicos de Salud , Proyectos de Investigación , Humanos , Estudios Retrospectivos , Tamaño de la Muestra
4.
Phys Rev Lett ; 128(8): 080506, 2022 Feb 25.
Artículo en Inglés | MEDLINE | ID: mdl-35275658

RESUMEN

The superiority of variational quantum algorithms (VQAs) such as quantum neural networks (QNNs) and variational quantum eigensolvers (VQEs) heavily depends on the expressivity of the employed Ansätze. Namely, a simple Ansatz is insufficient to capture the optimal solution, while an intricate Ansatz leads to the hardness of trainability. Despite its fundamental importance, an effective strategy of measuring the expressivity of VQAs remains largely unknown. Here, we exploit an advanced tool in statistical learning theory, i.e., covering number, to study the expressivity of VQAs. Particularly, we first exhibit how the expressivity of VQAs with an arbitrary Ansätze is upper bounded by the number of quantum gates and the measurement observable. We next explore the expressivity of VQAs on near-term quantum chips, where the system noise is considered. We observe an exponential decay of the expressivity with increasing circuit depth. We also utilize the achieved expressivity to analyze the generalization of QNNs and the accuracy of VQE. We numerically verify our theory employing VQAs with different levels of expressivity. Our Letter opens the avenue for quantitative understanding of the expressivity of VQAs.


Asunto(s)
Algoritmos , Redes Neurales de la Computación
5.
Phys Rev Lett ; 128(11): 110501, 2022 Mar 18.
Artículo en Inglés | MEDLINE | ID: mdl-35363009

RESUMEN

The recognition of entanglement states is a notoriously difficult problem when no prior information is available. Here, we propose an efficient quantum adversarial bipartite entanglement detection scheme to address this issue. Our proposal reformulates the bipartite entanglement detection as a two-player zero-sum game completed by parameterized quantum circuits, where a two-outcome measurement can be used to query a classical binary result about whether the input state is bipartite entangled or not. In principle, for an N-qubit quantum state, the runtime complexity of our proposal is O(poly(N)T) with T being the number of iterations. We experimentally implement our protocol on a linear optical network and exhibit its effectiveness to accomplish the bipartite entanglement detection for 5-qubit quantum pure states and 2-qubit quantum mixed states. Our work paves the way for using near-term quantum machines to tackle entanglement detection on multipartite entangled quantum systems.

6.
Neuroimage ; 244: 118586, 2021 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-34563678

RESUMEN

Mild cognitive impairment (MCI) conversion prediction, i.e., identifying MCI patients of high risks converting to Alzheimer's disease (AD), is essential for preventing or slowing the progression of AD. Although previous studies have shown that the fusion of multi-modal data can effectively improve the prediction accuracy, their applications are largely restricted by the limited availability or high cost of multi-modal data. Building an effective prediction model using only magnetic resonance imaging (MRI) remains a challenging research topic. In this work, we propose a multi-modal multi-instance distillation scheme, which aims to distill the knowledge learned from multi-modal data to an MRI-based network for MCI conversion prediction. In contrast to existing distillation algorithms, the proposed multi-instance probabilities demonstrate a superior capability of representing the complicated atrophy distributions, and can guide the MRI-based network to better explore the input MRI. To our best knowledge, this is the first study that attempts to improve an MRI-based prediction model by leveraging extra supervision distilled from multi-modal information. Experiments demonstrate the advantage of our framework, suggesting its potentials in the data-limited clinical settings.


Asunto(s)
Enfermedad de Alzheimer/diagnóstico por imagen , Imagen por Resonancia Magnética/métodos , Anciano , Anciano de 80 o más Años , Algoritmos , Atrofia , Encéfalo/patología , Disfunción Cognitiva/diagnóstico por imagen , Femenino , Humanos , Conocimiento , Aprendizaje , Masculino , Persona de Mediana Edad , Probabilidad
7.
Neural Comput ; 33(8): 2163-2192, 2021 07 26.
Artículo en Inglés | MEDLINE | ID: mdl-34310675

RESUMEN

Deep learning is often criticized by two serious issues that rarely exist in natural nervous systems: overfitting and catastrophic forgetting. It can even memorize randomly labeled data, which has little knowledge behind the instance-label pairs. When a deep network continually learns over time by accommodating new tasks, it usually quickly overwrites the knowledge learned from previous tasks. Referred to as the neural variability, it is well known in neuroscience that human brain reactions exhibit substantial variability even in response to the same stimulus. This mechanism balances accuracy and plasticity/flexibility in the motor learning of natural nervous systems. Thus, it motivates us to design a similar mechanism, named artificial neural variability (ANV), that helps artificial neural networks learn some advantages from "natural" neural networks. We rigorously prove that ANV plays as an implicit regularizer of the mutual information between the training data and the learned model. This result theoretically guarantees ANV a strictly improved generalizability, robustness to label noise, and robustness to catastrophic forgetting. We then devise a neural variable risk minimization (NVRM) framework and neural variable optimizers to achieve ANV for conventional network architectures in practice. The empirical studies demonstrate that NVRM can effectively relieve overfitting, label noise memorization, and catastrophic forgetting at negligible costs.


Asunto(s)
Aprendizaje Profundo , Encéfalo , Humanos , Redes Neurales de la Computación
8.
Pharmacol Res ; 156: 104797, 2020 06.
Artículo en Inglés | MEDLINE | ID: mdl-32278044

RESUMEN

Chronic pain is highly prevalent and poorly controlled, of which the accurate underlying mechanisms need be further elucidated. Herbal drugs have been widely used for controlling various pain disorders. The systematic integration of pain herbal data resources might be promising to help investigate the molecular mechanisms of pain phenotypes. Here, we integrated large-scale bibliographic literatures and well-established data sources to obtain high-quality pain relevant herbal data (i.e. 426 pain related herbs with their targets). We used machine learning method to identify three distinct herb categories with their specific indications of symptoms, targets and enriched pathways, which were characterized by the efficacy of treatment to the chronic cough related neuropathic pain, the reproduction and autoimmune related pain, and the cancer pain, respectively. We further detected the novel pathophysiological mechanisms of the pain subtypes by network medicine approach to evaluate the interactions between herb targets and the pain disease modules. This work increased the understanding of the underlying molecular mechanisms of pain subtypes that herbal drugs are participating and with the ultimate aim of developing novel personalized drugs for pain disorders.


Asunto(s)
Analgésicos/uso terapéutico , Dolor Crónico/tratamiento farmacológico , Aprendizaje Automático , Umbral del Dolor/efectos de los fármacos , Preparaciones de Plantas/uso terapéutico , Biología de Sistemas , Integración de Sistemas , Analgésicos/química , Analgésicos/clasificación , Animales , Dolor Crónico/metabolismo , Dolor Crónico/fisiopatología , Bases de Datos Factuales , Humanos , Estructura Molecular , Terapia Molecular Dirigida , Farmacopeas como Asunto , Preparaciones de Plantas/química , Preparaciones de Plantas/clasificación , Mapas de Interacción de Proteínas , Transducción de Señal , Relación Estructura-Actividad
9.
BMC Bioinformatics ; 20(Suppl 19): 660, 2019 Dec 24.
Artículo en Inglés | MEDLINE | ID: mdl-31870278

RESUMEN

BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) is a transformative technology, allowing global transcriptomes of individual cells to be profiled with high accuracy. An essential task in scRNA-seq data analysis is the identification of cell types from complex samples or tissues profiled in an experiment. To this end, clustering has become a key computational technique for grouping cells based on their transcriptome profiles, enabling subsequent cell type identification from each cluster of cells. Due to the high feature-dimensionality of the transcriptome (i.e. the large number of measured genes in each cell) and because only a small fraction of genes are cell type-specific and therefore informative for generating cell type-specific clusters, clustering directly on the original feature/gene dimension may lead to uninformative clusters and hinder correct cell type identification. RESULTS: Here, we propose an autoencoder-based cluster ensemble framework in which we first take random subspace projections from the data, then compress each random projection to a low-dimensional space using an autoencoder artificial neural network, and finally apply ensemble clustering across all encoded datasets to generate clusters of cells. We employ four evaluation metrics to benchmark clustering performance and our experiments demonstrate that the proposed autoencoder-based cluster ensemble can lead to substantially improved cell type-specific clusters when applied with both the standard k-means clustering algorithm and a state-of-the-art kernel-based clustering algorithm (SIMLR) designed specifically for scRNA-seq data. Compared to directly using these clustering algorithms on the original datasets, the performance improvement in some cases is up to 100%, depending on the evaluation metric used. CONCLUSIONS: Our results suggest that the proposed framework can facilitate more accurate cell type identification as well as other downstream analyses. The code for creating the proposed autoencoder-based cluster ensemble framework is freely available from https://github.com/gedcom/scCCESS.


Asunto(s)
Análisis de Secuencia de ARN , Algoritmos , Análisis por Conglomerados , Análisis de Datos , Humanos , Redes Neurales de la Computación , RNA-Seq , Análisis de la Célula Individual , Transcriptoma
10.
Bioinformatics ; 34(18): 3069-3077, 2018 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-29672669

RESUMEN

Motivation: CRISPR/Cas9 system is a widely used genome editing tool. A prediction problem of great interests for this system is: how to select optimal single-guide RNAs (sgRNAs), such that its cleavage efficiency is high meanwhile the off-target effect is low. Results: This work proposed a two-step averaging method (TSAM) for the regression of cleavage efficiencies of a set of sgRNAs by averaging the predicted efficiency scores of a boosting algorithm and those by a support vector machine (SVM). We also proposed to use profiled Markov properties as novel features to capture the global characteristics of sgRNAs. These new features are combined with the outstanding features ranked by the boosting algorithm for the training of the SVM regressor. TSAM improved the mean Spearman correlation coefficiencies comparing with the state-of-the-art performance on benchmark datasets containing thousands of human, mouse and zebrafish sgRNAs. Our method can be also converted to make binary distinctions between efficient and inefficient sgRNAs with superior performance to the existing methods. The analysis reveals that highly efficient sgRNAs have lower melting temperature at the middle of the spacer, cut at 5'-end closer parts of the genome and contain more 'A' but less 'G' comparing with inefficient ones. Comprehensive further analysis also demonstrates that our tool can predict an sgRNA's cutting efficiency with consistently good performance no matter it is expressed from an U6 promoter in cells or from a T7 promoter in vitro. Availability and implementation: Online tool is available at http://www.aai-bioinfo.com/CRISPR/. Python and Matlab source codes are freely available at https://github.com/penn-hui/TSAM. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Animales , Sistemas CRISPR-Cas , Humanos , Ratones , ARN Guía de Kinetoplastida/genética , Análisis de Regresión , Programas Informáticos , Máquina de Vectores de Soporte , Pez Cebra/genética
11.
Neuroimage ; 166: 1-9, 2018 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-29080713

RESUMEN

Sulcal morphology has been reported to change with age-related neurological diseases, but the trajectories of sulcal change in normal ageing in the elderly is still unclear. We conducted a study of sulcal morphological changes over seven years in 132 normal elderly participants aged 70-90 years at baseline, and who remained cognitively normal for the next seven years. We examined the fold opening and sulcal depth of sixteen (eight on each hemisphere) prominent sulci based on T1-weighted MRI using automated methods with visual quality control. The trajectory of each individual sulcus with respect to age was examined separately by linear mixed models. Fold opening was best modelled by cubic fits in five sulci, by quadratic models in six sulci and by linear models in five sulci, indicating an accelerated widening of a number of sulci in older age. Sulcal depth showed significant linear decline in three sulci and quadratic trend in one sulcus. Turning points of non-linear trajectories towards accelerated widening of the fold were found to be around the age between 75 and 80, indicating an accelerated atrophy of brain cortex starting in the age of late 70s. Our findings of cortical sulcal changes in normal ageing could provide a reference for studies of neurocognitive disorders, including neurodegenerative diseases, in the elderly.


Asunto(s)
Envejecimiento/patología , Corteza Cerebral/patología , Factores de Edad , Anciano , Anciano de 80 o más Años , Atrofia/patología , Corteza Cerebral/diagnóstico por imagen , Femenino , Humanos , Estudios Longitudinales , Imagen por Resonancia Magnética , Masculino
12.
BMC Bioinformatics ; 18(1): 193, 2017 Mar 24.
Artículo en Inglés | MEDLINE | ID: mdl-28340554

RESUMEN

BACKGROUND: MicroRNAs always function cooperatively in their regulation of gene expression. Dysfunctions of these co-functional microRNAs can play significant roles in disease development. We are interested in those multi-disease associated co-functional microRNAs that regulate their common dysfunctional target genes cooperatively in the development of multiple diseases. The research is potentially useful for human disease studies at the transcriptional level and for the study of multi-purpose microRNA therapeutics. METHODS AND RESULTS: We designed a computational method to detect multi-disease associated co-functional microRNA pairs and conducted cross disease analysis on a reconstructed disease-gene-microRNA (DGR) tripartite network. The construction of the DGR tripartite network is by the integration of newly predicted disease-microRNA associations with those relationships of diseases, microRNAs and genes maintained by existing databases. The prediction method uses a set of reliable negative samples of disease-microRNA association and a pre-computed kernel matrix instead of kernel functions. From this reconstructed DGR tripartite network, multi-disease associated co-functional microRNA pairs are detected together with their common dysfunctional target genes and ranked by a novel scoring method. We also conducted proof-of-concept case studies on cancer-related co-functional microRNA pairs as well as on non-cancer disease-related microRNA pairs. CONCLUSIONS: With the prioritization of the co-functional microRNAs that relate to a series of diseases, we found that the co-function phenomenon is not unusual. We also confirmed that the regulation of the microRNAs for the development of cancers is more complex and have more unique properties than those of non-cancer diseases.


Asunto(s)
Biología Computacional/métodos , MicroARNs/genética , Humanos
13.
Neural Comput ; 29(1): 247-262, 2017 01.
Artículo en Inglés | MEDLINE | ID: mdl-27764596

RESUMEN

The techniques of random matrices have played an important role in many machine learning models. In this letter, we present a new method to study the tail inequalities for sums of random matrices. Different from other work (Ahlswede & Winter, 2002 ; Tropp, 2012 ; Hsu, Kakade, & Zhang, 2012 ), our tail results are based on the largest singular value (LSV) and independent of the matrix dimension. Since the LSV operation and the expectation are noncommutative, we introduce a diagonalization method to convert the LSV operation into the trace operation of an infinitely dimensional diagonal matrix. In this way, we obtain another version of Laplace-transform bounds and then achieve the LSV-based tail inequalities for sums of random matrices.

14.
Neural Comput ; 28(10): 2213-49, 2016 10.
Artículo en Inglés | MEDLINE | ID: mdl-27391679

RESUMEN

The k-dimensional coding schemes refer to a collection of methods that attempt to represent data using a set of representative k-dimensional vectors and include nonnegative matrix factorization, dictionary learning, sparse coding, k-means clustering, and vector quantization as special cases. Previous generalization bounds for the reconstruction error of the k-dimensional coding schemes are mainly dimensionality-independent. A major advantage of these bounds is that they can be used to analyze the generalization error when data are mapped into an infinite- or high-dimensional feature space. However, many applications use finite-dimensional data features. Can we obtain dimensionality-dependent generalization bounds for k-dimensional coding schemes that are tighter than dimensionality-independent bounds when data are in a finite-dimensional feature space? Yes. In this letter, we address this problem and derive a dimensionality-dependent generalization bound for k-dimensional coding schemes by bounding the covering number of the loss function class induced by the reconstruction error. The bound is of order [Formula: see text], where m is the dimension of features, k is the number of the columns in the linear implementation of coding schemes, and n is the size of sample, [Formula: see text] when n is finite and [Formula: see text] when n is infinite. We show that our bound can be tighter than previous results because it avoids inducing the worst-case upper bound on k of the loss function. The proposed generalization bound is also applied to some specific coding schemes to demonstrate that the dimensionality-dependent bound is an indispensable complement to the dimensionality-independent generalization bounds.

15.
Neural Comput ; 28(12): 2757-2789, 2016 12.
Artículo en Inglés | MEDLINE | ID: mdl-27626968

RESUMEN

Linear submodular bandits has been proven to be effective in solving the diversification and feature-based exploration problem in information retrieval systems. Considering there is inevitably a budget constraint in many web-based applications, such as news article recommendations and online advertising, we study the problem of diversification under a budget constraint in a bandit setting. We first introduce a budget constraint to each exploration step of linear submodular bandits as a new problem, which we call per-round knapsack-constrained linear submodular bandits. We then define an [Formula: see text]-approximation unit-cost regret considering that the submodular function maximization is NP-hard. To solve this new problem, we propose two greedy algorithms based on a modified UCB rule. We prove these two algorithms with different regret bounds and computational complexities. Inspired by the lazy evaluation process in submodular function maximization, we also prove that a modified lazy evaluation process can be used to accelerate our algorithms without losing their theoretical guarantee. We conduct a number of experiments, and the experimental results confirm our theoretical analyses.

16.
IEEE Trans Pattern Anal Mach Intell ; 46(2): 1212-1230, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-37922160

RESUMEN

In this paper, we show the surprisingly good properties of plain vision transformers for body pose estimation from various aspects, namely simplicity in model structure, scalability in model size, flexibility in training paradigm, and transferability of knowledge between models, through a simple baseline model dubbed ViTPose. ViTPose employs the plain and non-hierarchical vision transformer as an encoder to encode features and a lightweight decoder to decode body keypoints in either a top-down or a bottom-up manner. It can be scaled to 1B parameters by taking the advantage of the scalable model capacity and high parallelism, setting a new Pareto front for throughput and performance. Besides, ViTPose is very flexible regarding the attention type, input resolution, and pre-training and fine-tuning strategy. Based on the flexibility, a novel ViTPose++ model is proposed to deal with heterogeneous body keypoint categories via knowledge factorization, i.e., adopting task-agnostic and task-specific feed-forward networks in the transformer. We also demonstrate that the knowledge of large ViTPose models can be easily transferred to small ones via a simple knowledge token. Our largest single model ViTPose-G sets a new record on the MS COCO test set without model ensemble. Furthermore, our ViTPose++ model achieves state-of-the-art performance simultaneously on a series of body pose estimation tasks, including MS COCO, AI Challenger, OCHuman, MPII for human keypoint detection, COCO-Wholebody for whole-body keypoint detection, as well as AP-10K and APT-36K for animal keypoint detection, without sacrificing inference speed.

17.
IEEE Trans Cybern ; 54(7): 4138-4149, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38150342

RESUMEN

Commonsense reasoning based on knowledge graphs (KGs) is a challenging task that requires predicting complex questions over the described textual contexts and relevant knowledge about the world. However, current methods typically assume clean training scenarios with accurately labeled samples, which are often unrealistic. The training set can include mislabeled samples, and the robustness to label noises is essential for commonsense reasoning methods to be practical, but this problem remains largely unexplored. This work focuses on commonsense reasoning with mislabeled training samples and makes several technical contributions: 1) we first construct diverse augmentations from knowledge and model, and offer a simple yet effective multiple-choice alignment method to divide the training samples into clean, semi-clean, and unclean parts; 2) we design adaptive label correction methods for the semi-clean and unclean samples to exploit the supervised potential of noisy information; and 3) finally, we extensively test these methods on noisy versions of commonsense reasoning benchmarks (CommonsenseQA and OpenbookQA). Experimental results show that the proposed method can significantly enhance robustness and improve overall performance. Furthermore, the proposed method is generally applicable to multiple existing commonsense reasoning frameworks to boost their robustness. The code is available at https://github.com/xdxuyang/CR_Noisy_Labels.

18.
IEEE Trans Pattern Anal Mach Intell ; 46(5): 3608-3624, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38190690

RESUMEN

Window-based attention has become a popular choice in vision transformers due to its superior performance, lower computational complexity, and less memory footprint. However, the design of hand-crafted windows, which is data-agnostic, constrains the flexibility of transformers to adapt to objects of varying sizes, shapes, and orientations. To address this issue, we propose a novel quadrangle attention (QA) method that extends the window-based attention to a general quadrangle formulation. Our method employs an end-to-end learnable quadrangle regression module that predicts a transformation matrix to transform default windows into target quadrangles for token sampling and attention calculation, enabling the network to model various targets with different shapes and orientations and capture rich context information. We integrate QA into plain and hierarchical vision transformers to create a new architecture named QFormer, which offers minor code modifications and negligible extra computational cost. Extensive experiments on public benchmarks demonstrate that QFormer outperforms existing representative vision transformers on various vision tasks, including classification, object detection, semantic segmentation, and pose estimation. The code will be made publicly available at QFormer.

19.
Artículo en Inglés | MEDLINE | ID: mdl-38393837

RESUMEN

With recent success of deep learning in 2-D visual recognition, deep-learning-based 3-D point cloud analysis has received increasing attention from the community, especially due to the rapid development of autonomous driving technologies. However, most existing methods directly learn point features in the spatial domain, leaving the local structures in the spectral domain poorly investigated. In this article, we introduce a new method, PointWavelet, to explore local graphs in the spectral domain via a learnable graph wavelet transform. Specifically, we first introduce the graph wavelet transform to form multiscale spectral graph convolution to learn effective local structural representations. To avoid the time-consuming spectral decomposition, we then devise a learnable graph wavelet transform, which significantly accelerates the overall training process. Extensive experiments on four popular point cloud datasets, ModelNet40, ScanObjectNN, ShapeNet-Part, and S3DIS, demonstrate the effectiveness of the proposed method on point cloud classification and segmentation.

20.
Artículo en Inglés | MEDLINE | ID: mdl-38324433

RESUMEN

This article studies the generalization of neural networks (NNs) by examining how a network changes when trained on a training sample with or without out-of-distribution (OoD) examples. If the network's predictions are less influenced by fitting OoD examples, then the network learns attentively from the clean training set. A new notion, dataset-distraction stability, is proposed to measure the influence. Extensive CIFAR-10/100 experiments on the different VGG, ResNet, WideResNet, ViT architectures, and optimizers show a negative correlation between the dataset-distraction stability and generalizability. With the distraction stability, we decompose the learning process on the training set S into multiple learning processes on the subsets of S drawn from simpler distributions, i.e., distributions of smaller intrinsic dimensions (IDs), and furthermore, a tighter generalization bound is derived. Through attentive learning, miraculous generalization in deep learning can be explained and novel algorithms can also be designed.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA