RESUMO
As the brain ages, it almost invariably accumulates vascular pathology, which differentially affects the cerebral white matter. A rich body of research has investigated the link between vascular risk factors and the brain. One of the less studied questions is that among various modifiable vascular risk factors, which is the most debilitating one for white matter health? A white matter specific brain age was developed to evaluate the overall white matter health from diffusion weighted imaging, using a three-dimensional convolutional neural network deep learning model in both cross-sectional UK biobank participants (n = 37,327) and a longitudinal subset (n = 1409). White matter brain age gap (WMBAG) was the difference between the white matter age and the chronological age. Participants with one, two, and three or more vascular risk factors, compared to those without any, showed an elevated WMBAG of 0.54, 1.23, and 1.94 years, respectively. Diabetes was most strongly associated with an increased WMBAG (1.39 years, p < 0.001) among all risk factors followed by hypertension (0.87 years, p < 0.001) and smoking (0.69 years, p < 0.001). Baseline WMBAG was associated significantly with processing speed, executive and global cognition. Significant associations of diabetes and hypertension with poor processing speed and executive function were found to be mediated through the WMBAG. White matter specific brain age can be successfully targeted for the examination of the most relevant risk factors and cognition, and for tracking an individual's cerebrovascular ageing process. It also provides clinical basis for the better management of specific risk factors.
RESUMO
Driver mental fatigue leads to thousands of traffic accidents. The increasing quality and availability of low-cost electroencephalogram (EEG) systems offer possibilities for practical fatigue monitoring. However, non-data-driven methods, designed for practical, complex situations, usually rely on handcrafted data statistics of EEG signals. To reduce human involvement, we introduce a data-driven methodology for online mental fatigue detection: self-weight ordinal regression (SWORE). Reaction time (RT), referring to the length of time people take to react to an emergency, is widely considered an objective behavioral measure for mental fatigue state. Since regression methods are sensitive to extreme RTs, we propose an indirect RT estimation based on preferences to explore the relationship between EEG and RT, which generalizes to any scenario when an objective fatigue indicator is available. In particular, SWORE evaluates the noisy EEG signals from multiple channels in terms of two states: shaking state and steady state. Modeling the shaking state can discriminate the reliable channels from the uninformative ones, while modeling the steady state can suppress the task-nonrelevant fluctuation within each channel. In addition, an online generalized Bayesian moment matching (online GBMM) algorithm is proposed to online-calibrate SWORE efficiently per participant. Experimental results with 40 participants show that SWORE can maximally achieve consistent with RT, demonstrating the feasibility and adaptability of our proposed framework in practical mental fatigue estimation.
Assuntos
Eletroencefalografia , Fadiga Mental , Teorema de Bayes , Encéfalo , Humanos , Tempo de ReaçãoRESUMO
Zero-shot learning (ZSL) aims to recognize unseen objects (test classes) given some other seen objects (training classes) by sharing information of attributes between different objects. Attributes are artificially annotated for objects and treated equally in recent ZSL tasks. However, some inferior attributes with poor predictability or poor discriminability may have negative impacts on the ZSL system performance. This letter first derives a generalization error bound for ZSL tasks. Our theoretical analysis verifies that selecting the subset of key attributes can improve the generalization performance of the original ZSL model, which uses all the attributes. Unfortunately, previous attribute selection methods have been conducted based on the seen data, and their selected attributes have poor generalization capability to the unseen data, which is unavailable in the training stage of ZSL tasks. Inspired by learning from pseudo-relevance feedback, this letter introduces out-of-the-box data-pseudo-data generated by an attribute-guided generative model-to mimic the unseen data. We then present an iterative attribute selection (IAS) strategy that iteratively selects key attributes based on the out-of-the-box data. Since the distribution of the generated out-of-the-box data is similar to that of the test data, the key attributes selected by IAS can be effectively generalized to test data. Extensive experiments demonstrate that IAS can significantly improve existing attribute-based ZSL methods and achieve state-of-the-art performance.
Assuntos
Algoritmos , Análise de Dados , Aprendizado de Máquina , Reconhecimento Automatizado de Padrão , Capacitação de Usuário de Computador , Humanos , Reconhecimento Automatizado de Padrão/métodosRESUMO
Multiview alignment, achieving one-to-one correspondence of multiview inputs, is critical in many real-world multiview applications, especially for cross-view data analysis problems. An increasing amount of work has studied this alignment problem with canonical correlation analysis (CCA). However, existing CCA models are prone to misalign the multiple views due to either the neglect of uncertainty or the inconsistent encoding of the multiple views. To tackle these two issues, this letter studies multiview alignment from a Bayesian perspective. Delving into the impairments of inconsistent encodings, we propose to recover correspondence of the multiview inputs by matching the marginalization of the joint distribution of multiview random variables under different forms of factorization. To realize our design, we present adversarial CCA (ACCA), which achieves consistent latent encodings by matching the marginalized latent encodings through the adversarial training paradigm. Our analysis, based on conditional mutual information, reveals that ACCA is flexible for handling implicit distributions. Extensive experiments on correlation analysis and cross-view generation under noisy input settings demonstrate the superiority of our model.
RESUMO
A driver's cognitive state of mental fatigue significantly affects his or her driving performance and more important, public safety. Previous studies have leveraged reaction time (RT) as the metric for mental fatigue and aim at estimating the exact value of RT using electroencephalogram (EEG) signals within a regression model. However, due to the easily corrupted and also nonsmooth properties of RTs during data collection, methods focusing on predicting the exact value of a noisy measurement, RT generally suffer from poor generalization performance. Considering that human RT is the reflection of brain dynamics preference (BDP) rather than a single regression output of EEG signals, we propose a novel channel-reliability-aware ranking (CArank) model for the multichannel ranking problem. CArank learns from BDPs using EEG data robustly and aims at preserving the ordering corresponding to RTs. In particular, we introduce a transition matrix to characterize the reliability of each channel used in the EEG data, which helps in learning with BDPs only from informative EEG channels. To handle large-scale EEG signals, we propose a stochastic-generalized expectation maximum (SGEM) algorithm to update CArank in an online fashion. Comprehensive empirical analysis on EEG signals from 40 participants shows that our CArank achieves substantial improvements in reliability while simultaneously detecting noisy or less informative EEG channels.
Assuntos
Algoritmos , Encéfalo/fisiopatologia , Eletroencefalografia/métodos , Fadiga Mental/fisiopatologia , Processamento de Sinais Assistido por Computador , Adulto , Feminino , Humanos , Masculino , Fadiga Mental/diagnóstico , Tempo de Reação/fisiologiaRESUMO
Recent graph-based models for multi-intent SLU have obtained promising results through modeling the guidance from the prediction of intents to the decoding of slot filling. However, existing methods (1) only model the unidirectional guidance from intent to slot, while there are bidirectional inter-correlations between intent and slot; (2) adopt homogeneous graphs to model the interactions between the slot semantics nodes and intent label nodes, which limit the performance. In this paper, we propose a novel model termed Co-guiding Net, which implements a two-stage framework achieving the mutual guidances between the two tasks. In the first stage, the initial estimated labels of both tasks are produced, and then they are leveraged in the second stage to model the mutual guidances. Specifically, we propose two heterogeneous graph attention networks working on the proposed two heterogeneous semantics-label graphs, which effectively represent the relations among the semantics nodes and label nodes. Besides, we further propose Co-guiding-SCL Net, which exploits the single-task and dual-task semantics contrastive relations. For the first stage, we propose single-task supervised contrastive learning, and for the second stage, we propose co-guiding supervised contrastive learning, which considers the two tasks' mutual guidances in the contrastive learning procedure. Experiment results on multi-intent SLU show that our model outperforms existing models by a large margin, obtaining a relative improvement of 21.3% over the previous best model on MixATIS dataset in overall accuracy. We also evaluate our model on the zero-shot cross-lingual scenario and the results show that our model can relatively improve the state-of-the-art model by 33.5% on average in terms of overall accuracy for the total 9 languages.
RESUMO
State-of-the-art model for zero-shot cross-lingual spoken language understanding performs cross-lingual unsupervised contrastive learning to achieve the label-agnostic semantic alignment between each utterance and its code-switched data. However, it ignores the precious intent/slot labels, whose label information is promising to help capture the label-aware semantics structure and then leverage supervised contrastive learning to improve both source and target languages' semantics. In this paper, we propose Hybrid and Cooperative Contrastive Learning to address this problem. Apart from cross-lingual unsupervised contrastive learning, we design a holistic approach that exploits source language supervised contrastive learning, cross-lingual supervised contrastive learning and multilingual supervised contrastive learning to perform label-aware semantics alignments in a comprehensive manner. Each kind of supervised contrastive learning mechanism includes both single-task and joint-task scenarios. In our model, one contrastive learning mechanism's input is enhanced by others. Thus the total four contrastive learning mechanisms are cooperative to learn more consistent and discriminative representations in the virtuous cycle during the training process. Experiments show that our model obtains consistent improvements over 9 languages, achieving new state-of-the-art performance.
RESUMO
Inspired by the impressive success of contrastive learning (CL), a variety of graph augmentation strategies have been employed to learn node representations in a self-supervised manner. Existing methods construct the contrastive samples by adding perturbations to the graph structure or node attributes. Although impressive results are achieved, it is rather blind to the wealth of prior information assumed: with the increase of the perturbation degree applied on the original graph: 1) the similarity between the original graph and the generated augmented graph gradually decreases and 2) the discrimination between all nodes within each augmented view gradually increases. In this article, we argue that both such prior information can be incorporated (differently) into the CL paradigm following our general ranking framework. In particular, we first interpret CL as a special case of learning to rank (L2R), which inspires us to leverage the ranking order among positive augmented views. Meanwhile, we introduce a self-ranking paradigm to ensure that the discriminative information among different nodes can be maintained and also be less altered to the perturbations of different degrees. Experiment results on various benchmark datasets verify the effectiveness of our algorithm compared with the supervised and unsupervised models.
RESUMO
BACKGROUND: Gray matter (GM) and white matter (WM) impairments are both associated with raised blood pressure (BP), although whether elevated BP is differentially associated with the GM and WM aging process remains inadequately examined. METHODS: We included 37â 327 participants with diffusion-weighted imaging (DWI) and 39â 630 participants with T1-weighted scans from UK Biobank. BP was classified into 4 categories: normal BP, high-normal BP, grade 1, and grade 2 hypertension. Brain age gaps (BAGs) for GM (BAGGM) and WM (BAGWM) were derived from diffusion-weighted imaging and T1 scans separately using 3-dimensional-convolutional neural network deep learning techniques. RESULTS: There was an increase in both BAGGM and BAGWM with raised BP (P<0.05). BAGWM was significantly larger than BAGGM at high-normal BP (0.195 years older; P=0.006), grade 1 hypertension (0.174 years older; P=0.004), and grade 2 hypertension (0.510 years older; P<0.001), but not for normal BP. Mediation analysis revealed that the association between hypertension and cognitive decline was primarily mediated by WM impairment. Mendelian randomization analysis suggested a causal relationship between hypertension and WM aging acceleration (unstandardized B, 1.780; P=0.016) but not for GM (P>0.05). Sliding-window analysis indicated the association between hypertension and brain aging acceleration was moderated by chronological age, showing stronger correlations in midlife but weaker associations in the older age. CONCLUSIONS: Compared with GM, WM was more vulnerable to raised BP. Our study provided compelling evidence that concerted efforts should be directed towards WM damage in individuals with hypertension in clinical practice.
Assuntos
Hipertensão , Substância Branca , Humanos , Idoso , Substância Branca/diagnóstico por imagem , Estudos de Coortes , Pressão Sanguínea , Biobanco do Reino Unido , Bancos de Espécimes Biológicos , Imageamento por Ressonância Magnética/métodos , Encéfalo/diagnóstico por imagem , Envelhecimento , Hipertensão/epidemiologiaRESUMO
Machine teaching is an inverse problem of machine learning that aims at steering the student toward its target hypothesis, in which the teacher has already known the student's learning parameters. Previous studies on machine teaching focused on balancing the teaching risk and cost to find the best teaching examples deriving from the student model. This optimization solver is in general ineffective when the student does not disclose any cue of the learning parameters. To supervise such a teaching scenario, this article presents a distribution matching-based machine teaching strategy via iteratively shrinking the teaching cost in a smooth surrogate, which eliminates boundary perturbations from the version space. Technically, our strategy could be redefined as a cost-controlled optimization process that finds the optimal teaching examples without further exploring the parameter distribution of the student. Then, given any limited teaching cost, the training examples would have a closed-form expression. Theoretical analysis and experiment results demonstrate the effectiveness of this strategy.
RESUMO
Dual-task dialog language understanding aims to tackle two correlative dialog language understanding tasks simultaneously via leveraging their inherent correlations. In this paper, we put forward a new framework, whose core is relational temporal graph reasoning. We propose a speaker-aware temporal graph (SATG) and a dual-task relational temporal graph (DRTG) to facilitate relational temporal modeling in dialog understanding and dual-task reasoning. Besides, different from previous works that only achieve implicit semantics-level interactions, we propose to model the explicit dependencies via integrating prediction-level interactions. To implement our framework, we first propose a novel model Dual-tAsk temporal Relational rEcurrent Reasoning network (DARER), which first generates the context-, speaker- and temporal-sensitive utterance representations through relational temporal modeling of SATG, then conducts recurrent dual-task relational temporal graph reasoning on DRTG, in which process the estimated label distributions act as key clues in prediction-level interactions. And the relational temporal modeling in DARER is achieved by relational graph convolutional networks (RGCNs). Then we further propose Relational Temporal Transformer (ReTeFormer), which achieves fine-grained relational temporal modeling via Relation- and Structure-aware Disentangled Multi-head Attention. Accordingly, we propose DARER with ReTeFormer (DARER 2), which adopts two variants of ReTeFormer to achieve the relational temporal modeling of SATG and DTRG, respectively. The extensive experiments on different scenarios verify that our models outperform state-of-the-art models by a large margin. Remarkably, on the dialog sentiment classification task in the Mastodon dataset, DARER and DARER 2 gain relative improvements of about 28% and 34% over the previous best model in terms of F1.
RESUMO
Minimizing prediction uncertainty on unlabeled data is a key factor to achieve good performance in semi-supervised learning (SSL). The prediction uncertainty is typically expressed as the entropy computed by the transformed probabilities in output space. Most existing works distill low-entropy prediction by either accepting the determining class (with the largest probability) as the true label or suppressing subtle predictions (with the smaller probabilities). Unarguably, these distillation strategies are usually heuristic and less informative for model training. From this discernment, this article proposes a dual mechanism, named adaptive sharpening (ADS), which first applies a soft-threshold to adaptively mask out determinate and negligible predictions, and then seamlessly sharpens the informed predictions, distilling certain predictions with the informed ones only. More importantly, we theoretically analyze the traits of ADS by comparing it with various distillation strategies. Numerous experiments verify that ADS significantly improves state-of-the-art SSL methods by making it a plug-in. Our proposed ADS forges a cornerstone for future distillation-based SSL research.
RESUMO
Deep learning on large-scale data is currently dominant nowadays. The unprecedented scale of data has been arguably one of the most important driving forces behind its success. However, there still exist scenarios where collecting data or labels could be extremely expensive, e.g., medical imaging and robotics. To fill up this gap, this paper considers the problem of data-efficient learning from scratch using a small amount of representative data. First, we characterize this problem by active learning on homeomorphic tubes of spherical manifolds. This naturally generates feasible hypothesis class. With homologous topological properties, we identify an important connection - finding tube manifolds is equivalent to minimizing hyperspherical energy (MHE) in physical geometry. Inspired by this connection, we propose a MHE-based active learning (MHEAL) algorithm, and provide comprehensive theoretical guarantees for MHEAL, covering convergence and generalization analysis. Finally, we demonstrate the empirical performance of MHEAL in a wide range of applications for data-efficient learning, including deep clustering, distribution matching, version space sampling, and deep active learning.
RESUMO
Deep models have achieved state-of-the-art performance on a broad range of visual recognition tasks. Nevertheless, the generalization ability of deep models is seriously affected by noisy labels. Though deep learning packages have different losses, this is not transparent for users to choose consistent losses. This paper addresses the problem of how to use abundant loss functions designed for the traditional classification problem in the presence of label noise. We present a dynamic label learning (DLL) algorithm for noisy label learning and then prove that any surrogate loss function can be used for classification with noisy labels by using our proposed algorithm, with a consistency guarantee that the label noise does not ultimately hinder the search for the optimal classifier of the noise-free sample. In addition, we provide a depth theoretical analysis of our algorithm to verify the justifies' correctness and explain the powerful robustness. Finally, experimental results on synthetic and real datasets confirm the efficiency of our algorithm and the correctness of our justifies and show that our proposed algorithm significantly outperforms or is comparable to current state-of-the-art counterparts.
RESUMO
The goal of domain adaptation (DA) is to train a good model for a target domain, with a large amount of labeled data in a source domain but only limited labeled data in the target domain. Conventional closed set domain adaptation (CSDA) assumes source and target label spaces are the same. However, this is not quite practical in real-world applications. In this work, we study the problem of open set domain adaptation (OSDA), which only requires the target label space to partially overlap with the source label space. Consequently, the solution to OSDA requires unknown classes detection and separation, which is normally achieved by introducing a threshold for the prediction of target unknown classes; however, the performance can be quite sensitive to that threshold. In this article, we tackle the above issues by proposing a novel OSDA method to perform soft rejection of unknown target classes and simultaneously match the source and target domains. Extensive experiments on three standard datasets validate the effectiveness of the proposed method over the state-of-the-art competitors.
RESUMO
Learning with noisy labels has become imperative in the Big Data era, which saves expensive human labors on accurate annotations. Previous noise-transition-based methods have achieved theoretically-grounded performance under the Class-Conditional Noise model (CCN). However, these approaches builds upon an ideal but impractical anchor set available to pre-estimate the noise transition. Even though subsequent works adapt the estimation as a neural layer, the ill-posed stochastic learning of its parameters in back-propagation easily falls into undesired local minimums. We solve this problem by introducing a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework. By projecting the noise transition into the Dirichlet space, the learning is constrained on a simplex characterized by the complete dataset, instead of some ad-hoc parametric space wrapped by the neural layer. We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels to train the classifier and to model the noise. Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples. We further generalize LCCN to different counterparts compatible with open-set noisy labels, semi-supervised learning as well as cross-model training. A range of experiments demonstrate the advantages of LCCN and its variants over the current state-of-the-art methods. The code is available at here.
Assuntos
Algoritmos , Big Data , Humanos , Teorema de Bayes , Aprendizado de Máquina SupervisionadoRESUMO
Time series analysis is essential to many far-reaching applications of data science and statistics including economic and financial forecasting, surveillance, and automated business processing. Though being greatly successful of Transformer in computer vision and natural language processing, the potential of employing it as the general backbone in analyzing the ubiquitous times series data has not been fully released yet. Prior Transformer variants on time series highly rely on task-dependent designs and pre-assumed "pattern biases", revealing its insufficiency in representing nuanced seasonal, cyclic, and outlier patterns which are highly prevalent in time series. As a consequence, they can not generalize well to different time series analysis tasks. To tackle the challenges, we propose DifFormer, an effective and efficient Transformer architecture that can serve as a workhorse for a variety of time-series analysis tasks. DifFormer incorporates a novel multi-resolutional differencing mechanism, which is able to progressively and adaptively make nuanced yet meaningful changes prominent, meanwhile, the periodic or cyclic patterns can be dynamically captured with flexible lagging and dynamic ranging operations. Extensive experiments demonstrate DifFormer significantly outperforms state-of-the-art models on three essential time-series analysis tasks, including classification, regression, and forecasting. In addition to its superior performances, DifFormer also excels in efficiency - a linear time/memory complexity with empirically lower time consumption.
RESUMO
Many real-world problems deal with collections of data with missing values, e.g., RNA sequential analytics, image completion, video processing, etc. Usually, such missing data is a serious impediment to a good learning achievement. Existing methods tend to use a universal model for all incomplete data, resulting in a suboptimal model for each missingness pattern. In this paper, we present a general model for learning with incomplete data. The proposed model can be appropriately adjusted with different missingness patterns, alleviating competitions between data. Our model is based on observable features only, so it does not incur errors from data imputation. We further introduce a low-rank constraint to promote the generalization ability of our model. Analysis of the generalization error justifies our idea theoretically. In additional, a subgradient method is proposed to optimize our model with a proven convergence rate. Experiments on different types of data show that our method compares favorably with typical imputation strategies and other state-of-the-art models for incomplete data. More importantly, our method can be seamlessly incorporated into the neural networks with the best results achieved. The source code is released at https://github.com/YS-GONG/missingness-patterns.
RESUMO
Many machine learning applications encounter situations where model providers are required to further refine the previously trained model so as to gratify the specific need of local users. This problem is reduced to the standard model tuning paradigm if the target data is permissibly fed to the model. However, it is rather difficult in a wide range of practical cases where target data is not shared with model providers but commonly some evaluations about the model are accessible. In this paper, we formally set up a challenge named Earning eXtra PerformancE from restriCTive feEDdbacks (EXPECTED) to describe this form of model tuning problems. Concretely, EXPECTED admits a model provider to access the operational performance of the candidate model multiple times via feedback from a local user (or a group of users). The goal of the model provider is to eventually deliver a satisfactory model to the local user(s) by utilizing the feedbacks. Unlike existing model tuning methods where the target data is always ready for calculating model gradients, the model providers in EXPECTED only see some feedbacks which could be as simple as scalars, such as inference accuracy or usage rate. To enable tuning in this restrictive circumstance, we propose to characterize the geometry of the model performance with regard to model parameters through exploring the parameters' distribution. In particular, for deep models whose parameters distribute across multiple layers, a more query-efficient algorithm is further tailor-designed that conducts layerwise tuning with more attention to those layers which pay off better. Our theoretical analyses justify the proposed algorithms from the aspects of both efficacy and efficiency. Extensive experiments on different applications demonstrate that our work forges a sound solution to the EXPECTED problem, which establishes the foundation for future studies towards this direction.
RESUMO
Existing deep learning-based shadow removal methods still produce images with shadow remnants. These shadow remnants typically exist in homogeneous regions with low-intensity values, making them untraceable in the existing image-to-image mapping paradigm. We observe that shadows mainly degrade images at the image-structure level (in which humans perceive object shapes and continuous colors). Hence, in this paper, we propose to remove shadows at the image structure level. Based on this idea, we propose a novel structure-informed shadow removal network (StructNet) to leverage the image-structure information to address the shadow remnant problem. Specifically, StructNet first reconstructs the structure information of the input image without shadows and then uses the restored shadow-free structure prior to guiding the image-level shadow removal. StructNet contains two main novel modules: 1) a mask-guided shadow-free extraction (MSFE) module to extract image structural features in a non-shadow-to-shadow directional manner; and 2) a multi-scale feature & residual aggregation (MFRA) module to leverage the shadow-free structure information to regularize feature consistency. In addition, we also propose to extend StructNet to exploit multi-level structure information (MStructNet), to further boost the shadow removal performance with minimum computational overheads. Extensive experiments on three shadow removal benchmarks demonstrate that our method outperforms existing shadow removal methods, and our StructNet can be integrated with existing methods to improve them further.