RESUMO
Correlation is not causation: this simple and uncontroversial statement has far-reaching implications. Defining and applying causality in biomedical research has posed significant challenges to the scientific community. In this perspective, we attempt to connect the partly disparate fields of systems biology, causal reasoning, and machine learning to inform future approaches in the field of systems biology and molecular medicine.
Assuntos
Causalidade , Aprendizado de Máquina , Biologia de Sistemas , Humanos , Pesquisa Biomédica , Modelos BiológicosRESUMO
Modern practice for training classification deepnets involves a terminal phase of training (TPT), which begins at the epoch where training error first vanishes. During TPT, the training error stays effectively zero, while training loss is pushed toward zero. Direct measurements of TPT, for three prototypical deepnet architectures and across seven canonical classification datasets, expose a pervasive inductive bias we call neural collapse (NC), involving four deeply interconnected phenomena. (NC1) Cross-example within-class variability of last-layer training activations collapses to zero, as the individual activations themselves collapse to their class means. (NC2) The class means collapse to the vertices of a simplex equiangular tight frame (ETF). (NC3) Up to rescaling, the last-layer classifiers collapse to the class means or in other words, to the simplex ETF (i.e., to a self-dual configuration). (NC4) For a given activation, the classifier's decision collapses to simply choosing whichever class has the closest train class mean (i.e., the nearest class center [NCC] decision rule). The symmetric and very simple geometry induced by the TPT confers important benefits, including better generalization performance, better robustness, and better interpretability.
RESUMO
Understanding the inductive biases that allow humans to learn in complex environments has been an important goal of cognitive science. Yet, while we have discovered much about human biases in specific learning domains, much of this research has focused on simple tasks that lack the complexity of the real world. In contrast, video games involving agents and objects embedded in richly structured systems provide an experimentally tractable proxy for real-world complexity. Recent work has suggested that key aspects of human learning in domains like video games can be captured by model-based reinforcement learning (RL) with object-oriented relational models-what we term theory-based RL. Restricting the model class in this way provides an inductive bias that dramatically increases learning efficiency, but in this paper we show that humans employ a stronger set of biases in addition to syntactic constraints on the structure of theories. In particular, we catalog a set of semantic biases that constrain the content of theories. Building these semantic biases into a theory-based RL system produces more human-like learning in video game environments.
Assuntos
Reforço Psicológico , Jogos de Vídeo , Viés , Humanos , Aprendizagem , SemânticaRESUMO
Induction benefits from useful priors. Penalized regression approaches, like ridge regression, shrink weights toward zero but zero association is usually not a sensible prior. Inspired by simple and robust decision heuristics humans use, we constructed non-zero priors for penalized regression models that provide robust and interpretable solutions across several tasks. Our approach enables estimates from a constrained model to serve as a prior for a more general model, yielding a principled way to interpolate between models of differing complexity. We successfully applied this approach to a number of decision and classification problems, as well as analyzing simulated brain imaging data. Models with robust priors had excellent worst-case performance. Solutions followed from the form of the heuristic that was used to derive the prior. These new algorithms can serve applications in data analysis and machine learning, as well as help in understanding how people transition from novice to expert performance.
Assuntos
Algoritmos , Encéfalo , Heurística , HumanosRESUMO
Is technological advancement constrained by biases in human cognition? People in all societies build on discoveries inherited from previous generations, leading to cumulative innovation. However, biases in human learning and memory may influence the process of knowledge transmission, potentially limiting this process. Here, we show that cumulative innovation in a continuous optimization problem is systematically constrained by human biases. In a large (n = 1250) behavioural study using a transmission chain design, participants searched for virtual technologies in one of four environments after inheriting a solution from previous generations. Participants converged on worse solutions in environments misaligned with their biases. These results substantiate a mathematical model of cumulative innovation in Bayesian agents, highlighting formal relationships between cultural evolution and distributed stochastic optimization. Our findings provide experimental evidence that human biases can limit the advancement of knowledge in a controlled laboratory setting, reinforcing concerns about bias in creative, scientific and educational contexts.
Assuntos
Evolução Cultural , Teorema de Bayes , Viés , Criatividade , Humanos , AprendizagemRESUMO
In this article, we study activity recognition in the context of sensor-rich environments. In these environments, many different constraints arise at various levels during the data generation process, such as the intrinsic characteristics of the sensing devices, their energy and computational constraints, and their collective (collaborative) dimension. These constraints have a fundamental impact on the final activity recognition models as the quality of the data, its availability, and its reliability, among other things, are not ensured during model deployment in real-world configurations. Current approaches for activity recognition rely on the activity recognition chain which defines several steps that the sensed data undergo: This is an inductive process that involves exploring a hypothesis space to find a theory able to explain the observations. For activity recognition to be effective and robust, this inductive process must consider the constraints at all levels and model them explicitly. Whether it is a bias related to sensor measurement, transmission protocol, sensor deployment topology, heterogeneity, dynamicity, or stochastic effects, it is essential to understand their substantial impact on the quality of the data and ultimately on activity recognition models. This study highlights the need to exhibit the different types of biases arising in real situations so that machine learning models, e.g., can adapt to the dynamicity of these environments, resist sensor failures, and follow the evolution of the sensors' topology. We propose a metamodeling approach in which these biases are specified as hyperparameters that can control the structure of the activity recognition models. Via these hyperparameters, it becomes easier to optimize the inductive processes, reason about them, and incorporate additional knowledge. It also provides a principled strategy to adapt the models to the evolutions of the environment. We illustrate our approach on the SHL dataset, which features motion sensor data for a set of human activities collected in real conditions. The obtained results make a case for the proposed metamodeling approach; noticeably, the robustness gains achieved when the deployed models are confronted with the evolution of the initial sensing configurations. The trade-offs exhibited and the broader implications of the proposed approach are discussed with alternative techniques to encode and incorporate knowledge into activity recognition models.
Assuntos
Atividades Humanas , Aprendizado de Máquina , Viés , Humanos , Movimento (Física) , Reprodutibilidade dos TestesRESUMO
Residual connections have been proposed as an architecture-based inductive bias to mitigate the problem of exploding and vanishing gradients and increased task performance in both feed-forward and recurrent networks (RNNs) when trained with the backpropagation algorithm. Yet, little is known about how residual connections in RNNs influence their dynamics and fading memory properties. Here, we introduce weakly coupled residual recurrent networks (WCRNNs) in which residual connections result in well-defined Lyapunov exponents and allow for studying properties of fading memory. We investigate how the residual connections of WCRNNs influence their performance, network dynamics, and memory properties on a set of benchmark tasks. We show that several distinct forms of residual connections yield effective inductive biases that result in increased network expressivity. In particular, those are residual connections that (i) result in network dynamics at the proximity of the edge of chaos, (ii) allow networks to capitalize on characteristic spectral properties of the data, and (iii) result in heterogeneous memory properties. In addition, we demonstrate how our results can be extended to non-linear residuals and introduce a weakly coupled residual initialization scheme that can be used for Elman RNNs.
Assuntos
Algoritmos , Redes Neurais de Computação , IdiomaRESUMO
Many diverse phenomena in nature often inherently encode both short- and long-term temporal dependencies, which especially result from the direction of the flow of time. In this respect, we discovered experimental evidence suggesting that interrelations of these events are higher for closer time stamps. However, to be able for attention-based models to learn these regularities in short-term dependencies, it requires large amounts of data, which are often infeasible. This is because, while they are good at learning piece-wise temporal dependencies, attention-based models lack structures that encode biases in time series. As a resolution, we propose a simple and efficient method that enables attention layers to better encode the short-term temporal bias of these data sets by applying learnable, adaptive kernels directly to the attention matrices. We chose various prediction tasks for the experiments using Electronic Health Records (EHR) data sets since they are great examples with underlying long- and short-term temporal dependencies. Our experiments show exceptional classification results compared to best-performing models on most tasks and data sets.
RESUMO
In the domain of medical image segmentation, traditional diffusion probabilistic models are hindered by local inductive biases stemming from convolutional operations, constraining their ability to model long-term dependencies and leading to inaccurate mask generation. Conversely, Transformer offers a remedy by obviating the local inductive biases inherent in convolutional operations, thereby enhancing segmentation precision. Currently, the integration of Transformer and convolution operations mainly occurs in two forms: nesting and stacking. However, both methods address the bias elimination at a relatively large granularity, failing to fully leverage the advantages of both approaches. To address this, this paper proposes a conditional diffusion segmentation model named TransDiffSeg, which combines Transformer with convolution operations from traditional diffusion models in a parallel manner. This approach eliminates the accumulated local inductive bias of convolution operations at a finer granularity within each layer. Additionally, an adaptive feature fusion block is employed to merge conditional semantic features and noise features, enhancing global semantic information and reducing the Transformer's sensitivity to noise features. To validate the impact of granularity in bias elimination on performance and the impact of Transformer in alleviating the accumulated local inductive biases of convolutional operations in diffusion probabilistic models, experiments are conducted on the AMOS22 dataset and BTCV dataset. Experimental results demonstrate that eliminating local inductive bias at a finer granularity significantly improves the segmentation performance of diffusion probabilistic models. Furthermore, the results confirm that the finer the granularity of bias elimination, the better the segmentation performance.
RESUMO
How are new Bayesian hypotheses generated within the framework of predictive processing? This explanatory framework purports to provide a unified, systematic explanation of cognition by appealing to Bayes rule and hierarchical Bayesian machinery alone. Given that the generation of new hypotheses is fundamental to Bayesian inference, the predictive processing framework faces an important challenge in this regard. By examining several cognitive-level and neurobiological architecture-inspired models of hypothesis generation, we argue that there is an essential difference between the two types of models. Cognitive-level models do not specify how they can be implemented in brains and include structures and assumptions that are external to the predictive processing framework. By contrast, neurobiological architecture-inspired models, which aim to better resemble brain processes, fail to explain important capacities of cognition, such as categorization and few-shot learning. The "scaling-up" challenge for proponents of predictive processing is to explain the relationship between these two types of models using only the theoretical and conceptual machinery of Bayesian inference.
Assuntos
Encéfalo , Cognição , Humanos , Teorema de Bayes , AprendizagemRESUMO
The starting-small effect is a cognitive advantage in language acquisition when learners begin by generalizing on regularities from structurally simple and shorter tokens in a skewed input distribution. Our study explored this effect as a potential explanation for the biased learning of opaque and transparent vowel harmony. In opaque vowel harmony, feature agreement occurs strictly between adjacent vowels, and an intervening "neutral vowel" blocks long-distance vowel harmony. Thus, opaque vowel harmony could be acquired even if learners start with structurally simpler and more frequent disyllabic tokens. Alternatively, transparent vowel harmony can only be observed in longer tokens demonstrating long-distance agreement by skipping a neutral vowel. Opaque vowel harmony is predicted to be learned more efficiently due to its compatibility with local dependency acquired via starting-small learning. In two artificial grammar learning experiments, learners were exposed to both vowel harmony patterns embedded in an equal number of disyllabic and trisyllabic tokens or a skewed distribution with twice as many disyllabic tokens. In Exp I, learners' test performance suggests the consistently biased learning of local and opaque vowel harmony with starting-small learning. Furthermore, in Exp II, the acquired vowel harmony patterns varied significantly by working memory capacity with a balanced but not skewed input distribution, presumably because of the ease of cognitive demand with starting-small learning.
RESUMO
Anchoring goals to spatial representations enables flexible navigation but is challenging in novel environments when both representations must be acquired simultaneously. We propose a framework for how Drosophila uses internal representations of head direction (HD) to build goal representations upon selective thermal reinforcement. We show that flies use stochastically generated fixations and directed saccades to express heading preferences in an operant visual learning paradigm and that HD neurons are required to modify these preferences based on reinforcement. We used a symmetric visual setting to expose how flies' HD and goal representations co-evolve and how the reliability of these interacting representations impacts behavior. Finally, we describe how rapid learning of new goal headings may rest on a behavioral policy whose parameters are flexible but whose form is genetically encoded in circuit architecture. Such evolutionarily structured architectures, which enable rapidly adaptive behavior driven by internal representations, may be relevant across species.
Assuntos
Objetivos , Navegação Espacial , Animais , Navegação Espacial/fisiologia , Movimentos Sacádicos/fisiologia , Aprendizagem/fisiologia , Neurônios/fisiologia , Drosophila/fisiologia , Reforço Psicológico , Drosophila melanogaster/fisiologia , Condicionamento Operante/fisiologia , Rede Nervosa/fisiologiaRESUMO
Emotion has been a subject undergoing intensive research in psychology and cognitive neuroscience over several decades. Recently, more and more studies of emotion have adopted automatic rather than manual methods of facial emotion recognition to analyze images or videos of human faces. Compared to manual methods, these computer-vision-based, automatic methods can help objectively and rapidly analyze a large amount of data. These automatic methods have also been validated and believed to be accurate in their judgments. However, these automatic methods often rely on statistical learning models (e.g., deep neural networks), which are intrinsically inductive and thus suffer from problems of induction. Specifically, the models that were trained primarily on Western faces may not generalize well to accurately judge Eastern faces, which can then jeopardize the measurement invariance of emotions in cross-cultural studies. To demonstrate such a possibility, the present study carries out a cross-racial validation of two popular facial emotion recognition systems-FaceReader and DeepFace-using two Western and two Eastern face datasets. Although both systems could achieve overall high accuracies in the judgments of emotion category on the Western datasets, they performed relatively poorly on the Eastern datasets, especially in recognition of negative emotions. While these results caution the use of these automatic methods of emotion recognition on non-Western faces, the results also suggest that the measurements of happiness outputted by these automatic methods are accurate and invariant across races and hence can still be utilized for cross-cultural studies of positive psychology.
RESUMO
Deep learning methods provide state of the art performance for supervised learning based medical image analysis. However it is essential that trained models extract clinically relevant features for downstream tasks as, otherwise, shortcut learning and generalization issues can occur. Furthermore in the medical field, trustability and transparency of current deep learning systems is a much desired property. In this paper we propose an interpretability-guided inductive bias approach enforcing that learned features yield more distinctive and spatially consistent saliency maps for different class labels of trained models, leading to improved model performance. We achieve our objectives by incorporating a class-distinctiveness loss and a spatial-consistency regularization loss term. Experimental results for medical image classification and segmentation tasks show our proposed approach outperforms conventional methods, while yielding saliency maps in higher agreement with clinical experts. Additionally, we show how information from unlabeled images can be used to further boost performance. In summary, the proposed approach is modular, applicable to existing network architectures used for medical imaging applications, and yields improved learning rates, model robustness, and model interpretability.
Assuntos
Aprendizado Profundo , Diagnóstico por Imagem , Humanos , Processamento de Imagem Assistida por Computador/métodosRESUMO
Learning from a limited number of experiences requires suitable inductive biases. To identify how inductive biases are implemented in and shaped by neural codes, we analyze sample-efficient learning of arbitrary stimulus-response maps from arbitrary neural codes with biologically-plausible readouts. We develop an analytical theory that predicts the generalization error of the readout as a function of the number of observed examples. Our theory illustrates in a mathematically precise way how the structure of population codes shapes inductive bias, and how a match between the code and the task is crucial for sample-efficient learning. It elucidates a bias to explain observed data with simple stimulus-response maps. Using recordings from the mouse primary visual cortex, we demonstrate the existence of an efficiency bias towards low-frequency orientation discrimination tasks for grating stimuli and low spatial frequency reconstruction tasks for natural images. We reproduce the discrimination bias in a simple model of primary visual cortex, and further show how invariances in the code to certain stimulus variations alter learning performance. We extend our methods to time-dependent neural codes and predict the sample efficiency of readouts from recurrent networks. We observe that many different codes can support the same inductive bias. By analyzing recordings from the mouse primary visual cortex, we demonstrate that biological codes have lower total activity than other codes with identical bias. Finally, we discuss implications of our theory in the context of recent developments in neuroscience and artificial intelligence. Overall, our study provides a concrete method for elucidating inductive biases of the brain and promotes sample-efficient learning as a general normative coding principle.
Assuntos
Inteligência Artificial , Encéfalo , Animais , Camundongos , ViésRESUMO
How do people decide how general a causal relationship is, in terms of the entities or situations it applies to? What features do people use to decide whether a new situation is governed by a new causal law or an old one? How can people make these difficult judgments in a fast, efficient way? We address these questions in two experiments that ask participants to generalize from one (Experiment 1) or several (Experiment 2) causal interactions between pairs of objects. In each case, participants see an agent object act on a recipient object, causing some changes to the recipient. In line with the human capacity for few-shot concept learning, we find systematic patterns of causal generalizations favoring simpler causal laws that extend over categories of similar objects. In Experiment 1, we find that participants' inferences are shaped by the order of the generalization questions they are asked. In both experiments, we find an asymmetry in the formation of causal categories: participants preferentially identify causal laws with features of the agent objects rather than recipients. To explain this, we develop a computational model that combines program induction (about the hidden causal laws) with non-parametric category inference (about their domains of influence). We demonstrate that our modeling approach can both explain the order effect in Experiment 1 and the causal asymmetry, and outperforms a naïve Bayesian account while providing a computationally plausible mechanism for real-world causal generalization.
RESUMO
Language models have recently emerged as a powerful machine-learning approach for distilling information from massive protein sequence databases. From readily available sequence data alone, these models discover evolutionary, structural, and functional organization across protein space. Using language models, we can encode amino-acid sequences into distributed vector representations that capture their structural and functional properties, as well as evaluate the evolutionary fitness of sequence variants. We discuss recent advances in protein language modeling and their applications to downstream protein property prediction problems. We then consider how these models can be enriched with prior biological knowledge and introduce an approach for encoding protein structural knowledge into the learned representations. The knowledge distilled by these models allows us to improve downstream function prediction through transfer learning. Deep protein language models are revolutionizing protein biology. They suggest new ways to approach protein and therapeutic design. However, further developments are needed to encode strong biological priors into protein language models and to increase their accessibility to the broader community.
Assuntos
Idioma , Proteínas , Sequência de Aminoácidos , Bases de Dados de Proteínas , Aprendizado de Máquina , Proteínas/químicaRESUMO
Recent progress in artificial intelligence provides the opportunity to ask the question of what is unique about human intelligence, but with a new comparison class. I argue that we can understand human intelligence, and the ways in which it may differ from artificial intelligence, by considering the characteristics of the kind of computational problems that human minds have to solve. I claim that these problems acquire their structure from three fundamental limitations that apply to human beings: limited time, limited computation, and limited communication. From these limitations we can derive many of the properties we associate with human intelligence, such as rapid learning, the ability to break down problems into parts, and the capacity for cumulative cultural evolution.
Assuntos
Inteligência Artificial , Inteligência , Humanos , AprendizagemRESUMO
Neuromorphic systems are designed with careful consideration of the physical properties of the computational substrate they use. Neuromorphic engineers often exploit physical phenomena to directly implement a desired functionality, enabled by "the isomorphism between physical processes in different media" (Douglas et al., 1995). This bottom-up design methodology could be described as matching computational primitives to physical phenomena. In this paper, we propose a top-down counterpart to the bottom-up approach to neuromorphic design. Our top-down approach, termed "bias matching," is to match the inductive biases required in a learning system to the hardware constraints of its implementation; a well-known example is enforcing translation equivariance in a neural network by tying weights (replacing vector-matrix multiplications with convolutions), which reduces memory requirements. We give numerous examples from the literature and explain how they can be understood from this perspective. Furthermore, we propose novel network designs based on this approach in the context of collaborative filtering. Our simulation results underline our central conclusions: additional hardware constraints can improve the predictions of a Machine Learning system, and understanding the inductive biases that underlie these performance gains can be useful in finding applications for a given constraint.
RESUMO
In memory tests, recalled information can be distorted by errors in memory and these distortions can be more memorable than the original stimuli to a later learner. This is typically observed over several generations of learners but there is less exploration of the initial distortions from the first generation of learners. In this article, participants studied visual matrix patterns which were either erroneous recall attempts from previous participants or were random patterns. Experiment 1 showed some evidence that material based on previous participants' recall data was more memorable than random material, but this did not replicate in Experiment 2. Of greater interest in the current data were homogeneity in the memory errors made by participants which demonstrated systematic recall biases in a single generation of learners. Unlike studies utilising multiple generations of learners, the currently observed distortions cannot be attributed to survival-of-the-fittest mechanisms where biases are driven by encoding effects.