RESUMEN
We introduce Brain-Inspired Modular Training (BIMT), a method for making neural networks more modular and interpretable. Inspired by brains, BIMT embeds neurons in a geometric space and augments the loss function with a cost proportional to the length of each neuron connection. This is inspired by the idea of minimum connection cost in evolutionary biology, but we are the first the combine this idea with training neural networks with gradient descent for interpretability. We demonstrate that BIMT discovers useful modular neural networks for many simple tasks, revealing compositional structures in symbolic formulas, interpretable decision boundaries and features for classification, and mathematical structure in algorithmic datasets. Qualitatively, BIMT-trained networks have modules readily identifiable by the naked eye, but regularly trained networks seem much more complicated. Quantitatively, we use Newman's method to compute the modularity of network graphs; BIMT achieves the highest modularity for all our test problems. A promising and ambitious future direction is to apply the proposed method to understand large models for vision, language, and science.
RESUMEN
We explore unique considerations involved in fitting machine learning (ML) models to data with very high precision, as is often required for science applications. We empirically compare various function approximation methods and study how they scale with increasing parameters and data. We find that neural networks (NNs) can often outperform classical approximation methods on high-dimensional examples, by (we hypothesize) auto-discovering and exploiting modular structures therein. However, neural networks trained with common optimizers are less powerful for low-dimensional cases, which motivates us to study the unique properties of neural network loss landscapes and the corresponding optimization challenges that arise in the high precision regime. To address the optimization issue in low dimensions, we develop training tricks which enable us to train neural networks to extremely low loss, close to the limits allowed by numerical precision.
RESUMEN
BACKGROUND: Determining cell identity in volumetric images of tagged neuronal nuclei is an ongoing challenge in contemporary neuroscience. Frequently, cell identity is determined by aligning and matching tags to an "atlas" of labeled neuronal positions and other identifying characteristics. Previous analyses of such C. elegans datasets have been hampered by the limited accuracy of such atlases, especially for neurons present in the ventral nerve cord, and also by time-consuming manual elements of the alignment process. RESULTS: We present a novel automated alignment method for sparse and incomplete point clouds of the sort resulting from typical C. elegans fluorescence microscopy datasets. This method involves a tunable learning parameter and a kernel that enforces biologically realistic deformation. We also present a pipeline for creating alignment atlases from datasets of the recently developed NeuroPAL transgene. In combination, these advances allow us to label neurons in volumetric images with confidence much higher than previous methods. CONCLUSIONS: We release, to the best of our knowledge, the most complete full-body C. elegans 3D positional neuron atlas, incorporating positional variability derived from at least 7 animals per neuron, for the purposes of cell-type identity prediction for myriad applications (e.g., imaging neuronal activity, gene expression, and cell-fate).
Asunto(s)
Caenorhabditis elegans , Neuronas , Animales , Microscopía FluorescenteRESUMEN
We present an automated method for finding hidden symmetries, defined as symmetries that become manifest only in a new coordinate system that must be discovered. Its core idea is to quantify asymmetry as violation of certain partial differential equations, and to numerically minimize such violation over the space of all invertible transformations, parametrized as invertible neural networks. For example, our method rediscovers the famous Gullstrand-Painlevé metric that manifests hidden translational symmetry in the Schwarzschild metric of nonrotating black holes, as well as Hamiltonicity, modularity, and other simplifying traits not traditionally viewed as symmetries.
Asunto(s)
Aprendizaje Automático , Redes Neurales de la ComputaciónRESUMEN
At the heart of both lossy compression and clustering is a trade-off between the fidelity and size of the learned representation. Our goal is to map out and study the Pareto frontier that quantifies this trade-off. We focus on the optimization of the Deterministic Information Bottleneck (DIB) objective over the space of hard clusterings. To this end, we introduce the primal DIB problem, which we show results in a much richer frontier than its previously studied Lagrangian relaxation when optimized over discrete search spaces. We present an algorithm for mapping out the Pareto frontier of the primal DIB trade-off that is also applicable to other two-objective clustering problems. We study general properties of the Pareto frontier, and we give both analytic and numerical evidence for logarithmic sparsity of the frontier in general. We provide evidence that our algorithm has polynomial scaling despite the super-exponential search space, and additionally, we propose a modification to the algorithm that can be used where sampling noise is expected to be significant. Finally, we use our algorithm to map the DIB frontier of three different tasks: compressing the English alphabet, extracting informative color classes from natural images, and compressing a group theory-inspired dataset, revealing interesting features of frontier, and demonstrating how the structure of the frontier can be used for model selection with a focus on points previously hidden by the cloak of the convex hull.
RESUMEN
We present AI Poincaré, a machine learning algorithm for autodiscovering conserved quantities using trajectory data from unknown dynamical systems. We test it on five Hamiltonian systems, including the gravitational three-body problem, and find that it discovers not only all exactly conserved quantities, but also periodic orbits, phase transitions, and breakdown timescales for approximate conservation laws.
RESUMEN
We present a novel recurrent neural network (RNN)-based model that combines the remembering ability of unitary evolution RNNs with the ability of gated RNNs to effectively forget redundant or irrelevant information in its memory. We achieve this by extending restricted orthogonal evolution RNNs with a gating mechanism similar to gated recurrent unit RNNs with a reset gate and an update gate. Our model is able to outperform long short-term memory, gated recurrent units, and vanilla unitary or orthogonal RNNs on several long-term-dependency benchmark tasks. We empirically show that both orthogonal and unitary RNNs lack the ability to forget. This ability plays an important role in RNNs. We provide competitive results along with an analysis of our model on many natural sequential tasks, including question answering, speech spectrum prediction, character-level language modeling, and synthetic tasks that involve long-term dependencies such as algorithmic, denoising, and copying tasks.
Asunto(s)
Redes Neurales de la Computación , Simulación por Computador , Humanos , Lenguaje , Aprendizaje , Lógica , MemoriaRESUMEN
The goal of lossy data compression is to reduce the storage cost of a data set X while retaining as much information as possible about something (Y) that you care about. For example, what aspects of an image X contain the most information about whether it depicts a cat? Mathematically, this corresponds to finding a mapping X â Z ≡ f ( X ) that maximizes the mutual information I ( Z , Y ) while the entropy H ( Z ) is kept below some fixed threshold. We present a new method for mapping out the Pareto frontier for classification tasks, reflecting the tradeoff between retained entropy and class information. We first show how a random variable X (an image, say) drawn from a class Y ∈ { 1 , , n } can be distilled into a vector W = f ( X ) ∈ R n - 1 losslessly, so that I ( W , Y ) = I ( X , Y ) ; for example, for a binary classification task of cats and dogs, each image X is mapped into a single real number W retaining all information that helps distinguish cats from dogs. For the n = 2 case of binary classification, we then show how W can be further compressed into a discrete variable Z = g ß ( W ) ∈ { 1 , , m ß } by binning W into m ß bins, in such a way that varying the parameter ß sweeps out the full Pareto frontier, solving a generalization of the discrete information bottleneck (DIB) problem. We argue that the most interesting points on this frontier are "corners" maximizing I ( Z , Y ) for a fixed number of bins m = 2 , 3 , which can conveniently be found without multiobjective optimization. We apply this method to the CIFAR-10, MNIST and Fashion-MNIST datasets, illustrating how it can be interpreted as an information-theoretically optimal image clustering algorithm. We find that these Pareto frontiers are not concave, and that recently reported DIB phase transitions correspond to transitions between these corners, changing the number of clusters.
RESUMEN
Much innovation is currently aimed at improving the number, density, and geometry of electrodes on extracellular multielectrode arrays for in vivo recording of neural activity in the mammalian brain. To choose a multielectrode array configuration for a given neuroscience purpose, or to reveal design principles of future multielectrode arrays, it would be useful to have a systematic way of evaluating the spike recording capability of such arrays. We describe an automated system that performs robotic patch-clamp recording of a neuron being simultaneously recorded via an extracellular multielectrode array. By recording a patch-clamp data set from a neuron while acquiring extracellular recordings from the same neuron, we can evaluate how well the extracellular multielectrode array captures the spiking information from that neuron. To demonstrate the utility of our system, we show that it can provide data from the mammalian cortex to evaluate how the spike sorting performance of a close-packed extracellular multielectrode array is affected by bursting, which alters the shape and amplitude of spikes in a train. We also introduce an algorithmic framework to help evaluate how the number of electrodes in a multielectrode array affects spike sorting, examining how adding more electrodes yields data that can be spike sorted more easily. Our automated methodology may thus help with the evaluation of new electrode designs and configurations, providing empirical guidance on the kinds of electrodes that will be optimal for different brain regions, cell types, and species, for improving the accuracy of spike sorting. NEW & NOTEWORTHY We present an automated strategy for evaluating the spike recording performance of an extracellular multielectrode array, by enabling simultaneous recording of a neuron with both such an array and with patch clamp. We use our robot and accompanying algorithms to evaluate the performance of multielectrode arrays on supporting spike sorting.
Asunto(s)
Potenciales de Acción , Automatización/métodos , Técnicas de Placa-Clamp/métodos , Corteza Visual/fisiología , Animales , Automatización/instrumentación , Excitabilidad Cortical , Electrodos/normas , Electroencefalografía/instrumentación , Electroencefalografía/métodos , Espacio Extracelular/fisiología , Masculino , Ratones , Ratones Endogámicos C57BL , Neuronas/fisiología , Técnicas de Placa-Clamp/instrumentación , Corteza Visual/citologíaRESUMEN
Although there is growing interest in measuring integrated information in computational and cognitive systems, current methods for doing so in practice are computationally unfeasible. Existing and novel integration measures are investigated and classified by various desirable properties. A simple taxonomy of Φ-measures is presented where they are each characterized by their choice of factorization method (5 options), choice of probability distributions to compare (3 × 4 options) and choice of measure for comparing probability distributions (7 options). When requiring the Φ-measures to satisfy a minimum of attractive properties, these hundreds of options reduce to a mere handful, some of which turn out to be identical. Useful exact and approximate formulas are derived that can be applied to real-world data from laboratory experiments without posing unreasonable computational demands.
Asunto(s)
Mapeo Encefálico/métodos , Encéfalo/fisiología , Cognición/fisiología , Estado de Conciencia/fisiología , Almacenamiento y Recuperación de la Información/métodos , Modelos Neurológicos , Animales , Simulación por Computador , Humanos , Aprendizaje Automático , Integración de SistemasRESUMEN
Discovering conservation laws for a given dynamical system is important but challenging. In a theorist setup (differential equations and basis functions are both known), we propose the sparse invariant detector (SID), an algorithm that autodiscovers conservation laws from differential equations. Its algorithmic simplicity allows robustness and interpretability of the discovered conserved quantities. We show that SID is able to rediscover known and even discover new conservation laws in a variety of systems. For two examples in fluid mechanics and atmospheric chemistry, SID discovers 14 and 3 conserved quantities, respectively, where only 12 and 2 were previously known to domain experts.
RESUMEN
We present an automated method for measuring media bias. Inferring which newspaper published a given article, based only on the frequencies with which it uses different phrases, leads to a conditional probability distribution whose analysis lets us automatically map newspapers and phrases into a bias space. By analyzing roughly a million articles from roughly a hundred newspapers for bias in dozens of news topics, our method maps newspapers into a two-dimensional bias landscape that agrees well with previous bias classifications based on human judgement. One dimension can be interpreted as traditional left-right bias, the other as establishment bias. This means that although news bias is inherently political, its measurement need not be.
Asunto(s)
Aprendizaje Automático , Medios de Comunicación de Masas , HumanosRESUMEN
We present a machine learning algorithm that discovers conservation laws from differential equations, both numerically (parametrized as neural networks) and symbolically, ensuring their functional independence (a nonlinear generalization of linear independence). Our independence module can be viewed as a nonlinear generalization of singular value decomposition. Our method can readily handle inductive biases for conservation laws. We validate it with examples including the three-body problem, the KdV equation, and nonlinear Schrödinger equation.
RESUMEN
The risk of a doomsday scenario in which high-energy physics experiments trigger the destruction of the Earth has been estimated to be minuscule. But this may give a false sense of security: the fact that the Earth has survived for so long does not necessarily mean that such disasters are unlikely, because observers are, by definition, in places that have avoided destruction. Here we derive a new upper bound of one per billion years (99.9% confidence level) for the exogenous terminal-catastrophe rate that is free of such selection bias, using calculations based on the relatively late formation time of Earth.
Asunto(s)
Desastres/estadística & datos numéricos , Planeta Tierra , Vida , Liberación de Radiactividad Peligrosa , Radiación Cósmica , Ingeniería Genética , Humanos , Luz , Fenómenos Físicos , Física , Probabilidad , Sobrevida , Factores de TiempoRESUMEN
We present a method for unsupervised learning of equations of motion for objects in raw and optionally distorted unlabeled synthetic video (or, more generally, for discovering and modeling predictable features in time-series data). We first train an autoencoder that maps each video frame into a low-dimensional latent space where the laws of motion are as simple as possible, by minimizing a combination of nonlinearity, acceleration, and prediction error. Differential equations describing the motion are then discovered using Pareto-optimal symbolic regression. We find that our pre-regression ("pregression") step is able to rediscover Cartesian coordinates of unlabeled moving objects even when the video is distorted by a generalized lens. Using intuition from multidimensional knot theory, we find that the pregression step is facilitated by first adding extra latent space dimensions to avoid topological problems during training and then removing these extra dimensions via principal component analysis. An inertial frame is autodiscovered by minimizing the combined equation complexity for multiple experiments.
RESUMEN
Energy conservation is a basic physics principle, the breakdown of which often implies new physics. This paper presents a method for data-driven "new physics" discovery. Specifically, given a trajectory governed by unknown forces, our neural new-physics detector (NNPhD) aims to detect new physics by decomposing the force field into conservative and nonconservative components, which are represented by a Lagrangian neural network (LNN) and an unconstrained neural network, respectively, trained to minimize the force recovery error plus a constant λ times the magnitude of the predicted nonconservative force. We show that a phase transition occurs at λ=1, universally for arbitrary forces. We demonstrate that NNPhD successfully discovers new physics in toy numerical experiments, rediscovering friction (1493) from a damped double pendulum, Neptune from Uranus' orbit (1846), and gravitational waves (2017) from an inspiraling orbit. We also show how NNPhD coupled with an integrator outperforms both an LNN and an unconstrained neural network for predicting the future of a damped double pendulum.
RESUMEN
Reciprocal copy number variations (CNVs) of 16p11.2 are associated with a wide spectrum of neuropsychiatric and neurodevelopmental disorders. Here, we use human induced pluripotent stem cells (iPSCs)-derived dopaminergic (DA) neurons carrying CNVs of 16p11.2 duplication (16pdup) and 16p11.2 deletion (16pdel), engineered using CRISPR-Cas9. We show that 16pdel iPSC-derived DA neurons have increased soma size and synaptic marker expression compared to isogenic control lines, while 16pdup iPSC-derived DA neurons show deficits in neuronal differentiation and reduced synaptic marker expression. The 16pdel iPSC-derived DA neurons have impaired neurophysiological properties. The 16pdel iPSC-derived DA neuronal networks are hyperactive and have increased bursting in culture compared to controls. We also show that the expression of RHOA is increased in the 16pdel iPSC-derived DA neurons and that treatment with a specific RHOA-inhibitor, Rhosin, rescues the network activity of the 16pdel iPSC-derived DA neurons. Our data suggest that 16p11.2 deletion-associated iPSC-derived DA neuron hyperactivation can be rescued by RHOA inhibition.
Asunto(s)
Deleción Cromosómica , Cromosomas Humanos Par 16/genética , Neuronas Dopaminérgicas/metabolismo , Células Madre Pluripotentes Inducidas/metabolismo , Red Nerviosa/metabolismo , Transmisión Sináptica/genética , Proteína de Unión al GTP rhoA/genética , Diferenciación Celular/efectos de los fármacos , Diferenciación Celular/genética , Células Cultivadas , Variaciones en el Número de Copia de ADN , Neuronas Dopaminérgicas/citología , Neuronas Dopaminérgicas/fisiología , Expresión Génica/efectos de los fármacos , Humanos , Células Madre Pluripotentes Inducidas/citología , Red Nerviosa/efectos de los fármacos , Compuestos Orgánicos/farmacología , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Transmisión Sináptica/efectos de los fármacos , Proteína de Unión al GTP rhoA/antagonistas & inhibidores , Proteína de Unión al GTP rhoA/metabolismoRESUMEN
A core challenge for both physics and artificial intelligence (AI) is symbolic regression: finding a symbolic expression that matches data from an unknown function. Although this problem is likely to be NP-hard in principle, functions of practical interest often exhibit symmetries, separability, compositionality, and other simplifying properties. In this spirit, we develop a recursive multidimensional symbolic regression algorithm that combines neural network fitting with a suite of physics-inspired techniques. We apply it to 100 equations from the Feynman Lectures on Physics, and it discovers all of them, while previous publicly available software cracks only 71; for a more difficult physics-based test set, we improve the state-of-the-art success rate from 15 to 90%.
RESUMEN
The emergence of artificial intelligence (AI) and its progressively wider impact on many sectors requires an assessment of its effect on the achievement of the Sustainable Development Goals. Using a consensus-based expert elicitation process, we find that AI can enable the accomplishment of 134 targets across all the goals, but it may also inhibit 59 targets. However, current research foci overlook important aspects. The fast development of AI needs to be supported by the necessary regulatory insight and oversight for AI-based technologies to enable sustainable development. Failure to do so could result in gaps in transparency, safety, and ethical standards.
RESUMEN
We investigate opportunities and challenges for improving unsupervised machine learning using four common strategies with a long history in physics: divide and conquer, Occam's razor, unification, and lifelong learning. Instead of using one model to learn everything, we propose a paradigm centered around the learning and manipulation of theories, which parsimoniously predict both aspects of the future (from past observations) and the domain in which these predictions are accurate. Specifically, we propose a generalized mean loss to encourage each theory to specialize in its comparatively advantageous domain, and a differentiable description length objective to downweight bad data and "snap" learned theories into simple symbolic formulas. Theories are stored in a "theory hub," which continuously unifies learned theories and can propose theories when encountering new environments. We test our implementation, the toy "artificial intelligence physicist" learning agent, on a suite of increasingly complex physics environments. From unsupervised observation of trajectories through worlds involving random combinations of gravity, electromagnetism, harmonic motion, and elastic bounces, our agent typically learns faster and produces mean-squared prediction errors about a billion times smaller than a standard feedforward neural net of comparable complexity, typically recovering integer and rational theory parameters exactly. Our agent successfully identifies domains with different laws of motion also for a nonlinear chaotic double pendulum in a piecewise constant force field.