|

1.

A domain-agnostic approach for characterization of lifelong learning systems.

Baker, Megan M; New, Alexander; Aguilar-Simon, Mario; Al-Halah, Ziad; Arnold, Sébastien M R; Ben-Iwhiwhu, Ese; Brna, Andrew P; Brooks, Ethan; Brown, Ryan C; Daniels, Zachary; Daram, Anurag; Delattre, Fabien; Dellana, Ryan; Eaton, Eric; Fu, Haotian; Grauman, Kristen; Hostetler, Jesse; Iqbal, Shariq; Kent, Cassandra; Ketz, Nicholas; Kolouri, Soheil; Konidaris, George; Kudithipudi, Dhireesha; Learned-Miller, Erik; Lee, Seungwon; Littman, Michael L; Madireddy, Sandeep; Mendez, Jorge A; Nguyen, Eric Q; Piatko, Christine; Pilly, Praveen K; Raghavan, Aswin; Rahman, Abrar; Ramakrishnan, Santhosh Kumar; Ratzlaff, Neale; Soltoggio, Andrea; Stone, Peter; Sur, Indranil; Tang, Zhipeng; Tiwari, Saket; Vedder, Kyle; Wang, Felix; Xu, Zifan; Yanguas-Gil, Angel; Yedidsion, Harel; Yu, Shangqun; Vallabha, Gautam K.

Neural Netw ; 160: 274-296, 2023 Mar.

Article En | MEDLINE | ID: mdl-36709531

Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to "real world" events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and systems will instead need to adapt to novel distributions and tasks while deployed. This critical gap may be addressed through the development of "Lifelong Learning" systems that are capable of (1) Continuous Learning, (2) Transfer and Adaptation, and (3) Scalability. Unfortunately, efforts to improve these capabilities are typically treated as distinct areas of research that are assessed independently, without regard to the impact of each separate capability on other aspects of the system. We instead propose a holistic approach, using a suite of metrics and an evaluation framework to assess Lifelong Learning in a principled way that is agnostic to specific domains or system techniques. Through five case studies, we show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems. We highlight how the proposed suite of metrics quantifies performance trade-offs present during Lifelong Learning system development - both the widely discussed Stability-Plasticity dilemma and the newly proposed relationship between Sample Efficient and Robust Learning. Further, we make recommendations for the formulation and use of metrics to guide the continuing development of Lifelong Learning systems and assess their progress in the future.

Education, Continuing , Machine Learning

2.

Context meta-reinforcement learning via neuromodulation.

Ben-Iwhiwhu, Eseoghene; Dick, Jeffery; Ketz, Nicholas A; Pilly, Praveen K; Soltoggio, Andrea.

Neural Netw ; 152: 70-79, 2022 Aug.

Article En | MEDLINE | ID: mdl-35512540

Meta-reinforcement learning (meta-RL) algorithms enable agents to adapt quickly to tasks from few samples in dynamic environments. Such a feat is achieved through dynamic representations in an agent's policy network (obtained via reasoning about task context, model parameter updates, or both). However, obtaining rich dynamic representations for fast adaptation beyond simple benchmark problems is challenging due to the burden placed on the policy network to accommodate different policies. This paper addresses the challenge by introducing neuromodulation as a modular component to augment a standard policy network that regulates neuronal activities in order to produce efficient dynamic representations for task adaptation. The proposed extension to the policy network is evaluated across multiple discrete and continuous control environments of increasing complexity. To prove the generality and benefits of the extension in meta-RL, the neuromodulated network was applied to two state-of-the-art meta-RL algorithms (CAVIA and PEARL). The result demonstrates that meta-RL augmented with neuromodulation produces significantly better result and richer dynamic representations in comparison to the baselines.

Algorithms , Reinforcement, Psychology , Animals , Guinea Pigs , Learning/physiology

3.

Fast and automated biomarker detection in breath samples with machine learning.

Skarysz, Angelika; Salman, Dahlia; Eddleston, Michael; Sykora, Martin; Hunsicker, Eugénie; Nailon, William H; Darnley, Kareen; McLaren, Duncan B; Thomas, C L Paul; Soltoggio, Andrea.

PLoS One ; 17(4): e0265399, 2022.

Article En | MEDLINE | ID: mdl-35413057

Volatile organic compounds (VOCs) in human breath can reveal a large spectrum of health conditions and can be used for fast, accurate and non-invasive diagnostics. Gas chromatography-mass spectrometry (GC-MS) is used to measure VOCs, but its application is limited by expert-driven data analysis that is time-consuming, subjective and may introduce errors. We propose a machine learning-based system to perform GC-MS data analysis that exploits deep learning pattern recognition ability to learn and automatically detect VOCs directly from raw data, thus bypassing expert-led processing. We evaluate this new approach on clinical samples and with four types of convolutional neural networks (CNNs): VGG16, VGG-like, densely connected and residual CNNs. The proposed machine learning methods showed to outperform the expert-led analysis by detecting a significantly higher number of VOCs in just a fraction of time while maintaining high specificity. These results suggest that the proposed novel approach can help the large-scale deployment of breath-based diagnosis by reducing time and cost, and increasing accuracy and consistency.

Breath Tests , Volatile Organic Compounds , Biomarkers/analysis , Breath Tests/methods , Gas Chromatography-Mass Spectrometry/methods , Humans , Machine Learning , Volatile Organic Compounds/analysis

4.

Deep Reinforcement Learning With Modulated Hebbian Plus Q-Network Architecture.

Ladosz, Pawel; Ben-Iwhiwhu, Eseoghene; Dick, Jeffery; Ketz, Nicholas; Kolouri, Soheil; Krichmar, Jeffrey L; Pilly, Praveen K; Soltoggio, Andrea.

IEEE Trans Neural Netw Learn Syst ; 33(5): 2045-2056, 2022 05.

Article En | MEDLINE | ID: mdl-34559664

In this article, we consider a subclass of partially observable Markov decision process (POMDP) problems which we termed confounding POMDPs. In these types of POMDPs, temporal difference (TD)-based reinforcement learning (RL) algorithms struggle, as TD error cannot be easily derived from observations. We solve these types of problems using a new bio-inspired neural architecture that combines a modulated Hebbian network (MOHN) with deep Q-network (DQN), which we call modulated Hebbian plus Q-network architecture (MOHQA). The key idea is to use a Hebbian network with rarely correlated bio-inspired neural traces to bridge temporal delays between actions and rewards when confounding observations and sparse rewards result in inaccurate TD errors. In MOHQA, DQN learns low-level features and control, while the MOHN contributes to high-level decisions by associating rewards with past states and actions. Thus, the proposed architecture combines two modules with significantly different learning algorithms, a Hebbian associative network and a classical DQN pipeline, exploiting the advantages of both. Simulations on a set of POMDPs and on the Malmo environment show that the proposed algorithm improved DQN's results and even outperformed control tests with advantage-actor critic (A2C), quantile regression DQN with long short-term memory (QRDQN + LSTM), Monte Carlo policy gradient (REINFORCE), and aggregated memory for reinforcement learning (AMRL) algorithms on most difficult POMDPs with confounding stimuli and sparse rewards.

Neural Networks, Computer , Reinforcement, Psychology , Algorithms , Markov Chains , Reward

5.

Detecting Changes and Avoiding Catastrophic Forgetting in Dynamic Partially Observable Environments.

Dick, Jeffery; Ladosz, Pawel; Ben-Iwhiwhu, Eseoghene; Shimadzu, Hideyasu; Kinnell, Peter; Pilly, Praveen K; Kolouri, Soheil; Soltoggio, Andrea.

Front Neurorobot ; 14: 578675, 2020.

Article En | MEDLINE | ID: mdl-33424575

The ability of an agent to detect changes in an environment is key to successful adaptation. This ability involves at least two phases: learning a model of an environment, and detecting that a change is likely to have occurred when this model is no longer accurate. This task is particularly challenging in partially observable environments, such as those modeled with partially observable Markov decision processes (POMDPs). Some predictive learners are able to infer the state from observations and thus perform better with partial observability. Predictive state representations (PSRs) and neural networks are two such tools that can be trained to predict the probabilities of future observations. However, most such existing methods focus primarily on static problems in which only one environment is learned. In this paper, we propose an algorithm that uses statistical tests to estimate the probability of different predictive models to fit the current environment. We exploit the underlying probability distributions of predictive models to provide a fast and explainable method to assess and justify the model's beliefs about the current environment. Crucially, by doing so, the method can label incoming data as fitting different models, and thus can continuously train separate models in different environments. This new method is shown to prevent catastrophic forgetting when new environments, or tasks, are encountered. The method can also be of use when AI-informed decisions require justifications because its beliefs are based on statistical evidence from observations. We empirically demonstrate the benefit of the novel method with simulations in a set of POMDP environments.

6.

VOCCluster: Untargeted Metabolomics Feature Clustering Approach for Clinical Breath Gas Chromatography/Mass Spectrometry Data.

Alkhalifah, Yaser; Phillips, Iain; Soltoggio, Andrea; Darnley, Kareen; Nailon, William H; McLaren, Duncan; Eddleston, Michael; Thomas, C L Paul; Salman, Dahlia.

Anal Chem ; 92(4): 2937-2945, 2020 02 18.

Article En | MEDLINE | ID: mdl-31791122

Metabolic profiling of breath analysis involves processing, alignment, scaling, and clustering of thousands of features extracted from gas chromatography/mass spectrometry (GC/MS) data from hundreds of participants. The multistep data processing is complicated, operator error-prone, and time-consuming. Automated algorithmic clustering methods that are able to cluster features in a fast and reliable way are necessary. These accelerate metabolic profiling and discovery platforms for next-generation medical diagnostic tools. Our unsupervised clustering technique, VOCCluster, prototyped in Python, handles features of deconvolved GC/MS breath data. VOCCluster was created from a heuristic ontology based on the observation of experts undertaking data processing with a suite of software packages. VOCCluster identifies and clusters groups of volatile organic compounds (VOCs) from deconvolved GC/MS breath with similar mass spectra and retention index profiles. VOCCluster was used to cluster more than 15 000 features extracted from 74 GC/MS clinical breath samples obtained from participants with cancer before and after a radiation therapy. Results were evaluated against a panel of ground truth compounds and compared to other clustering methods (DBSCAN and OPTICS) that were used in previous metabolomics studies. VOCCluster was able to cluster those features into 1081 groups (including endogenous and exogenous compounds and instrumental artifacts) with an accuracy rate of 96% (±0.04 at 95% confidence interval).

Metabolomics , Software , Volatile Organic Compounds/metabolism , Algorithms , Breath Tests , Cluster Analysis , Gas Chromatography-Mass Spectrometry , Humans , Volatile Organic Compounds/analysis

7.

A fully convolutional two-stream fusion network for interactive image segmentation.

Hu, Yang; Soltoggio, Andrea; Lock, Russell; Carter, Steve.

Neural Netw ; 109: 31-42, 2019 Jan.

Article En | MEDLINE | ID: mdl-30390521

In this paper, we propose a novel fully convolutional two-stream fusion network (FCTSFN) for interactiveimage segmentation. The proposed network includes two sub-networks: a two-stream late fusion network (TSLFN) that predicts the foreground at a reduced resolution, and a multi-scale refining network (MSRN) that refines the foreground at full resolution. The TSLFN includes two distinct deep streams followed by a fusion network. The intuition is that, since user interactions are more direct information on foreground/background than the image itself, the two-stream structure of the TSLFN reduces the number of layers between the pure user interaction features and the network output, allowing the user interactions to have a more direct impact on the segmentation result. The MSRN fuses the features from different layers of TSLFN with different scales, in order to seek the local to global information on the foreground to refine the segmentation result at full resolution. We conduct comprehensive experiments on four benchmark datasets. The results show that the proposed network achieves competitive performance compared to current state-of-the-art interactive image segmentation methods. 1.

Neural Networks, Computer , Pattern Recognition, Automated/methods , Databases, Factual , Pattern Recognition, Visual

8.

Born to learn: The inspiration, progress, and future of evolved plastic artificial neural networks.

Soltoggio, Andrea; Stanley, Kenneth O; Risi, Sebastian.

Neural Netw ; 108: 48-67, 2018 Dec.

Article En | MEDLINE | ID: mdl-30142505

Biological neural networks are systems of extraordinary computational capabilities shaped by evolution, development, and lifelong learning. The interplay of these elements leads to the emergence of biological intelligence. Inspired by such intricate natural phenomena, Evolved Plastic Artificial Neural Networks (EPANNs) employ simulated evolution in-silico to breed plastic neural networks with the aim to autonomously design and create learning systems. EPANN experiments evolve networks that include both innate properties and the ability to change and learn in response to experiences in different environments and problem domains. EPANNs' aims include autonomously creating learning systems, bootstrapping learning from scratch, recovering performance in unseen conditions, testing the computational advantages of particular neural components, and deriving hypotheses on the emergence of biological learning. Thus, EPANNs may include a large variety of different neuron types and dynamics, network architectures, plasticity rules, and other factors. While EPANNs have seen considerable progress over the last two decades, current scientific and technological advances in artificial neural networks are setting the conditions for radically new approaches and results. Exploiting the increased availability of computational resources and of simulation environments, the often challenging task of hand-designing learning neural networks could be replaced by more autonomous and creative processes. This paper brings together a variety of inspiring ideas that define the field of EPANNs. The main methods and results are reviewed. Finally, new opportunities and possible developments are presented.

Machine Learning/trends , Neural Networks, Computer , Neuronal Plasticity , Computer Simulation , Forecasting , Humans , Models, Neurological , Nerve Net/physiology , Neurons

9.

Distributed Task Rescheduling With Time Constraints for the Optimization of Total Task Allocations in a Multirobot System.

Turner, Joanna; Meng, Qinggang; Schaefer, Gerald; Whitbrook, Amanda; Soltoggio, Andrea.

IEEE Trans Cybern ; 48(9): 2583-2597, 2018 Sep.

Article En | MEDLINE | ID: mdl-28976326

This paper considers the problem of maximizing the number of task allocations in a distributed multirobot system under strict time constraints, where other optimization objectives need also be considered. It builds upon existing distributed task allocation algorithms, extending them with a novel method for maximizing the number of task assignments. The fundamental idea is that a task assignment to a robot has a high cost if its reassignment to another robot creates a feasible time slot for unallocated tasks. Multiple reassignments among networked robots may be required to create a feasible time slot and an upper limit to this number of reassignments can be adjusted according to performance requirements. A simulated rescue scenario with task deadlines and fuel limits is used to demonstrate the performance of the proposed method compared with existing methods, the consensus-based bundle algorithm and the performance impact (PI) algorithm. Starting from existing (PI-generated) solutions, results show up to a 20% increase in task allocations using the proposed method.

10.

Editorial: Neural plasticity for rich and uncertain robotic information streams.

Soltoggio, Andrea; van der Velde, Frank.

Front Neurorobot ; 9: 12, 2015.

Article En | MEDLINE | ID: mdl-26578947

11.

Short-term plasticity as cause-effect hypothesis testing in distal reward learning.

Soltoggio, Andrea.

Biol Cybern ; 109(1): 75-94, 2015 Feb.

Article En | MEDLINE | ID: mdl-25189158

Asynchrony, overlaps, and delays in sensory-motor signals introduce ambiguity as to which stimuli, actions, and rewards are causally related. Only the repetition of reward episodes helps distinguish true cause-effect relationships from coincidental occurrences. In the model proposed here, a novel plasticity rule employs short- and long-term changes to evaluate hypotheses on cause-effect relationships. Transient weights represent hypotheses that are consolidated in long-term memory only when they consistently predict or cause future rewards. The main objective of the model is to preserve existing network topologies when learning with ambiguous information flows. Learning is also improved by biasing the exploration of the stimulus-response space toward actions that in the past occurred before rewards. The model indicates under which conditions beliefs can be consolidated in long-term memory, it suggests a solution to the plasticity-stability dilemma, and proposes an interpretation of the role of short-term plasticity.

Learning/physiology , Models, Neurological , Neuronal Plasticity/physiology , Reward , Synapses/physiology , Animals , Humans , Memory , Time Factors

12.

Rare neural correlations implement robotic conditioning with delayed rewards and disturbances.

Soltoggio, Andrea; Lemme, Andre; Reinhart, Felix; Steil, Jochen J.

Front Neurorobot ; 7: 6, 2013.

Article En | MEDLINE | ID: mdl-23565092

Neural conditioning associates cues and actions with following rewards. The environments in which robots operate, however, are pervaded by a variety of disturbing stimuli and uncertain timing. In particular, variable reward delays make it difficult to reconstruct which previous actions are responsible for following rewards. Such an uncertainty is handled by biological neural networks, but represents a challenge for computational models, suggesting the lack of a satisfactory theory for robotic neural conditioning. The present study demonstrates the use of rare neural correlations in making correct associations between rewards and previous cues or actions. Rare correlations are functional in selecting sparse synapses to be eligible for later weight updates if a reward occurs. The repetition of this process singles out the associating and reward-triggering pathways, and thereby copes with distal rewards. The neural network displays macro-level classical and operant conditioning, which is demonstrated in an interactive real-life human-robot interaction. The proposed mechanism models realistic conditioning in humans and animals and implements similar behaviors in neuro-robotic platforms.

13.

Solving the distal reward problem with rare correlations.

Soltoggio, Andrea; Steil, Jochen J.

Neural Comput ; 25(4): 940-78, 2013 Apr.

Article En | MEDLINE | ID: mdl-23339615

In the course of trial-and-error learning, the results of actions, manifested as rewards or punishments, occur often seconds after the actions that caused them. How can a reward be associated with an earlier action when the neural activity that caused that action is no longer present in the network? This problem is referred to as the distal reward problem. A recent computational study proposes a solution using modulated plasticity with spiking neurons and argues that precise firing patterns in the millisecond range are essential for such a solution. In contrast, the study reported in this letter shows that it is the rarity of correlating neural activity, and not the spike timing, that allows the network to solve the distal reward problem. In this study, rare correlations are detected in a standard rate-based computational model by means of a threshold-augmented Hebbian rule. The novel modulated plasticity rule allows a randomly connected network to learn in classical and instrumental conditioning scenarios with delayed rewards. The rarity of correlations is shown to be a pivotal factor in the learning and in handling various delays of the reward. This study additionally suggests the hypothesis that short-term synaptic plasticity may implement eligibility traces and thereby serve as a selection mechanism in promoting candidate synapses for long-term storage.

Learning/physiology , Models, Neurological , Neuronal Plasticity/physiology , Neurons/physiology , Reward , Action Potentials/physiology , Computer Simulation , Memory/physiology , Reinforcement, Psychology , Synapses/physiology

14.

From modulated Hebbian plasticity to simple behavior learning through noise and weight saturation.

Soltoggio, Andrea; Stanley, Kenneth O.

Neural Netw ; 34: 28-41, 2012 Oct.

Article En | MEDLINE | ID: mdl-22796669

Synaptic plasticity is a major mechanism for adaptation, learning, and memory. Yet current models struggle to link local synaptic changes to the acquisition of behaviors. The aim of this paper is to demonstrate a computational relationship between local Hebbian plasticity and behavior learning by exploiting two traditionally unwanted features: neural noise and synaptic weight saturation. A modulation signal is employed to arbitrate the sign of plasticity: when the modulation is positive, the synaptic weights saturate to express exploitative behavior; when it is negative, the weights converge to average values, and neural noise reconfigures the network's functionality. This process is demonstrated through simulating neural dynamics in the autonomous emergence of fearful and aggressive navigating behaviors and in the solution to reward-based problems. The neural model learns, memorizes, and modifies different behaviors that lead to positive modulation in a variety of settings. The algorithm establishes a simple relationship between local plasticity and behavior learning by demonstrating the utility of noise and weight saturation. Moreover, it provides a new tool to simulate adaptive behavior, and contributes to bridging the gap between synaptic changes and behavior in neural computation.

Behavior , Learning , Models, Neurological , Neuronal Plasticity , Electricity