RESUMO
BACKGROUND: In recent years, the production of inclusion bodies that retain substantial catalytic activity was demonstrated. These catalytically active inclusion bodies (CatIBs) are formed by genetic fusion of an aggregation-inducing tag to a gene of interest via short linker polypeptides. The resulting CatIBs are known for their easy and cost-efficient production, recyclability as well as their improved stability. Recent studies have outlined the cooperative effects of linker and aggregation-inducing tag on CatIB activities. However, no a priori prediction is possible so far to indicate the best combination thereof. Consequently, extensive screening is required to find the best performing CatIB variant. RESULTS: In this work, a semi-automated cloning workflow was implemented and used for fast generation of 63 CatIB variants with glucose dehydrogenase of Bacillus subtilis (BsGDH). Furthermore, the variant BsGDH-PT-CBDCell was used to develop, optimize and validate an automated CatIB screening workflow, enhancing the analysis of many CatIB candidates in parallel. Compared to previous studies with CatIBs, important optimization steps include the exclusion of plate position effects in the BioLector by changing the cultivation temperature. For the overall workflow including strain construction, the manual workload could be reduced from 59 to 7 h for 48 variants (88%). After demonstration of high reproducibility with 1.9% relative standard deviation across 42 biological replicates, the workflow was performed in combination with a Bayesian process model and Thompson sampling. While the process model is crucial to derive key performance indicators of CatIBs, Thompson sampling serves as a strategy to balance exploitation and exploration in screening procedures. Our methodology allowed analysis of 63 BsGDH-CatIB variants within only three batch experiments. Because of the high likelihood of TDoT-PT-BsGDH being the best CatIB performer, it was selected in 50 biological replicates during the three screening rounds, much more than other, low-performing variants. CONCLUSIONS: At the current state of knowledge, every new enzyme requires screening for different linker/aggregation-inducing tag combinations. For this purpose, the presented CatIB toolbox facilitates fast and simplified construction and screening procedures. The methodology thus assists in finding the best CatIB producer from large libraries in short time, rendering possible automated Design-Build-Test-Learn cycles to generate structure/function learnings.
Assuntos
Automação Laboratorial , Ensaios de Triagem em Larga Escala , Reprodutibilidade dos Testes , Teorema de Bayes , Corpos de Inclusão , AutomaçãoRESUMO
In the scenario of an integrated space-air-ground emergency communication network, users encounter the challenge of rapidly identifying the optimal network node amidst the uncertainty and stochastic fluctuations of network states. This study introduces a Multi-Armed Bandit (MAB) model and proposes an optimization algorithm leveraging dynamic variance sampling (DVS). The algorithm posits that the prior distribution of each node's network state conforms to a normal distribution, and by constructing the distribution's expected value and variance, it maximizes the utilization of sample data, thereby maintaining an equilibrium between data exploitation and the exploration of the unknown. Theoretical substantiation is provided to illustrate that the Bayesian regret associated with the algorithm exhibits sublinear growth. Empirical simulations corroborate that the algorithm in question outperforms traditional ε-greedy, Upper Confidence Bound (UCB), and Thompson sampling algorithms in terms of higher cumulative rewards, diminished total regret, accelerated convergence rates, and enhanced system throughput.
RESUMO
We study stochastic linear contextual bandits (CB) where the agent observes a noisy version of the true context through a noise channel with unknown channel parameters. Our objective is to design an action policy that can "approximate" that of a Bayesian oracle that has access to the reward model and the noise channel parameter. We introduce a modified Thompson sampling algorithm and analyze its Bayesian cumulative regret with respect to the oracle action policy via information-theoretic tools. For Gaussian bandits with Gaussian context noise, our information-theoretic analysis shows that under certain conditions on the prior variance, the Bayesian cumulative regret scales as OË(mT), where m is the dimension of the feature vector and T is the time horizon. We also consider the problem setting where the agent observes the true context with some delay after receiving the reward, and show that delayed true contexts lead to lower regret. Finally, we empirically demonstrate the performance of the proposed algorithms against baselines.
RESUMO
Many popular survival models rely on restrictive parametric, or semiparametric, assumptions that could provide erroneous predictions when the effects of covariates are complex. Modern advances in computational hardware have led to an increasing interest in flexible Bayesian nonparametric methods for time-to-event data such as Bayesian additive regression trees (BART). We propose a novel approach that we call nonparametric failure time (NFT) BART in order to increase the flexibility beyond accelerated failure time (AFT) and proportional hazard models. NFT BART has three key features: (1) a BART prior for the mean function of the event time logarithm; (2) a heteroskedastic BART prior to deduce a covariate-dependent variance function; and (3) a flexible nonparametric error distribution using Dirichlet process mixtures (DPM). Our proposed approach widens the scope of hazard shapes including nonproportional hazards, can be scaled up to large sample sizes, naturally provides estimates of uncertainty via the posterior and can be seamlessly employed for variable selection. We provide convenient, user-friendly, computer software that is freely available as a reference implementation. Simulations demonstrate that NFT BART maintains excellent performance for survival prediction especially when AFT assumptions are violated by heteroskedasticity. We illustrate the proposed approach on a study examining predictors for mortality risk in patients undergoing hematopoietic stem cell transplant (HSCT) for blood-borne cancer, where heteroskedasticity and nonproportional hazards are likely present.
Assuntos
Aprendizado de Máquina , Software , Humanos , Teorema de Bayes , Modelos de Riscos Proporcionais , Incerteza , Modelos Estatísticos , Simulação por ComputadorRESUMO
We consider outcome adaptive phase II or phase II/III trials to identify the best treatment for further development. Different from many other multi-arm multi-stage designs, we borrow approaches for the best arm identification in multi-armed bandit (MAB) approaches developed for machine learning and adapt them for clinical trial purposes. The best arm identification in MAB focuses on the error rate of identification at the end of the trial, but we are also interested in the cumulative benefit of trial patients, for example, the frequency of patients treated with the best treatment. In particular, we consider Top-Two Thompson Sampling (TTTS) and propose an acceleration approach for better performance in drug development scenarios in which the sample size is much smaller than that considered in machine learning applications. We also propose a variant of TTTS (TTTS2) which is simpler, easier for implementation, and has comparable performance in small sample settings. An extensive simulation study was conducted to evaluate the performance of the proposed approach in multiple typical scenarios in drug development.
Assuntos
Projetos de Pesquisa , Humanos , Tamanho da Amostra , Simulação por ComputadorRESUMO
Contextual bandits can solve a huge range of real-world problems. However, current popular algorithms to solve them either rely on linear models or unreliable uncertainty estimation in non-linear models, which are required to deal with the exploration-exploitation trade-off. Inspired by theories of human cognition, we introduce novel techniques that use maximum entropy exploration, relying on neural networks to find optimal policies in settings with both continuous and discrete action spaces. We present two classes of models, one with neural networks as reward estimators, and the other with energy based models, which model the probability of obtaining an optimal reward given an action. We evaluate the performance of these models in static and dynamic contextual bandit simulation environments. We show that both techniques outperform standard baseline algorithms, such as NN HMC, NN Discrete, Upper Confidence Bound, and Thompson Sampling, where energy based models have the best overall performance. This provides practitioners with new techniques that perform well in static and dynamic settings, and are particularly well suited to non-linear scenarios with continuous action spaces.
RESUMO
A reconfigurable intelligent surface (RIS) is a promising technology that can extend short-range millimeter wave (mmWave) communications coverage. However, phase shifts (PSs) of both mmWave transmitter (TX) and RIS antenna elements need to be optimally adjusted to effectively cover a mmWave user. This paper proposes codebook-based phase shifters for mmWave TX and RIS to overcome the difficulty of estimating their mmWave channel state information (CSI). Moreover, to adjust the PSs of both, an online learning approach in the form of a multiarmed bandit (MAB) game is suggested, where a nested two-stage stochastic MAB strategy is proposed. In the proposed strategy, the PS vector of the mmWave TX is adjusted in the first MAB stage. Based on it, the PS vector of the RIS is calibrated in the second stage and vice versa over the time horizon. Hence, we leverage and implement two standard MAB algorithms, namely Thompson sampling (TS) and upper confidence bound (UCB). Simulation results confirm the superior performance of the proposed nested two-stage MAB strategy; in particular, the nested two-stage TS nearly matches the optimal performance.
RESUMO
N-of-1 trials, which are randomized, double-blinded, controlled, multiperiod, crossover trials on a single subject, have been applied to determine the heterogeneity of the individual's treatment effect in precision medicine settings. An aggregated N-of-1 design, which can estimate the population effect from these individual trials, is a pragmatic alternative when a randomized controlled trial (RCT) is infeasible. We propose a Bayesian adaptive design for both the individual and aggregated N-of-1 trials using a multiarmed bandit framework that is estimated via efficient Markov chain Monte Carlo. A Bayesian hierarchical structure is used to jointly model the individual and population treatment effects. Our proposed adaptive trial design is based on Thompson sampling, which randomly allocates individuals to treatments based on the Bayesian posterior probability of each treatment being optimal. While we use a subject-specific treatment effect and Bayesian posterior probability estimates to determine an individual's treatment allocation, our hierarchical model facilitates these individual estimates to borrow strength from the population estimates via shrinkage to the population mean. We present the design's operating characteristics and performance via a simulation study motivated by a recently completed N-of-1 clinical trial. We demonstrate that from a patient-centered perspective, subjects are likely to benefit from our adaptive design, in particular, for those individuals that deviate from the overall population effect.
Assuntos
Projetos de Pesquisa , Teorema de Bayes , Estudos Cross-Over , Humanos , Cadeias de Markov , Método de Monte CarloRESUMO
In a clinical trial, sometimes it is desirable to allocate as many patients as possible to the best treatment, in particular, when a trial for a rare disease may contain a considerable portion of the whole target population. The Gittins index rule is a powerful tool for sequentially allocating patients to the best treatment based on the responses of patients already treated. However, its application in clinical trials is limited due to technical complexity and lack of randomness. Thompson sampling is an appealing approach, since it makes a compromise between optimal treatment allocation and randomness with some desirable optimal properties in the machine learning context. However, in clinical trial settings, multiple simulation studies have shown disappointing results with Thompson samplers. We consider how to improve short-run performance of Thompson sampling and propose a novel acceleration approach. This approach can also be applied to situations when patients can only be allocated by batch and is very easy to implement without using complex algorithms. A simulation study showed that this approach could improve the performance of Thompson sampling in terms of average total response rate. An application to a redesign of a preference trial to maximize patient's satisfaction is also presented.
Assuntos
Ensaios Clínicos como Assunto , Projetos de Pesquisa , Algoritmos , Simulação por Computador , HumanosRESUMO
The Multi-Armed Bandit (MAB) problem has been extensively studied in order to address real-world challenges related to sequential decision making. In this setting, an agent selects the best action to be performed at time-step t, based on the past rewards received by the environment. This formulation implicitly assumes that the expected payoff for each action is kept stationary by the environment through time. Nevertheless, in many real-world applications this assumption does not hold and the agent has to face a non-stationary environment, that is, with a changing reward distribution. Thus, we present a new MAB algorithm, named f-Discounted-Sliding-Window Thompson Sampling (f-dsw TS), for non-stationary environments, that is, when the data streaming is affected by concept drift. The f-dsw TS algorithm is based on Thompson Sampling (TS) and exploits a discount factor on the reward history and an arm-related sliding window to contrast concept drift in non-stationary environments. We investigate how to combine these two sources of information, namely the discount factor and the sliding window, by means of an aggregation function f(.). In particular, we proposed a pessimistic (f=min), an optimistic (f=max), as well as an averaged (f=mean) version of the f-dsw TS algorithm. A rich set of numerical experiments is performed to evaluate the f-dsw TS algorithm compared to both stationary and non-stationary state-of-the-art TS baselines. We exploited synthetic environments (both randomly-generated and controlled) to test the MAB algorithms under different types of drift, that is, sudden/abrupt, incremental, gradual and increasing/decreasing drift. Furthermore, we adapt four real-world active learning tasks to our framework-a prediction task on crimes in the city of Baltimore, a classification task on insects species, a recommendation task on local web-news, and a time-series analysis on microbial organisms in the tropical air ecosystem. The f-dsw TS approach emerges as the best performing MAB algorithm. At least one of the versions of f-dsw TS performs better than the baselines in synthetic environments, proving the robustness of f-dsw TS under different concept drift types. Moreover, the pessimistic version (f=min) results as the most effective in all real-world tasks.
RESUMO
Mate choice requires navigating an exploration-exploitation trade-off. Successful mate choice requires choosing partners who have preferred qualities; but time spent determining one partner's qualities could have been spent exploring for potentially superior alternatives. Here I argue that this dilemma can be modeled in a reinforcement learning framework as a multi-armed bandit problem. Moreover, using agent-based models and a sample of k = 522 real-world romantic dyads, I show that a reciprocity-weighted Thompson sampling algorithm performs well both in guiding mate search in noisy search environments and in reproducing the mate choices of real-world participants. These results provide a formal model of the understudied psychology of human mate search. They additionally offer implications for our understanding of person perception and mate choice.
RESUMO
In many real-world problems of real-time monitoring high-dimensional streaming data, one wants to detect an undesired event or change quickly once it occurs, but under the sampling control constraint in the sense that one might be able to only observe or use selected components data for decision-making per time step in the resource-constrained environments. In this paper, we propose to incorporate multi-armed bandit approaches into sequential change-point detection to develop an efficient bandit change-point detection algorithm based on the limiting Bayesian approach to incorporate a prior knowledge of potential changes. Our proposed algorithm, termed Thompson-Sampling-Shiryaev-Roberts-Pollak (TSSRP), consists of two policies per time step: the adaptive sampling policy applies the Thompson Sampling algorithm to balance between exploration for acquiring long-term knowledge and exploitation for immediate reward gain, and the statistical decision policy fuses the local Shiryaev-Roberts-Pollak statistics to determine whether to raise a global alarm by sum shrinkage techniques. Extensive numerical simulations and case studies demonstrate the statistical and computational efficiency of our proposed TSSRP algorithm.
RESUMO
Antibodies are one of the predominant treatment modalities for various diseases. To improve the characteristics of a lead antibody, such as antigen-binding affinity and stability, we conducted comprehensive substitutions and exhaustively explored their sequence space. However, it is practically unfeasible to evaluate all possible combinations of mutations owing to combinatorial explosion when multiple amino acid residues are incorporated. It was recently reported that a machine-learning guided protein engineering approach such as Thompson sampling (TS) has been used to efficiently explore sequence space in the framework of Bayesian optimization. For TS, over-exploration occurs when the initial data are biasedly distributed in the vicinity of the lead antibody. We handle a large-scale virtual library that includes numerous mutations. When the number of experiments is limited, this over-exploration causes a serious issue. Thus, we conducted Monte Carlo Thompson sampling (MTS) to balance the exploration-exploitation trade-off by defining the posterior distribution via the Monte Carlo method and compared its performance with TS in antibody engineering. Our results demonstrated that MTS largely outperforms TS in discovering desirable candidates at an earlier round when over-exploration occurs on TS. Thus, the MTS method is a powerful technique for efficiently discovering antibodies with desired characteristics when the number of rounds is limited.
Assuntos
Anticorpos , Engenharia de Proteínas , Teorema de Bayes , Método de Monte Carlo , Anticorpos/química , Engenharia de Proteínas/métodosRESUMO
Signal detection theory (SDT) has been widely applied to identify the optimal discriminative decisions of receivers under uncertainty. However, the approach assumes that decision-makers immediately adopt the appropriate acceptance threshold, even though the optimal response must often be learned. Here we recast the classical normal-normal (and power-law) signal detection model as a contextual multi-armed bandit (CMAB). Thus, rather than starting with complete information, decision-makers must infer how the magnitude of a continuous cue is related to the probability that a signaller is desirable, while simultaneously seeking to exploit the information they acquire. We explain how various CMAB heuristics resolve the trade-off between better estimating the underlying relationship and exploiting it. Next, we determined how naive human volunteers resolve signal detection problems with a continuous cue. As anticipated, a model of choice (accept/reject) that assumed volunteers immediately adopted the SDT-predicted acceptance threshold did not predict volunteer behaviour well. The Softmax rule for solving CMABs, with choices based on a logistic function of the expected payoffs, best explained the decisions of our volunteers but a simple midpoint algorithm also predicted decisions well under some conditions. CMABs offer principled parametric solutions to solving many classical SDT problems when decision-makers start with incomplete information.
RESUMO
A Brain-Computer Interface (BCI) is a device that interprets brain activity to help people with disabilities communicate. The P300 ERP-based BCI speller displays a series of events on the screen and searches the elicited electroencephalogram (EEG) data for target P300 event-related potential (ERP) responses among a series of non-target events. The Checkerboard (CB) paradigm is a common stimulus presentation paradigm. Although a few studies have proposed data-driven methods for stimulus selection, they suffer from intractable decision rules, large computation complexity, or error propagation for participants who perform poorly under the static paradigm. In addition, none of the methods have been applied to the CB paradigm directly. In this work, we propose a sequence-based adaptive stimulus selection method using Thompson Sampling in the multi-bandit problem with multiple actions. During each sequence, the algorithm selects a random subset of stimuli with fixed size, aiming to identify all target stimuli and to improve the spelling speed by reducing the number of unnecessary non-target stimuli. We compute "clean" stimulus-specific rewards from raw classifier scores via the Bayes rule. We perform extensive simulation studies to compare our algorithm to the static CB paradigm. We show the robustness of our algorithm by considering the constraints of practical use. For scenarios where simulated data resemble the real data the most, the spelling efficiency of our algorithm increases by more than 70%, compared to the static CB paradigm.
RESUMO
A key feature of sequential decision making under uncertainty is a need to balance between exploiting-choosing the best action according to the current knowledge, and exploring-obtaining information about values of other actions. The multi-armed bandit problem, a classical task that captures this trade-off, served as a vehicle in machine learning for developing bandit algorithms that proved to be useful in numerous industrial applications. The active inference framework, an approach to sequential decision making recently developed in neuroscience for understanding human and animal behaviour, is distinguished by its sophisticated strategy for resolving the exploration-exploitation trade-off. This makes active inference an exciting alternative to already established bandit algorithms. Here we derive an efficient and scalable approximate active inference algorithm and compare it to two state-of-the-art bandit algorithms: Bayesian upper confidence bound and optimistic Thompson sampling. This comparison is done on two types of bandit problems: a stationary and a dynamic switching bandit. Our empirical evaluation shows that the active inference algorithm does not produce efficient long-term behaviour in stationary bandits. However, in the more challenging switching bandit problem active inference performs substantially better than the two state-of-the-art bandit algorithms. The results open exciting venues for further research in theoretical and applied machine learning, as well as lend additional credibility to active inference as a general framework for studying human and animal behaviour.
Assuntos
Algoritmos , Tomada de Decisões , Animais , Teorema de Bayes , Humanos , Aprendizado de Máquina , IncertezaRESUMO
A key component in controlling the spread of an epidemic is deciding where, when and to whom to apply an intervention. We develop a framework for using data to inform these decisions in realtime. We formalize a treatment allocation strategy as a sequence of functions, one per treatment period, that map up-to-date information on the spread of an infectious disease to a subset of locations where treatment should be allocated. An optimal allocation strategy optimizes some cumulative outcome, e.g. the number of uninfected locations, the geographic footprint of the disease or the cost of the epidemic. Estimation of an optimal allocation strategy for an emerging infectious disease is challenging because spatial proximity induces interference between locations, the number of possible allocations is exponential in the number of locations, and because disease dynamics and intervention effectiveness are unknown at out-break. We derive a Bayesian on-line estimator of the optimal allocation strategy that combines simulation-optimization with Thompson sampling. The estimator proposed performs favourably in simulation experiments. This work is motivated by and illustrated using data on the spread of white nose syndrome, which is a highly fatal infectious disease devastating bat populations in North America.