We present a principled approach for designing stochastic Newton methods for solving finite sum optimization problems. Our approach has two steps. First, we rewrite the stationarity conditions as a system of nonlinear equations that associates each data point to a new row. Second, we apply a subsampled Newton Raphson method to solve this system of nonlinear equations. By design, methods developed using our approach are incremental, in that they require only a single data point per iteration. Using our approach, we develop a new Stochastic Average Newton (SAN) method, which is incremental and cheap to implement when solving regularized generalized linear models. We show through extensive numerical experiments that SAN requires no knowledge about the problem, neither parameter tuning, while remaining competitive as compared to classical variance reduced gradient methods, such as SAG and SVRG.
Probabilistic model-checking is a field which seeks to automate the formal analysis of probabilistic models such as Markov chains. In this thesis, we study and develop the stochastic Markov reward model (sMRM) which extends the Markov chain with rewards as random variables. The model recently being introduced, does not have much in the way of techniques and algorithms for their analysis. The purpose of this study is to derive such algorithms that are both scalable and accurate. Additionally, we derive the necessary theory for probabilistic model-checking of sMRMs against existing temporal logics such as PRCTL. We present the equations for computing \textit{first-passage reward densities}, \textit{expected value problems}, and other \textit{reachability problems}. Our focus however is on finding strictly numerical solutions for \textit{first-passage reward densities}. We solve for these by firstly adapting known direct linear algebra algorithms such as Gaussian elimination, and iterative methods such as the power method, Jacobi and Gauss-Seidel. We provide solutions for both discrete-reward sMRMs, where all rewards discrete (lattice) random variables. And also for continuous-reward sMRMs, where all rewards are strictly continuous random variables, but not necessarily having continuous probability density functions (pdfs). Our solutions involve the use of fast Fourier transform (FFT) for faster computation, and we adapted existing quadrature rules for convolution to gain more accurate solutions, rules such as the trapezoid rule, Simpson's rule or Romberg's method.
The energy consumption of data centers assumes a significant fraction of the world's overall energy consumption. Most data centers are statically provisioned, leading to a very low average utilization of servers. In this work, we survey uni-dimensional and high-dimensional approaches for dynamically powering up and powering down servers to reduce the energy footprint of data centers while ensuring that incoming jobs are processed in time. We implement algorithms for smoothed online convex optimization and variations thereof where, in each round, the agent receives a convex cost function. The agent seeks to balance minimizing this cost and a movement cost associated with changing decisions in-between rounds. We implement the algorithms in their most general form, inviting future research on their performance in other application areas. We evaluate the algorithms for the application of right-sizing data centers using traces from Facebook, Microsoft, Alibaba, and Los Alamos National Lab. Our experiments show that the online algorithms perform close to the dynamic offline optimum in practice and promise a significant cost reduction compared to a static provisioning of servers. We discuss how features of the data center model and trace impact the performance. Finally, we investigate the practical use of predictions to achieve further cost reductions.
One practical requirement in solving dynamic games is to ensure that the players play well from any decision point onward. To satisfy this requirement, existing efforts focus on equilibrium refinement, but the scalability and applicability of existing techniques are limited. In this paper, we propose Temporal-Induced Self-Play (TISP), a novel reinforcement learning-based framework to find strategies with decent performances from any decision point onward. TISP uses belief-space representation, backward induction, policy learning, and non-parametric approximation. Building upon TISP, we design a policy-gradient-based algorithm TISP-PG. We prove that TISP-based algorithms can find approximate Perfect Bayesian Equilibrium in zero-sum one-sided stochastic Bayesian games with finite horizon. We test TISP-based algorithms in various games, including finitely repeated security games and a grid-world game. The results show that TISP-PG is more scalable than existing mathematical programming-based methods and significantly outperforms other learning-based methods.
In a previous paper we have presented a CEGAR approach for the verification of parameterized systems with an arbitrary number of processes organized in an array or a ring. The technique is based on the iterative computation of parameterized invariants, i.e., infinite families of invariants for the infinitely many instances of the system. Safety properties are proved by checking that every global configuration of the system satisfying all parameterized invariants also satisfies the property; we have shown that this check can be reduced to the satisfiability problem for Monadic Second Order on words, which is decidable. A strong limitation of the approach is that processes can only have a fixed number of variables with a fixed finite range. In particular, they cannot use variables with range [0,N-1], where N is the number of processes, which appear in many standard distributed algorithms. In this paper, we extend our technique to this case. While conducting the check whether a safety property is inductive assuming a computed set of invariants becomes undecidable, we show how to reduce it to checking satisfiability of a first-order formula. We report on experiments showing that automatic first-order theorem provers can still perform this check for a collection of non-trivial examples. Additionally, we can give small sets of readable invariants for these checks.
Parameter calibration is a significant challenge in agent-based modelling and simulation (ABMS). An agent-based model's (ABM) complexity grows as the number of parameters required to be calibrated increases. This parameter expansion leads to the ABMS equivalent of the \say{curse of dimensionality}. In particular, infeasible computational requirements searching an infinite parameter space. We propose a more comprehensive and adaptive ABMS Framework that can effectively swap out parameterisation strategies and surrogate models to parameterise an infectious disease ABM. This framework allows us to evaluate different strategy-surrogate combinations' performance in accuracy and efficiency (speedup). We show that we achieve better than parity in accuracy across the surrogate assisted sampling strategies and the baselines. Also, we identify that the Metric Stochastic Response Surface strategy combined with the Support Vector Machine surrogate is the best overall in getting closest to the true synthetic parameters. Also, we show that DYnamic COOrdindate Search Using Response Surface Models with XGBoost as a surrogate attains in combination the highest probability of approximating a cumulative synthetic daily infection data distribution and achieves the most significant speedup with regards to our analysis. Lastly, we show in a real-world setting that DYCORS XGBoost and MSRS SVM can approximate the real world cumulative daily infection distribution with $97.12$\% and $96.75$\% similarity respectively.
Based on the framework of the quantum-inspired evolutionary algorithm, a cuckoo quantum evolutionary algorithm (CQEA) is proposed for solving the graph coloring problem (GCP). To reduce iterations for the search of the chromatic number, the initial quantum population is generated by random initialization assisted by inheritance. Moreover, improvement of global exploration is achieved by incorporating the cuckoo search strategy, and a local search operation, as well as a perturbance strategy, is developed to enhance its performance on GCPs. Numerical results demonstrate that CQEA operates with strong exploration and exploitation abilities, and is competitive to the compared state-of-the-art heuristic algorithms.
This study combines simulated annealing with delta evaluation to solve the joint stratification and sample allocation problem. In this problem, atomic strata are partitioned into mutually exclusive and collectively exhaustive strata. Each stratification is a solution, the quality of which is measured by its cost. The Bell number of possible solutions is enormous for even a moderate number of atomic strata and an additional layer of complexity is added with the evaluation time of each solution. Many larger scale combinatorial optimisation problems cannot be solved to optimality because the search for an optimum solution requires a prohibitive amount of computation time; a number of local search heuristic algorithms have been designed for this problem but these can become trapped in local minima preventing any further improvements. We add to the existing suite of local search algorithms a simulated annealing algorithm that allows for an escape from local minima and uses delta evaluation to exploit the similarity between consecutive solutions and thereby reduce the evaluation time. We compare the simulated annealing algorithm with two recent algorithms. In both cases the SAA attains a solution of comparable quality in considerably less computation time.
In this paper, a new numerical method based on adaptive gradient descent optimizers is provided for computing the implied volatility from the Black-Scholes (B-S) option pricing model. It is shown that the new method is more accurate than the close form approximation. Compared with the Newton-Raphson method, the new method obtains a reliable rate of convergence and tends to be less sensitive to the beginning point.
A core capability of intelligent systems is the ability to quickly learn new tasks by drawing on prior experience. Gradient (or optimization) based meta-learning has recently emerged as an effective approach for few-shot learning. In this formulation, meta-parameters are learned in the outer loop, while task-specific models are learned in the inner-loop, by using only a small amount of data from the current task. A key challenge in scaling these approaches is the need to differentiate through the inner loop learning process, which can impose considerable computational and memory burdens. By drawing upon implicit differentiation, we develop the implicit MAML algorithm, which depends only on the solution to the inner level optimization and not the path taken by the inner loop optimizer. This effectively decouples the meta-gradient computation from the choice of inner loop optimizer. As a result, our approach is agnostic to the choice of inner loop optimizer and can gracefully handle many gradient steps without vanishing gradients or memory constraints. Theoretically, we prove that implicit MAML can compute accurate meta-gradients with a memory footprint that is, up to small constant factors, no more than that which is required to compute a single inner loop gradient and at no overall increase in the total computational cost. Experimentally, we show that these benefits of implicit MAML translate into empirical gains on few-shot image recognition benchmarks.
Machine learning algorithms have found several applications in the field of robotics and control systems. The control systems community has started to show interest towards several machine learning algorithms from the sub-domains such as supervised learning, imitation learning and reinforcement learning to achieve autonomous control and intelligent decision making. Amongst many complex control problems, stable bipedal walking has been the most challenging problem. In this paper, we present an architecture to design and simulate a planar bipedal walking robot(BWR) using a realistic robotics simulator, Gazebo. The robot demonstrates successful walking behaviour by learning through several of its trial and errors, without any prior knowledge of itself or the world dynamics. The autonomous walking of the BWR is achieved using reinforcement learning algorithm called Deep Deterministic Policy Gradient(DDPG). DDPG is one of the algorithms for learning controls in continuous action spaces. After training the model in simulation, it was observed that, with a proper shaped reward function, the robot achieved faster walking or even rendered a running gait with an average speed of 0.83 m/s. The gait pattern of the bipedal walker was compared with the actual human walking pattern. The results show that the bipedal walking pattern had similar characteristics to that of a human walking pattern.