Evolutionary algorithms have been applied to a wide range of stochastic problems. Motivated by real-world problems where constraint violations have disruptive effects, this paper considers the chance-constrained knapsack problem (CCKP) which is a variance of the binary knapsack problem. The problem aims to maximize the profit of selected items under a constraint that the knapsack capacity bound is violated with a small probability. To tackle the chance constraint, we introduce how to construct surrogate functions by applying well-known deviation inequalities such as Chebyshev's inequality and Chernoff bounds. Furthermore, we investigate the performance of several deterministic approaches and introduce a single- and multi-objective evolutionary algorithm to solve the CCKP. In the experiment section, we evaluate and compare the deterministic approaches and evolutionary algorithms on a wide range of instances. Our experimental results show that a multi-objective evolutionary algorithm outperforms its single-objective formulation for all instances and performance better than deterministic approaches according to the computation time. Furthermore, our investigation points out in which circumstances to favour Chebyshev's inequality or the Chernoff bound when dealing with the CCKP.
In this paper, we consider a well-known sparse optimization problem that aims to find a sparse solution of a possibly noisy underdetermined system of linear equations. Mathematically, it can be modeled in a unified manner by minimizing $\|\bf{x}\|_p^p$ subject to $\|A\bf{x}-\bf{b}\|_q\leq\sigma$ for given $A \in \mathbb{R}^{m \times n}$, $\bf{b}\in\mathbb{R}^m$, $\sigma \geq0$, $0\leq p\leq 1$ and $q \geq 1$. We then study various properties of the optimal solutions of this problem. Specifically, without any condition on the matrix $A$, we provide upper bounds in cardinality and infinity norm for the optimal solutions, and show that all optimal solutions must be on the boundary of the feasible set when $0<p<1$. Moreover, for $q \in \{1,\infty\}$, we show that the problem with $0<p<1$ has a finite number of optimal solutions and prove that there exists $0<p^*<1$ such that the solution set of the problem with any $0<p<p^*$ is contained in the solution set of the problem with $p=0$ and there further exists $0<\bar{p}<p^*$ such that the solution set of the problem with any $0<p\leq\bar{p}$ remains unchanged. An estimation of such $p^*$ is also provided. In addition, to solve the constrained nonconvex non-Lipschitz $L_p$-$L_1$ problem ($0<p<1$ and $q=1$), we propose a smoothing penalty method and show that, under some mild conditions, any cluster point of the sequence generated is a KKT point of our problem. Some numerical examples are given to implicitly illustrate the theoretical results and show the efficiency of the proposed algorithm for the constrained $L_p$-$L_1$ problem under different noises.
Science and engineering problems subject to uncertainty are frequently both computationally expensive and feature nonsmooth parameter dependence, making standard Monte Carlo too slow, and excluding efficient use of accelerated uncertainty quantification methods relying on strict smoothness assumptions. To remedy these challenges, we propose an adaptive stratification method suitable for nonsmooth problems and with significantly reduced variance compared to Monte Carlo sampling. The stratification is iteratively refined and samples are added sequentially to satisfy an allocation criterion combining the benefits of proportional and optimal sampling. Theoretical estimates are provided for the expected performance and probability of failure to correctly estimate essential statistics. We devise a practical adaptive stratification method with strata of the same kind of geometrical shapes, cost-effective refinement satisfying a greedy variance reduction criterion. Numerical experiments corroborate the theoretical findings and exhibit speedups of up to three orders of magnitude compared to standard Monte Carlo sampling.
We propose a novel Riemannian method for solving the Extreme multi-label classification problem that exploits the geometric structure of the sparse low-dimensional local embedding models. A constrained optimization problem is formulated as an optimization problem on matrix manifold and solved using a Riemannian optimization method. The proposed approach is tested on several real world large scale multi-label datasets and its usefulness is demonstrated through numerical experiments. The numerical experiments suggest that the proposed method is fastest to train and has least model size among the embedding-based methods. An outline of the proof of convergence for the proposed Riemannian optimization method is also stated.
Transfer learning is a machine learning paradigm where knowledge from one problem is utilized to solve a new but related problem. While conceivable that knowledge from one task could be useful for solving a related task, if not executed properly, transfer learning algorithms can impair the learning performance instead of improving it -- commonly known as negative transfer. In this paper, we study transfer learning from a Bayesian perspective, where a parametric statistical model is used. Specifically, we study three variants of transfer learning problems, instantaneous, online, and time-variant transfer learning. For each problem, we define an appropriate objective function, and provide either exact expressions or upper bounds on the learning performance using information-theoretic quantities, which allow simple and explicit characterizations when the sample size becomes large. Furthermore, examples show that the derived bounds are accurate even for small sample sizes. The obtained bounds give valuable insights into the effect of prior knowledge for transfer learning, at least with respect to our Bayesian formulation of the transfer learning problem. In particular, we formally characterize the conditions under which negative transfer occurs. Lastly, we devise two (online) transfer learning algorithms that are amenable to practical implementations, one of which does not require the parametric assumption. We demonstrate the effectiveness of our algorithms with real data sets, focusing primarily on when the source and target data have strong similarities.
We present the first formal verification of approximation algorithms for NP-complete optimization problems: vertex cover, independent set, set cover, center selection, load balancing, and bin packing. We uncover incompletenesses in existing proofs and improve the approximation ratio in one case. All proofs are uniformly invariant based.
In information retrieval (IR) and related tasks, term weighting approaches typically consider the frequency of the term in the document and in the collection in order to compute a score reflecting the importance of the term for the document. In tasks characterized by the presence of training data (such as text classification) it seems logical that the term weighting function should take into account the distribution (as estimated from training data) of the term across the classes of interest. Although `supervised term weighting' approaches that use this intuition have been described before, they have failed to show consistent improvements. In this article we analyse the possible reasons for this failure, and call consolidated assumptions into question. Following this criticism we propose a novel supervised term weighting approach that, instead of relying on any predefined formula, learns a term weighting function optimised on the training set of interest; we dub this approach \emph{Learning to Weight} (LTW). The experiments that we run on several well-known benchmarks, and using different learning methods, show that our method outperforms previous term weighting approaches in text classification.
Graph convolutional neural networks have recently shown great potential for the task of zero-shot learning. These models are highly sample efficient as related concepts in the graph structure share statistical strength allowing generalization to new classes when faced with a lack of data. However, multi-layer architectures, which are required to propagate knowledge to distant nodes in the graph, dilute the knowledge by performing extensive Laplacian smoothing at each layer and thereby consequently decrease performance. In order to still enjoy the benefit brought by the graph structure while preventing dilution of knowledge from distant nodes, we propose a Dense Graph Propagation (DGP) module with carefully designed direct links among distant nodes. DGP allows us to exploit the hierarchical graph structure of the knowledge graph through additional connections. These connections are added based on a node's relationship to its ancestors and descendants. A weighting scheme is further used to weigh their contribution depending on the distance to the node to improve information propagation in the graph. Combined with finetuning of the representations in a two-stage training approach our method outperforms state-of-the-art zero-shot learning approaches.
Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space, such as the simplex, the time-discretisation error can dominate when we are near the boundary of the space. We demonstrate that while current SGMCMC methods for the simplex perform well in certain cases, they struggle with sparse simplex spaces; when many of the components are close to zero. However, most popular large-scale applications of Bayesian inference on simplex spaces, such as network or topic models, are sparse. We argue that this poor performance is due to the biases of SGMCMC caused by the discretization error. To get around this, we propose the stochastic CIR process, which removes all discretization error and we prove that samples from the stochastic CIR process are asymptotically unbiased. Use of the stochastic CIR process within a SGMCMC algorithm is shown to give substantially better performance for a topic model and a Dirichlet process mixture model than existing SGMCMC approaches.
We present an end-to-end framework for solving the Vehicle Routing Problem (VRP) using reinforcement learning. In this approach, we train a single model that finds near-optimal solutions for problem instances sampled from a given distribution, only by observing the reward signals and following feasibility rules. Our model represents a parameterized stochastic policy, and by applying a policy gradient algorithm to optimize its parameters, the trained model produces the solution as a sequence of consecutive actions in real time, without the need to re-train for every new problem instance. On capacitated VRP, our approach outperforms classical heuristics and Google's OR-Tools on medium-sized instances in solution quality with comparable computation time (after training). We demonstrate how our approach can handle problems with split delivery and explore the effect of such deliveries on the solution quality. Our proposed framework can be applied to other variants of the VRP such as the stochastic VRP, and has the potential to be applied more generally to combinatorial optimization problems.
This paper proposes a Reinforcement Learning (RL) algorithm to synthesize policies for a Markov Decision Process (MDP), such that a linear time property is satisfied. We convert the property into a Limit Deterministic Buchi Automaton (LDBA), then construct a product MDP between the automaton and the original MDP. A reward function is then assigned to the states of the product automaton, according to accepting conditions of the LDBA. With this reward function, our algorithm synthesizes a policy that satisfies the linear time property: as such, the policy synthesis procedure is "constrained" by the given specification. Additionally, we show that the RL procedure sets up an online value iteration method to calculate the maximum probability of satisfying the given property, at any given state of the MDP - a convergence proof for the procedure is provided. Finally, the performance of the algorithm is evaluated via a set of numerical examples. We observe an improvement of one order of magnitude in the number of iterations required for the synthesis compared to existing approaches.