A hybrid framework combining the branch and bound method with multiobjective evolutionary algorithms is proposed for nonconvex multiobjective optimization. The hybridization exploits the complementary character of the two optimization strategies. A multiobjective evolutionary algorithm is intended for inducing tight lower and upper bounds during the branch and bound procedure. Tight bounds such as the ones derived in this way can reduce the number of subproblems that have to be solved. The branch and bound method guarantees the global convergence of the framework and improves the search capability of the multiobjective evolutionary algorithm. An implementation of the hybrid framework considering NSGA-II and MOEA/D-DE as multiobjective evolutionary algorithms is presented. Numerical experiments verify the hybrid algorithms benefit from synergy of the branch and bound method and multiobjective evolutionary algorithms.
In recent years, many connections have been made between minimal codes, a classical object in coding theory, and other remarkable structures in finite geometry and combinatorics. One of the main problems related to minimal codes is to give lower and upper bounds on the length $m(k,q)$ of the shortest minimal codes of a given dimension $k$ over the finite field $\mathbb{F}_q$. It has been recently proved that $m(k, q) \geq (q+1)(k-1)$. In this note, we prove that $\liminf_{k \rightarrow \infty} \frac{m(k, q)}{k} \geq (q+ \varepsilon(q) )$, where $\varepsilon$ is an increasing function such that $1.52 <\varepsilon(2)\leq \varepsilon(q) \leq \sqrt{2} + \frac{1}{2}$. Hence, the previously known lower bound is not tight for large enough $k$. We then focus on the binary case and prove some structural results on minimal codes of length $3(k-1)$. As a byproduct, we are able to show that, if $k = 5 \pmod 8$ and for other small values of $k$, the bound is not tight.
Bayesian inference with nested sampling requires a likelihood-restricted prior sampling method, which draws samples from the prior distribution that exceed a likelihood threshold. For high-dimensional problems, Markov Chain Monte Carlo derivatives have been proposed. We numerically study ten algorithms based on slice sampling, hit-and-run and differential evolution algorithms in ellipsoidal, non-ellipsoidal and non-convex problems from 2 to 100 dimensions. Mixing capabilities are evaluated with the nested sampling shrinkage test. This makes our results valid independent of how heavy-tailed the posteriors are. Given the same number of steps, slice sampling is outperformed by hit-and-run and whitened slice sampling, while whitened hit-and-run does not provide as good results. Proposing along differential vectors of live point pairs also leads to the highest efficiencies, and appears promising for multi-modal problems. The tested proposals are implemented in the UltraNest nested sampling package, enabling efficient low and high-dimensional inference of a large class of practical inference problems relevant to astronomy, cosmology, particle physics and astronomy.
When estimating a Global Average Treatment Effect (GATE) under network interference, units can have widely different relationships to the treatment depending on a combination of the structure of their network neighborhood, the structure of the interference mechanism, and how the treatment was distributed in their neighborhood. In this work, we introduce a sequential procedure to generate and select graph- and treatment-based covariates for GATE estimation under regression adjustment. We show that it is possible to simultaneously achieve low bias and considerably reduce variance with such a procedure. To tackle inferential complications caused by our feature generation and selection process, we introduce a way to construct confidence intervals based on a block bootstrap. We illustrate that our selection procedure and subsequent estimator can achieve good performance in terms of root mean squared error in several semi-synthetic experiments with Bernoulli designs, comparing favorably to an oracle estimator that takes advantage of regression adjustments for the known underlying interference structure. We apply our method to a real world experimental dataset with strong evidence of interference and demonstrate that it can estimate the GATE reasonably well without knowing the interference process a priori.
In the usual Bayesian setting, a full probabilistic model is required to link the data and parameters, and the form of this model and the inference and prediction mechanisms are specified via de Finetti's representation. In general, such a formulation is not robust to model mis-specification of its component parts. An alternative approach is to draw inference based on loss functions, where the quantity of interest is defined as a minimizer of some expected loss, and to construct posterior distributions based on the loss-based formulation; this strategy underpins the construction of the Gibbs posterior. We develop a Bayesian non-parametric approach; specifically, we generalize the Bayesian bootstrap, and specify a Dirichlet process model for the distribution of the observables. We implement this using direct prior-to-posterior calculations, but also using predictive sampling. We also study the assessment of posterior validity for non-standard Bayesian calculations, and provide an efficient way to calibrate the scaling parameter in the Gibbs posterior so that it can achieve the desired coverage rate. We show that the developed non-standard Bayesian updating procedures yield valid posterior distributions in terms of consistency and asymptotic normality under model mis-specification. Simulation studies show that the proposed methods can recover the true value of the parameter efficiently and achieve frequentist coverage even when the sample size is small. Finally, we apply our methods to evaluate the causal impact of speed cameras on traffic collisions in England.
Optimization problems involving mixed variables, i.e., variables of numerical and categorical nature, can be challenging to solve, especially in the presence of complex constraints. Moreover, when the objective function is the result of a simulation or experiment, it may be expensive to evaluate. In this paper, we propose a novel surrogate-based global optimization algorithm, called PWAS, based on constructing a piecewise affine surrogate of the objective function over feasible samples. We introduce two types of exploration functions to efficiently search the feasible domain via mixed integer linear programming (MILP) solvers. We also provide a preference-based version of the algorithm, called PWASp, which can be used when only pairwise comparisons between samples can be acquired while the objective function remains unquantified. PWAS and PWASp are tested on mixed-variable benchmark problems with and without constraints. The results show that, within a small number of acquisitions, PWAS and PWASp can often achieve better or comparable results than other existing methods.
A methodology is presented for the numerical solution of nonlinear elliptic systems in unbounded domains, consisting of three elements. First, the problem is posed on a finite domain by means of a proper nonlinear change of variables. The compressed domain is then discretised, regardless of its final shape, via the radial basis function partition of unity method. Finally, the system of nonlinear algebraic collocation equations is solved with the trust-region algorithm, taking advantage of analytically derived Jacobians. We validate the methodology on a benchmark of computational fluid mechanics: the steady viscous flow past a circular cylinder. The resulting flow characteristics compare very well with the literature. Then, we stress-test the methodology on less smooth obstacles - rounded and sharp square cylinders. As expected, in the latter scenario the solution is polluted by spurious oscillations, owing to the presence of boundary singularities.
We revisit the classic problem of optimal subset selection in the online learning set-up. Assume that the set $[N]$ consists of $N$ distinct elements. On the $t$th round, an adversary chooses a monotone reward function $f_t: 2^{[N]} \to \mathbb{R}_+$ that assigns a non-negative reward to each subset of $[N].$ An online policy selects (perhaps randomly) a subset $S_t \subseteq [N]$ consisting of $k$ elements before the reward function $f_t$ for the $t$th round is revealed to the learner. As a consequence of its choice, the policy receives a reward of $f_t(S_t)$ on the $t$th round. Our goal is to design an online sequential subset selection policy to maximize the expected cumulative reward accumulated over a time horizon. In this connection, we propose an online learning policy called SCore (Subset Selection with Core) that solves the problem for a large class of reward functions. The proposed SCore policy is based on a new polyhedral characterization of the reward functions called $\alpha$-Core - a generalization of Core from the cooperative game theory literature. We establish a learning guarantee for the SCore policy in terms of a new performance metric called $\alpha$-augmented regret. In this new metric, the performance of the online policy is compared with an unrestricted offline benchmark that can select all $N$ elements at every round. We show that a large class of reward functions, including submodular, can be efficiently optimized with the SCore policy. We also extend the proposed policy to the optimistic learning set-up where the learner has access to additional untrusted hints regarding the reward functions. Finally, we conclude the paper with a list of open problems.
Given a boolean predicate $\Pi$ on labeled networks (e.g., proper coloring, leader election, etc.), a self-stabilizing algorithm for $\Pi$ is a distributed algorithm that can start from any initial configuration of the network (i.e., every node has an arbitrary value assigned to each of its variables), and eventually converge to a configuration satisfying $\Pi$. It is known that leader election does not have a deterministic self-stabilizing algorithm using a constant-size register at each node, i.e., for some networks, some of their nodes must have registers whose sizes grow with the size $n$ of the networks. On the other hand, it is also known that leader election can be solved by a deterministic self-stabilizing algorithm using registers of $O(\log \log n)$ bits per node in any $n$-node bounded-degree network. We show that this latter space complexity is optimal. Specifically, we prove that every deterministic self-stabilizing algorithm solving leader election must use $\Omega(\log \log n)$-bit per node registers in some $n$-node networks. In addition, we show that our lower bounds go beyond leader election, and apply to all problems that cannot be solved by anonymous algorithms.
We investigate the connections between sparse approximation methods for making kernel methods and Gaussian processes (GPs) scalable to large-scale data, focusing on the Nystr\"om method and the Sparse Variational Gaussian Processes (SVGP). While sparse approximation methods for GPs and kernel methods share some algebraic similarities, the literature lacks a deep understanding of how and why they are related. This may pose an obstacle to the communications between the GP and kernel communities, making it difficult to transfer results from one side to the other. Our motivation is to remove this obstacle, by clarifying the connections between the sparse approximations for GPs and kernel methods. In this work, we study the two popular approaches, the Nystr\"om and SVGP approximations, in the context of a regression problem, and establish various connections and equivalences between them. In particular, we provide an RKHS interpretation of the SVGP approximation, and show that the Evidence Lower Bound of the SVGP contains the objective function of the Nystr\"om approximation, revealing the origin of the algebraic equivalence between the two approaches. We also study recently established convergence results for the SVGP and how they are related to the approximation quality of the Nystr\"om method.
Differential private optimization for nonconvex smooth objective is considered. In the previous work, the best known utility bound is $\widetilde O(\sqrt{d}/(n\varepsilon_\mathrm{DP}))$ in terms of the squared full gradient norm, which is achieved by Differential Private Gradient Descent (DP-GD) as an instance, where $n$ is the sample size, $d$ is the problem dimensionality and $\varepsilon_\mathrm{DP}$ is the differential privacy parameter. To improve the best known utility bound, we propose a new differential private optimization framework called \emph{DIFF2 (DIFFerential private optimization via gradient DIFFerences)} that constructs a differential private global gradient estimator with possibly quite small variance based on communicated \emph{gradient differences} rather than gradients themselves. It is shown that DIFF2 with a gradient descent subroutine achieves the utility of $\widetilde O(d^{2/3}/(n\varepsilon_\mathrm{DP})^{4/3})$, which can be significantly better than the previous one in terms of the dependence on the sample size $n$. To the best of our knowledge, this is the first fundamental result to improve the standard utility $\widetilde O(\sqrt{d}/(n\varepsilon_\mathrm{DP}))$ for nonconvex objectives. Additionally, a more computational and communication efficient subroutine is combined with DIFF2 and its theoretical analysis is also given. Numerical experiments are conducted to validate the superiority of DIFF2 framework.