Our research deals with the optimization version of the set partition problem, where the objective is to minimize the absolute difference between the sums of the two disjoint partitions. Although this problem is known to be NP-hard and requires exponential time to solve, we propose a less demanding version of this problem where the goal is to find a locally optimal solution. In our approach, we consider the local optimality in respect to any movement of at most two elements. To accomplish this, we developed an algorithm that can generate a locally optimal solution in at most $O(N^2)$ time and $O(N)$ space. Our algorithm can handle arbitrary input precisions and does not require positive or integer inputs. Hence, it can be applied in various problem scenarios with ease.
Bayesian variable selection methods are powerful techniques for fitting and inferring on sparse high-dimensional linear regression models. However, many are computationally intensive or require restrictive prior distributions on model parameters. In this paper, we proposed a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression. Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates of hyperparameters. Efficient maximum a posteriori (MAP) estimation is completed through a Parameter-Expanded Expectation-Conditional-Maximization (PX-ECM) algorithm. The PX-ECM results in a robust computationally efficient coordinate-wise optimization, which adjusts for the impact of other predictor variables. The completion of the E-step uses an approach motivated by the popular two-groups approach to multiple testing. The result is a PaRtitiOned empirical Bayes Ecm (PROBE) algorithm applied to sparse high-dimensional linear regression, which can be completed using one-at-a-time or all-at-once type optimization. We compare the empirical properties of PROBE to comparable approaches with numerous simulation studies and an analysis of cancer cell lines drug response study. The proposed approach is implemented in the R package probe.
Given a matroid $M=(E,{\cal I})$, and a total ordering over the elements $E$, a broken circuit is a circuit where the smallest element is removed and an NBC independent set is an independent set in ${\cal I}$ with no broken circuit. The set of NBC independent sets of any matroid $M$ define a simplicial complex called the broken circuit complex which has been the subject of intense study in combinatorics. Recently, Adiprasito, Huh and Katz showed that the face of numbers of any broken circuit complex form a log-concave sequence, proving a long-standing conjecture of Rota. We study counting and optimization problems on NBC bases of a generic matroid. We find several fundamental differences with the independent set complex: for example, we show that it is NP-hard to find the max-weight NBC base of a matroid or that the convex hull of NBC bases of a matroid has edges of arbitrary large length. We also give evidence that the natural down-up walk on the space of NBC bases of a matroid may not mix rapidly by showing that for some family of matroids it is NP-hard to count the number of NBC bases after certain conditionings.
It is not difficult to think of applications that can be modelled as graph problems in which placing some facility or commodity at a vertex has some positive or negative effect on the values of all the vertices out to some distance, and we want to be able to calculate quickly the cumulative effect on any vertex's value at any time or the list of the most beneficial or most detrimential effects on a vertex. In this paper we show how, given an edge-weighted graph with constant-size separators, we can support the following operations on it in time polylogarithmic in the number of vertices and the number of facilities placed on the vertices, where distances between vertices are measured with respect to the edge weights: Add (v, f, w, d) places a facility of weight w and with effect radius d onto vertex v. Remove (v, f) removes a facility f previously placed on v using Add from v. Sum (v) or Sum (v, d) returns the total weight of all facilities affecting v or, with a distance parameter d, the total weight of all facilities whose effect region intersects the ``circle'' with radius d around v. Top (v, k) or Top (v, k, d) returns the k facilities of greatest weight that affect v or, with a distance parameter d, whose effect region intersects the ``circle'' with radius d around v. The weights of the facilities and the operation that Sum uses to ``sum'' them must form a semigroup. For Top queries, the weights must be drawn from a total order.
Typical deep visual recognition models are capable of performing the one task they were trained on. In this paper, we tackle the extremely difficult problem of combining completely distinct models with different initializations, each solving a separate task, into one multi-task model without any additional training. Prior work in model merging permutes one model to the space of the other then adds them together. While this works for models trained on the same task, we find that this fails to account for the differences in models trained on disjoint tasks. Thus, we introduce "ZipIt!", a general method for merging two arbitrary models of the same architecture that incorporates two simple strategies. First, in order to account for features that aren't shared between models, we expand the model merging problem to additionally allow for merging features within each model by defining a general "zip" operation. Second, we add support for partially zipping the models up until a specified layer, naturally creating a multi-head model. We find that these two changes combined account for a staggering 20-60% improvement over prior work, making the merging of models trained on disjoint tasks feasible.
In this paper, we focus on the solution of online optimization problems that arise often in signal processing and machine learning, in which we have access to streaming sources of data. We discuss algorithms for online optimization based on the prediction-correction paradigm, both in the primal and dual space. In particular, we leverage the typical regularized least-squares structure appearing in many signal processing problems to propose a novel and tailored prediction strategy, which we call extrapolation-based. By using tools from operator theory, we then analyze the convergence of the proposed methods as applied both to primal and dual problems, deriving an explicit bound for the tracking error, that is, the distance from the time-varying optimal solution. We further discuss the empirical performance of the algorithm when applied to signal processing, machine learning, and robotics problems.
We describe a practical algorithm for computing normal forms for semigroups and monoids with finite presentations satisfying so-called small overlap conditions. Small overlap conditions are natural conditions on the relations in a presentation, which were introduced by J. H. Remmers and subsequently studied extensively by M. Kambites. Presentations satisfying these conditions are ubiquitous; Kambites showed that a randomly chosen finite presentation satisfies the $C(4)$ condition with probability tending to 1 as the sum of the lengths of relation words tends to infinity. Kambites also showed that several key problems for finitely presented semigroups and monoids are tractable in $C(4)$ monoids: the word problem is solvable in $O(\min\{|u|, |v|\})$ time in the size of the input words $u$ and $v$; the uniform word problem for $\langle A|R\rangle$ is solvable in $O(N ^ 2 \min\{|u|, |v|\})$ where $N$ is the sum of the lengths of the words in $R$; and a normal form for any given word $u$ can be found in $O(|u|)$ time. Although Kambites' algorithm for solving the word problem in $C(4)$ monoids is highly practical, it appears that the coefficients in the linear time algorithm for computing normal forms are too large in practice. In this paper, we present an algorithm for computing normal forms in $C(4)$ monoids that has time complexity $O(|u| ^ 2)$ for input word $u$, but where the coefficients are sufficiently small to allow for practical computation. Additionally, we show that the uniform word problem for small overlap monoids can be solved in $O(N \min\{|u|, |v|\})$ time.
Since its inception in Erikki Oja's seminal paper in 1982, Oja's algorithm has become an established method for streaming principle component analysis (PCA). We study the problem of streaming PCA, where the data-points are sampled from an irreducible, aperiodic, and reversible Markov chain. Our goal is to estimate the top eigenvector of the unknown covariance matrix of the stationary distribution. This setting has implications in situations where data can only be sampled from a Markov Chain Monte Carlo (MCMC) type algorithm, and the goal is to do inference for parameters of the stationary distribution of this chain. Most convergence guarantees for Oja's algorithm in the literature assume that the data-points are sampled IID. For data streams with Markovian dependence, one typically downsamples the data to get a "nearly" independent data stream. In this paper, we obtain the first sharp rate for Oja's algorithm on the entire data, where we remove the logarithmic dependence on $n$ resulting from throwing data away in downsampling strategies.
For solving combinatorial optimisation problems with metaheuristics, different search operators are applied for sampling new solutions in the neighbourhood of a given solution. It is important to understand the relationship between operators for various purposes, e.g., adaptively deciding when to use which operator to find optimal solutions efficiently. However, it is difficult to theoretically analyse this relationship, especially in the complex solution space of combinatorial optimisation problems. In this paper, we propose to empirically analyse the relationship between operators in terms of the correlation between their local optima and develop a measure for quantifying their relationship. The comprehensive analyses on a wide range of capacitated vehicle routing problem benchmark instances show that there is a consistent pattern in the correlation between commonly used operators. Based on this newly proposed local optima correlation metric, we propose a novel approach for adaptively selecting among the operators during the search process. The core intention is to improve search efficiency by preventing wasting computational resources on exploring neighbourhoods where the local optima have already been reached. Experiments on randomly generated instances and commonly used benchmark datasets are conducted. Results show that the proposed approach outperforms commonly used adaptive operator selection methods.
Bayesian optimization (BO) is a popular approach for sample-efficient optimization of black-box objective functions. While BO has been successfully applied to a wide range of scientific applications, traditional approaches to single-objective BO only seek to find a single best solution. This can be a significant limitation in situations where solutions may later turn out to be intractable. For example, a designed molecule may turn out to violate constraints that can only be reasonably evaluated after the optimization process has concluded. To address this issue, we propose Rank-Ordered Bayesian Optimization with Trust-regions (ROBOT) which aims to find a portfolio of high-performing solutions that are diverse according to a user-specified diversity metric. We evaluate ROBOT on several real-world applications and show that it can discover large sets of high-performing diverse solutions while requiring few additional function evaluations compared to finding a single best solution.
Given a graph $G$, an edge-coloring is an assignment of colors to edges of $G$ such that any two edges sharing an endpoint receive different colors. By Vizing's celebrated theorem, any graph of maximum degree $\Delta$ needs at least $\Delta$ and at most $(\Delta + 1)$ colors to be properly edge colored. In this paper, we study edge colorings in the streaming setting. The edges arrive one by one in an arbitrary order. The algorithm takes a single pass over the input and must output a solution using a much smaller space than the input size. Since the output of edge coloring is as large as its input, the assigned colors should also be reported in a streaming fashion. The streaming edge coloring problem has been studied in a series of works over the past few years. The main challenge is that the algorithm cannot "remember" all the color assignments that it returns. To ensure the validity of the solution, existing algorithms use many more colors than Vizing's bound. Namely, in $n$-vertex graphs, the state-of-the-art algorithm with $\widetilde{O}(n s)$ space requires $O(\Delta^2/s + \Delta)$ colors. Note, in particular, that for an asymptotically optimal $O(\Delta)$ coloring, this algorithm requires $\Omega(n\Delta)$ space which is as large as the input. Whether such a coloring can be achieved with sublinear space has been left open. In this paper, we answer this question in the affirmative. We present a randomized algorithm that returns an asymptotically optimal $O(\Delta)$ edge coloring using $\widetilde{O}(n \sqrt{\Delta})$ space. More generally, our algorithm returns a proper $O(\Delta^{1.5}/s + \Delta)$ edge coloring with $\widetilde{O}(n s)$ space, improving prior algorithms for the whole range of $s$.