This paper presents a Multiple Kernel Learning (abbreviated as MKL) framework for the Support Vector Machine (SVM) with the $(0, 1)$ loss function. Some KKT-like first-order optimality conditions are provided and then exploited to develop a fast ADMM algorithm to solve the nonsmooth nonconvex optimization problem. Numerical experiments on synthetic and real datasets show that the performance of our MKL-$L_{0/1}$-SVM is comparable with the one of the leading approaches called SimpleMKL developed by Rakotomamonjy, Bach, Canu, and Grandvalet [Journal of Machine Learning Research, vol.~9, pp.~2491--2521, 2008].
We introduce and analyse a family of hash and predicate functions that are more likely to produce collisions for small reducible configurations of vectors. These may offer practical improvements to lattice sieving for short vectors. In particular, in one asymptotic regime the family exhibits significantly different convergent behaviour than existing hash functions and predicates.
Consider the problem of estimating a random variable $X$ from noisy observations $Y = X+ Z$, where $Z$ is standard normal, under the $L^1$ fidelity criterion. It is well known that the optimal Bayesian estimator in this setting is the conditional median. This work shows that the only prior distribution on $X$ that induces linearity in the conditional median is Gaussian. Along the way, several other results are presented. In particular, it is demonstrated that if the conditional distribution $P_{X|Y=y}$ is symmetric for all $y$, then $X$ must follow a Gaussian distribution. Additionally, we consider other $L^p$ losses and observe the following phenomenon: for $p \in [1,2]$, Gaussian is the only prior distribution that induces a linear optimal Bayesian estimator, and for $p \in (2,\infty)$, infinitely many prior distributions on $X$ can induce linearity. Finally, extensions are provided to encompass noise models leading to conditional distributions from certain exponential families.
In the Steiner point removal (SPR) problem, we are given a (weighted) graph $G$ and a subset $T$ of its vertices called terminals, and the goal is to compute a (weighted) graph $H$ on $T$ that is a minor of $G$, such that the distance between every pair of terminals is preserved to within some small multiplicative factor, that is called the stretch of $H$. It has been shown that on general graphs we can achieve stretch $O(\log |T|)$ [Filtser, 2018]. On the other hand, the best-known stretch lower bound is $8$ [Chan-Xia-Konjevod-Richa, 2006], which holds even for trees. In this work, we show an improved lower bound of $\tilde\Omega\big(\sqrt{\log |T|}\big)$.
Given a large graph $G$ with a subset $|T|=k$ of its vertices called terminals, a quality-$q$ flow sparsifier is a small graph $G'$ that contains $T$ and preserves all multicommodity flows that can be routed between terminals in $T$, to within factor $q$. The problem of constructing flow sparsifiers with good (small) quality and (small) size has been a central problem in graph compression for decades. A natural approach of constructing $O(1)$-quality flow sparsifiers, which was adopted in most previous constructions, is contraction. Andoni, Krauthgamer, and Gupta constructed a sketch of size $f(k,\varepsilon)$ that stores all feasible multicommodity flows up to a factor of $(1+\varepsilon)$, raised the question of constructing quality-$(1+\varepsilon)$ flow sparsifiers whose size only depends on $k,\varepsilon$ (but not the number of vertices in the input graph $G$), and proposed a contraction-based framework towards it using their sketch result. In this paper, we settle their question for contraction-based flow sparsifiers, by showing that quality-$(1+\varepsilon)$ contraction-based flow sparsifiers with size $f(\varepsilon)$ exist for all $5$-terminal graphs, but not for all $6$-terminal graphs. Our hardness result on $6$-terminal graphs improves upon a recent hardness result by Krauthgamer and Mosenzon on exact (quality-$1$) flow sparsifiers, for contraction-based constructions. Our construction and proof utilize the notion of tight spans in metric geometry, which we believe is a powerful tool for future work.
We consider online reinforcement learning (RL) in episodic Markov decision processes (MDPs) under the linear $q^\pi$-realizability assumption, where it is assumed that the action-values of all policies can be expressed as linear functions of state-action features. This class is known to be more general than linear MDPs, where the transition kernel and the reward function are assumed to be linear functions of the feature vectors. As our first contribution, we show that the difference between the two classes is the presence of states in linearly $q^\pi$-realizable MDPs where for any policy, all the actions have approximately equal values, and skipping over these states by following an arbitrarily fixed policy in those states transforms the problem to a linear MDP. Based on this observation, we derive a novel (computationally inefficient) learning algorithm for linearly $q^\pi$-realizable MDPs that simultaneously learns what states should be skipped over and runs another learning algorithm on the linear MDP hidden in the problem. The method returns an $\epsilon$-optimal policy after $\text{polylog}(H, d)/\epsilon^2$ interactions with the MDP, where $H$ is the time horizon and $d$ is the dimension of the feature vectors, giving the first polynomial-sample-complexity online RL algorithm for this setting. The results are proved for the misspecified case, where the sample complexity is shown to degrade gracefully with the misspecification error.
In this paper we give the detailed error analysis of two algorithms (denoted as $W_1$ and $W_2$) for computing the symplectic factorization of a symmetric positive definite and symplectic matrix $A \in \mathbb R^{2n \times 2n}$ in the form $A=LL^T$, where $L \in \mathbb R^{2n \times 2n}$ is a symplectic block lower triangular matrix. Algorithm $W_1$ is an implementation of the $HH^T$ factorization from [Dopico et al., 2009]. Algorithm $W_2$, proposed in [Bujok et al., 2023], uses both Cholesky and Reverse Cholesky decompositions of symmetric positive definite matrix blocks that appear during the factorization. We prove that Algorithm $W_2$ is numerically stable for a broader class of symmetric positive definite matrices $A \in \mathbb R^{2n \times 2n}$, producing the computed factors $\tilde L$ in floating-point arithmetic with machine precision $u$, such that $||A-\tilde L {\tilde L}^T||_2= {\cal O}(u ||A||_2)$. However, Algorithm $W_1$ is unstable in general for symmetric positive definite and symplectic matrix $A$. This was confirmed by numerical experiments in [Bujok et al., 2023]. In this paper we give corresponding bounds also for Algorithm $W_1$ that are weaker, since we show that the factorization error depends on the size of the inverse of the principal submatrix $A_{11}$. The tests performed in MATLAB illustrate that our error bounds for considered algorithms are reasonably sharp.
Let $G=(V,E)$ be a simple undirected graph. The open neighbourhood of a vertex $v$ in $G$ is defined as $N_G(v)=\{u\in V~|~ uv\in E\}$; whereas the closed neighbourhood is defined as $N_G[v]= N_G(v)\cup \{v\}$. For an integer $k$, a subset $D\subseteq V$ is called a $k$-vertex-edge dominating set of $G$ if for every edge $uv\in E$, $|(N_G[u]\cup N_G[v]) \cap D|\geq k$. In $k$-vertex-edge domination problem, our goal is to find a $k$-vertex-edge dominating set of minimum cardinality of an input graph $G$. In this paper, we first prove that the decision version of $k$-vertex-edge domination problem is NP-complete for chordal graphs. On the positive side, we design a linear time algorithm for finding a minimum $k$-vertex-edge dominating set of tree. We also prove that there is a $O(\log(\Delta(G)))$-approximation algorithm for this problem in general graph $G$, where $\Delta(G)$ is the maximum degree of $G$. Then we show that for a graph $G$ with $n$ vertices, this problem cannot be approximated within a factor of $(1-\epsilon) \ln n$ for any $\epsilon >0$ unless $NP\subseteq DTIME(|V|^{O(\log\log|V|)})$. Finally, we prove that it is APX-complete for graphs with bounded degree $k+3$.
The top-$k$-sum operator computes the sum of the largest $k$ components of a given vector. The Euclidean projection onto the top-$k$-sum constraint serves as a crucial subroutine in iterative methods to solve composite superquantile optimization problems. In this paper, we introduce a solver that implements two finite-termination algorithms to compute this projection. Both algorithms have complexity $O(n)$ when applied to a sorted $n$-dimensional input vector, where the absorbed constant is independent of $k$. This stands in contrast to the existing grid-search-inspired method that has $O(k(n-k))$ complexity. The improvement is significant when $k$ is linearly dependent on $n$, which frequently encountered in practical superquantile optimization applications. In instances where the input vector is unsorted, an additional cost is incurred to (partially) sort the vector. To reduce this cost, we further derive a rigorous procedure that leverages approximate sorting to compute the projection, which is particularly useful when solving a sequence of similar projection problems. Numerical results show that our methods solve problems of scale $n=10^7$ and $k=10^4$ within $0.05$ seconds, whereas the existing grid-search-based method and the Gurobi QP solver can take minutes to hours.
We present approximation algorithms for the Fault-tolerant $k$-Supplier with Outliers ($\mathsf{F}k\mathsf{SO}$) problem. This is a common generalization of two known problems -- $k$-Supplier with Outliers, and Fault-tolerant $k$-Supplier -- each of which generalize the well-known $k$-Supplier problem. In the $k$-Supplier problem the goal is to serve $n$ clients $C$, by opening $k$ facilities from a set of possible facilities $F$; the objective function is the farthest that any client must travel to access an open facility. In $\mathsf{F}k\mathsf{SO}$, each client $v$ has a fault-tolerance $\ell_v$, and now desires $\ell_v$ facilities to serve it; so each client $v$'s contribution to the objective function is now its distance to the $\ell_v^{\text{th}}$ closest open facility. Furthermore, we are allowed to choose $m$ clients that we will serve, and only those clients contribute to the objective function, while the remaining $n-m$ are considered outliers. Our main result is a $\min\{4t-1,2^t+1\}$-approximation for the $\mathsf{F}k\mathsf{SO}$ problem, where $t$ is the number of distinct values of $\ell_v$ that appear in the instance. At $t=1$, i.e. in the case where the $\ell_v$'s are uniformly some $\ell$, this yields a $3$-approximation, improving upon the $11$-approximation given for the uniform case by Inamdar and Varadarajan [2020], who also introduced the problem. Our result for the uniform case matches tight $3$-approximations that exist for $k$-Supplier, $k$-Supplier with Outliers, and Fault-tolerant $k$-Supplier. Our key technical contribution is an application of the round-or-cut schema to $\mathsf{F}k\mathsf{SO}$. Guided by an LP relaxation, we reduce to a simpler optimization problem, which we can solve to obtain distance bounds for the "round" step, and valid inequalities for the "cut" step.
This paper introduces a novel class of fair and interpolatory curves called $p\kappa$-curves. These curves are comprised of smoothly stitched B\'ezier curve segments, where the curvature distribution of each segment is made to closely resemble a parabola, resulting in an aesthetically pleasing shape. Moreover, each segment passes through an interpolated point at a parameter where the parabola has an extremum, encouraging the alignment of interpolated points with curvature extrema. To achieve these properties, we tailor an energy function that guides the optimization process to obtain the desired curve characteristics. Additionally, we develop an efficient algorithm and an initialization method, enabling interactive modeling of the $p\kappa$-curves without the need for global optimization. We provide various examples and comparisons with existing state-of-the-art methods to demonstrate the curve modeling capabilities and visually pleasing appearance of $p\kappa$-curves.