99欧美日韩精品一区二区红桃,亚洲五月花在线观看

A property $\Pi$ on a finite set $U$ is \emph{monotone} if for every $X \subseteq U$ satisfying $\Pi$, every superset $Y \subseteq U$ of $X$ also satisfies $\Pi$. Many combinatorial properties can be seen as monotone properties. The problem of finding a minimum subset of $U$ satisfying $\Pi$ is a central problem in combinatorial optimization. Although many approximate/exact algorithms have been developed to solve this kind of problem on numerous properties, a solution obtained by these algorithms is often unsuitable for real-world applications due to the difficulty of building accurate mathematical models on real-world problems. A promising approach to overcome this difficulty is to \emph{enumerate} multiple small solutions rather than to \emph{find} a single small solution. To this end, given a weight function $w: U \to \mathbb N$ and an integer $k$, we devise algorithms that \emph{approximately} enumerate all minimal subsets of $U$ with weight at most $k$ satisfying $\Pi$ for various monotone properties $\Pi$, where "approximate enumeration" means that algorithms output all minimal subsets satisfying $\Pi$ whose weight at most $k$ and may output some minimal subsets satisfying $\Pi$ whose weight exceeds $k$ but is at most $ck$ for some constant $c \ge 1$. These algorithms allow us to efficiently enumerate minimal vertex covers, minimal dominating sets in bounded degree graphs, minimal feedback vertex sets, minimal hitting sets in bounded rank hypergraphs, etc., of weight at most $k$ with constant approximation factors.

相關內容

Weight

關注 0

優化器 · 流形 · Analysis · 正則的 · 典型相關分析 ·

2024 年 11 月 8 日

Optimization without Retraction on the Random Generalized Stiefel Manifold

Simon Vary,Pierre Ablin,Bin Gao,P. -A. Absil

from arxiv, This v3 is a corrected version of the ICML 2024 paper (PMLR 235:49226-49248); see the errata at the end

Optimization over the set of matrices $X$ that satisfy $X^\top B X = I_p$, referred to as the generalized Stiefel manifold, appears in many applications involving sampled covariance matrices such as the canonical correlation analysis (CCA), independent component analysis (ICA), and the generalized eigenvalue problem (GEVP). Solving these problems is typically done by iterative methods that require a fully formed $B$. We propose a cheap stochastic iterative method that solves the optimization problem while having access only to random estimates of $B$. Our method does not enforce the constraint in every iteration; instead, it produces iterations that converge to critical points on the generalized Stiefel manifold defined in expectation. The method has lower per-iteration cost, requires only matrix multiplications, and has the same convergence rates as its Riemannian optimization counterparts that require the full matrix $B$. Experiments demonstrate its effectiveness in various machine learning applications involving generalized orthogonality constraints, including CCA, ICA, and the GEVP.

樣本 · Learning · MoDELS · 穩健性 · 泛函 ·

2024 年 11 月 8 日

Sample and Computationally Efficient Robust Learning of Gaussian Single-Index Models

Puqian Wang,Nikos Zarifis,Ilias Diakonikolas,Jelena Diakonikolas

A single-index model (SIM) is a function of the form $\sigma(\mathbf{w}^{\ast} \cdot \mathbf{x})$, where $\sigma: \mathbb{R} \to \mathbb{R}$ is a known link function and $\mathbf{w}^{\ast}$ is a hidden unit vector. We study the task of learning SIMs in the agnostic (a.k.a. adversarial label noise) model with respect to the $L^2_2$-loss under the Gaussian distribution. Our main result is a sample and computationally efficient agnostic proper learner that attains $L^2_2$-error of $O(\mathrm{OPT})+\epsilon$, where $\mathrm{OPT}$ is the optimal loss. The sample complexity of our algorithm is $\tilde{O}(d^{\lceil k^{\ast}/2\rceil}+d/\epsilon)$, where $k^{\ast}$ is the information-exponent of $\sigma$ corresponding to the degree of its first non-zero Hermite coefficient. This sample bound nearly matches known CSQ lower bounds, even in the realizable setting. Prior algorithmic work in this setting had focused on learning in the realizable case or in the presence of semi-random noise. Prior computationally efficient robust learners required significantly stronger assumptions on the link function.

泛函 · 核化 · 最大平均偏差 · 均值 · 離散化 ·

2024 年 11 月 8 日

Wasserstein Gradient Flows of MMD Functionals with Distance Kernel and Cauchy Problems on Quantile Functions

Richard Duong,Viktor Stein,Robert Beinert,Johannes Hertrich,Gabriele Steidl

from arxiv, We corrected the explicit scheme in our code and updated the plots. 45 pages, 23 figures, comments welcome!

We give a comprehensive description of Wasserstein gradient flows of maximum mean discrepancy (MMD) functionals $\mathcal F_\nu := \text{MMD}_K^2(\cdot, \nu)$ towards given target measures $\nu$ on the real line, where we focus on the negative distance kernel $K(x,y) := -|x-y|$. In one dimension, the Wasserstein-2 space can be isometrically embedded into the cone $\mathcal C(0,1) \subset L_2(0,1)$ of quantile functions leading to a characterization of Wasserstein gradient flows via the solution of an associated Cauchy problem on $L_2(0,1)$. Based on the construction of an appropriate counterpart of $\mathcal F_\nu$ on $L_2(0,1)$ and its subdifferential, we provide a solution of the Cauchy problem. For discrete target measures $\nu$, this results in a piecewise linear solution formula. We prove invariance and smoothing properties of the flow on subsets of $\mathcal C(0,1)$. For certain $\mathcal F_\nu$-flows this implies that initial point measures instantly become absolutely continuous, and stay so over time. Finally, we illustrate the behavior of the flow by various numerical examples using an implicit Euler scheme, which is easily computable by a bisection algorithm. For continuous targets $\nu$, also the explicit Euler scheme can be employed, although with limited convergence guarantees.

代價 · 估計/估計量 · 極小點 · contrastive · 分解的 ·

2024 年 11 月 8 日

Query Complexity of the Metric Steiner Tree Problem

Yu Chen,Sanjeev Khanna,Zihan Tan

We study the query complexity of the metric Steiner Tree problem, where we are given an $n \times n$ metric on a set $V$ of vertices along with a set $T \subseteq V$ of $k$ terminals, and the goal is to find a tree of minimum cost that contains all terminals in $T$. The query complexity for the related minimum spanning tree (MST) problem is well-understood: for any fixed $\varepsilon > 0$, one can estimate the MST cost to within a $(1+\varepsilon)$-factor using only $\tilde{O}(n)$ queries, and this is known to be tight. This implies that a $(2 + \varepsilon)$-approximate estimate of Steiner Tree cost can be obtained with $\tilde{O}(k)$ queries by simply applying the MST cost estimation algorithm on the metric induced by the terminals. Our first result shows that any (randomized) algorithm that estimates the Steiner Tree cost to within a $(5/3 - \varepsilon)$-factor requires $\Omega(n^2)$ queries, even if $k$ is a constant. This lower bound is in sharp contrast to an upper bound of $O(nk)$ queries for computing a $(5/3)$-approximate Steiner Tree, which follows from previous work by Du and Zelikovsky. Our second main result, and the main technical contribution of this work, is a sublinear query algorithm for estimating the Steiner Tree cost to within a strictly better-than-$2$ factor, with query complexity $\tilde{O}(n^{12/7} + n^{6/7}\cdot k)=\tilde{O}(n^{13/7})=o(n^2)$. We complement this result by showing an $\tilde{\Omega}(n + k^{6/5})$ query lower bound for any algorithm that estimates Steiner Tree cost to a strictly better than $2$ factor. Thus $\tilde{\Omega}(n^{6/5})$ queries are needed to just beat $2$-approximation when $k = \Omega(n)$; a sharp contrast to MST cost estimation where a $(1+o(1))$-approximate estimate of cost is achievable with only $\tilde{O}(n)$ queries.

Conformer · 真實值 · 輸出空間 · 知識 (knowledge) · Extensibility ·

2024 年 11 月 7 日

Conformalized Credal Regions for Classification with Ambiguous Ground Truth

Michele Caprio,David Stutz,Shuo Li,Arnaud Doucet

An open question in \emph{Imprecise Probabilistic Machine Learning} is how to empirically derive a credal region (i.e., a closed and convex family of probabilities on the output space) from the available data, without any prior knowledge or assumption. In classification problems, credal regions are a tool that is able to provide provable guarantees under realistic assumptions by characterizing the uncertainty about the distribution of the labels. Building on previous work, we show that credal regions can be directly constructed using conformal methods. This allows us to provide a novel extension of classical conformal prediction to problems with ambiguous ground truth, that is, when the exact labels for given inputs are not exactly known. The resulting construction enjoys desirable practical and theoretical properties: (i) conformal coverage guarantees, (ii) smaller prediction sets (compared to classical conformal prediction regions) and (iii) disentanglement of uncertainty sources (epistemic, aleatoric). We empirically verify our findings on both synthetic and real datasets.

規范化的 · 模式識別 · 近似 · Learning · 統計量 ·

2024 年 11 月 7 日

The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning

Vivek Borkar,Shuhang Chen,Adithya Devraj,Ioannis Kontoyiannis,Sean Meyn

from arxiv, 2 figures

The paper concerns the $d$-dimensional stochastic approximation recursion, $$ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) $$ where $ \{ \Phi_n \}$ is a stochastic process on a general state space, satisfying a conditional Markov property that allows for parameter-dependent noise. The main results are established under additional conditions on the mean flow and a version of the Donsker-Varadhan Lyapunov drift condition known as (DV3): {(i)} An appropriate Lyapunov function is constructed that implies convergence of the estimates in $L_4$. {(ii)} A functional central limit theorem (CLT) is established, as well as the usual one-dimensional CLT for the normalized error. Moment bounds combined with the CLT imply convergence of the normalized covariance $\textsf{E} [ z_n z_n^T ]$ to the asymptotic covariance in the CLT, where $z_n{=:} (\theta_n-\theta^*)/\sqrt{\alpha_n}$. {(iii)} The CLT holds for the normalized version $z^{\text{PR}}_n{=:} \sqrt{n} [\theta^{\text{PR}}_n -\theta^*]$, of the averaged parameters $\theta^{\text{PR}}_n {=:} n^{-1} \sum_{k=1}^n\theta_k$, subject to standard assumptions on the step-size. Moreover, the covariance in the CLT coincides with the minimal covariance of Polyak and Ruppert. {(iv)} An example is given where $f$ and $\bar{f}$ are linear in $\theta$, and $\Phi$ is a geometrically ergodic Markov chain but does not satisfy (DV3). While the algorithm is convergent, the second moment of $\theta_n$ is unbounded and in fact diverges. {\bf This arXiv version 3 represents a major extension of the results in prior versions.} The main results now allow for parameter-dependent noise, as is often the case in applications to reinforcement learning.

近似 · 泛函 · Subspace · 優化器 · 分解的 ·

2024 年 11 月 6 日

Nearly Optimal Approximation of Matrix Functions by the Lanczos Method

Noah Amsel,Tyler Chen,Anne Greenbaum,Cameron Musco,Chris Musco

Approximating the action of a matrix function $f(\mathbf{A})$ on a vector $\mathbf{b}$ is an increasingly important primitive in machine learning, data science, and statistics, with applications such as sampling high dimensional Gaussians, Gaussian process regression and Bayesian inference, principle component analysis, and approximating Hessian spectral densities. Over the past decade, a number of algorithms enjoying strong theoretical guarantees have been proposed for this task. Many of the most successful belong to a family of algorithms called Krylov subspace methods. Remarkably, a classic Krylov subspace method, called the Lanczos method for matrix functions (Lanczos-FA), frequently outperforms newer methods in practice. Our main result is a theoretical justification for this finding: we show that, for a natural class of rational functions, Lanczos-FA matches the error of the best possible Krylov subspace method up to a multiplicative approximation factor. The approximation factor depends on the degree of $f(x)$'s denominator and the condition number of $\mathbf{A}$, but not on the number of iterations $k$. Our result provides a strong justification for the excellent performance of Lanczos-FA, especially on functions that are well approximated by rationals, such as the matrix square root.

Networking · 極大 · 優化器 · 可辨認的 · 近似 ·

2024 年 11 月 6 日

Fundamental Limits of Routing Attack on Network Overload

Xinyu Wu,Eytan Modiano

We quantify the threat of network adversaries to inducing \emph{network overload} through \emph{routing attacks}, where a subset of network nodes are hijacked by an adversary. We develop routing attacks on the hijacked nodes for two objectives related to overload: \emph{no-loss throughput minimization} and \emph{loss maximization}. The first objective attempts to identify a routing attack that minimizes the network's throughput that is guaranteed to survive. We develop a polynomial-time algorithm that can output the optimal routing attack in multi-hop networks with global information on the network's topology, and an algorithm with an approximation ratio of $2$ under partial information. The second objective attempts to maximize the throughput loss. We demonstrate that this problem is NP-hard, and develop two approximation algorithms with multiplicative and additive guarantees respectively in single-hop networks. We further investigate the adversary's optimal selection of nodes to hijack that can maximize network overload. We propose a heuristic polynomial-time algorithm to solve this NP-hard problem, and prove its optimality in special cases. We validate the near-optimal performance of the proposed algorithms over a wide range of network settings. Our results demonstrate that the proposed algorithms can accurately quantify the risk of overload given an arbitrary set of hijacked nodes and identify the critical nodes that should be protected against routing attacks.

得分 · 極小點 · 樣本 · 大數據 · 可約的 ·

2024 年 11 月 6 日

Efficient Data-Driven Leverage Score Sampling Algorithm for the Minimum Volume Covering Ellipsoid Problem in Big Data

Elizabeth Harris,Ali Eshragh,Bishnu Lamichhane,Jordan Shaw-Carmody,Elizabeth Stojanovski

from arxiv, 20 pages, 3 figures

The Minimum Volume Covering Ellipsoid (MVCE) problem, characterised by $n$ observations in $d$ dimensions where $n \gg d$, can be computationally very expensive in the big data regime. We apply methods from randomised numerical linear algebra to develop a data-driven leverage score sampling algorithm for solving MVCE, and establish theoretical error bounds and a convergence guarantee. Assuming the leverage scores follow a power law decay, we show that the computational complexity of computing the approximation for MVCE is reduced from $\mathcal{O}(nd^2)$ to $\mathcal{O}(nd + \text{poly}(d))$, which is a significant improvement in big data problems. Numerical experiments demonstrate the efficacy of our new algorithm, showing that it substantially reduces computation time and yields near-optimal solutions.

估計/估計量 · 類別 · 易處理的 · 統計量 · Subspace ·

2024 年 11 月 5 日

Near-Optimal and Tractable Estimation under Shift-Invariance

Dmitrii M. Ostrovskii

from arxiv, 29 pages

How hard is it to estimate a discrete-time signal $(x_{1}, ..., x_{n}) \in \mathbb{C}^n$ satisfying an unknown linear recurrence relation of order $s$ and observed in i.i.d. complex Gaussian noise? The class of all such signals is parametric but extremely rich: it contains all exponential polynomials over $\mathbb{C}$ with total degree $s$, including harmonic oscillations with $s$ arbitrary frequencies. Geometrically, this class corresponds to the projection onto $\mathbb{C}^{n}$ of the union of all shift-invariant subspaces of $\mathbb{C}^\mathbb{Z}$ of dimension $s$. We show that the statistical complexity of this class, as measured by the squared minimax radius of the $(1-\delta)$-confidence $\ell_2$-ball, is nearly the same as for the class of $s$-sparse signals, namely $O\left(s\log(en) + \log(\delta^{-1})\right) \cdot \log^2(es) \cdot \log(en/s).$ Moreover, the corresponding near-minimax estimator is tractable, and it can be used to build a test statistic with a near-minimax detection threshold in the associated detection problem. These statistical results rest upon an approximation-theoretic one: we show that finite-dimensional shift-invariant subspaces admit compactly supported reproducing kernels whose Fourier spectra have nearly the smallest possible $\ell_p$-norms, for all $p \in [1,+\infty]$ at once.