亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Bilevel optimization problems, which are problems where two optimization problems are nested, have more and more applications in machine learning. In many practical cases, the upper and the lower objectives correspond to empirical risk minimization problems and therefore have a sum structure. In this context, we propose a bilevel extension of the celebrated SARAH algorithm. We demonstrate that the algorithm requires $\mathcal{O}((n+m)^{\frac12}\varepsilon^{-1})$ gradient computations to achieve $\varepsilon$-stationarity with $n+m$ the total number of samples, which improves over all previous bilevel algorithms. Moreover, we provide a lower bound on the number of oracle calls required to get an approximate stationary point of the objective function of the bilevel problem. This lower bound is attained by our algorithm, which is therefore optimal in terms of sample complexity.

相關內容

經驗風險(xian)是(shi)對(dui)訓(xun)練集(ji)(ji)中的所有樣本(ben)點損(sun)失函數的平均最小(xiao)化。經驗風險(xian)越(yue)(yue)小(xiao)說(shuo)明(ming)模型f(X)對(dui)訓(xun)練集(ji)(ji)的擬合程(cheng)度越(yue)(yue)好。

We provide an interior point method based on quasi-Newton iterations, which only requires first-order access to a strongly self-concordant barrier function. To achieve this, we extend the techniques of Dunagan-Harvey [STOC '07] to maintain a preconditioner, while using only first-order information. We measure the quality of this preconditioner in terms of its relative excentricity to the unknown Hessian matrix, and we generalize these techniques to convex functions with a slowly-changing Hessian. We combine this with an interior point method to show that, given first-order access to an appropriate barrier function for a convex set $K$, we can solve well-conditioned linear optimization problems over $K$ to $\varepsilon$ precision in time $\widetilde{O}\left(\left(\mathcal{T}+n^{2}\right)\sqrt{n\nu}\log\left(1/\varepsilon\right)\right)$, where $\nu$ is the self-concordance parameter of the barrier function, and $\mathcal{T}$ is the time required to make a gradient query. As a consequence we show that: $\bullet$ Linear optimization over $n$-dimensional convex sets can be solved in time $\widetilde{O}\left(\left(\mathcal{T}n+n^{3}\right)\log\left(1/\varepsilon\right)\right)$. This parallels the running time achieved by state of the art algorithms for cutting plane methods, when replacing separation oracles with first-order oracles for an appropriate barrier function. $\bullet$ We can solve semidefinite programs involving $m\geq n$ matrices in $\mathbb{R}^{n\times n}$ in time $\widetilde{O}\left(mn^{4}+m^{1.25}n^{3.5}\log\left(1/\varepsilon\right)\right)$, improving over the state of the art algorithms, in the case where $m=\Omega\left(n^{\frac{3.5}{\omega-1.25}}\right)$. Along the way we develop a host of tools allowing us to control the evolution of our potential functions, using techniques from matrix analysis and Schur convexity.

The Langevin algorithms are frequently used to sample the posterior distributions in Bayesian inference. In many practical problems, however, the posterior distributions often consist of non-differentiable components, posing challenges for the standard Langevin algorithms, as they require to evaluate the gradient of the energy function in each iteration. To this end, a popular remedy is to utilize the proximity operator, and as a result one needs to solve a proximity subproblem in each iteration. The conventional practice is to solve the subproblems accurately, which can be exceedingly expensive, as the subproblem needs to be solved in each iteration. We propose an approximate primal-dual fixed-point algorithm for solving the subproblem, which only seeks an approximate solution of the subproblem and therefore reduces the computational cost considerably. We provide theoretical analysis of the proposed method and also demonstrate its performance with numerical examples.

In this paper, we study differentially private empirical risk minimization (DP-ERM). It has been shown that the worst-case utility of DP-ERM reduces polynomially as the dimension increases. This is a major obstacle to privately learning large machine learning models. In high dimension, it is common for some model's parameters to carry more information than others. To exploit this, we propose a differentially private greedy coordinate descent (DP-GCD) algorithm. At each iteration, DP-GCD privately performs a coordinate-wise gradient step along the gradients' (approximately) greatest entry. We show theoretically that DP-GCD can achieve a logarithmic dependence on the dimension for a wide range of problems by naturally exploiting their structural properties (such as quasi-sparse solutions). We illustrate this behavior numerically, both on synthetic and real datasets.

Airplane refueling problem is a nonlinear combinatorial optimization problem with $n!$ feasible feasible solutions. Given a fleet of $n$ airplanes with mid-air refueling technique, each airplane has a specific fuel capacity and fuel consumption rate. The fleet starts to fly together to a same target and during the trip each airplane could instantaneously refuel to other airplanes and then be dropped out. The question is how to find the best refueling policy to make the last remaining airplane travels the farthest. To solve the large scale of the airplane refueling problem in polynomial-time, we propose the definition of the sequential feasible solution by employing the data structural properties of the airplane refueling problem. We prove that if an airplane refueling problem has feasible solutions, it must have sequential feasible solutions, and its optimal feasible solution must be the optimal sequential feasible solution. Then we present the sequential search algorithm which has a computational complexity that depends on the number of sequential feasible solutions referred to $Q_n$, which is proved to be upper bounded by $2^{n-2}$ as an exponential bound that lacks of applicability on larger input for worst case. Therefore we investigate the complexity behavior of the sequential search algorithm from dynamic perspective, and find out that $Q_n$ is bounded by $\frac{m^2}{n}C_n^m$ when the input $n$ is greater than $2m$. Here $m$ is a constant and $2m$ is regarded as the "inflection point" of the complexity of the sequential search algorithm from exponential-time to polynomial-time. Moreover, we build an efficient computability scheme according to which we shall predict the specific complexity of the sequential search algorithm to choose a proper algorithm considering the available running time for decision makers or users.

Recent work in the matrix completion literature has shown that prior knowledge of a matrix's row and column spaces can be successfully incorporated into reconstruction programs to substantially benefit matrix recovery. This paper proposes a novel methodology that exploits more general forms of known matrix structure in terms of subspaces. The work derives reconstruction error bounds that are informative in practice, providing insight to previous approaches in the literature while introducing novel programs that severely reduce sampling complexity. The main result shows that a family of weighted nuclear norm minimization programs incorporating a $M_1 r$-dimensional subspace of $n\times n$ matrices (where $M_1\geq 1$ conveys structural properties of the subspace) allow accurate approximation of a rank $r$ matrix aligned with the subspace from a near-optimal number of observed entries (within a logarithmic factor of $M_1 r)$. The result is robust, where the error is proportional to measurement noise, applies to full rank matrices, and reflects degraded output when erroneous prior information is utilized. Numerical experiments are presented that validate the theoretical behavior derived for several example weighted programs.

Submodular maximization is a general optimization problem with a wide range of applications in machine learning (e.g., active learning, clustering, and feature selection). In large-scale optimization, the parallel running time of an algorithm is governed by its adaptivity, which measures the number of sequential rounds needed if the algorithm can execute polynomially-many independent oracle queries in parallel. While low adaptivity is ideal, it is not sufficient for an algorithm to be efficient in practice -- there are many applications of distributed submodular optimization where the number of function evaluations becomes prohibitively expensive. Motivated by these applications, we study the adaptivity and query complexity of submodular maximization. In this paper, we give the first constant-factor approximation algorithm for maximizing a non-monotone submodular function subject to a cardinality constraint $k$ that runs in $O(\log(n))$ adaptive rounds and makes $O(n \log(k))$ oracle queries in expectation. In our empirical study, we use three real-world applications to compare our algorithm with several benchmarks for non-monotone submodular maximization. The results demonstrate that our algorithm finds competitive solutions using significantly fewer rounds and queries.

We consider the problem of rank-$1$ low-rank approximation (LRA) in the matrix-vector product model under various Schatten norms: $$ \min_{\|u\|_2=1} \|A (I - u u^\top)\|_{\mathcal{S}_p} , $$ where $\|M\|_{\mathcal{S}_p}$ denotes the $\ell_p$ norm of the singular values of $M$. Given $\varepsilon>0$, our goal is to output a unit vector $v$ such that $$ \|A(I - vv^\top)\|_{\mathcal{S}_p} \leq (1+\varepsilon) \min_{\|u\|_2=1}\|A(I - u u^\top)\|_{\mathcal{S}_p}. $$ Our main result shows that Krylov methods (nearly) achieve the information-theoretically optimal number of matrix-vector products for Spectral ($p=\infty$), Frobenius ($p=2$) and Nuclear ($p=1$) LRA. In particular, for Spectral LRA, we show that any algorithm requires $\Omega\left(\log(n)/\varepsilon^{1/2}\right)$ matrix-vector products, exactly matching the upper bound obtained by Krylov methods [MM15, BCW22]. Our lower bound addresses Open Question 1 in [Woo14], providing evidence for the lack of progress on algorithms for Spectral LRA and resolves Open Question 1.2 in [BCW22]. Next, we show that for any fixed constant $p$, i.e. $1\leq p =O(1)$, there is an upper bound of $O\left(\log(1/\varepsilon)/\varepsilon^{1/3}\right)$ matrix-vector products, implying that the complexity does not grow as a function of input size. This improves the $O\left(\log(n/\varepsilon)/\varepsilon^{1/3}\right)$ bound recently obtained in [BCW22], and matches their $\Omega\left(1/\varepsilon^{1/3}\right)$ lower bound, to a $\log(1/\varepsilon)$ factor.

Many state-of-the-art adversarial training methods for deep learning leverage upper bounds of the adversarial loss to provide security guarantees against adversarial attacks. Yet, these methods rely on convex relaxations to propagate lower and upper bounds for intermediate layers, which affect the tightness of the bound at the output layer. We introduce a new approach to adversarial training by minimizing an upper bound of the adversarial loss that is based on a holistic expansion of the network instead of separate bounds for each layer. This bound is facilitated by state-of-the-art tools from Robust Optimization; it has closed-form and can be effectively trained using backpropagation. We derive two new methods with the proposed approach. The first method (Approximated Robust Upper Bound or aRUB) uses the first order approximation of the network as well as basic tools from Linear Robust Optimization to obtain an empirical upper bound of the adversarial loss that can be easily implemented. The second method (Robust Upper Bound or RUB), computes a provable upper bound of the adversarial loss. Across a variety of tabular and vision data sets we demonstrate the effectiveness of our approach -- RUB is substantially more robust than state-of-the-art methods for larger perturbations, while aRUB matches the performance of state-of-the-art methods for small perturbations.

Given a set of points $P = (P^+ \sqcup P^-) \subset \mathbb{R}^d$ for some constant $d$ and a supply function $\mu:P\to \mathbb{R}$ such that $\mu(p) > 0~\forall p \in P^+$, $\mu(p) < 0~\forall p \in P^-$, and $\sum_{p\in P}{\mu(p)} = 0$, the geometric transportation problem asks one to find a transportation map $\tau: P^+\times P^-\to \mathbb{R}_{\ge 0}$ such that $\sum_{q\in P^-}{\tau(p, q)} = \mu(p)~\forall p \in P^+$, $\sum_{p\in P^+}{\tau(p, q)} = -\mu(q)~ \forall q \in P^-$, and the weighted sum of Euclidean distances for the pairs $\sum_{(p,q)\in P^+\times P^-}\tau(p, q)\cdot ||q-p||_2$ is minimized. We present the first deterministic algorithm that computes, in near-linear time, a transportation map whose cost is within a $(1 + \varepsilon)$ factor of optimal. More precisely, our algorithm runs in $O(n\varepsilon^{-(d+2)}\log^5{n}\log{\log{n}})$ time for any constant $\varepsilon > 0$. Surprisingly, our result is not only a generalization of a bipartite matching one to arbitrary instances of geometric transportation, but it also reduces the running time for all previously known $(1 + \varepsilon)$-approximation algorithms, randomized or deterministic, even for geometric bipartite matching. In particular, we give the first $(1 + \varepsilon)$-approximate deterministic algorithm for geometric bipartite matching and the first $(1 + \varepsilon)$-approximate deterministic or randomized algorithm for geometric transportation with no dependence on $d$ in the exponent of the running time's polylog. As an additional application of our main ideas, we also give the first randomized near-linear $O(poly(1 / \varepsilon) m \log^{O(1)} n)$ time $(1 + \varepsilon)$-approximation algorithm for the uncapacitated minimum cost flow (transshipment) problem in undirected graphs with arbitrary real edge costs.

This paper resolves the open question of designing near-optimal algorithms for learning imperfect-information extensive-form games from bandit feedback. We present the first line of algorithms that require only $\widetilde{\mathcal{O}}((XA+YB)/\varepsilon^2)$ episodes of play to find an $\varepsilon$-approximate Nash equilibrium in two-player zero-sum games, where $X,Y$ are the number of information sets and $A,B$ are the number of actions for the two players. This improves upon the best known sample complexity of $\widetilde{\mathcal{O}}((X^2A+Y^2B)/\varepsilon^2)$ by a factor of $\widetilde{\mathcal{O}}(\max\{X, Y\})$, and matches the information-theoretic lower bound up to logarithmic factors. We achieve this sample complexity by two new algorithms: Balanced Online Mirror Descent, and Balanced Counterfactual Regret Minimization. Both algorithms rely on novel approaches of integrating \emph{balanced exploration policies} into their classical counterparts. We also extend our results to learning Coarse Correlated Equilibria in multi-player general-sum games.

北京阿比特科技有限公司