四虎亚洲精品高清在线观看,欧美PLAY视频海量性欧美

It is well-known that given a smooth, bounded-from-below, and possibly nonconvex function, standard gradient-based methods can find $\epsilon$-stationary points (with gradient norm less than $\epsilon$) in $\mathcal{O}(1/\epsilon^2)$ iterations. However, many important nonconvex optimization problems, such as those associated with training modern neural networks, are inherently not smooth, making these results inapplicable. In this paper, we study nonsmooth nonconvex optimization from an oracle complexity viewpoint, where the algorithm is assumed to be given access only to local information about the function at various points. We provide two main results: First, we consider the problem of getting near $\epsilon$-stationary points. This is perhaps the most natural relaxation of finding $\epsilon$-stationary points, which is impossible in the nonsmooth nonconvex case. We prove that this relaxed goal cannot be achieved efficiently, for any distance and $\epsilon$ smaller than some constants. Our second result deals with the possibility of tackling nonsmooth nonconvex optimization by reduction to smooth optimization: Namely, applying smooth optimization methods on a smooth approximation of the objective function. For this approach, we prove under a mild assumption an inherent trade-off between oracle complexity and smoothness: On the one hand, smoothing a nonsmooth nonconvex function can be done very efficiently (e.g., by randomized smoothing), but with dimension-dependent factors in the smoothness parameter, which can strongly affect iteration complexity when plugging into standard smooth optimization methods. On the other hand, these dimension factors can be eliminated with suitable smoothing methods, but only by making the oracle complexity of the smoothing process exponentially large.

相關內容

平滑

關注 1

binary · 優化器 · 示例 · CC · 可理解性 ·

2022 年 12 月 14 日

On the complexity of binary polynomial optimization over acyclic hypergraphs

Alberto Del Pia,Silvia Di Gregorio

In this work we advance the understanding of the fundamental limits of computation for Binary Polynomial Optimization (BPO), which is the problem of maximizing a given polynomial function over all binary points. In our main result we provide a novel class of BPO that can be solved efficiently both from a theoretical and computational perspective. In fact, we give a strongly polynomial-time algorithm for instances whose corresponding hypergraph is beta-acyclic. We note that the beta-acyclicity assumption is natural in several applications including relational database schemes and the lifted multicut problem on trees. Due to the novelty of our proving technique, we obtain an algorithm which is interesting also from a practical viewpoint. This is because our algorithm is very simple to implement and the running time is a polynomial of very low degree in the number of nodes and edges of the hypergraph. Our result completely settles the computational complexity of BPO over acyclic hypergraphs, since the problem is NP-hard on alpha-acyclic instances. Our algorithm can also be applied to any general BPO problem that contains beta-cycles. For these problems, the algorithm returns a smaller instance together with a rule to extend any optimal solution of the smaller instance to an optimal solution of the original instance.

樣本復雜度 · Networking · ReLU · 泛函 · 近似 ·

2022 年 12 月 13 日

Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks

Thanh Nguyen-Tang,Sunil Gupta,Hung Tran-The,Svetha Venkatesh

from arxiv, //openreview.net/forum?id=LdEm0umNcv

Offline reinforcement learning (RL) leverages previously collected data for policy optimization without any further active exploration. Despite the recent interest in this problem, its theoretical results in neural network function approximation settings remain elusive. In this paper, we study the statistical theory of offline RL with deep ReLU network function approximation. In particular, we establish the sample complexity of $n = \tilde{\mathcal{O}}( H^{4 + 4 \frac{d}{\alpha}} \kappa_{\mu}^{1 + \frac{d}{\alpha}} \epsilon^{-2 - 2\frac{d}{\alpha}} )$ for offline RL with deep ReLU networks, where $\kappa_{\mu}$ is a measure of distributional shift, {$H = (1-\gamma)^{-1}$ is the effective horizon length}, $d$ is the dimension of the state-action space, $\alpha$ is a (possibly fractional) smoothness parameter of the underlying Markov decision process (MDP), and $\epsilon$ is a user-specified error. Notably, our sample complexity holds under two novel considerations: the Besov dynamic closure and the correlated structure. While the Besov dynamic closure subsumes the dynamic conditions for offline RL in the prior works, the correlated structure renders the prior works of offline RL with general/neural network function approximation improper or inefficient {in long (effective) horizon problems}. To the best of our knowledge, this is the first theoretical characterization of the sample complexity of offline RL with deep neural network function approximation under the general Besov regularity condition that goes beyond {the linearity regime} in the traditional Reproducing Hilbert kernel spaces and Neural Tangent Kernels.

Integration · Ray · 線性的 · CASES · 時間步 ·

2022 年 12 月 12 日

Exponential methods for anisotropic diffusion

Pranab J. Deka,Lukas Einkemmer,Ralf Kissmann

from arxiv, in submission

The anisotropic diffusion equation is imperative in understanding cosmic ray diffusion across the Galaxy, the heliosphere, and its interplay with the ambient magnetic field. This diffusion term contributes to the highly stiff nature of the CR transport equation. In order to conduct numerical simulations of time-dependent cosmic ray transport, implicit integrators have been traditionally favoured over the CFL-bound explicit integrators in order to be able to take large step sizes. We propose exponential methods that directly compute the exponential of the matrix to solve the linear anisotropic diffusion equation. These methods allow us to take even larger step sizes; in certain cases, we are able to choose a step size as large as the simulation time, i.e., only one time step. This can substantially speed-up the simulations whilst generating highly accurate solutions (l2 error $\leq 10^{-10}$). Additionally, we test an approach based on extracting a constant diffusion coefficient from the anisotropic diffusion equation, where the constant coefficient term is solved implicitly or exponentially and the remainder is treated using some explicit method. We find that this approach, for homogeneous linear problems, is unable to improve on the exponential-based methods that directly evaluate the matrix exponential.

估計/估計量 · 線性的 · 統計量 · Analysis · 情景 ·

2022 年 12 月 12 日

Analysis and Detectability of Offline Data Poisoning Attacks on Linear Systems

Alessio Russo,Alexandre Proutiere

In recent years, there has been a growing interest in the effects of data poisoning attacks on data-driven control methods. Poisoning attacks are well-known to the Machine Learning community, which, however, make use of assumptions, such as cross-sample independence, that in general do not hold for linear dynamical systems. Consequently, these systems require different attack and detection methods than those developed for supervised learning problems in the i.i.d.\ setting. Since most data-driven control algorithms make use of the least-squares estimator, we study how poisoning impacts the least-squares estimate through the lens of statistical testing, and question in what way data poisoning attacks can be detected. We establish under which conditions the set of models compatible with the data includes the true model of the system, and we analyze different poisoning strategies for the attacker. On the basis of the arguments hereby presented, we propose a stealthy data poisoning attack on the least-squares estimator that can escape classical statistical tests, and conclude by showing the efficiency of the proposed attack.

泛函 · Learning · 近似 · DNN · 深度學習 ·

2022 年 12 月 9 日

Deep learning and American options via free boundary framework

Chinonso Nwankwo,Nneka Umeorah,Tony Ware,Weizhong Dai

We propose a deep learning method for solving the American options model with a free boundary feature. To extract the free boundary known as the early exercise boundary from our proposed method, we introduce the Landau transformation. For efficient implementation of our proposed method, we further construct a dual solution framework consisting of a novel auxiliary function and free boundary equations. The auxiliary function is formulated to include the feed forward deep neural network (DNN) output and further mimic the far boundary behaviour, smooth pasting condition, and remaining boundary conditions due to the second-order space derivative and first-order time derivative. Because the early exercise boundary and its derivative are not a priori known, the boundary values mimicked by the auxiliary function are in approximate form. Concurrently, we then establish equations that approximate the early exercise boundary and its derivative directly from the DNN output based on some linear relationships at the left boundary. Furthermore, the option Greeks are obtained from the derivatives of this auxiliary function. We test our implementation with several examples and compare them with the existing numerical methods. All indicators show that our proposed deep learning method presents an efficient and alternative way of pricing options with early exercise features.

泛函 · 正則的 · 估計/估計量 · 變換 · 近似 ·

2022 年 12 月 9 日

Polynomial Distributions and Transformations

Yue Yu,Pavel Loskot

from arxiv, 21 pages, no figures

Polynomials are common algebraic structures, which are often used to approximate functions including probability distributions. This paper proposes to directly define polynomial distributions in order to describe stochastic properties of systems rather than to assume polynomials for only approximating known or empirically estimated distributions. Polynomial distributions offer a great modeling flexibility, and often, also mathematical tractability. However, unlike canonical distributions, polynomial functions may have non-negative values in the interval of support for some parameter values, the number of their parameters is usually much larger than for canonical distributions, and the interval of support must be finite. In particular, polynomial distributions are defined here assuming three forms of polynomial function. The transformation of polynomial distributions and fitting a histogram to a polynomial distribution are considered. The key properties of polynomial distributions are derived in closed-form. A piecewise polynomial distribution construction is devised to ensure that it is non-negative over the support interval. Finally, the problems of estimating parameters of polynomial distributions and generating polynomially distributed samples are also studied.

離散化 · Continuity · 可辨認的 · 判別器 · 方陣 ·

2022 年 12 月 9 日

Complexity and Approximation for Discriminating and Identifying Code Problems in Geometric Setups

Sanjana Dey,Florent Foucaud,Subhas C Nandy,Arunabha Sen

We study geometric variations of the discriminating code problem. In the \emph{discrete version} of the problem, a finite set of points $P$ and a finite set of objects $S$ are given in $\mathbb{R}^d$. The objective is to choose a subset $S^* \subseteq S$ of minimum cardinality such that for each point $p_i \in P$, the subset $S_i^* \subseteq S^*$ covering $p_i$ satisfies $S_i^*\neq \emptyset$, and each pair $p_i,p_j \in P$, $i \neq j$, we have $S_i^* \neq S_j^*$. In the \emph{continuous version} of the problem, the solution set $S^*$ can be chosen freely among a (potentially infinite) class of allowed geometric objects. In the 1-dimensional case ($d=1$), the points in $P$ are placed on a horizontal line $L$, and the objects in $S$ are finite-length line segments aligned with $L$ (called intervals). We show that the discrete version of this problem is NP-complete. This is somewhat surprising as the continuous version is known to be polynomial-time solvable. Still, for the 1-dimensional discrete version, we design a polynomial-time $2$-approximation algorithm. We also design a PTAS for both discrete and continuous versions in one dimension, for the restriction where the intervals are all required to have the same length. We then study the 2-dimensional case ($d=2$) for axis-parallel unit square objects. We show that both continuous and discrete versions are NP-complete, and design polynomial-time approximation algorithms that produce $(16\cdot OPT+1)$-approximate and $(64\cdot OPT+1)$-approximate solutions respectively, using rounding of suitably defined integer linear programming problems. We show that the identifying code problem for axis-parallel unit square intersection graphs (in $d=2$) can be solved in the same manner as for the discrete version of the discriminating code problem for unit square objects.

Branch · 非凸 · 優化器 · 可約的 · 最優化 ·

2022 年 12 月 9 日

The Hybridization of Branch and Bound with Metaheuristics for Nonconvex Multiobjective Optimization

Wei-tian Wu,Xin-min Yang

A hybrid framework combining the branch and bound method with multiobjective evolutionary algorithms is proposed for nonconvex multiobjective optimization. The hybridization exploits the complementary character of the two optimization strategies. A multiobjective evolutionary algorithm is intended for inducing tight lower and upper bounds during the branch and bound procedure. Tight bounds such as the ones derived in this way can reduce the number of subproblems that have to be solved. The branch and bound method guarantees the global convergence of the framework and improves the search capability of the multiobjective evolutionary algorithm. An implementation of the hybrid framework considering NSGA-II and MOEA/D-DE as multiobjective evolutionary algorithms is presented. Numerical experiments verify the hybrid algorithms benefit from synergy of the branch and bound method and multiobjective evolutionary algorithms.

回合 · 知識 (knowledge) · INFORMS · 優化器 · Agent ·

2022 年 12 月 8 日

The R-algebra of Quasiknowledge and Convex Optimization

Duyal Yolcu

from arxiv, 40 pages

This article develops a convex description of a classical or quantum learner's or agent's state of knowledge about its environment, presented as a convex subset of a commutative R-algebra. With caveats, this leads to a generalization of certain semidefinite programs in quantum information (such as those describing the universal query algorithm dual to the quantum adversary bound, related to optimal learning or control of the environment) to the classical and faulty-quantum setting, which would not be possible with a naive description via joint probability distributions over environment and internal memory. More philosophically, it also makes an interpretation of the set of reduced density matrices as "states of knowledge" of an observer of its environment, related to these techniques, more explicit. As another example, I describe and solve a formal differential equation of states of knowledge in that algebra, where an agent obtains experimental data in a Poissonian process, and its state of knowledge evolves as an exponential power series. However, this framework currently lacks impressive applications, and I post it in part to solicit feedback and collaboration on those. In particular, it may be possible to develop it into a new framework for the design of experiments, e.g. the problem of finding maximally informative questions to ask human labelers or the environment in machine-learning problems. The parts of the article not related to quantum information don't assume knowledge of it.

非凸 · 優化器 · 因子分解 · 統計量 · 分解的 ·

2019 年 9 月 19 日

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Yuejie Chi,Yue M. Lu,Yuxin Chen

from arxiv, Invited overview article

Substantial progress has been made recently on developing provably accurate and efficient algorithms for low-rank matrix factorization via nonconvex optimization. While conventional wisdom often takes a dim view of nonconvex optimization algorithms due to their susceptibility to spurious local minima, simple iterative methods such as gradient descent have been remarkably successful in practice. The theoretical footings, however, had been largely lacking until recently. In this tutorial-style overview, we highlight the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees. We review two contrasting approaches: (1) two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and (2) global landscape analysis and initialization-free algorithms. Several canonical matrix factorization problems are discussed, including but not limited to matrix sensing, phase retrieval, matrix completion, blind deconvolution, robust principal component analysis, phase synchronization, and joint alignment. Special care is taken to illustrate the key technical insights underlying their analyses. This article serves as a testament that the integrated consideration of optimization and statistics leads to fruitful research findings.