在线亚洲91SE亚洲综合在线,午夜欧美不卡AAAA精品观看,日本高清不卡一区二区三区视频

In this work, we analyze an efficient sampling-based algorithm for general-purpose reachability analysis, which remains a notoriously challenging problem with applications ranging from neural network verification to safety analysis of dynamical systems. By sampling inputs, evaluating their images in the true reachable set, and taking their $\epsilon$-padded convex hull as a set estimator, this algorithm applies to general problem settings and is simple to implement. Our main contribution is the derivation of asymptotic and finite-sample accuracy guarantees using random set theory. This analysis informs algorithmic design to obtain an $\epsilon$-close reachable set approximation with high probability, provides insights into which reachability problems are most challenging, and motivates safety-critical applications of the technique. On a neural network verification task, we show that this approach is more accurate and significantly faster than prior work. Informed by our analysis, we also design a robust model predictive controller that we demonstrate in hardware experiments.

相關內容

SimPLe

關注 4

優化器 · 過擬合 · 線性的 · 有偏 · 平方損失 ·

2022 年 2 月 14 日

Stochastic linear optimization never overfits with quadratically-bounded losses on general data

Matus Telgarsky

This work shows that a diverse collection of linear optimization methods, when run on general data, fail to overfit, despite lacking any explicit constraints or regularization: with high probability, their trajectories stay near the curve of optimal constrained solutions over the population distribution. This analysis is powered by an elementary but flexible proof scheme which can handle many settings, summarized as follows. Firstly, the data can be general: unlike other implicit bias works, it need not satisfy large margin or other structural conditions, and moreover can arrive sequentially IID, sequentially following a Markov chain, as a batch, and lastly it can have heavy tails. Secondly, while the main analysis is for mirror descent, rates are also provided for the Temporal-Difference fixed-point method from reinforcement learning; all prior high probability analyses in these settings required bounded iterates, bounded updates, bounded noise, or some equivalent. Thirdly, the losses are general, and for instance the logistic and squared losses can be handled simultaneously, unlike other implicit bias works. In all of these settings, not only is low population error guaranteed with high probability, but moreover low sample complexity is guaranteed so long as there exists any low-complexity near-optimal solution, even if the global problem structure and in particular global optima have high complexity.

樣本 · state-of-the-art · 正則化項 · 優化器 · 噪聲 ·

2022 年 2 月 13 日

Improved analysis for a proximal algorithm for sampling

Yongxin Chen,Sinho Chewi,Adil Salim,Andre Wibisono

from arxiv, 34 pages

We study the proximal sampler of Lee, Shen, and Tian (2021) and obtain new convergence guarantees under weaker assumptions than strong log-concavity: namely, our results hold for (1) weakly log-concave targets, and (2) targets satisfying isoperimetric assumptions which allow for non-log-concavity. We demonstrate our results by obtaining new state-of-the-art sampling guarantees for several classes of target distributions. We also strengthen the connection between the proximal sampler and the proximal method in optimization by interpreting the proximal sampler as an entropically regularized Wasserstein proximal method, and the proximal point method as the limit of the proximal sampler with vanishing noise.

SimPLe · UniFormer · 樣本 · 算法與數據結構 · 離散數學 ·

2022 年 2 月 11 日

Improved bounds for randomly colouring simple hypergraphs

Weiming Feng,Heng Guo,Jiaheng Wang

We study the problem of sampling almost uniform proper $q$-colourings in $k$-uniform simple hypergraphs with maximum degree $\Delta$. For any $\delta > 0$, if $k \geq\frac{20(1+\delta)}{\delta}$ and $q \geq 100\Delta^{\frac{2+\delta}{k-4/\delta-4}}$, the running time of our algorithm is $\tilde{O}(\mathrm{poly}(\Delta k)\cdot n^{1.01})$, where $n$ is the number of vertices. Our result requires fewer colours than previous results for general hypergraphs (Jain, Pham, and Voung, 2021; He, Sun, and Wu, 2021), and does not require $\Omega(\log n)$ colours unlike the work of Frieze and Anastos (2017).

生成模型 · MoDELS · 極大似然 · 似然 · 學成 ·

2022 年 2 月 10 日

Analyzing and Improving Adversarial Training for Generative Modeling

Xuwang Yin,Shiying Li,Gustavo K. Rohde

We study a new generative modeling technique based on adversarial training (AT). We show that in a setting where the model is trained to discriminate in-distribution data from adversarial examples perturbed from out-distribution samples, the model learns the support of the in-distribution data. The learning process is also closely related to MCMC-based maximum likelihood learning of energy-based models (EBMs), and can be considered as an approximate maximum likelihood learning method. We show that this AT generative model achieves competitive image generation performance to state-of-the-art EBMs, and at the same time is stable to train and has better sampling efficiency. We demonstrate that the AT generative model is well-suited for the task of image translation and worst-case out-of-distribution detection.

稀疏 · 約束 · 圖 · 極小點 · 特化 ·

2022 年 2 月 9 日

PTAS for Sparse General-Valued CSPs

Balázs F. Mezei,Marcin Wrochna,Stanislav ?ivny

We study polynomial-time approximation schemes (PTASes) for constraint satisfaction problems (CSPs) such as Maximum Independent Set or Minimum Vertex Cover on sparse graph classes. Baker's approach gives a PTAS on planar graphs, excluded-minor classes, and beyond. For Max-CSPs, and even more generally, maximisation finite-valued CSPs (where constraints are arbitrary non-negative functions), Romero, Wrochna, and \v{Z}ivn\'y [SODA'21] showed that the Sherali-Adams LP relaxation gives a simple PTAS for all fractionally-treewidth-fragile classes, which is the most general "sparsity" condition for which a PTAS is known. We extend these results to general-valued CSPs, which include "crisp" (or "strict") constraints that have to be satisfied by every feasible assignment. The only condition on the crisp constraints is that their domain contains an element which is at least as feasible as all the others (but possibly less valuable). For minimisation general-valued CSPs with crisp constraints, we present a PTAS for all Baker graph classes -- a definition by Dvo\v{r}\'ak [SODA'20] which encompasses all classes where Baker's technique is known to work, except possibly for fractionally-treewidth-fragile classes. While this is standard for problems satisfying a certain monotonicity condition on crisp constraints, we show this can be relaxed to diagonalisability -- a property of relational structures connected to logics, statistical physics, and random CSPs.

鞍點 · SimPLe · 駐點 · 平穩的 · 冪法 ·

2021 年 11 月 28 日

Escape saddle points by a simple gradient-descent based algorithm

Chenyi Zhang,Tongyang Li

from arxiv, 34 pages, 8 figures, to appear in the 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

Escaping saddle points is a central research topic in nonconvex optimization. In this paper, we propose a simple gradient-based algorithm such that for a smooth function $f\colon\mathbb{R}^n\to\mathbb{R}$, it outputs an $\epsilon$-approximate second-order stationary point in $\tilde{O}(\log n/\epsilon^{1.75})$ iterations. Compared to the previous state-of-the-art algorithms by Jin et al. with $\tilde{O}((\log n)^{4}/\epsilon^{2})$ or $\tilde{O}((\log n)^{6}/\epsilon^{1.75})$ iterations, our algorithm is polynomially better in terms of $\log n$ and matches their complexities in terms of $1/\epsilon$. For the stochastic setting, our algorithm outputs an $\epsilon$-approximate second-order stationary point in $\tilde{O}((\log n)^{2}/\epsilon^{4})$ iterations. Technically, our main contribution is an idea of implementing a robust Hessian power method using only gradients, which can find negative curvature near saddle points and achieve the polynomial speedup in $\log n$ compared to the perturbed gradient descent methods. Finally, we also perform numerical experiments that support our results.

近似 · INFORMS · 納什均衡 · 強化學習 · 學成 ·

2020 年 6 月 15 日

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Stephen McAleer,John Lanier,Roy Fox,Pierre Baldi

from arxiv, SM and JL contributed equally

Finding approximate Nash equilibria in zero-sum imperfect-information games is challenging when the number of information states is large. Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm grounded in game theory that is guaranteed to converge to an approximate Nash equilibrium. However, PSRO requires training a reinforcement learning policy at each iteration, making it too slow for large games. We show through counterexamples and experiments that DCH and Rectified PSRO, two existing approaches to scaling up PSRO, fail to converge even in small games. We introduce Pipeline PSRO (P2SRO), the first scalable general method for finding approximate Nash equilibria in large zero-sum imperfect-information games. P2SRO is able to parallelize PSRO with convergence guarantees by maintaining a hierarchical pipeline of reinforcement learning workers, each training against the policies generated by lower levels in the hierarchy. We show that unlike existing methods, P2SRO converges to an approximate Nash equilibrium, and does so faster as the number of parallel workers increases, across a variety of imperfect information games. We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game tree complexity of $10^{50}$. P2SRO is able to achieve state-of-the-art performance on Barrage Stratego and beats all existing bots.

圖卷積神經網絡/圖卷積網絡 · 圖卷積網絡 · 圖 · 圖卷積 · state-of-the-art ·

2019 年 5 月 20 日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Wei-Lin Chiang,Xuanqing Liu,Si Si,Yang Li,Samy Bengio,Cho-Jui Hsieh

from arxiv, In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD'19)

Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. Current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for keeping the entire graph and the embedding of each node in memory. In this paper, we propose Cluster-GCN, a novel GCN algorithm that is suitable for SGD-based training by exploiting the graph clustering structure. Cluster-GCN works as the following: at each step, it samples a block of nodes that associate with a dense subgraph identified by a graph clustering algorithm, and restricts the neighborhood search within this subgraph. This simple but effective strategy leads to significantly improved memory and computational efficiency while being able to achieve comparable test accuracy with previous algorithms. To test the scalability of our algorithm, we create a new Amazon2M data with 2 million nodes and 61 million edges which is more than 5 times larger than the previous largest publicly available dataset (Reddit). For training a 3-layer GCN on this data, Cluster-GCN is faster than the previous state-of-the-art VR-GCN (1523 seconds vs 1961 seconds) and using much less memory (2.2GB vs 11.2GB). Furthermore, for training 4 layer GCN on this data, our algorithm can finish in around 36 minutes while all the existing GCN training algorithms fail to train due to the out-of-memory issue. Furthermore, Cluster-GCN allows us to train much deeper GCN without much time and memory overhead, which leads to improved prediction accuracy---using a 5-layer Cluster-GCN, we achieve state-of-the-art test F1 score 99.36 on the PPI dataset, while the previous best result was 98.71 by [16].

網絡嵌入 · Networking · 可約的 · DC · 鏈路預測 ·

2018 年 5 月 7 日

Billion-scale Network Embedding with Iterative Random Projection

Ziwei Zhang,Peng Cui,Haoyang Li,Xiao Wang,Wenwu Zhu

from arxiv, 7 pages, 5 figures

Network embedding has attracted considerable research attention recently. However, the existing methods are incapable of handling billion-scale networks, because they are computationally expensive and, at the same time, difficult to be accelerated by distributed computing schemes. To address these problems, we propose RandNE, a novel and simple billion-scale network embedding method. Specifically, we propose a Gaussian random projection approach to map the network into a low-dimensional embedding space while preserving the high-order proximities between nodes. To reduce the time complexity, we design an iterative projection procedure to avoid the explicit calculation of the high-order proximities. Theoretical analysis shows that our method is extremely efficient, and friendly to distributed computing schemes without any communication cost in the calculation. We demonstrate the efficacy of RandNE over state-of-the-art methods in network reconstruction and link prediction tasks on multiple datasets with different scales, ranging from thousands to billions of nodes and edges.

Performer · 估計/估計量 · 經驗風險最小化 · 經驗風險 · 方差 ·

2017 年 12 月 14 日

Variance-based regularization with convex objectives

John Duchi,Hongseok Namkoong

We develop an approach to risk minimization and stochastic optimization that provides a convex surrogate for variance, allowing near-optimal and computationally efficient trading between approximation and estimation error. Our approach builds off of techniques for distributionally robust optimization and Owen's empirical likelihood, and we provide a number of finite-sample and asymptotic results characterizing the theoretical performance of the estimator. In particular, we show that our procedure comes with certificates of optimality, achieving (in some scenarios) faster rates of convergence than empirical risk minimization by virtue of automatically balancing bias and variance. We give corroborating empirical evidence showing that in practice, the estimator indeed trades between variance and absolute performance on a training sample, improving out-of-sample (test) performance over standard empirical risk minimization for a number of classification problems.