欧美精品日韩精品国内精品,亚洲视频华人在线播放,欧美在线播放线观看免费宅男,亚洲欧美日韩3751色一区二区三区,一个色免费在线视频

We propose a new algorithm for variance reduction when estimating $f(X_T)$ where $X$ is the solution to some stochastic differential equation and $f$ is a test function. The new estimator is $(f(X^1_T) + f(X^2_T))/2$, where $X^1$ and $X^2$ have same marginal law as $X$ but are pathwise correlated so that to reduce the variance. The optimal correlation function $\rho$ is approximated by a deep neural network and is calibrated along the trajectories of $(X^1, X^2)$ by policy gradient and reinforcement learning techniques. Finding an optimal coupling given marginal laws has links with maximum optimal transport.

相關內容

優化器

關注 4

Analysis · SGD · Performer · 小批次隨機梯度下降 · 泛函 ·

2023 年 10 月 31 日

AsGrad: A Sharp Unified Analysis of Asynchronous-SGD Algorithms

Rustem Islamov,Mher Safaryan,Dan Alistarh

We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting, where each worker has its own computation and communication speeds, as well as data distribution. In these algorithms, workers compute possibly stale and stochastic gradients associated with their local data at some iteration back in history and then return those gradients to the server without synchronizing with other workers. We present a unified convergence theory for non-convex smooth functions in the heterogeneous regime. The proposed analysis provides convergence for pure asynchronous SGD and its various modifications. Moreover, our theory explains what affects the convergence rate and what can be done to improve the performance of asynchronous algorithms. In particular, we introduce a novel asynchronous method based on worker shuffling. As a by-product of our analysis, we also demonstrate convergence guarantees for gradient-type algorithms such as SGD with random reshuffling and shuffle-once mini-batch SGD. The derived rates match the best-known results for those algorithms, highlighting the tightness of our approach. Finally, our numerical evaluations support theoretical findings and show the good practical performance of our method.

值域 · 哈爾濱工業大學（HIT） · Color · CASES · 極小點 ·

2023 年 10 月 30 日

Polychromatic Colorings of Geometric Hypergraphs via Shallow Hitting Sets

Tim Planken,Torsten Ueckerdt

A range family $\mathcal{R}$ is a family of subsets of $\mathbb{R}^d$, like all halfplanes, or all unit disks. Given a range family $\mathcal{R}$, we consider the $m$-uniform range capturing hypergraphs $\mathcal{H}(V,\mathcal{R},m)$ whose vertex-sets $V$ are finite sets of points in $\mathbb{R}^d$ with any $m$ vertices forming a hyperedge $e$ whenever $e = V \cap R$ for some $R \in \mathcal{R}$. Given additionally an integer $k \geq 2$, we seek to find the minimum $m = m_{\mathcal{R}}(k)$ such that every $\mathcal{H}(V,\mathcal{R},m)$ admits a polychromatic $k$-coloring of its vertices, that is, where every hyperedge contains at least one point of each color. Clearly, $m_{\mathcal{R}}(k) \geq k$ and the gold standard is an upper bound $m_{\mathcal{R}}(k) = O(k)$ that is linear in $k$. A $t$-shallow hitting set in $\mathcal{H}(V,\mathcal{R},m)$ is a subset $S \subseteq V$ such that $1 \leq |e \cap S| \leq t$ for each hyperedge $e$; i.e., every hyperedge is hit at least once but at most $t$ times by $S$. We show for several range families $\mathcal{R}$ the existence of $t$-shallow hitting sets in every $\mathcal{H}(V,\mathcal{R},m)$ with $t$ being a constant only depending on $\mathcal{R}$. This in particular proves that $m_{\mathcal{R}}(k) \leq tk = O(k)$ in such cases, improving previous polynomial bounds in $k$. Particularly, we prove this for the range families of all axis-aligned strips in $\mathbb{R}^d$, all bottomless and topless rectangles in $\mathbb{R}^2$, and for all unit-height axis-aligned rectangles in $\mathbb{R}^2$.

泛函 · 概率密度函數 · 累積分布函數 · Performer · 平穩的 ·

2023 年 10 月 30 日

More On the Quasi-Stationary Distribution of the Shiryaev--Roberts Diffusion

Soumik Banerjee,Aleksey S. Polunchenko

from arxiv, arXiv admin note: text overlap with arXiv:1907.02676

We consider the classical Shiryaev--Roberts martingale diffusion, $(R_t)_{t\ge0}$, restricted to the interval $[0,A]$, where $A>0$ is a preset absorbing boundary. We take yet another look at the well-known phenomenon of quasi-stationarity (time-invariant probabilistic behavior, conditional on no absorbtion hitherto) exhibited by the diffusion in the temporal limit, as $t\to+\infty$, for each $A>0$. We obtain new upper- and lower-bounds for the quasi-stationary distribution's probability density function (pdf), $q_{A}(x)$; the bounds vary in the trade-off between simplicity and tightness. The bounds imply directly the expected result that $q_{A}(x)$ converges to the pdf, $h(x)$, of the diffusion's stationary distribution, as $A\to+\infty$; the convergence is pointwise, for all $x\ge0$. The bounds also yield an explicit upperbound for the gap between $q_{A}(x)$ and $h(x)$ for a fixed $x$. By virtue of integration the bounds for the pdf $q_{A}(x)$ translate into new bounds for the corresponding cumulative distribution function (cdf), $Q_{A}(x)$. All of our results are established explicitly, using certain latest monotonicity properties of the modified Bessel $K$ function involved in the exact closed-form formula for $q_{A}(x)$ recently obtained by Polunchenko (2017). We conclude with a discussion of potential applications of our results in quickest change-point detection: our bounds allow for a very accurate performance analysis of the so-called randomized Shiryaev--Roberts--Pollak change-point detection procedure.

平滑 · 圖 · Pivotal（公司） · Analysis · 樣例 ·

2023 年 10 月 30 日

Superpolynomial smoothed complexity of 3-FLIP in Local Max-Cut

Lukas Michel,Alex Scott

from arxiv, 18 pages, 3 figures

We construct a graph with $n$ vertices where the smoothed runtime of the 3-FLIP algorithm for the 3-Opt Local Max-Cut problem can be as large as $2^{\Omega(\sqrt{n})}$. This provides the first example where a local search algorithm for the Max-Cut problem can fail to be efficient in the framework of smoothed analysis. We also give a new construction of graphs where the runtime of the FLIP algorithm for the Local Max-Cut problem is $2^{\Omega(n)}$ for any pivot rule. This graph is much smaller and has a simpler structure than previous constructions.

Performer · MoDELS · 蒙特卡羅 · Better · 模型評估 ·

2023 年 10 月 30 日

Efficient Shapley Performance Attribution for Least-Squares Regression

Logan Bell,Nikhil Devanathan,Stephen Boyd

from arxiv, 29 pages, 3 figures

We consider the performance of a least-squares regression model, as judged by out-of-sample $R^2$. Shapley values give a fair attribution of the performance of a model to its input features, taking into account interdependencies between features. Evaluating the Shapley values exactly requires solving a number of regression problems that is exponential in the number of features, so a Monte Carlo-type approximation is typically used. We focus on the special case of least-squares regression models, where several tricks can be used to compute and evaluate regression models efficiently. These tricks give a substantial speed up, allowing many more Monte Carlo samples to be evaluated, achieving better accuracy. We refer to our method as least-squares Shapley performance attribution (LS-SPA), and describe our open-source implementation.

圖 · 泛函 · 寬度 · ICALP · Extensibility ·

2023 年 10 月 29 日

The Weisfeiler-Leman Dimension of Existential Conjunctive Queries

Andreas G?bel,Leslie Ann Goldberg,Marc Roth

from arxiv, 36 pages, 4 figures, abstract shortened due to ArXiv requirements

The Weisfeiler-Leman (WL) dimension of a graph parameter $f$ is the minimum $k$ such that, if $G_1$ and $G_2$ are indistinguishable by the $k$-dimensional WL-algorithm then $f(G_1)=f(G_2)$. The WL-dimension of $f$ is $\infty$ if no such $k$ exists. We study the WL-dimension of graph parameters characterised by the number of answers from a fixed conjunctive query to the graph. Given a conjunctive query $\varphi$, we quantify the WL-dimension of the function that maps every graph $G$ to the number of answers of $\varphi$ in $G$. The works of Dvor\'ak (J. Graph Theory 2010), Dell, Grohe, and Rattan (ICALP 2018), and Neuen (ArXiv 2023) have answered this question for full conjunctive queries, which are conjunctive queries without existentially quantified variables. For such queries $\varphi$, the WL-dimension is equal to the treewidth of the Gaifman graph of $\varphi$. In this work, we give a characterisation that applies to all conjunctive qureies. Given any conjunctive query $\varphi$, we prove that its WL-dimension is equal to the semantic extension width $\mathsf{sew}(\varphi)$, a novel width measure that can be thought of as a combination of the treewidth of $\varphi$ and its quantified star size, an invariant introduced by Durand and Mengel (ICDT 2013) describing how the existentially quantified variables of $\varphi$ are connected with the free variables. Using the recently established equivalence between the WL-algorithm and higher-order Graph Neural Networks (GNNs) due to Morris et al. (AAAI 2019), we obtain as a consequence that the function counting answers to a conjunctive query $\varphi$ cannot be computed by GNNs of order smaller than $\mathsf{sew}(\varphi)$.

可辨認的 · contrastive · Learning · 對比學習 · 對角矩陣 ·

2023 年 10 月 29 日

Identifiable Contrastive Learning with Automatic Feature Importance Discovery

Qi Zhang,Yifei Wang,Yisen Wang

Existing contrastive learning methods rely on pairwise sample contrast $z_x^\top z_{x'}$ to learn data representations, but the learned features often lack clear interpretability from a human perspective. Theoretically, it lacks feature identifiability and different initialization may lead to totally different features. In this paper, we study a new method named tri-factor contrastive learning (triCL) that involves a 3-factor contrast in the form of $z_x^\top S z_{x'}$, where $S=\text{diag}(s_1,\dots,s_k)$ is a learnable diagonal matrix that automatically captures the importance of each feature. We show that by this simple extension, triCL can not only obtain identifiable features that eliminate randomness but also obtain more interpretable features that are ordered according to the importance matrix $S$. We show that features with high importance have nice interpretability by capturing common classwise features, and obtain superior performance when evaluated for image retrieval using a few features. The proposed triCL objective is general and can be applied to different contrastive learning methods like SimCLR and CLIP. We believe that it is a better alternative to existing 2-factor contrastive learning by improving its identifiability and interpretability with minimal overhead. Code is available at //github.com/PKU-ML/Tri-factor-Contrastive-Learning.

隨機梯度下降 · 損失 · 噪聲 · Analysis · SGD ·

2023 年 10 月 28 日

Uniform-in-Time Wasserstein Stability Bounds for (Noisy) Stochastic Gradient Descent

Lingjiong Zhu,Mert Gurbuzbalaban,Anant Raj,Umut Simsekli

from arxiv, 49 pages, NeurIPS 2023

Algorithmic stability is an important notion that has proven powerful for deriving generalization bounds for practical algorithms. The last decade has witnessed an increasing number of stability bounds for different algorithms applied on different classes of loss functions. While these bounds have illuminated various properties of optimization algorithms, the analysis of each case typically required a different proof technique with significantly different mathematical tools. In this study, we make a novel connection between learning theory and applied probability and introduce a unified guideline for proving Wasserstein stability bounds for stochastic optimization algorithms. We illustrate our approach on stochastic gradient descent (SGD) and we obtain time-uniform stability bounds (i.e., the bound does not increase with the number of iterations) for strongly convex losses and non-convex losses with additive noise, where we recover similar results to the prior art or extend them to more general cases by using a single proof technique. Our approach is flexible and can be generalizable to other popular optimizers, as it mainly requires developing Lyapunov functions, which are often readily available in the literature. It also illustrates that ergodicity is an important component for obtaining time-uniform bounds -- which might not be achieved for convex or non-convex losses unless additional noise is injected to the iterates. Finally, we slightly stretch our analysis technique and prove time-uniform bounds for SGD under convex and non-convex losses (without additional additive noise), which, to our knowledge, is novel.

離散化 · Networking · Neural Networks · 講稿 · Performer ·

2023 年 10 月 27 日

Energetic Variational Neural Network Discretizations of Gradient Flows

Ziqing Hu,Chun Liu,Yiwei Wang,Zhiliang Xu

from arxiv, 27 pages, 8 figures

We present a structure-preserving Eulerian algorithm for solving $L^2$-gradient flows and a structure-preserving Lagrangian algorithm for solving generalized diffusions. Both algorithms employ neural networks as tools for spatial discretization. Unlike most existing methods that construct numerical discretizations based on the strong or weak form of the underlying PDE, the proposed schemes are constructed based on the energy-dissipation law directly. This guarantees the monotonic decay of the system's energy, which avoids unphysical states of solutions and is crucial for the long-term stability of numerical computations. To address challenges arising from nonlinear neural-network discretization, we first perform temporal discretization on these variational systems. This approach is computationally memory-efficient when implementing neural network-based algorithms. The proposed neural-network-based schemes are mesh-free, allowing us to solve gradient flows in high dimensions. Various numerical experiments are presented to demonstrate the accuracy and energy stability of the proposed numerical schemes.

Learning · 優化器 · REST ·

2023 年 10 月 27 日

Testing and Learning Quantum Juntas Nearly Optimally

Thomas Chen,Shivam Nadimpalli,Henry Yuen

We consider the problem of testing and learning quantum $k$-juntas: $n$-qubit unitary matrices which act non-trivially on just $k$ of the $n$ qubits and as the identity on the rest. As our main algorithmic results, we give (a) a $\widetilde{O}(\sqrt{k})$-query quantum algorithm that can distinguish quantum $k$-juntas from unitary matrices that are "far" from every quantum $k$-junta; and (b) a $O(4^k)$-query algorithm to learn quantum $k$-juntas. We complement our upper bounds for testing quantum $k$-juntas and learning quantum $k$-juntas with near-matching lower bounds of $\Omega(\sqrt{k})$ and $\Omega(\frac{4^k}{k})$, respectively. Our techniques are Fourier-analytic and make use of a notion of influence of qubits on unitaries.