男插曲女视频免费观看,欧美日韩国产在线一区二区观看,最新亚洲中文字幕,日韩性爱免费视频无码

We consider the problem of estimating the factors of a rank-$1$ matrix with i.i.d. Gaussian, rank-$1$ measurements that are nonlinearly transformed and corrupted by noise. Considering two prototypical choices for the nonlinearity, we study the convergence properties of a natural alternating update rule for this nonconvex optimization problem starting from a random initialization. We show sharp convergence guarantees for a sample-split version of the algorithm by deriving a deterministic recursion that is accurate even in high-dimensional problems. Notably, while the infinite-sample population update is uninformative and suggests exact recovery in a single step, the algorithm -- and our deterministic prediction -- converges geometrically fast from a random initialization. Our sharp, non-asymptotic analysis also exposes several other fine-grained properties of this problem, including how the nonlinearity and noise level affect convergence behavior. On a technical level, our results are enabled by showing that the empirical error recursion can be predicted by our deterministic sequence within fluctuations of the order $n^{-1/2}$ when each iteration is run with $n$ observations. Our technique leverages leave-one-out tools originating in the literature on high-dimensional $M$-estimation and provides an avenue for sharply analyzing higher-order iterative algorithms from a random initialization in other high-dimensional optimization problems with random data.

相關內容

隨機初始化

關注 0

泛函 · 特化 · 線性的 · 相似度 · binary ·

2024 年 11 月 8 日

Some notes on the pseudorandomness of Legendre symbol and Liouville function

Johannes Grünberger,Arne Winterhof

We improve bounds on the degree and sparsity of Boolean functions representing the Legendre symbol as well as on the $N$th linear complexity of the Legendre sequence. We also prove similar results for both the Liouville function for integers and its analog for polynomials over $\mathbb{F}_2$, or more general for any (binary) arithmetic function which satisfies $f(2n)=-f(n)$ for $n=1,2,\ldots$

MoDELS · Learning · 泛函 · Sphering · 情景 ·

2024 年 11 月 8 日

Learnability of high-dimensional targets by two-parameter models and gradient flow

Dmitry Yarotsky

from arxiv, Camera-ready NeurIPS 2024 version; some extra comments and figures

We explore the theoretical possibility of learning $d$-dimensional targets with $W$-parameter models by gradient flow (GF) when $W<d$. Our main result shows that if the targets are described by a particular $d$-dimensional probability distribution, then there exist models with as few as two parameters that can learn the targets with arbitrarily high success probability. On the other hand, we show that for $W<d$ there is necessarily a large subset of GF-non-learnable targets. In particular, the set of learnable targets is not dense in $\mathbb R^d$, and any subset of $\mathbb R^d$ homeomorphic to the $W$-dimensional sphere contains non-learnable targets. Finally, we observe that the model in our main theorem on almost guaranteed two-parameter learning is constructed using a hierarchical procedure and as a result is not expressible by a single elementary function. We show that this limitation is essential in the sense that most models written in terms of elementary functions cannot achieve the learnability demonstrated in this theorem.

離散化 · Performer · 可約的 · 查準率/準確率 · Atom（文本編輯器） ·

2024 年 11 月 7 日

A high-order accurate moving mesh finite element method for the radial Kohn--Sham equation

Zheming Luo,Yang Kuang

In this paper, we introduce a highly accurate and efficient numerical solver for the radial Kohn--Sham equation. The equation is discretized using a high-order finite element method, with its performance further improved by incorporating a parameter-free moving mesh technique. This approach greatly reduces the number of elements required to achieve the desired precision. In practice, the mesh redistribution involves no more than three steps, ensuring the algorithm remains computationally efficient. Remarkably, with a maximum of $13$ elements, we successfully reproduce the NIST database results for elements with atomic numbers ranging from $1$ to $92$.

泛函 · 表示定理 · 表示 · 統計理論 ·

2024 年 11 月 7 日

Maxitive functions with respect to general orders

M. Kupper,J. M. Zapata

In decision-making, maxitive functions are used for worst-case and best-case evaluations. Maxitivity gives rise to a rich structure that is well-studied in the context of the pointwise order. In this article, we investigate maxitivity with respect to general preorders and provide a representation theorem for such functionals. The results are illustrated for different stochastic orders in the literature, including the usual stochastic order, the increasing convex/concave order, and the dispersive order.

閾值 · 近似 · 統計量 · 樣本 · 控制器 ·

2024 年 11 月 6 日

Improving the (approximate) sequential probability ratio test by avoiding overshoot

Lasse Fischer,Aaditya Ramdas

from arxiv, 23 pages, 7 figures

The sequential probability ratio test (SPRT) by Wald (1945) is a cornerstone of sequential analysis. Based on desired type-I, II error levels $\alpha, \beta \in (0,1)$, it stops when the likelihood ratio statistic crosses certain upper and lower thresholds, guaranteeing optimality of the expected sample size. However, these thresholds are not closed form and the test is often applied with approximate thresholds $(1-\beta)/\alpha$ and $\beta/(1-\alpha)$ (approximate SPRT). When $\beta > 0$, this neither guarantees type I,II error control at $\alpha,\beta$ nor optimality. When $\beta=0$ (power-one SPRT), it guarantees type I error control at $\alpha$ that is in general conservative, and thus not optimal. The looseness in both cases is caused by overshoot: the test statistic overshoots the thresholds at the stopping time. One standard way to address this is to calculate the right thresholds numerically, but many papers and software packages do not do this. In this paper, we describe a different way to improve the approximate SPRT: we change the test statistic to avoid overshoot. Our technique uniformly improves power-one SPRTs $(\beta=0)$ for simple nulls and alternatives, or for one-sided nulls and alternatives in exponential families. When $\beta > 0$, our techniques provide valid type I and type II error guarantees, while needing less samples than Wald's approximated thresholds in all considered simulations. These improved sequential tests can also be used for deriving tighter parametric confidence sequences, and can be extended to nontrivial settings like sampling without replacement and conformal martingales.

相互獨立的 · 閾值 · 可辨認的 · UniFormer · 均勻分布 ·

2024 年 11 月 6 日

On the satisfiability of random $3$-SAT formulas with $k$-wise independent clauses

Ioannis Caragiannis,Nick Gravin,Zhile Jiang

from arxiv, 26 pages, 1 fugure

The problem of identifying the satisfiability threshold of random $3$-SAT formulas has received a lot of attention during the last decades and has inspired the study of other threshold phenomena in random combinatorial structures. The classical assumption in this line of research is that, for a given set of $n$ Boolean variables, each clause is drawn uniformly at random among all sets of three literals from these variables, independently from other clauses. Here, we keep the uniform distribution of each clause, but deviate significantly from the independence assumption and consider richer families of probability distributions. For integer parameters $n$, $m$, and $k$, we denote by $\DistFamily_k(n,m)$ the family of probability distributions that produce formulas with $m$ clauses, each selected uniformly at random from all sets of three literals from the $n$ variables, so that the clauses are $k$-wise independent. Our aim is to make general statements about the satisfiability or unsatisfiability of formulas produced by distributions in $\DistFamily_k(n,m)$ for different values of the parameters $n$, $m$, and $k$.

隨機梯度下降 · 動量 · 情景 · 超參數 · 學習率 ·

2024 年 11 月 6 日

Exponential convergence rates for momentum stochastic gradient descent in the overparametrized setting

Benjamin Gess,Sebastian Kassing

We prove explicit bounds on the exponential rate of convergence for the momentum stochastic gradient descent scheme (MSGD) for arbitrary, fixed hyperparameters (learning rate, friction parameter) and its continuous-in-time counterpart in the context of non-convex optimization. In the small step-size regime and in the case of flat minima or large noise intensities, these bounds prove faster convergence of MSGD compared to plain stochastic gradient descent (SGD). The results are shown for objective functions satisfying a local Polyak-Lojasiewicz inequality and under assumptions on the variance of MSGD that are satisfied in overparametrized settings. Moreover, we analyze the optimal choice of the friction parameter and show that the MSGD process almost surely converges to a local minimum.

統計量 · MoDELS · 近似 · 類別 · 非線性模型 ·

2024 年 11 月 6 日

An efficient scheme for approximating long-time dynamics of a class of non-linear models

Jack Coleman,Daozhi Han,Xiaoming Wang

We propose a novel, highly efficient, second-order accurate, long-time unconditionally stable numerical scheme for a class of finite-dimensional nonlinear models that are of importance in geophysical fluid dynamics. The scheme is highly efficient in the sense that only a (fixed) symmetric positive definite linear problem (with varying right hand sides) is involved at each time-step. The solutions to the scheme are uniformly bounded for all time. We show that the scheme is able to capture the long-time dynamics of the underlying geophysical model, with the global attractors as well as the invariant measures of the scheme converge to those of the original model as the step size approaches zero. In our numerical experiments, we take an indirect approach, using long-term statistics to approximate the invariant measures. Our results suggest that the convergence rate of the long-term statistics, as a function of terminal time, is approximately first order using the Jensen-Shannon metric and half-order using the L1 metric. This implies that very long time simulation is needed in order to capture a few significant digits of long time statistics (climate) correct. Nevertheless, the second order scheme's performance remains superior to that of the first order one, requiring significantly less time to reach a small neighborhood of statistical equilibrium for a given step size.

Processing（編程語言） · 優化器 · 泛函 · 類別 · 規范化的 ·

2024 年 11 月 6 日

Asymptotically optimal Wasserstein couplings for the small-time stable domain of attraction

Jorge González Cázares,David Kramer-Bang,Aleksandar Mijatovi?

from arxiv, 42 pages, 2 figures, for a short YouTube video describing the results, see //youtu.be/76eJD6a8Kko?si=5OkdWw4AiNp0P1po

We develop two novel couplings between general pure-jump L\'evy processes in $\R^d$ and apply them to obtain upper bounds on the rate of convergence in an appropriate Wasserstein distance on the path space for a wide class of L\'evy processes attracted to a multidimensional stable process in the small-time regime. We also establish general lower bounds based on certain universal properties of slowly varying functions and the relationship between the Wasserstein and Toscani--Fourier distances of the marginals. Our upper and lower bounds typically have matching rates. In particular, the rate of convergence is polynomial for the domain of normal attraction and slower than a slowly varying function for the domain of non-normal attraction.

估計/估計量 · SimPLe · 近似 · 論文 · 近似誤差 ·

2024 年 11 月 6 日

Upper bound of high-order derivatives for Wachspress coordinates on polytopes

Pengjie Tian,Yanqiu Wang

The gradient bounds of generalized barycentric coordinates play an essential role in the $H^1$ norm approximation error estimate of generalized barycentric interpolations. Similarly, the $H^k$ norm, $k>1$, estimate needs upper bounds of high-order derivatives, which are not available in the literature. In this paper, we derive such upper bounds for the Wachspress generalized barycentric coordinates on simple convex $d$-dimensional polytopes, $d\ge 1$. The result can be used to prove optimal convergence for Wachspress-based polytopal finite element approximation of, for example, fourth-order elliptic equations. Another contribution of this paper is to compare various shape-regularity conditions for simple convex polytopes, and to clarify their relations using knowledge from convex geometry.