亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this paper, we provide three applications for $f$-divergences: (i) we introduce Sanov's upper bound on the tail probability of the sum of independent random variables based on super-modular $f$-divergence and show that our generalized Sanov's bound strictly improves over ordinary one, (ii) we consider the lossy compression problem which studies the set of achievable rates for a given distortion and code length. We extend the rate-distortion function using mutual $f$-information and provide new and strictly better bounds on achievable rates in the finite blocklength regime using super-modular $f$-divergences, and (iii) we provide a connection between the generalization error of algorithms with bounded input/output mutual $f$-information and a generalized rate-distortion problem. This connection allows us to bound the generalization error of learning algorithms using lower bounds on the $f$-rate-distortion function. Our bound is based on a new lower bound on the rate-distortion function that (for some examples) strictly improves over previously best-known bounds.

相關內容

學(xue)習(xi)方(fang)法的泛(fan)(fan)化(hua)(hua)能(neng)(neng)力(li)(Generalization Error)是由該方(fang)法學(xue)習(xi)到(dao)的模型對未知數據的預測能(neng)(neng)力(li),是學(xue)習(xi)方(fang)法本(ben)質上重要的性(xing)質。現實中(zhong)采用最多的辦(ban)法是通(tong)過測試泛(fan)(fan)化(hua)(hua)誤(wu)差來評價學(xue)習(xi)方(fang)法的泛(fan)(fan)化(hua)(hua)能(neng)(neng)力(li)。泛(fan)(fan)化(hua)(hua)誤(wu)差界刻畫了(le)學(xue)習(xi)算法的經(jing)驗風險(xian)與(yu)期望風險(xian)之(zhi)間(jian)偏差和收斂速度。一個機器(qi)(qi)學(xue)習(xi)的泛(fan)(fan)化(hua)(hua)誤(wu)差(Generalization Error),是一個描述學(xue)生機器(qi)(qi)在(zai)從樣(yang)品數據中(zhong)學(xue)習(xi)之(zhi)后,離教師機器(qi)(qi)之(zhi)間(jian)的差距的函數。

The kernel-based method has been successfully applied in linear system identification using stable kernel designs. From a Gaussian process perspective, it automatically provides probabilistic error bounds for the identified models from the posterior covariance, which are useful in robust and stochastic control. However, the error bounds require knowledge of the true hyperparameters in the kernel design and are demonstrated to be inaccurate with estimated hyperparameters for lightly damped systems or in the presence of high noise. In this work, we provide reliable quantification of the estimation error when the hyperparameters are unknown. The bounds are obtained by first constructing a high-probability set for the true hyperparameters from the marginal likelihood function and then finding the worst-case posterior covariance within the set. The proposed bound is proven to contain the true model with a high probability and its validity is verified in numerical simulation.

Our work explores the hardness of $3$SUM instances without certain additive structures, and its applications. As our main technical result, we show that solving $3$SUM on a size-$n$ integer set that avoids solutions to $a+b=c+d$ for $\{a, b\} \ne \{c, d\}$ still requires $n^{2-o(1)}$ time, under the $3$SUM hypothesis. Such sets are called Sidon sets and are well-studied in the field of additive combinatorics. - Combined with previous reductions, this implies that the All-Edges Sparse Triangle problem on $n$-vertex graphs with maximum degree $\sqrt{n}$ and at most $n^{k/2}$ $k$-cycles for every $k \ge 3$ requires $n^{2-o(1)}$ time, under the $3$SUM hypothesis. This can be used to strengthen the previous conditional lower bounds by Abboud, Bringmann, Khoury, and Zamir [STOC'22] of $4$-Cycle Enumeration, Offline Approximate Distance Oracle and Approximate Dynamic Shortest Path. In particular, we show that no algorithm for the $4$-Cycle Enumeration problem on $n$-vertex $m$-edge graphs with $n^{o(1)}$ delays has $O(n^{2-\varepsilon})$ or $O(m^{4/3-\varepsilon})$ pre-processing time for $\varepsilon >0$. We also present a matching upper bound via simple modifications of the known algorithms for $4$-Cycle Detection. - A slight generalization of the main result also extends the result of Dudek, Gawrychowski, and Starikovskaya [STOC'20] on the $3$SUM hardness of nontrivial 3-Variate Linear Degeneracy Testing (3-LDTs): we show $3$SUM hardness for all nontrivial 4-LDTs. The proof of our main technical result combines a wide range of tools: Balog-Szemer{\'e}di-Gowers theorem, sparse convolution algorithm, and a new almost-linear hash function with almost $3$-universal guarantee for integers that do not have small-coefficient linear relations.

Given a $k\times n$ integer primitive matrix $\bf{A}$ (i.e., a matrix can be extended to an $n\times n$ unimodular matrix over the integers) with the maximal absolute value of entries $\|\bf{A}\|$ bounded by {an integer} $\lambda$ from above, we study the probability that the $m\times n$ matrix extended from $\bf{A}$ by appending other $m-k$ row vectors of dimension $n$ with entries chosen randomly and independently from the uniform distribution over $\{0, 1,\ldots, \lambda-1\}$ is still primitive. We present a complete and rigorous proof of a lower bound on the probability, which is at least a constant for fixed $m$ in the range $[k+1, n-4]$. As an application, we prove that there exists a fast Las Vegas algorithm that completes a $k\times n$ primitive matrix $\bf{A}$ to an $n\times n$ unimodular matrix within expected $\tilde{O}(n^{\omega}\log \|\bf{A}\|)$ bit operations, where $\tilde{O}$ is big-$O$ but without log factors, $\omega$ is the exponent on the arithmetic operations of matrix multiplication.

The matrix sensing problem is an important low-rank optimization problem that has found a wide range of applications, such as matrix completion, phase synchornization/retrieval, robust PCA, and power system state estimation. In this work, we focus on the general matrix sensing problem with linear measurements that are corrupted by random noise. We investigate the scenario where the search rank $r$ is equal to the true rank $r^*$ of the unknown ground truth (the exact parametrized case), as well as the scenario where $r$ is greater than $r^*$ (the overparametrized case). We quantify the role of the restricted isometry property (RIP) in shaping the landscape of the non-convex factorized formulation and assisting with the success of local search algorithms. First, we develop a global guarantee on the maximum distance between an arbitrary local minimizer of the non-convex problem and the ground truth under the assumption that the RIP constant is smaller than $1/(1+\sqrt{r^*/r})$. We then present a local guarantee for problems with an arbitrary RIP constant, which states that any local minimizer is either considerably close to the ground truth or far away from it. More importantly, we prove that this noisy, overparametrized problem exhibits the strict saddle property, which leads to the global convergence of perturbed gradient descent algorithm in polynomial time. The results of this work provide a comprehensive understanding of the geometric landscape of the matrix sensing problem in the noisy and overparametrized regime.

(Stochastic) bilevel optimization is a frequently encountered problem in machine learning with a wide range of applications such as meta-learning, hyper-parameter optimization, and reinforcement learning. Most of the existing studies on this problem only focused on analyzing the convergence or improving the convergence rate, while little effort has been devoted to understanding its generalization behaviors. In this paper, we conduct a thorough analysis on the generalization of first-order (gradient-based) methods for the bilevel optimization problem. We first establish a fundamental connection between algorithmic stability and generalization error in different forms and give a high probability generalization bound which improves the previous best one from $\bigO(\sqrt{n})$ to $\bigO(\log n)$, where $n$ is the sample size. We then provide the first stability bounds for the general case where both inner and outer level parameters are subject to continuous update, while existing work allows only the outer level parameter to be updated. Our analysis can be applied in various standard settings such as strongly-convex-strongly-convex (SC-SC), convex-convex (C-C), and nonconvex-nonconvex (NC-NC). Our analysis for the NC-NC setting can also be extended to a particular nonconvex-strongly-convex (NC-SC) setting that is commonly encountered in practice. Finally, we corroborate our theoretical analysis and demonstrate how iterations can affect the generalization error by experiments on meta-learning and hyper-parameter optimization.

Coverings of convex bodies have emerged as a central component in the design of efficient solutions to approximation problems involving convex bodies. Intuitively, given a convex body $K$ and $\epsilon> 0$, a covering is a collection of convex bodies whose union covers $K$ such that a constant factor expansion of each body lies within an $\epsilon$ expansion of $K$. Coverings have been employed in many applications, such as approximations for diameter, width, and $\epsilon$-kernels of point sets, approximate nearest neighbor searching, polytope approximations, and approximations to the Closest Vector Problem (CVP). It is known how to construct coverings of size $n^{O(n)} / \epsilon^{(n-1)/2}$ for general convex bodies in $\textbf{R}^n$. In special cases, such as when the convex body is the $\ell_p$ unit ball, this bound has been improved to $2^{O(n)} / \epsilon^{(n-1)/2}$. This raises the question of whether such a bound generally holds. In this paper we answer the question in the affirmative. We demonstrate the power and versatility of our coverings by applying them to the problem of approximating a convex body by a polytope, under the Banach-Mazur metric. Given a well-centered convex body $K$ and an approximation parameter $\epsilon> 0$, we show that there exists a polytope $P$ consisting of $2^{O(n)} / \epsilon^{(n-1)/2}$ vertices (facets) such that $K \subset P \subset K(1+\epsilon)$. This bound is optimal in the worst case up to factors of $2^{O(n)}$. As an additional consequence, we obtain the fastest $(1+\epsilon)$-approximate CVP algorithm that works in any norm, with a running time of $2^{O(n)} / \epsilon ^{(n-1)/2}$ up to polynomial factors in the input size, and we obtain the fastest $(1+\epsilon)$-approximation algorithm for integer programming. We also present a framework for constructing coverings of optimal size for any convex body (up to factors of $2^{O(n)}$).

The Sinc approximation applied to double-exponentially decaying functions is referred to as the DE-Sinc approximation. This approximation has notably been utilized for many applications because of its high efficiency. The Sinc approximation's mesh size and truncation numbers should be optimally selected to avail its full performance. However, the usual formula has only been ``near-optimally'' selected because the optimal formula between the two cannot be expressed in terms of elementary functions. In this study, we propose two improved formulas. The first one is based on the concept by an earlier research that produced an improved selection formula for the double-exponential formula. The formula performed better than the usual one, but was still not optimal. As a second formula, we introduce a new parameter to propose a truly optimal formula between the two. We give explicit error bounds for both formulas. Numerical comparisons show that the first formula gives a better error bound than the standard formula, and the second formula gives a far better error bound than both the standard and first formulas.

Model parameter regularization is a widely used technique to improve generalization, but also can be used to shape the weight distributions for various purposes. In this work, we shed light on how weight regularization can assist model quantization and compression techniques, and then propose range regularization (R^2) to further boost the quality of model optimization by focusing on the outlier prevention. By effectively regulating the minimum and maximum weight values from a distribution, we mold the overall distribution into a tight shape so that model compression and quantization techniques can better utilize their limited numeric representation powers. We introduce L-inf regularization, its extension margin regularization and a new soft-min-max regularization to be used as a regularization loss during full-precision model training. Coupled with state-of-the-art quantization and compression techniques, models trained with R^2 perform better on an average, specifically at lower bit weights with 16x compression ratio. We also demonstrate that R^2 helps parameter constrained models like MobileNetV1 achieve significant improvement of around 8% for 2 bit quantization and 7% for 1 bit compression.

The Chebyshev or $\ell_{\infty}$ estimator is an unconventional alternative to the ordinary least squares in solving linear regressions. It is defined as the minimizer of the $\ell_{\infty}$ objective function \begin{align*} \hat{\boldsymbol{\beta}} := \arg\min_{\boldsymbol{\beta}} \|\boldsymbol{Y} - \mathbf{X}\boldsymbol{\beta}\|_{\infty}. \end{align*} The asymptotic distribution of the Chebyshev estimator under fixed number of covariates was recently studied (Knight, 2020), yet finite sample guarantees and generalizations to high-dimensional settings remain open. In this paper, we develop non-asymptotic upper bounds on the estimation error $\|\hat{\boldsymbol{\beta}}-\boldsymbol{\beta}^*\|_2$ for a Chebyshev estimator $\hat{\boldsymbol{\beta}}$, in a regression setting with uniformly distributed noise $\varepsilon_i\sim U([-a,a])$ where $a$ is either known or unknown. With relatively mild assumptions on the (random) design matrix $\mathbf{X}$, we can bound the error rate by $\frac{C_p}{n}$ with high probability, for some constant $C_p$ depending on the dimension $p$ and the law of the design. Furthermore, we illustrate that there exist designs for which the Chebyshev estimator is (nearly) minimax optimal. On the other hand we also argue that there exist designs for which this estimator behaves sub-optimally in terms of the constant $C_p$'s dependence on $p$. In addition we show that "Chebyshev's LASSO" has advantages over the regular LASSO in high dimensional situations, provided that the noise is uniform. Specifically, we argue that it achieves a much faster rate of estimation under certain assumptions on the growth rate of the sparsity level and the ambient dimension with respect to the sample size.

Diffusion models are a class of deep generative models that have shown impressive results on various tasks with dense theoretical founding. Although diffusion models have achieved impressive quality and diversity of sample synthesis than other state-of-the-art models, they still suffer from costly sampling procedure and sub-optimal likelihood estimation. Recent studies have shown great enthusiasm on improving the performance of diffusion model. In this article, we present a first comprehensive review of existing variants of the diffusion models. Specifically, we provide a first taxonomy of diffusion models and categorize them variants to three types, namely sampling-acceleration enhancement, likelihood-maximization enhancement and data-generalization enhancement. We also introduce in detail other five generative models (i.e., variational autoencoders, generative adversarial networks, normalizing flow, autoregressive models, and energy-based models), and clarify the connections between diffusion models and these generative models. Then we make a thorough investigation into the applications of diffusion models, including computer vision, natural language processing, waveform signal processing, multi-modal modeling, molecular graph generation, time series modeling, and adversarial purification. Furthermore, we propose new perspectives pertaining to the development of this generative model.

北京阿比特科技有限公司