亚洲黄色网站不卡免费_亚洲色精品一区二区色欲AV_十八禁无码永久在线观看_日本欧美色综合网站免费_国产日韩在线视频播放观看_香港经典A毛片在线免费观看_黄页免费在线观看视频

We study quantile trend filtering, a recently proposed method for nonparametric quantile regression with the goal of generalizing existing risk bounds known for the usual trend filtering estimators which perform mean regression. We study both the penalized and the constrained version (of order $r \geq 1$) of univariate quantile trend filtering. Our results show that both the constrained and the penalized version (of order $r \geq 1$) attain the minimax rate up to log factors, when the $(r-1)$th discrete derivative of the true vector of quantiles belongs to the class of bounded variation signals. Moreover we also show that if the true vector of quantiles is a discrete spline with a few polynomial pieces then both versions attain a near parametric rate of convergence. Corresponding results for the usual trend filtering estimators are known to hold only when the errors are sub-Gaussian. In contrast, our risk bounds are shown to hold under minimal assumptions on the error variables. In particular, no moment assumptions are needed and our results hold under heavy-tailed errors. Our proof techniques are general and thus can potentially be used to study other nonparametric quantile regression methods. To illustrate this generality we also employ our proof techniques to obtain new results for multivariate quantile total variation denoising and high dimensional quantile linear regression.

相關內容

估計/估計量

關注 3

SGD · 單峰值 · 隨機梯度下降 · 泛化理論 · MoDELS ·

2021 年 10 月 13 日

On the Double Descent of Random Features Models Trained with SGD

Fanghui Liu,Johan A. K. Suykens,Volkan Cevher

from arxiv, 37 pages, 2 figures

We study generalization properties of random features (RF) regression in high dimensions optimized by stochastic gradient descent (SGD). In this regime, we derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting, and observe the double descent phenomenon both theoretically and empirically. Our analysis shows how to cope with multiple randomness sources of initialization, label noise, and data sampling (as well as stochastic gradients) with no closed-form solution, and also goes beyond the commonly-used Gaussian/spherical data assumption. Our theoretical results demonstrate that, with SGD training, RF regression still generalizes well for interpolation learning, and is able to characterize the double descent behavior by the unimodality of variance and monotonic decrease of bias. Besides, we also prove that the constant step-size SGD setting incurs no loss in convergence rate when compared to the exact minimal-norm interpolator, as a theoretical justification of using SGD in practice.

泛化理論 · PDE · DRM · 優化器 · FAST ·

2021 年 10 月 13 日

Machine Learning For Elliptic PDEs: Fast Rate Generalization Bound, Neural Scaling Law and Minimax Optimality

Yiping Lu,Haoxuan Chen,Jianfeng Lu,Lexing Ying,Jose Blanchet

In this paper, we study the statistical limits of deep learning techniques for solving elliptic partial differential equations (PDEs) from random samples using the Deep Ritz Method (DRM) and Physics-Informed Neural Networks (PINNs). To simplify the problem, we focus on a prototype elliptic PDE: the Schr\"odinger equation on a hypercube with zero Dirichlet boundary condition, which has wide application in the quantum-mechanical systems. We establish upper and lower bounds for both methods, which improves upon concurrently developed upper bounds for this problem via a fast rate generalization bound. We discover that the current Deep Ritz Methods is sub-optimal and propose a modified version of it. We also prove that PINN and the modified version of DRM can achieve minimax optimal bounds over Sobolev spaces. Empirically, following recent work which has shown that the deep model accuracy will improve with growing training sets according to a power law, we supply computational experiments to show a similar behavior of dimension dependent power law for deep PDE solvers.

估計/估計量 · 推斷 · 值域 · binary · CASES ·

2021 年 10 月 13 日

Estimation and Inference of Extremal Quantile Treatment Effects for Heavy-Tailed Distributions

David Deuber,Jinzhou Li,Sebastian Engelke,Marloes H. Maathuis

Causal inference for extreme events has many potential applications in fields such as medicine, climate science and finance. We study the extremal quantile treatment effect of a binary treatment on a continuous, heavy-tailed outcome. Existing methods are limited to the case where the quantile of interest is within the range of the observations. For applications in risk assessment, however, the most relevant cases relate to extremal quantiles that go beyond the data range. We introduce an estimator of the extremal quantile treatment effect that relies on asymptotic tail approximations and uses a new causal Hill estimator for the extreme value indices of potential outcome distributions. We establish asymptotic normality of the estimators even in the setting of extremal quantiles, and we propose a consistent variance estimator to achieve valid statistical inference. In simulation studies we illustrate the advantages of our methodology over competitors, and we apply it to a real data set.

分解的 · 元學習 · 秩 · 次最優 · 層 ·

2021 年 10 月 12 日

Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Uncertainty

Jeffrey Ryan Willette,Hae Beom Lee,Juho Lee,Sung Ju Hwang

Numerous recent works utilize bi-Lipschitz regularization of neural network layers to preserve relative distances between data instances in the feature spaces of each layer. This distance sensitivity with respect to the data aids in tasks such as uncertainty calibration and out-of-distribution (OOD) detection. In previous works, features extracted with a distance sensitive model are used to construct feature covariance matrices which are used in deterministic uncertainty estimation or OOD detection. However, in cases where there is a distribution over tasks, these methods result in covariances which are sub-optimal, as they may not leverage all of the meta information which can be shared among tasks. With the use of an attentive set encoder, we propose to meta learn either diagonal or diagonal plus low-rank factors to efficiently construct task specific covariance matrices. Additionally, we propose an inference procedure which utilizes scaled energy to achieve a final predictive distribution which can better separate OOD data, and is well calibrated under a distributional dataset shift.

UniFormer · 層 · Lipschitz連續 · 離散化 · Continuity ·

2021 年 10 月 12 日

As Easy as ABC: Adaptive Binning Coincidence Test for Uniformity Testing

Sudeep Salgia,Qing Zhao,Lang Tong

We consider the problem of uniformity testing of Lipschitz continuous distributions with bounded support. The alternative hypothesis is a composite set of Lipschitz continuous distributions that are at least $\varepsilon$ away in $\ell_1$ distance from the uniform distribution. We propose a sequential test that adapts to the unknown distribution under the alternative hypothesis. Referred to as the Adaptive Binning Coincidence (ABC) test, the proposed strategy adapts in two ways. First, it partitions the set of alternative distributions into layers based on their distances to the uniform distribution. It then sequentially eliminates the alternative distributions layer by layer in decreasing distance to the uniform, and subsequently takes advantage of favorable situations of a distant alternative by exiting early. Second, it adapts, across layers of the alternative distributions, the resolution level of the discretization for computing the coincidence statistic. The farther away the layer is from the uniform, the coarser the discretization is needed for eliminating/exiting this layer. It thus exits both early in the detection process and quickly by using a lower resolution to take advantage of favorable alternative distributions. The ABC test builds on a novel sequential coincidence test for discrete distributions, which is of independent interest. We establish the sample complexity of the proposed tests as well as a lower bound.

泛化理論 · Performer · Better · 測試數據 · FAST ·

2021 年 10 月 12 日

Domain Generalization via Domain-based Covariance Minimization

Anqi Wu

Researchers have been facing a difficult problem that data generation mechanisms could be influenced by internal or external factors leading to the training and test data with quite different distributions, consequently traditional classification or regression from the training set is unable to achieve satisfying results on test data. In this paper, we address this nontrivial domain generalization problem by finding a central subspace in which domain-based covariance is minimized while the functional relationship is simultaneously maximally preserved. We propose a novel variance measurement for multiple domains so as to minimize the difference between conditional distributions across domains with solid theoretical demonstration and supports, meanwhile, the algorithm preserves the functional relationship via maximizing the variance of conditional expectations given output. Furthermore, we also provide a fast implementation that requires much less computation and smaller memory for large-scale matrix operations, suitable for not only domain generalization but also other kernel-based eigenvalue decompositions. To show the practicality of the proposed method, we compare our methods against some well-known dimension reduction and domain generalization techniques on both synthetic data and real-world applications. We show that for small-scale datasets, we are able to achieve better quantitative results indicating better generalization performance over unseen test datasets. For large-scale problems, the proposed fast implementation maintains the quantitative performance but at a substantially lower computational cost.

SGD · 線性回歸 · 線性的 · 優化器 · 隨機梯度下降 ·

2021 年 10 月 12 日

Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression

Jingfeng Wu,Difan Zou,Vladimir Braverman,Quanquan Gu,Sham M. Kakade

from arxiv, 40 pages, 2 figures

Stochastic gradient descent (SGD) has been demonstrated to generalize well in many deep learning applications. In practice, one often runs SGD with a geometrically decaying stepsize, i.e., a constant initial stepsize followed by multiple geometric stepsize decay, and uses the last iterate as the output. This kind of SGD is known to be nearly minimax optimal for classical finite-dimensional linear regression problems (Ge et al., 2019), and provably outperforms SGD with polynomially decaying stepsize in terms of the statistical minimax rates. However, a sharp analysis for the last iterate of SGD with decaying step size in the overparameterized setting is still open. In this paper, we provide problem-dependent analysis on the last iterate risk bounds of SGD with decaying stepsize, for (overparameterized) linear regression problems. In particular, for SGD with geometrically decaying stepsize (or tail geometrically decaying stepsize), we prove nearly matching upper and lower bounds on the excess risk. Our results demonstrate the generalization ability of SGD for a wide class of overparameterized problems, and can recover the minimax optimal results up to logarithmic factors in the classical regime. Moreover, we provide an excess risk lower bound for SGD with polynomially decaying stepsize and illustrate the advantage of geometrically decaying stepsize in an instance-wise manner, which complements the minimax rate comparison made in previous work.

估計/估計量 · 可辨認的 · 均值 · 全 · MoDELS ·

2021 年 10 月 12 日

Identification and estimation of nonignorable missing outcome mean without identifying the full data distribution

Wei Li,Wang Miao,Eric Tchetgen Tchetgen

from arxiv, 37 pages, 1 figure and 2 tables

We consider the problem of making inference about the population outcome mean of an outcome variable subject to nonignorable missingness. By leveraging a so-called shadow variable for the outcome, we propose a novel condition that ensures nonparametric identification of the outcome mean, although the full data distribution is not identified. The identifying condition requires the existence of a function as a solution to a representer equation that connects the shadow variable to the outcome mean. Under this condition, we use sieves to nonparametrically solve the representer equation and propose an estimator which avoids modeling the propensity score or the outcome regression. We establish the asymptotic properties of the proposed estimator. We also show that the estimator is locally efficient and attains the semiparametric efficiency bound for the shadow variable model under certain regularity conditions. We illustrate the proposed approach via simulations and a real data application on home pricing.

子空間 · CASE · TCS · STOC · Better ·

2021 年 10 月 12 日

List-decodable Codes and Covering Codes

Hao Chen

from arxiv, 41 pages, extended to other metrics, a generalized Singleton upper bound for average-radius list-decodable codes added, McEliece-Rodemich-Rumsey-Welch bound compared

The list-decodable code has been an active topic in theoretical computer science since the seminal papers of M. Sudan and V. Guruswami in 1997-1998. List-decodable codes are also considered in rank-metric, subspace metric, cover-metric, pair metric and insdel metric settings. In this paper we show that rates, list-decodable radius and list sizes are closely related to the classical topic of covering codes. We prove new general simple but strong upper bounds for list-decodable codes in general finite metric spaces based on various covering codes of finite metric spaces. The general covering code upper bounds can apply to the case when the volumes of the balls depend on the centers, not only on the radius case. Then any good upper bound on the covering radius or the size of covering code imply a good upper bound on the size of list-decodable codes.Our results give exponential improvements on the recent generalized Singleton upper bound in STOC 2020 for Hamming metric list-decodable codes, when the code lengths are large. Even for the list size $L=1$ case our covering code upper bounds give highly non-trivial upper bounds on the sizes of codes with the given minimum distance.The generalized Singleton upper bound for average-radius list-decodable codes is given. The asymptotic forms of covering code bounds can partially recover the Blinovsky bound and the combinatorial bound of Guruswami-H{\aa}stad-Sudan-Zuckerman in Hamming metric setting. We also suggest to study the combinatorial covering list-decodable codes as a natural generalization of combinatorial list-decodable codes. We apply our general covering code upper bounds for list-decodable rank-metric codes, list-decodable subspace codes, list-decodable insertion codes and list-decodable deletion codes. Some new better results about non-list-decodability of rank-metric codes and subspace codes are obtained.

Performer · 估計/估計量 · 經驗風險最小化 · 經驗風險 · 方差 ·

2017 年 12 月 14 日

Variance-based regularization with convex objectives

John Duchi,Hongseok Namkoong

We develop an approach to risk minimization and stochastic optimization that provides a convex surrogate for variance, allowing near-optimal and computationally efficient trading between approximation and estimation error. Our approach builds off of techniques for distributionally robust optimization and Owen's empirical likelihood, and we provide a number of finite-sample and asymptotic results characterizing the theoretical performance of the estimator. In particular, we show that our procedure comes with certificates of optimality, achieving (in some scenarios) faster rates of convergence than empirical risk minimization by virtue of automatically balancing bias and variance. We give corroborating empirical evidence showing that in practice, the estimator indeed trades between variance and absolute performance on a training sample, improving out-of-sample (test) performance over standard empirical risk minimization for a number of classification problems.