亚洲AV午夜成人片精品网站听书_日韩精品大片一区二区三区四区_老牛影视亚洲精品无码资源站_日本一道综合久久aⅴ久久_精品久久久久久精品日日_精品视频一区二区三区四区五区_国产综合精品一区二区

Popular iterative algorithms such as boosting methods and coordinate descent on linear models converge to the maximum $\ell_1$-margin classifier, a.k.a. sparse hard-margin SVM, in high dimensional regimes where the data is linearly separable. Previous works consistently show that many estimators relying on the $\ell_1$-norm achieve improved statistical rates for hard sparse ground truths. We show that surprisingly, this adaptivity does not apply to the maximum $\ell_1$-margin classifier for a standard discriminative setting. In particular, for the noiseless setting, we prove tight upper and lower bounds for the prediction error that match existing rates of order $\frac{\|w^*\|_1^{2/3}}{n^{1/3}}$ for general ground truths. To complete the picture, we show that when interpolating noisy observations, the error vanishes at a rate of order $\frac{1}{\sqrt{\log(d/n)}}$. We are therefore first to show benign overfitting for the maximum $\ell_1$-margin classifier.

相關內容

情景

關注 1

樣本復雜度 · FAST · 樣本 · 情景 · Markov ·

2023 年 3 月 14 日

Fast Rates for Maximum Entropy Exploration

Daniil Tiapkin,Denis Belomestny,Daniele Calandriello,Eric Moulines,Remi Munos,Alexey Naumov,Pierre Perrault,Yunhao Tang,Michal Valko,Pierre Menard

We consider the reinforcement learning (RL) setting, in which the agent has to act in unknown environment driven by a Markov Decision Process (MDP) with sparse or even reward free signals. In this situation, exploration becomes the main challenge. In this work, we study the maximum entropy exploration problem of two different types. The first type is visitation entropy maximization that was previously considered by Hazan et al. (2019) in the discounted setting. For this type of exploration, we propose an algorithm based on a game theoretic representation that has $\widetilde{\mathcal{O}}(H^3 S^2 A / \varepsilon^2)$ sample complexity thus improving the $\varepsilon$-dependence of Hazan et al. (2019), where $S$ is a number of states, $A$ is a number of actions, $H$ is an episode length, and $\varepsilon$ is a desired accuracy. The second type of entropy we study is the trajectory entropy. This objective function is closely related to the entropy-regularized MDPs, and we propose a simple modification of the UCBVI algorithm that has a sample complexity of order $\widetilde{\mathcal{O}}(1/\varepsilon)$ ignoring dependence in $S, A, H$. Interestingly enough, it is the first theoretical result in RL literature establishing that the exploration problem for the regularized MDPs can be statistically strictly easier (in terms of sample complexity) than for the ordinary MDPs.

列 · 近似 · 劃分 · 張成子空間 · 近似誤差 ·

2023 年 3 月 14 日

Asymptotically Sharp Upper Bound for the Column Subset Selection Problem

Jian-Feng Cai,Zhiqiang Xu,Zili Xu

This paper investigates the spectral norm version of the column subset selection problem. Given a matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ and a positive integer $k\leq\text{rank}(\mathbf{A})$, the objective is to select exactly $k$ columns of $\mathbf{A}$ that minimize the spectral norm of the residual matrix after projecting $\mathbf{A}$ onto the space spanned by the selected columns. We use the method of interlacing polynomials introduced by Marcus-Spielman-Srivastava to derive an asymptotically sharp upper bound on the minimal approximation error, and propose a deterministic polynomial-time algorithm that achieves this error bound (up to a computational error). Furthermore, we extend our result to a column partition problem in which the columns of $\mathbf{A}$ can be partitioned into $r\geq 2$ subsets such that $\mathbf{A}$ can be well approximated by subsets from various groups. We show that the machinery of interlacing polynomials also works in this context, and establish a connection between the relevant expected characteristic polynomials and the $r$-characteristic polynomials introduced by Ravichandran and Leake. As a consequence, we prove that the columns of a rank-$d$ matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ can be partitioned into $r$ subsets $S_1,\ldots S_r$, such that the column space of $\mathbf{A}$ can be well approximated by the span of the columns in the complement of $S_i$ for each $1\leq i\leq r$.

分解的 · 縮放 · 有偏 · Extensibility · 對數尺度 ·

2023 年 3 月 14 日

Empirical Bayes factors for common hypothesis tests

Frank Dudbridge

Bayes factors for composite hypotheses have difficulty in encoding vague prior knowledge, as improper priors cannot be used and objective priors may be subjectively unreasonable. To address these issues we revisit the posterior Bayes factor, in which the posterior distribution from the data at hand is re-used in the Bayes factor for the same data. We argue that this is biased when calibrated against proper Bayes factors, but propose adjustments to allow interpretation on the same scale. In the important case of a regular normal model, the bias in log scale is half the number of parameters. The resulting empirical Bayes factor is closely related to the widely applicable information criterion. We develop test-based empirical Bayes factors for several standard tests and propose an extension to multiple testing closely related to the optimal discovery procedure. For non-parametric tests the empirical Bayes factor is approximately 10 times the P-value. We propose interpreting the strength of Bayes factors on a logarithmic scale with base 3.73, reflecting the sharpest distinction between weaker and stronger belief. This provides an objective framework for interpreting statistical evidence, realising a Bayesian/frequentist compromise.

估計/估計量 · Networking · SimPLe · GROUP · 穩健性 ·

2023 年 3 月 13 日

Comparing the Robustness of Simple Network Scale-Up Method (NSUM) Estimators

Jessica P. Kunke,Ian Laga,Xiaoyue Niu,Tyler H. McCormick

from arxiv, Main paper 26 pages, 3 figures; supplement 11 pages, 5 figures

The network scale-up method (NSUM) is a cost-effective approach to estimating the size or prevalence of a group of people that is hard to reach through a standard survey. The basic NSUM involves two steps: estimating respondents' degrees by one of various methods (in this paper we focus on the probe group method which uses the number of people a respondent knows in various groups of known size), and estimating the prevalence of the hard-to-reach population of interest using respondents' estimated degrees and the number of people they report knowing in the hard-to-reach group. Each of these two steps involves taking either an average of ratios or a ratio of averages. Using the ratio of averages for each step has so far been the most common approach. However, we present theoretical arguments that using the average of ratios at the second, prevalence-estimation step often has lower mean squared error when a main model assumption is violated, which happens frequently in practice; this estimator which uses the ratio of averages for degree estimates and the average of ratios for prevalence was proposed early in NSUM development but has largely been unexplored and unused. Simulation results using an example network data set also support these findings. Based on this theoretical and empirical evidence, we suggest that future surveys that use a simple estimator may want to use this mixed estimator, and estimation methods based on this estimator may produce new improvements.

SGD · Weight · Shuffle · 分解的 · CASES ·

2023 年 3 月 13 日

Tighter Lower Bounds for Shuffling SGD: Random Permutations and Beyond

Jaeyoung Cha,Jaewook Lee,Chulhee Yun

from arxiv, 62 pages

We study convergence lower bounds of without-replacement stochastic gradient descent (SGD) for solving smooth (strongly-)convex finite-sum minimization problems. Unlike most existing results focusing on final iterate lower bounds in terms of the number of components $n$ and the number of epochs $K$, we seek bounds for arbitrary weighted average iterates that are tight in all factors including the condition number $\kappa$. For SGD with Random Reshuffling, we present lower bounds that have tighter $\kappa$ dependencies than existing bounds. Our results are the first to perfectly close the gap between lower and upper bounds for weighted average iterates in both strongly-convex and convex cases. We also prove weighted average iterate lower bounds for arbitrary permutation-based SGD, which apply to all variants that carefully choose the best permutation. Our bounds improve the existing bounds in factors of $n$ and $\kappa$ and thereby match the upper bounds shown for a recently proposed algorithm called GraB.

估計/估計量 · 最大后驗 · GM · 線性的 · 最大后驗估計 ·

2023 年 3 月 13 日

Maximum a Posteriori Estimation in Graphical Models Using Local Linear Approximation

Ksheera Sagar,Jyotishka Datta,Sayantan Banerjee,Anindya Bhadra

Sparse structure learning in high-dimensional Gaussian graphical models is an important problem in multivariate statistical signal processing; since the sparsity pattern naturally encodes the conditional independence relationship among variables. However, maximum a posteriori (MAP) estimation is challenging if the prior model admits multiple levels of hierarchy, and traditional numerical optimization routines or expectation--maximization algorithms are difficult to implement. To this end, our contribution is a novel local linear approximation scheme that circumvents this issue using a very simple computational algorithm. Most importantly, the conditions under which our algorithm is guaranteed to converge to the MAP estimate are explicitly derived and are shown to cover a broad class of completely monotone priors, including the graphical horseshoe. Further, the resulting MAP estimate is shown to be sparse and consistent in the $\ell_2$-norm. Numerical results validate the speed, scalability, and statistical performance of the proposed method.

估計/估計量 · INTERACT · 分解的 · 方陣 · MoDELS ·

2023 年 3 月 12 日

A semiparametric approach for interactive fixed effects panel data models

Georg Keilbar,Juan M. Rodriguez-Poo,Alexandra Soberon,Weining Wang

This paper presents a new approach for the estimation and inference of the regression parameters in a panel data model with interactive fixed effects. It relies on the assumption that the factor loadings can be expressed as an unknown smooth function of the time average of covariates plus an idiosyncratic error term. Compared to existing approaches, our estimator has a simple partial least squares form and does neither require iterative procedures nor the previous estimation of factors. We derive its asymptotic properties by finding out that the limiting distribution has a discontinuity, depending on the explanatory power of our basis functions which is expressed by the variance of the error of the factor loadings. As a result, the usual ``plug-in" methods based on estimates of the asymptotic covariance are only valid pointwise and may produce either over- or under-coverage probabilities. We show that uniformly valid inference can be achieved by using the cross-sectional bootstrap. A Monte Carlo study indicates good performance in terms of mean squared error. We apply our methodology to analyze the determinants of growth rates in OECD countries.

MASS · 優化器 · 輸入分布 · INFORMS · Analysis ·

2023 年 3 月 10 日

A novel notion of barycenter for probability distributions based on optimal weak mass transport

Elsa Cazelles,Felipe Tobar,Joaquín Fontbona

We introduce weak barycenters of a family of probability distributions, based on the recently developed notion of optimal weak transport of mass by Gozlanet al. (2017) and Backhoff-Veraguas et al. (2020). We provide a theoretical analysis of this object and discuss its interpretation in the light of convex ordering between probability measures. In particular, we show that, rather than averaging the input distributions in a geometric way (as the Wasserstein barycenter based on classic optimal transport does) weak barycenters extract common geometric information shared by all the input distributions, encoded as a latent random variable that underlies all of them. We also provide an iterative algorithm to compute a weak barycenter for a finite family of input distributions, and a stochastic algorithm that computes them for arbitrary populations of laws. The latter approach is particularly well suited for the streaming setting, i.e., when distributions are observed sequentially. The notion of weak barycenter and our approaches to compute it are illustrated on synthetic examples, validated on 2D real-world data and compared to standard Wasserstein barycenters.

正則化項 · MoDELS · Principle · 近似 · 噪聲 ·

2023 年 3 月 10 日

Energy regularized models for logarithmic SPDEs and their numerical approximations

Jianbo Cui,Dianming Hou,Zhonghua Qiao

from arxiv, 26 pages, 5 figures

Understanding the properties of the stochastic phase field models is crucial to model processes in several practical applications, such as soft matters and phase separation in random environments. To describe such random evolution, this work proposes and studies two mathematical models and their numerical approximations for parabolic stochastic partial differential equation (SPDE) with a logarithmic Flory--Huggins energy potential. These multiscale models are built based on a regularized energy technique and thus avoid possible singularities of coefficients. According to the large deviation principle, we show that the limit of the proposed models with small noise naturally recovers the classical dynamics in deterministic case. Moreover, when the driving noise is multiplicative, the Stampacchia maximum principle holds which indicates the robustness of the proposed model. One of the main advantages of the proposed models is that they can admit the energy evolution law and asymptotically preserve the Stampacchia maximum bound of the original problem. To numerically capture these asymptotic behaviors, we investigate the semi-implicit discretizations for regularized logrithmic SPDEs. Several numerical results are presented to verify our theoretical findings.

異常檢測 · 圖卷積 · Extensibility · 噪聲 · 標注 ·

2019 年 3 月 18 日

Graph Convolutional Label Noise Cleaner: Train a Plug-and-play Action Classifier for Anomaly Detection

Jia-Xing Zhong,Nannan Li,Weijie Kong,Shan Liu,Thomas H. Li,Ge Li

from arxiv, To appear in CVPR 2019

Video anomaly detection under weak labels is formulated as a typical multiple-instance learning problem in previous works. In this paper, we provide a new perspective, i.e., a supervised learning task under noisy labels. In such a viewpoint, as long as cleaning away label noise, we can directly apply fully supervised action classifiers to weakly supervised anomaly detection, and take maximum advantage of these well-developed classifiers. For this purpose, we devise a graph convolutional network to correct noisy labels. Based upon feature similarity and temporal consistency, our network propagates supervisory signals from high-confidence snippets to low-confidence ones. In this manner, the network is capable of providing cleaned supervision for action classifiers. During the test phase, we only need to obtain snippet-wise predictions from the action classifier without any extra post-processing. Extensive experiments on 3 datasets at different scales with 2 types of action classifiers demonstrate the efficacy of our method. Remarkably, we obtain the frame-level AUC score of 82.12% on UCF-Crime.