女生喊疼男生越往里寨的免费观看-国产免费一区二区三区在线能观看

This paper is concerned with the asymptotic behavior in $\beta$-H\"older spaces and under $L^p$ losses of a Dirichlet kernel density estimator introduced by Aitchison & Lauder (1985) and studied theoretically by Ouimet & Tolosana-Delgado (2021). It is shown that the estimator is minimax when $p \in [1, 3)$ and $\beta \in (0, 2]$, and that it is never minimax when $p \in [4, \infty)$ or $\beta \in (2, \infty)$. These results rectify in a minor way and, more importantly, extend to all dimensions those already reported in the univariate case by Bertin & Klutchnikoff (2011).

相關內容

核密度估計

關注 0

神經語言模型 · 語言模型化 · 基于上下文的表示 · 近似 · MoDELS ·

2022 年 2 月 7 日

"Average" Approximates "First Principal Component"? An Empirical Analysis on Representations from Neural Language Models

Zihan Wang,Chengyu Dong,Jingbo Shang

Contextualized representations based on neural language models have furthered the state of the art in various NLP tasks. Despite its great success, the nature of such representations remains a mystery. In this paper, we present an empirical property of these representations -- "average" approximates "first principal component". Specifically, experiments show that the average of these representations shares almost the same direction as the first principal component of the matrix whose columns are these representations. We believe this explains why the average representation is always a simple yet strong baseline. Our further examinations show that this property also holds in more challenging scenarios, for example, when the representations are from a model right after its random initialization. Therefore, we conjecture that this property is intrinsic to the distribution of representations and not necessarily related to the input structure. We realize that these representations empirically follow a normal distribution for each dimension, and by assuming this is true, we demonstrate that the empirical property can be in fact derived mathematically.

極小點 · 邊 · 近似 · 情景 · 路徑 ·

2022 年 2 月 7 日

Approximation Algorithms for ROUND-UFP and ROUND-SAP

Debajyoti Kar,Arindam Khan,Andreas Wiese

from arxiv, 26 pages, 5 figures

We study ROUND-UFP and ROUND-SAP, two generalizations of the classical BIN PACKING problem that correspond to the unsplittable flow problem on a path (UFP) and the storage allocation problem (SAP), respectively. We are given a path with capacities on its edges and a set of tasks where for each task we are given a demand and a subpath. In ROUND-UFP, the goal is to find a packing of all tasks into a minimum number of copies (rounds) of the given path such that for each copy, the total demand of tasks on any edge does not exceed the capacity of the respective edge. In ROUND-SAP, the tasks are considered to be rectangles and the goal is to find a non-overlapping packing of these rectangles into a minimum number of rounds such that all rectangles lie completely below the capacity profile of the edges. We show that in contrast to BIN PACKING, both the problems do not admit an asymptotic polynomial-time approximation scheme (APTAS), even when all edge capacities are equal. However, for this setting, we obtain asymptotic $(2+\varepsilon)$-approximations for both problems. For the general case, we obtain an $O(\log\log n)$-approximation algorithm and an $O(\log\log\frac{1}{\delta})$-approximation under $(1+\delta)$-resource augmentation for both problems. For the intermediate setting of the no bottleneck assumption (i.e., the maximum task demand is at most the minimum edge capacity), we obtain absolute $12$- and asymptotic $(16+\varepsilon)$-approximation algorithms for ROUND-UFP and ROUND-SAP, respectively.

估計/估計量 · Weight · 無偏 · 統計量 · 規范化的 ·

2022 年 2 月 7 日

Estimation of the tail index of Pareto-type distributions using regularisation

E. Ocran,R. Minkah,G. Kallah-Dagadu,K. Doku-Amponsah

from arxiv, 24 pages

In this paper, we introduce reduced-bias estimators for the estimation of the tail index of a Pareto-type distribution. This is achieved through the use of a regularised weighted least squares with an exponential regression model for log-spacings of top order statistics. The asymptotic properties of the proposed estimators are investigated analytically and found to be asymptotically unbiased, consistent and normally distributed. Also, the finite sample behaviour of the estimators are studied through a simulations theory. The proposed estimators were found to yield low bias and MSE. In addition, the proposed estimators are illustrated through the estimation of the tail index of the underlying distribution of claims from the insurance industry.

估計/估計量 · MASS · 樣本 · 同分布的 · 均方誤差 ·

2022 年 2 月 6 日

Missing Mass Estimation from Sticky Channels

Prafulla Chandra,Andrew Thangaraj,Nived Rajaraman

Distribution estimation under error-prone or non-ideal sampling modelled as "sticky" channels have been studied recently motivated by applications such as DNA computing. Missing mass, the sum of probabilities of missing letters, is an important quantity that plays a crucial role in distribution estimation, particularly in the large alphabet regime. In this work, we consider the problem of estimation of missing mass, which has been well-studied under independent and identically distributed (i.i.d) sampling, in the case when sampling is "sticky". Precisely, we consider the scenario where each sample from an unknown distribution gets repeated a geometrically-distributed number of times. We characterise the minimax rate of Mean Squared Error (MSE) of estimating missing mass from such sticky sampling channels. An upper bound on the minimax rate is obtained by bounding the risk of a modified Good-Turing estimator. We derive a matching lower bound on the minimax rate by extending the Le Cam method.

經驗風險 · SSL · 未標記 · 線性回歸 · 線性的 ·

2022 年 2 月 5 日

Semi-Supervised Empirical Risk Minimization: Using unlabeled data to improve prediction

Oren Yuval,Saharon Rosset

from arxiv, 39 pages, 4 figures

We present a general methodology for using unlabeled data to design semi supervised learning (SSL) variants of the Empirical Risk Minimization (ERM) learning process. Focusing on generalized linear regression, we analyze of the effectiveness of our SSL approach in improving prediction performance. The key ideas are carefully considering the null model as a competitor, and utilizing the unlabeled data to determine signal-noise combinations where SSL outperforms both supervised learning and the null model. We then use SSL in an adaptive manner based on estimation of the signal and noise. In the special case of linear regression with Gaussian covariates, we prove that the non-adaptive SSL version is in fact not capable of improving on both the supervised estimator and the null model simultaneously, beyond a negligible O(1/n) term. On the other hand, the adaptive model presented in this work, can achieve a substantial improvement over both competitors simultaneously, under a variety of settings. This is shown empirically through extensive simulations, and extended to other scenarios, such as non-Gaussian covariates, misspecified linear regression, or generalized linear regression with non-linear link functions.

估計/估計量 · 矩 · 似然 · 規范化的 · Performer ·

2022 年 2 月 5 日

Culling the herd of moments with penalized empirical likelihood

Jinyuan Chang,Zhentao Shi,Jia Zhang

Models defined by moment conditions are at the center of structural econometric estimation, but economic theory is mostly agnostic about moment selection. While a large pool of valid moments can potentially improve estimation efficiency, in the meantime a few invalid ones may undermine consistency. This paper investigates the empirical likelihood estimation of these moment-defined models in high-dimensional settings. We propose a penalized empirical likelihood (PEL) estimation and establish its oracle property with consistent detection of invalid moments. The PEL estimator is asymptotically normally distributed, and a projected PEL procedure further eliminates its asymptotic bias and provides more accurate normal approximation to the finite sample behavior. Simulation exercises demonstrate excellent numerical performance of these methods in estimation and inference.

優化器 · 絕對多數投票 · 類標記 · GROUP · Performer ·

2022 年 2 月 5 日

One-Nearest-Neighbor Search is All You Need for Minimax Optimal Regression and Classification

J. Jon Ryu,Young-Han Kim

from arxiv, 25 pages, 2 figures

Recently, Qiao, Duan, and Cheng~(2019) proposed a distributed nearest-neighbor classification method, in which a massive dataset is split into smaller groups, each processed with a $k$-nearest-neighbor classifier, and the final class label is predicted by a majority vote among these groupwise class labels. This paper shows that the distributed algorithm with $k=1$ over a sufficiently large number of groups attains a minimax optimal error rate up to a multiplicative logarithmic factor under some regularity conditions, for both regression and classification problems. Roughly speaking, distributed 1-nearest-neighbor rules with $M$ groups has a performance comparable to standard $\Theta(M)$-nearest-neighbor rules. In the analysis, alternative rules with a refined aggregation method are proposed and shown to attain exact minimax optimal rates.

估計/估計量 · 泛函 · Gamma分布 · 近鄰 · 變換 ·

2022 年 2 月 5 日

Nearest neighbor density functional estimation from inverse Laplace transform

J. Jon Ryu,Shouvik Ganguly,Young-Han Kim,Yung-Kyun Noh,Daniel D. Lee

from arxiv, 43 pages, 4 figures. IEEE Transactions on Information Theory (to appear)

A new approach to $L_2$-consistent estimation of a general density functional using $k$-nearest neighbor distances is proposed, where the functional under consideration is in the form of the expectation of some function $f$ of the densities at each point. The estimator is designed to be asymptotically unbiased, using the convergence of the normalized volume of a $k$-nearest neighbor ball to a Gamma distribution in the large-sample limit, and naturally involves the inverse Laplace transform of a scaled version of the function $f.$ Some instantiations of the proposed estimator recover existing $k$-nearest neighbor based estimators of Shannon and R\'enyi entropies and Kullback--Leibler and R\'enyi divergences, and discover new consistent estimators for many other functionals such as logarithmic entropies and divergences. The $L_2$-consistency of the proposed estimator is established for a broad class of densities for general functionals, and the convergence rate in mean squared error is established as a function of the sample size for smooth, bounded densities.

UniFormer · 估計/估計量 · Extensibility · 近似 · Continuity ·

2022 年 2 月 4 日

Uniform tail estimates and $L^p(\mathbb{R}^N)$-convergence for finite-difference approximations of nonlinear diffusion equations

Félix del Teso,J?rgen Endal,Espen R. Jakobsen

from arxiv, 27 pages

We obtain new equitightness and $C([0,T];L^p(\mathbb{R}^N))$-convergence results for numerical approximations of generalized porous medium equations of the form $$ \partial_tu-\mathfrak{L}[\varphi(u)]=g\qquad\text{in $\mathbb{R}^N\times(0,T)$}, $$ where $\varphi:\mathbb{R}\to\mathbb{R}$ is continuous and nondecreasing, and $\mathfrak{L}$ is a local or nonlocal diffusion operator. Our results include slow diffusions, strongly degenerate Stefan problems, and fast diffusions above a critical exponent. These results improve the previous $C([0,T];L_{\text{loc}}^p(\mathbb{R}^N))$-convergence obtained in a series of papers on the topic by the authors. To have equitightness and global $L^p$-convergence, some additional restrictions on $\mathfrak{L}$ and $\varphi$ are needed. Most commonly used symmetric operators $\mathfrak{L}$ are still included: the Laplacian, fractional Laplacians, and other generators of symmetric L\'evy processes with some fractional moment. We also discuss extensions to nonlinear possibly strongly degenerate convection-diffusion equations.

采樣法 · 方差 · 圖形處理器 · INFORMS · 泛化理論 ·

2020 年 6 月 24 日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Weilin Cong,Rana Forsati,Mahmut Kandemir,Mehrdad Mahdavi

Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.