夏娃韩剧电视剧在剧免费韩剧TV,无码人妻丰满熟妇A片护士M,久久精品国产V曰韩V亚洲

We consider a stationary linear AR($p$) model with unknown mean. The autoregression parameters as well as the distribution function (d.f.) $G$ of innovations are unknown. The observations contain gross errors (outliers). The distribution of outliers is unknown and arbitrary, their intensity is $\gamma n^{-1/2}$ with an unknown $\gamma$, $n$ is the sample size. The assential problem in such situation is to test the normality of innovations. Normality, as is known, ensures the optimality properties of widely used least squares procedures. To construct and study a Pearson chi-square type test for normality we estimate the unknown mean and the autoregression parameters. Then, using the estimates, we find the residuals in the autoregression. Based on them, we construct a kind of empirical distribution function (r.e.d.f.) , which is a counterpart of the (inaccessible) e.d.f. of the autoregression innovations. Our Pearson's satatistic is the functional from r.e.d.f. Its asymptotic distributions under the hypothesis and the local alternatives are determined by the asymptotic behavior of r.e.d.f. %Therefore, the study of the asymptotic properties of r.e.d.f. is a natural and meaningful task. In the present work, we find and substantiate in details the stochastic expansions of the r.e.d.f. in two situations. In the first one d.f. $ G (x) $ of innovations does not depend on $ n $. We need this result to investigate test statistic under the hypothesis. In the second situation $ G (x) $ depends on $ n $ and has the form of a mixture $ G (x) = A_n (x) = (1-n ^ {- 1/2}) G_0 (x) + n ^ { -1/2} H (x). $ We need this result to study the power of test under the local alternatives.

相關內容

經驗分布

關注 0

離散化 · Continuity · 鏈式法則 · Lipschitz · 估計/估計量 ·

2021 年 10 月 11 日

The One Step Malliavin scheme: new discretization of BSDEs implemented with deep learning regressions

Balint Negyesi,Kristoffer Andersson,Cornelis W. Oosterlee

from arxiv, 31 pages, 4 figures, 2 tables

A novel discretization is presented for forward-backward stochastic differential equations (FBSDE) with differentiable coefficients, simultaneously solving the BSDE and its Malliavin sensitivity problem. The control process is estimated by the corresponding linear BSDE driving the trajectories of the Malliavin derivatives of the solution pair, which implies the need to provide accurate $\Gamma$ estimates. The approximation is based on a merged formulation given by the Feynman-Kac formulae and the Malliavin chain rule. The continuous time dynamics is discretized with a theta-scheme. In order to allow for an efficient numerical solution of the arising semi-discrete conditional expectations in possibly high-dimensions, it is fundamental that the chosen approach admits to differentiable estimates. Two fully-implementable schemes are considered: the BCOS method as a reference in the one-dimensional framework and neural network Monte Carlo regressions in case of high-dimensional problems, similarly to the recently emerging class of Deep BSDE methods [Han et al. (2018), Hur\'e et al. (2020)]. An error analysis is carried out to show $L^2$ convergence of order $1/2$, under standard Lipschitz assumptions and additive noise in the forward diffusion. Numerical experiments are provided for a range of different semi- and quasi-linear equations up to $50$ dimensions, demonstrating that the proposed scheme yields a significant improvement in the control estimations.

線性模型 · 自編碼器 · MoDELS · 線性的 · 聯系函數 ·

2021 年 10 月 11 日

A Generalised Linear Model Framework for $β$-Variational Autoencoders based on Exponential Dispersion Families

Robert Sicks,Ralf Korn,Stefanie Schwaar

from arxiv, //jmlr.org/papers/v22/21-0037.html

Although variational autoencoders (VAE) are successfully used to obtain meaningful low-dimensional representations for high-dimensional data, the characterization of critical points of the loss function for general observation models is not fully understood. We introduce a theoretical framework that is based on a connection between $\beta$-VAE and generalized linear models (GLM). The equality between the activation function of a $\beta$-VAE and the inverse of the link function of a GLM enables us to provide a systematic generalization of the loss analysis for $\beta$-VAE based on the assumption that the observation model distribution belongs to an exponential dispersion family (EDF). As a result, we can initialize $\beta$-VAE nets by maximum likelihood estimates (MLE) that enhance the training performance on both synthetic and real world data sets. As a further consequence, we analytically describe the auto-pruning property inherent in the $\beta$-VAE objective and reason for posterior collapse.

估計/估計量 · 最大似然估計 · 極大似然 · 似然 · 極大似然估計 ·

2021 年 10 月 9 日

On the benefits of maximum likelihood estimation for Regression and Forecasting

Pranjal Awasthi,Abhimanyu Das,Rajat Sen,Ananda Theertha Suresh

We advocate for a practical Maximum Likelihood Estimation (MLE) approach towards designing loss functions for regression and forecasting, as an alternative to the typical approach of direct empirical risk minimization on a specific target metric. The MLE approach is better suited to capture inductive biases such as prior domain knowledge in datasets, and can output post-hoc estimators at inference time that can optimize different types of target metrics. We present theoretical results to demonstrate that our approach is competitive with any estimator for the target metric under some general conditions. In two example practical settings, Poisson and Pareto regression, we show that our competitive results can be used to prove that the MLE approach has better excess risk bounds than directly minimizing the target metric. We also demonstrate empirically that our method instantiated with a well-designed general purpose mixture likelihood family can obtain superior performance for a variety of tasks across time-series forecasting and regression datasets with different data distributions.

MoDELS · CASE · COVID-19 · 再參數化/重參數化 · 最大似然估計 ·

2021 年 10 月 9 日

A parametric quantile beta regression for modeling case fatality rates of COVID-19

Marcelo Bourguignon,Diego I. Gallardo,Helton Saulo

from arxiv, 30 pages, 8 figures

Motivated by the case fatality rate (CFR) of COVID-19, in this paper, we develop a fully parametric quantile regression model based on the generalized three-parameter beta (GB3) distribution. Beta regression models are primarily used to model rates and proportions. However, these models are usually specified in terms of a conditional mean. Therefore, they may be inadequate if the observed response variable follows an asymmetrical distribution, such as CFR data. In addition, beta regression models do not consider the effect of the covariates across the spectrum of the dependent variable, which is possible through the conditional quantile approach. In order to introduce the proposed GB3 regression model, we first reparameterize the GB3 distribution by inserting a quantile parameter and then we develop the new proposed quantile model. We also propose a simple interpretation of the predictor-response relationship in terms of percentage increases/decreases of the quantile. A Monte Carlo study is carried out for evaluating the performance of the maximum likelihood estimates and the choice of the link functions. Finally, a real COVID-19 dataset from Chile is analyzed and discussed to illustrate the proposed approach.

INFORMS · SimPLe · CC · 貝葉斯推斷 · Better ·

2021 年 10 月 8 日

Rule-based Bayesian regression

Themistoklis Botsas,Lachlan R. Mason,Indranil Pan

We introduce a novel rule-based approach for handling regression problems. The new methodology carries elements from two frameworks: (i) it provides information about the uncertainty of the parameters of interest using Bayesian inference, and (ii) it allows the incorporation of expert knowledge through rule-based systems. The blending of those two different frameworks can be particularly beneficial for various domains (e.g. engineering), where, even though the significance of uncertainty quantification motivates a Bayesian approach, there is no simple way to incorporate researcher intuition into the model. We validate our models by applying them to synthetic applications: a simple linear regression problem and two more complex structures based on partial differential equations. Finally, we review the advantages of our methodology, which include the simplicity of the implementation, the uncertainty reduction due to the added information and, in some occasions, the derivation of better point predictions, and we address limitations, mainly from the computational complexity perspective, such as the difficulty in choosing an appropriate algorithm and the added computational burden.

SGD · Extensibility · 學成 · INFORMS · Integration ·

2021 年 10 月 8 日

Combining Differential Privacy and Byzantine Resilience in Distributed SGD

Rachid Guerraoui,Nirupam Gupta,Rafael Pinot,Sebastien Rouault,John Stephan

Privacy and Byzantine resilience (BR) are two crucial requirements of modern-day distributed machine learning. The two concepts have been extensively studied individually but the question of how to combine them effectively remains unanswered. This paper contributes to addressing this question by studying the extent to which the distributed SGD algorithm, in the standard parameter-server architecture, can learn an accurate model despite (a) a fraction of the workers being malicious (Byzantine), and (b) the other fraction, whilst being honest, providing noisy information to the server to ensure differential privacy (DP). We first observe that the integration of standard practices in DP and BR is not straightforward. In fact, we show that many existing results on the convergence of distributed SGD under Byzantine faults, especially those relying on $(\alpha,f)$-Byzantine resilience, are rendered invalid when honest workers enforce DP. To circumvent this shortcoming, we revisit the theory of $(\alpha,f)$-BR to obtain an approximate convergence guarantee. Our analysis provides key insights on how to improve this guarantee through hyperparameter optimization. Essentially, our theoretical and empirical results show that (1) an imprudent combination of standard approaches to DP and BR might be fruitless, but (2) by carefully re-tuning the learning algorithm, we can obtain reasonable learning accuracy while simultaneously guaranteeing DP and BR.

估計/估計量 · GROUP · 近似 · 線性的 · 有偏 ·

2021 年 10 月 7 日

Approximate Post-Selective Inference for Regression with the Group LASSO

Snigdha Panigrahi,Peter W. MacDonald,Daniel Kessler

from arxiv, 9 figures, 66 Pages

We develop a post-selective Bayesian framework to jointly and consistently estimate parameters in group-sparse linear regression models. After selection with the Group LASSO (or generalized variants such as the overlapping, sparse, or standardized Group LASSO), uncertainty estimates for the selected parameters are unreliable in the absence of adjustments for selection bias. Existing post-selective approaches are limited to uncertainty estimation for (i) real-valued projections onto very specific selected subspaces for the group-sparse problem, (ii) selection events categorized broadly as polyhedral events that are expressible as linear inequalities in the data variables. Our Bayesian methods address these gaps by deriving a likelihood adjustment factor, and an approximation thereof, that eliminates bias from selection. Paying a very nominal price for this adjustment, experiments on simulated data, and data from the Human Connectome Project demonstrate the efficacy of our methods for a joint estimation of group-sparse parameters and their uncertainties post selection.

似然 · 統計量 · 估計/估計量 · CASE · 列 ·

2021 年 10 月 7 日

Multiway empirical likelihood

Harold D Chiang,Yukitoshi Matsushita,Taisuke Otsu

from arxiv, 29 pages, 2 tables

This paper develops a general methodology to conduct statistical inference for observations indexed by multiple sets of entities. We propose a novel multiway empirical likelihood statistic that converges to a chi-square distribution under the non-degenerate case, where corresponding Hoeffding type decomposition is dominated by linear terms. Our methodology is related to the notion of jackknife empirical likelihood but the leave-out pseudo values are constructed by leaving columns or rows. We further develop a modified version of our multiway empirical likelihood statistic, which converges to a chi-square distribution regardless of the degeneracy, and discover its desirable higher-order property compared to the t-ratio by the conventional Eicker-White type variance estimator. The proposed methodology is illustrated by several important statistical problems, such as bipartite network, two-stage sampling, generalized estimating equations, and three-way observations.

可辨認的 · MoDELS · 潛在 · 圖 · 有向非循環圖 ·

2021 年 10 月 7 日

Identifiability of Hierarchical Latent Attribute Models

Yuqi Gu,Gongjun Xu

Hierarchical Latent Attribute Models (HLAMs) are a family of discrete latent variable models that are attracting increasing attention in educational, psychological, and behavioral sciences. The key ingredients of an HLAM include a binary structural matrix and a directed acyclic graph specifying hierarchical constraints on the configurations of latent attributes. These components encode practitioners' design information and carry important scientific meanings. Despite the popularity of HLAMs, the fundamental identifiability issue remains unaddressed. The existence of the attribute hierarchy graph leads to degenerate parameter space, and the potentially unknown structural matrix further complicates the identifiability problem. This paper addresses this issue of identifying the latent structure and model parameters underlying an HLAM. We develop sufficient and necessary identifiability conditions. These results directly and sharply characterize the different impacts on identifiability cast by different attribute types in the graph. The proposed conditions not only provide insights into diagnostic test designs under the attribute hierarchy, but also serve as tools to assess the validity of an estimated HLAM.

隨機梯度下降 · 規范化的 · Batch Size · 優化器 · 寬度 ·

2019 年 5 月 9 日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Daniel S. Park,Jascha Sohl-Dickstein,Quoc V. Le,Samuel L. Smith

from arxiv, 17 pages, 3 tables, 17 figures; accepted to ICML 2019

We investigate how the final parameters found by stochastic gradient descent are influenced by over-parameterization. We generate families of models by increasing the number of channels in a base network, and then perform a large hyper-parameter search to study how the test error depends on learning rate, batch size, and network width. We find that the optimal SGD hyper-parameters are determined by a "normalized noise scale," which is a function of the batch size, learning rate, and initialization conditions. In the absence of batch normalization, the optimal normalized noise scale is directly proportional to width. Wider networks, with their higher optimal noise scale, also achieve higher test accuracy. These observations hold for MLPs, ConvNets, and ResNets, and for two different parameterization schemes ("Standard" and "NTK"). We observe a similar trend with batch normalization for ResNets. Surprisingly, since the largest stable learning rate is bounded, the largest batch size consistent with the optimal normalized noise scale decreases as the width increases.