亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Approximate Message Passing (AMP) algorithms are a class of iterative procedures for computationally-efficient estimation in high-dimensional inference and estimation tasks. Due to the presence of an 'Onsager' correction term in its iterates, for $N \times M$ design matrices $\mathbf{A}$ with i.i.d. Gaussian entries, the asymptotic distribution of the estimate at any iteration of the algorithm can be exactly characterized in the large system limit as $M/N \rightarrow \delta \in (0, \infty)$ via a scalar recursion referred to as state evolution. In this paper, we show that appropriate functionals of the iterates, in fact, concentrate around their limiting values predicted by these asymptotic distributions with rates exponentially fast in $N$ for a large class of AMP-style algorithms, including those that are used when high-dimensional generalized linear regression models are assumed to be the data-generating process, like the generalized AMP algorithm, or those that are used when the measurement matrix is assumed to be right rotationally invariant instead of i.i.d. Gaussian, like vector AMP and generalized vector AMP. In practice, these more general AMP algorithms have many applications, for example in in communications or imaging, and this work provides the first study of finite sample behavior of such algorithms.

相關內容

Probabilistic models based on continuous latent spaces, such as variational autoencoders, can be understood as uncountable mixture models where components depend continuously on the latent code. They have proven to be expressive tools for generative and probabilistic modelling, but are at odds with tractable probabilistic inference, that is, computing marginals and conditionals of the represented probability distribution. Meanwhile, tractable probabilistic models such as probabilistic circuits (PCs) can be understood as hierarchical discrete mixture models, and thus are capable of performing exact inference efficiently but often show subpar performance in comparison to continuous latent-space models. In this paper, we investigate a hybrid approach, namely continuous mixtures of tractable models with a small latent dimension. While these models are analytically intractable, they are well amenable to numerical integration schemes based on a finite set of integration points. With a large enough number of integration points the approximation becomes de-facto exact. Moreover, for a finite set of integration points, the integration method effectively compiles the continuous mixture into a standard PC. In experiments, we show that this simple scheme proves remarkably effective, as PCs learnt this way set new state of the art for tractable models on many standard density estimation benchmarks.

Tyler's and Maronna's M-estimators, as well as their regularized variants, are popular robust methods to estimate the scatter or covariance matrix of a multivariate distribution. In this work, we study the non-asymptotic behavior of these estimators, for data sampled from a distribution that satisfies one of the following properties: 1) independent sub-Gaussian entries, up to a linear transformation; 2) log-concave distributions; 3) distributions satisfying a convex concentration property. Our main contribution is the derivation of tight non-asymptotic concentration bounds of these M-estimators around a suitably scaled version of the data sample covariance matrix. Prior to our work, non-asymptotic bounds were derived only for Elliptical and Gaussian distributions. Our proof uses a variety of tools from non asymptotic random matrix theory and high dimensional geometry. Finally, we illustrate the utility of our results on two examples of practical interest: sparse covariance and sparse precision matrix estimation.

Westling and Carone (2020) proposed a framework for studying the large sample distributional properties of generalized Grenander-type estimators, a versatile class of nonparametric estimators of monotone functions. The limiting distribution of those estimators is representable as the left derivative of the greatest convex minorant of a Gaussian process whose covariance kernel can be complicated and whose monomial mean can be of unknown order (when the degree of flatness of the function of interest is unknown). The standard nonparametric bootstrap is unable to consistently approximate the large sample distribution of the generalized Grenander-type estimators even if the monomial order of the mean is known, making statistical inference a challenging endeavour in applications. To address this inferential problem, we present a bootstrap-assisted inference procedure for generalized Grenander-type estimators. The procedure relies on a carefully crafted, yet automatic, transformation of the estimator. Moreover, our proposed method can be made ``flatness robust" in the sense that it can be made adaptive to the (possibly unknown) degree of flatness of the function of interest. The method requires only the consistent estimation of a single scalar quantity, for which we propose an automatic procedure based on numerical derivative estimation and the generalized jackknife. Under random sampling, our inference method can be implemented using a computationally attractive exchangeable bootstrap procedure. We illustrate our methods with examples and we also provide a small simulation study. The development of formal results is made possible by some technical results that may be of independent interest.

As neural networks increasingly make critical decisions in high-stakes settings, monitoring and explaining their behavior in an understandable and trustworthy manner is a necessity. One commonly used type of explainer is post hoc feature attribution, a family of methods for giving each feature in an input a score corresponding to its influence on a model's output. A major limitation of this family of explainers in practice is that they can disagree on which features are more important than others. Our contribution in this paper is a method of training models with this disagreement problem in mind. We do this by introducing a Post hoc Explainer Agreement Regularization (PEAR) loss term alongside the standard term corresponding to accuracy, an additional term that measures the difference in feature attribution between a pair of explainers. We observe on three datasets that we can train a model with this loss term to improve explanation consensus on unseen data, and see improved consensus between explainers other than those used in the loss term. We examine the trade-off between improved consensus and model performance. And finally, we study the influence our method has on feature attribution explanations.

This paper considers estimating functional-coefficient models in panel quantile regression with individual effects, allowing the cross-sectional and temporal dependence for large panel observations. A latent group structure is imposed on the heterogenous quantile regression models so that the number of nonparametric functional coefficients to be estimated can be reduced considerably. With the preliminary local linear quantile estimates of the subject-specific functional coefficients, a classic agglomerative clustering algorithm is used to estimate the unknown group structure and an easy-to-implement ratio criterion is proposed to determine the group number. The estimated group number and structure are shown to be consistent. Furthermore, a post-grouping local linear smoothing method is introduced to estimate the group-specific functional coefficients, and the relevant asymptotic normal distribution theory is derived with a normalisation rate comparable to that in the literature. The developed methodologies and theory are verified through a simulation study and showcased with an application to house price data from UK local authority districts, which reveals different homogeneity structures at different quantile levels.

Our goal is to develop a general strategy to decompose a random variable $X$ into multiple independent random variables, without sacrificing any information about unknown parameters. A recent paper showed that for some well-known natural exponential families, $X$ can be "thinned" into independent random variables $X^{(1)}, \ldots, X^{(K)}$, such that $X = \sum_{k=1}^K X^{(k)}$. In this paper, we generalize their procedure by relaxing this summation requirement and simply asking that some known function of the independent random variables exactly reconstruct $X$. This generalization of the procedure serves two purposes. First, it greatly expands the families of distributions for which thinning can be performed. Second, it unifies sample splitting and data thinning, which on the surface seem to be very different, as applications of the same principle. This shared principle is sufficiency. We use this insight to perform generalized thinning operations for a diverse set of families.

The classical non-greedy algorithm (NGA) and the recently proposed proximal alternating minimization method with extrapolation (PAMe) for $L_1$-norm PCA are revisited and their finite-step convergence are studied. It is first shown that NGA can be interpreted as a conditional subgradient or an alternating maximization method. By recognizing it as a conditional subgradient, we prove that the iterative points generated by the algorithm will be constant in finitely many steps under a certain full-rank assumption; such an assumption can be removed when the projection dimension is one. By treating the algorithm as an alternating maximization, we then prove that the objective value will be fixed after at most $\left\lceil\frac{F^{\max}}{\tau_0} \right\rceil$ steps, where the stopping point satisfies certain optimality conditions. Then, a slight modification of NGA with improved convergence properties is analyzed. It is shown that the iterative points generated by the modified algorithm will not change after at most $\left\lceil\frac{2F^{\max}}{\tau} \right\rceil$ steps; furthermore, the stopping point satisfies certain optimality conditions if the proximal parameter $\tau$ is small enough. For PAMe, it is proved that the sign variable will remain constant after finitely many steps and the algorithm can output a point satisfying certain optimality condition, if the parameters are small enough and a full rank assumption is satisfied. Moreover, if there is no proximal term on the projection matrix related subproblem, then the iterative points generated by this modified algorithm will not change after at most $\left\lceil \frac{4F^{\max}}{\tau(1-\gamma)} \right\rceil$ steps and the stopping point also satisfies certain optimality conditions, provided similar assumptions as those for PAMe. The full rank assumption can be removed when the projection dimension is one.

Statistical inferences for high-dimensional regression models have been extensively studied for their wide applications ranging from genomics, neuroscience, to economics. However, in practice, there are often potential unmeasured confounders associated with both the response and covariates, which can lead to invalidity of standard debiasing methods. This paper focuses on a generalized linear regression framework with hidden confounding and proposes a debiasing approach to address this high-dimensional problem, by adjusting for the effects induced by the unmeasured confounders. We establish consistency and asymptotic normality for the proposed debiased estimator. The finite sample performance of the proposed method is demonstrated through extensive numerical studies and an application to a genetic data set.

When data is collected in an adaptive manner, even simple methods like ordinary least squares can exhibit non-normal asymptotic behavior. As an undesirable consequence, hypothesis tests and confidence intervals based on asymptotic normality can lead to erroneous results. We propose a family of online debiasing estimators to correct these distributional anomalies in least squares estimation. Our proposed methods take advantage of the covariance structure present in the dataset and provide sharper estimates in directions for which more information has accrued. We establish an asymptotic normality property for our proposed online debiasing estimators under mild conditions on the data collection process and provide asymptotically exact confidence intervals. We additionally prove a minimax lower bound for the adaptive linear regression problem, thereby providing a baseline by which to compare estimators. There are various conditions under which our proposed estimators achieve the minimax lower bound. We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.

Classic algorithms and machine learning systems like neural networks are both abundant in everyday life. While classic computer science algorithms are suitable for precise execution of exactly defined tasks such as finding the shortest path in a large graph, neural networks allow learning from data to predict the most likely answer in more complex tasks such as image classification, which cannot be reduced to an exact algorithm. To get the best of both worlds, this thesis explores combining both concepts leading to more robust, better performing, more interpretable, more computationally efficient, and more data efficient architectures. The thesis formalizes the idea of algorithmic supervision, which allows a neural network to learn from or in conjunction with an algorithm. When integrating an algorithm into a neural architecture, it is important that the algorithm is differentiable such that the architecture can be trained end-to-end and gradients can be propagated back through the algorithm in a meaningful way. To make algorithms differentiable, this thesis proposes a general method for continuously relaxing algorithms by perturbing variables and approximating the expectation value in closed form, i.e., without sampling. In addition, this thesis proposes differentiable algorithms, such as differentiable sorting networks, differentiable renderers, and differentiable logic gate networks. Finally, this thesis presents alternative training strategies for learning with algorithms.

北京阿比特科技有限公司