亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

A joint mix is a random vector with a constant component-wise sum. The dependence structure of a joint mix minimizes some common objectives such as the variance of the component-wise sum, and it is regarded as a concept of extremal negative dependence. In this paper, we explore the connection between the joint mix structure and popular notions of negative dependence in statistics, such as negative correlation dependence, negative orthant dependence and negative association. A joint mix is not always negatively dependent in any of the above senses, but some natural classes of joint mixes are. We derive various necessary and sufficient conditions for a joint mix to be negatively dependent, and study the compatibility of these notions. For identical marginal distributions, we show that a negatively dependent joint mix solves a multi-marginal optimal transport problem for quadratic cost under a novel setting of uncertainty. Analysis of this optimal transport problem with heterogeneous marginals reveals a trade-off between negative dependence and the joint mix structure.

相關內容

This paper characterizes the proximal operator of the piece-wise exponential function $1\!-\!e^{-|x|/\sigma}$ with a given shape parameter $\sigma\!>\!0$, which is a popular nonconvex surrogate of $\ell_0$-norm in support vector machines, zero-one programming problems, and compressed sensing, etc. Although Malek-Mohammadi et al. [IEEE Transactions on Signal Processing, 64(21):5657--5671, 2016] once worked on this problem, the expressions they derived were regrettably inaccurate. In a sense, it was lacking a case. Using the Lambert W function and an extensive study of the piece-wise exponential function, we have rectified the formulation of the proximal operator of the piece-wise exponential function in light of their work. We have also undertaken a thorough analysis of this operator. Finally, as an application in compressed sensing, an iterative shrinkage and thresholding algorithm (ISTA) for the piece-wise exponential function regularization problem is developed and fully investigated. A comparative study of ISTA with nine popular non-convex penalties in compressed sensing demonstrates the advantage of the piece-wise exponential penalty.

We consider a symmetric mixture of linear regressions with random samples from the pairwise comparison design, which can be seen as a noisy version of a type of Euclidean distance geometry problem. We analyze the expectation-maximization (EM) algorithm locally around the ground truth and establish that the sequence converges linearly, providing an $\ell_\infty$-norm guarantee on the estimation error of the iterates. Furthermore, we show that the limit of the EM sequence achieves the sharp rate of estimation in the $\ell_2$-norm, matching the information-theoretically optimal constant. We also argue through simulation that convergence from a random initialization is much more delicate in this setting, and does not appear to occur in general. Our results show that the EM algorithm can exhibit several unique behaviors when the covariate distribution is suitably structured.

Many multivariate data sets exhibit a form of positive dependence, which can either appear globally between all variables or only locally within particular subgroups. A popular notion of positive dependence that allows for localized positivity is positive association. In this work we introduce the notion of extremal positive association for multivariate extremes from threshold exceedances. Via a sufficient condition for extremal association, we show that extremal association generalizes extremal tree models. For H\"usler--Reiss distributions the sufficient condition permits a parametric description that we call the metric property. As the parameter of a H\"usler--Reiss distribution is a Euclidean distance matrix, the metric property relates to research in electrical network theory and Euclidean geometry. We show that the metric property can be localized with respect to a graph and study surrogate likelihood inference. This gives rise to a two-step estimation procedure for locally metrical H\"usler--Reiss graphical models. The second step allows for a simple dual problem, which is implemented via a gradient descent algorithm. Finally, we demonstrate our results on simulated and real data.

In Bayesian inference, a widespread technique to approximately sample from and compute statistics of a high-dimensional posterior is to use the Laplace approximation, a Gaussian proxy to the posterior. The Laplace approximation accuracy improves as sample size grows, but the question of how fast dimension $d$ can grow with sample size $n$ has not been fully resolved. Prior works have shown that $d^3\ll n$ is a sufficient condition for accuracy of the approximation. But by deriving the leading order contribution to the TV error, we show that $d^2\ll n$ is sufficient. We show for a logistic regression posterior that this growth condition is necessary.

Reinforcement learning algorithms commonly seek to optimize policies for solving one particular task. How should we explore an unknown dynamical system such that the estimated model allows us to solve multiple downstream tasks in a zero-shot manner? In this paper, we address this challenge, by developing an algorithm -- OPAX -- for active exploration. OPAX uses well-calibrated probabilistic models to quantify the epistemic uncertainty about the unknown dynamics. It optimistically -- w.r.t. to plausible dynamics -- maximizes the information gain between the unknown dynamics and state observations. We show how the resulting optimization problem can be reduced to an optimal control problem that can be solved at each episode using standard approaches. We analyze our algorithm for general models, and, in the case of Gaussian process dynamics, we give a sample complexity bound and show that the epistemic uncertainty converges to zero. In our experiments, we compare OPAX with other heuristic active exploration approaches on several environments. Our experiments show that OPAX is not only theoretically sound but also performs well for zero-shot planning on novel downstream tasks.

Stochastic optimization has found wide applications in minimizing objective functions in machine learning, which motivates a lot of theoretical studies to understand its practical success. Most of existing studies focus on the convergence of optimization errors, while the generalization analysis of stochastic optimization is much lagging behind. This is especially the case for nonconvex and nonsmooth problems often encountered in practice. In this paper, we initialize a systematic stability and generalization analysis of stochastic optimization on nonconvex and nonsmooth problems. We introduce novel algorithmic stability measures and establish their quantitative connection on the gap between population gradients and empirical gradients, which is then further extended to study the gap between the Moreau envelope of the empirical risk and that of the population risk. To our knowledge, these quantitative connection between stability and generalization in terms of either gradients or Moreau envelopes have not been studied in the literature. We introduce a class of sampling-determined algorithms, for which we develop bounds for three stability measures. Finally, we apply these discussions to derive error bounds for stochastic gradient descent and its adaptive variant, where we show how to achieve an implicit regularization by tuning the step sizes and the number of iterations.

The sensitivity of loss reserving techniques to outliers in the data or deviations from model assumptions is a well known challenge. It has been shown that the popular chain-ladder reserving approach is at significant risk to such aberrant observations in that reserve estimates can be significantly shifted in the presence of even one outlier. As a consequence the chain-ladder reserving technique is non-robust. In this paper we investigate the sensitivity of reserves and mean squared errors of prediction under Mack's Model (Mack, 1993). This is done through the derivation of impact functions which are calculated by taking the first derivative of the relevant statistic of interest with respect to an observation. We also provide and discuss the impact functions for quantiles when total reserves are assumed to be lognormally distributed. Additionally, comparisons are made between the impact functions for individual accident year reserves under Mack's Model and the Bornhuetter-Ferguson methodology. It is shown that the impact of incremental claims on these statistics of interest varies widely throughout a loss triangle and is heavily dependent on other cells in the triangle. Results are illustrated using data from a Belgian non-life insurer.

We consider the problem of discovering $K$ related Gaussian directed acyclic graphs (DAGs), where the involved graph structures share a consistent causal order and sparse unions of supports. Under the multi-task learning setting, we propose a $l_1/l_2$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models. We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order (or topological order) than separate estimations. Moreover, the joint estimator is able to recover non-identifiable DAGs, by estimating them together with some identifiable DAGs. Lastly, our analysis also shows the consistency of union support recovery of the structures. To allow practical implementation, we design a continuous optimization problem whose optimizer is the same as the joint estimator and can be approximated efficiently by an iterative algorithm. We validate the theoretical analysis and the effectiveness of the joint estimator in experiments.

This paper focuses on two fundamental tasks of graph analysis: community detection and node representation learning, which capture the global and local structures of graphs, respectively. In the current literature, these two tasks are usually independently studied while they are actually highly correlated. We propose a probabilistic generative model called vGraph to learn community membership and node representation collaboratively. Specifically, we assume that each node can be represented as a mixture of communities, and each community is defined as a multinomial distribution over nodes. Both the mixing coefficients and the community distribution are parameterized by the low-dimensional representations of the nodes and communities. We designed an effective variational inference algorithm which regularizes the community membership of neighboring nodes to be similar in the latent space. Experimental results on multiple real-world graphs show that vGraph is very effective in both community detection and node representation learning, outperforming many competitive baselines in both tasks. We show that the framework of vGraph is quite flexible and can be easily extended to detect hierarchical communities.

When labeled training data is scarce, a promising data augmentation approach is to generate visual features of unknown classes using their attributes. To learn the class conditional distribution of CNN features, these models rely on pairs of image features and class attributes. Hence, they can not make use of the abundance of unlabeled data samples. In this paper, we tackle any-shot learning problems i.e. zero-shot and few-shot, in a unified feature generating framework that operates in both inductive and transductive learning settings. We develop a conditional generative model that combines the strength of VAE and GANs and in addition, via an unconditional discriminator, learns the marginal feature distribution of unlabeled images. We empirically show that our model learns highly discriminative CNN features for five datasets, i.e. CUB, SUN, AWA and ImageNet, and establish a new state-of-the-art in any-shot learning, i.e. inductive and transductive (generalized) zero- and few-shot learning settings. We also demonstrate that our learned features are interpretable: we visualize them by inverting them back to the pixel space and we explain them by generating textual arguments of why they are associated with a certain label.

北京阿比特科技有限公司