亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study the polynomial approximation of symmetric multivariate functions and of multi-set functions. Specifically, we consider $f(x_1, \dots, x_N)$, where $x_i \in \mathbb{R}^d$, and $f$ is invariant under permutations of its $N$ arguments. We demonstrate how these symmetries can be exploited to improve the cost versus error ratio in a polynomial approximation of the function $f$, and in particular study the dependence of that ratio on $d, N$ and the polynomial degree. These results are then used to construct approximations and prove approximation rates for functions defined on multi-sets where $N$ becomes a parameter of the input.

相關內容

Empirical studies of the loss landscape of deep networks have revealed that many local minima are connected through low-loss valleys. Yet, little is known about the theoretical origin of such valleys. We present a general framework for finding continuous symmetries in the parameter space, which carve out low-loss valleys. Our framework uses equivariances of the activation functions and can be applied to different layer architectures. To generalize this framework to nonlinear neural networks, we introduce a novel set of nonlinear, data-dependent symmetries. These symmetries can transform a trained model such that it performs similarly on new samples, which allows ensemble building that improves robustness under certain adversarial attacks. We then show that conserved quantities associated with linear symmetries can be used to define coordinates along low-loss valleys. The conserved quantities help reveal that using common initialization methods, gradient flow only explores a small part of the global minimum. By relating conserved quantities to convergence rate and sharpness of the minimum, we provide insights on how initialization impacts convergence and generalizability.

This paper considers estimating functional-coefficient models in panel quantile regression with individual effects, allowing the cross-sectional and temporal dependence for large panel observations. A latent group structure is imposed on the heterogenous quantile regression models so that the number of nonparametric functional coefficients to be estimated can be reduced considerably. With the preliminary local linear quantile estimates of the subject-specific functional coefficients, a classic agglomerative clustering algorithm is used to estimate the unknown group structure and an easy-to-implement ratio criterion is proposed to determine the group number. The estimated group number and structure are shown to be consistent. Furthermore, a post-grouping local linear smoothing method is introduced to estimate the group-specific functional coefficients, and the relevant asymptotic normal distribution theory is derived with a normalisation rate comparable to that in the literature. The developed methodologies and theory are verified through a simulation study and showcased with an application to house price data from UK local authority districts, which reveals different homogeneity structures at different quantile levels.

Models with intractable normalising functions have numerous applications. Because the normalising constants are functions of the parameters of interest, standard Markov chain Monte Carlo cannot be used for Bayesian inference for these models. Many algorithms have been developed for such models. Some have the posterior distribution as the asymptotic distribution. Other ``asymptotically inexact'' algorithms do not possess this property. There is limited guidance for evaluating approximations based on these algorithms. We propose two new diagnostics that address these problems. We provide theoretical justification for our methods and apply them to several algorithms in the context of challenging examples.

In the last few years, many works have tried to explain the predictions of deep learning models. Few methods, however, have been proposed to verify the accuracy or faithfulness of these explanations. Recently, influence functions, which is a method that approximates the effect that leave-one-out training has on the loss function, has been shown to be fragile. The proposed reason for their fragility remains unclear. Although previous work suggests the use of regularization to increase robustness, this does not hold in all cases. In this work, we seek to investigate the experiments performed in the prior work in an effort to understand the underlying mechanisms of influence function fragility. First, we verify influence functions using procedures from the literature under conditions where the convexity assumptions of influence functions are met. Then, we relax these assumptions and study the effects of non-convexity by using deeper models and more complex datasets. Here, we analyze the key metrics and procedures that are used to validate influence functions. Our results indicate that the validation procedures may cause the observed fragility.

A fundamental problem in quantum physics is to encode functions that are completely anti-symmetric under permutations of identical particles. The Barron space consists of high-dimensional functions that can be parameterized by infinite neural networks with one hidden layer. By explicitly encoding the anti-symmetric structure, we prove that the anti-symmetric functions which belong to the Barron space can be efficiently approximated with sums of determinants. This yields a factorial improvement in complexity compared to the standard representation in the Barron space and provides a theoretical explanation for the effectiveness of determinant-based architectures in ab-initio quantum chemistry.

Modern machine learning classifiers often exhibit vanishing classification error on the training set. They achieve this by learning nonlinear representations of the inputs that maps the data into linearly separable classes. Motivated by these phenomena, we revisit high-dimensional maximum margin classification for linearly separable data. We consider a stylized setting in which data $(y_i,{\boldsymbol x}_i)$, $i\le n$ are i.i.d. with ${\boldsymbol x}_i\sim\mathsf{N}({\boldsymbol 0},{\boldsymbol \Sigma})$ a $p$-dimensional Gaussian feature vector, and $y_i \in\{+1,-1\}$ a label whose distribution depends on a linear combination of the covariates $\langle {\boldsymbol \theta}_*,{\boldsymbol x}_i \rangle$. While the Gaussian model might appear extremely simplistic, universality arguments can be used to show that the results derived in this setting also apply to the output of certain nonlinear featurization maps. We consider the proportional asymptotics $n,p\to\infty$ with $p/n\to \psi$, and derive exact expressions for the limiting generalization error. We use this theory to derive two results of independent interest: $(i)$ Sufficient conditions on $({\boldsymbol \Sigma},{\boldsymbol \theta}_*)$ for `benign overfitting' that parallel previously derived conditions in the case of linear regression; $(ii)$ An asymptotically exact expression for the generalization error when max-margin classification is used in conjunction with feature vectors produced by random one-layer neural networks.

We study the complexity of high-dimensional approximation in the $L_2$-norm when different classes of information are available; we compare the power of function evaluations with the power of arbitrary continuous linear measurements. Here, we discuss the situation when the number of linear measurements required to achieve an error $\varepsilon \in (0,1)$ in dimension $d\in\mathbb{N}$ depends only poly-logarithmically on $\varepsilon^{-1}$. This corresponds to an exponential order of convergence of the approximation error, which often happens in applications. However, it does not mean that the high-dimensional approximation problem is easy, the main difficulty usually lies within the dependence on the dimension $d$. We determine to which extent the required amount of information changes, if we allow only function evaluation instead of arbitrary linear information. It turns out that in this case we only lose very little, and we can even restrict to linear algorithms. In particular, several notions of tractability hold simultaneously for both types of available information.

Leximin is a common approach to multi-objective optimization, frequently employed in fair division applications. In leximin optimization, one first aims to maximize the smallest objective value; subject to this, one maximizes the second-smallest objective; and so on. Often, even the single-objective problem of maximizing the smallest value cannot be solved accurately, e.g. due to computational hardness. What can we hope to accomplish for leximin optimization in this situation? Recently, Henzinger et al (2022) defined a notion of approximate leximin optimality, and showed how it can be computed for the problem of representative cohort selection. However, their definition considers only an additive approximation in the single-objective problem. In this work, we define approximate leximin optimality allowing approximations that have multiplicative and additive errors. We show how to compute a solution that approximates leximin in polynomial time, using an oracle that finds an approximation to the single-objective problem. The approximation factors of the algorithms are closely related: a factor of $(\alpha,\epsilon)$ for the single-objective problem translates into a factor of $\left(\frac{\alpha^2}{1-\alpha + \alpha^2}, \frac{\alpha(2-\alpha)\epsilon}{1-\alpha +\alpha^2}\right)$ for the multi-objective leximin problem, regardless of the number of objectives. As a usage example, we apply our algorithm to the problem of stochastic allocations of indivisible goods. For this problem, assuming sub-modular objectives functions, the single-objective egalitarian welfare can be approximated, with only a multiplicative error, to an optimal $1-\frac{1}{e}\approx 0.632$ factor w.h.p. We show how to extend the approximation to leximin, over all the objective functions, to a multiplicative factor of $\frac{(e-1)^2}{e^2-e+1}\approx 0.52$ w.h.p or $\frac{1}{3}$ deterministically.

We study the problem of zeroth-order (black-box) optimization of a Lipschitz function $f$ defined on a compact subset $\mathcal X$ of $\mathbb R^d$, with the additional constraint that algorithms must certify the accuracy of their recommendations. We characterize the optimal number of evaluations of any Lipschitz function $f$ to find and certify an approximate maximizer of $f$ at accuracy $\varepsilon$. Under a weak assumption on $\mathcal X$, this optimal sample complexity is shown to be nearly proportional to the integral $\int_{\mathcal X} \mathrm{d}\boldsymbol x/( \max(f) - f(\boldsymbol x) + \varepsilon )^d$. This result, which was only (and partially) known in dimension $d=1$, solves an open problem dating back to 1991. In terms of techniques, our upper bound relies on a packing bound by Bouttier al. (2020) for the Piyavskii-Shubert algorithm that we link to the above integral. We also show that a certified version of the computationally tractable DOO algorithm matches these packing and integral bounds. Our instance-dependent lower bound differs from traditional worst-case lower bounds in the Lipschitz setting and relies on a local worst-case analysis that could likely prove useful for other learning tasks.

Generalized linear mixed models are powerful tools for analyzing clustered data, where the unknown parameters are classically (and most commonly) estimated by the maximum likelihood and restricted maximum likelihood procedures. However, since the likelihood based procedures are known to be highly sensitive to outliers, M-estimators have become popular as a means to obtain robust estimates under possible data contamination. In this paper, we prove that, for sufficiently smooth general loss functions defining the M-estimators in generalized linear mixed models, the tail probability of the deviation between the estimated and the true regression coefficients have an exponential bound. This implies an exponential rate of consistency of these M-estimators under appropriate assumptions, generalizing the existing exponential consistency results from univariate to multivariate responses. We have illustrated this theoretical result further for the special examples of the maximum likelihood estimator and the robust minimum density power divergence estimator, a popular example of model-based M-estimators, in the settings of linear and logistic mixed models, comparing it with the empirical rate of convergence through simulation studies.

北京阿比特科技有限公司