亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

It is well known that Empirical Risk Minimization (ERM) with squared loss may attain minimax suboptimal error rates (Birg\'e and Massart, 1993). The key message of this paper is that, under mild assumptions, the suboptimality of ERM must be due to large bias rather than variance. More precisely, in the bias-variance decomposition of the squared error of the ERM, the variance term necessarily enjoys the minimax rate. In the case of fixed design, we provide an elementary proof of this fact using the probabilistic method. Then, we prove this result for various models in the random design setting. In addition, we provide a simple proof of Chatterjee's admissibility theorem (Chatterjee, 2014, Theorem 1.4), which states that ERM cannot be ruled out as an optimal method, in the fixed design setting, and extend this result to the random design setting. We also show that our estimates imply stability of ERM, complementing the main result of Caponnetto and Rakhlin (2006) for non-Donsker classes. Finally, we show that for non-Donsker classes, there are functions close to the ERM, yet far from being almost-minimizers of the empirical loss, highlighting the somewhat irregular nature of the loss landscape.

相關內容

經驗風險(xian)最(zui)(zui)(zui)(zui)小(xiao)化(hua)(ERM)是統(tong)計學(xue)習理論中的(de)(de)一(yi)個原則,它定義了一(yi)系列學(xue)習算法,并用于給出(chu)其性能的(de)(de)理論界限。經驗風險(xian)最(zui)(zui)(zui)(zui)小(xiao)化(hua)的(de)(de)策略認(ren)為(wei),經驗風險(xian)最(zui)(zui)(zui)(zui)小(xiao)的(de)(de)模(mo)型是最(zui)(zui)(zui)(zui)優(you)(you)的(de)(de)模(mo)型。根據這一(yi)策略,按照經驗風險(xian)最(zui)(zui)(zui)(zui)小(xiao)化(hua)求最(zui)(zui)(zui)(zui)優(you)(you)模(mo)型就是求解最(zui)(zui)(zui)(zui)優(you)(you)化(hua)問題。

During recent years the interest of optimization and machine learning communities in high-probability convergence of stochastic optimization methods has been growing. One of the main reasons for this is that high-probability complexity bounds are more accurate and less studied than in-expectation ones. However, SOTA high-probability non-asymptotic convergence results are derived under strong assumptions such as the boundedness of the gradient noise variance or of the objective's gradient itself. In this paper, we propose several algorithms with high-probability convergence results under less restrictive assumptions. In particular, we derive new high-probability convergence results under the assumption that the gradient/operator noise has bounded central $\alpha$-th moment for $\alpha \in (1,2]$ in the following setups: (i) smooth non-convex / Polyak-Lojasiewicz / convex / strongly convex / quasi-strongly convex minimization problems, (ii) Lipschitz / star-cocoercive and monotone / quasi-strongly monotone variational inequalities. These results justify the usage of the considered methods for solving problems that do not fit standard functional classes studied in stochastic optimization.

Stochastic optimization has found wide applications in minimizing objective functions in machine learning, which motivates a lot of theoretical studies to understand its practical success. Most of existing studies focus on the convergence of optimization errors, while the generalization analysis of stochastic optimization is much lagging behind. This is especially the case for nonconvex and nonsmooth problems often encountered in practice. In this paper, we initialize a systematic stability and generalization analysis of stochastic optimization on nonconvex and nonsmooth problems. We introduce novel algorithmic stability measures and establish their quantitative connection on the gap between population gradients and empirical gradients, which is then further extended to study the gap between the Moreau envelope of the empirical risk and that of the population risk. To our knowledge, these quantitative connection between stability and generalization in terms of either gradients or Moreau envelopes have not been studied in the literature. We introduce a class of sampling-determined algorithms, for which we develop bounds for three stability measures. Finally, we apply these discussions to derive error bounds for stochastic gradient descent and its adaptive variant, where we show how to achieve an implicit regularization by tuning the step sizes and the number of iterations.

A new variant of Newton's method for empirical risk minimization is studied, where at each iteration of the optimization algorithm, the gradient and Hessian of the objective function are replaced by robust estimators taken from existing literature on robust mean estimation for multivariate data. After proving a general theorem about the convergence of successive iterates to a small ball around the population-level minimizer, consequences of the theory in generalized linear models are studied when data are generated from Huber's epsilon-contamination model and/or heavytailed distributions. An algorithm for obtaining robust Newton directions based on the conjugate gradient method is also proposed, which may be more appropriate for high-dimensional settings, and conjectures about the convergence of the resulting algorithm are offered. Compared to robust gradient descent, the proposed algorithm enjoys the faster rates of convergence for successive iterates often achieved by second-order algorithms for convex problems, i.e., quadratic convergence in a neighborhood of the optimum, with a stepsize that may be chosen adaptively via backtracking linesearch.

We consider the problem of sharing a set of indivisible goods among agents in a fair manner, namely such that the allocation is envy-free up to any good (EFX). We focus on the problem of computing an EFX allocation in the two-agent case and characterize the computational complexity of the problem for most well-known valuation classes. We present a simple greedy algorithm that solves the problem when the agent valuations are weakly well-layered, a class which contains gross substitutes and budget-additive valuations. For the next largest valuation class we prove a negative result: the problem is PLS-complete for submodular valuations. All of our results also hold for the setting where there are many agents with identical valuations.

The criticality problem in nuclear engineering asks for the principal eigen-pair of a Boltzmann operator describing neutron transport in a reactor core. Being able to reliably design, and control such reactors requires assessing these quantities within quantifiable accuracy tolerances. In this paper we propose a paradigm that deviates from the common practice of approximately solving the corresponding spectral problem with a fixed, presumably sufficiently fine discretization. Instead, the present approach is based on first contriving iterative schemes, formulated in function space, that are shown to converge at a quantitative rate without assuming any a priori excess regularity properties, and that exploit only properties of the optical parameters in the underlying radiative transfer model. We develop the analytical and numerical tools for approximately realizing each iteration step withing judiciously chosen accuracy tolerances, verified by a posteriori estimates, so as to still warrant quantifiable convergence to the exact eigen-pair. This is carried out in full first for a Newton scheme. Since this is only locally convergent we analyze in addition the convergence of a power iteration in function space to produce sufficiently accurate initial guesses. Here we have to deal with intrinsic difficulties posed by compact but unsymmetric operators preventing standard arguments used in the finite dimensional case. Our main point is that we can avoid any condition on an initial guess to be already in a small neighborhood of the exact solution. We close with a discussion of remaining intrinsic obstructions to a certifiable numerical implementation, mainly related to not knowing the gap between the principal eigenvalue and the next smaller one in modulus.

In recent decades, a number of ways of dealing with causality in practice, such as propensity score matching, the PC algorithm and invariant causal prediction, have been introduced. Besides its interpretational appeal, the causal model provides the best out-of-sample prediction guarantees. In this paper, we study the identification of causal-like models from in-sample data that provide out-of-sample risk guarantees when predicting a target variable from a set of covariates. Whereas ordinary least squares provides the best in-sample risk with limited out-of-sample guarantees, causal models have the best out-of-sample guarantees but achieve an inferior in-sample risk. By defining a trade-off of these properties, we introduce $\textit{causal regularization}$. As the regularization is increased, it provides estimators whose risk is more stable across sub-samples at the cost of increasing their overall in-sample risk. The increased risk stability is shown to lead to out-of-sample risk guarantees. We provide finite sample risk bounds for all models and prove the adequacy of cross-validation for attaining these bounds.

Hypothesis transfer learning (HTL) contrasts domain adaptation by allowing for a previous task leverage, named the source, into a new one, the target, without requiring access to the source data. Indeed, HTL relies only on a hypothesis learnt from such source data, relieving the hurdle of expansive data storage and providing great practical benefits. Hence, HTL is highly beneficial for real-world applications relying on big data. The analysis of such a method from a theoretical perspective faces multiple challenges, particularly in classification tasks. This paper deals with this problem by studying the learning theory of HTL through algorithmic stability, an attractive theoretical framework for machine learning algorithms analysis. In particular, we are interested in the statistical behaviour of the regularized empirical risk minimizers in the case of binary classification. Our stability analysis provides learning guarantees under mild assumptions. Consequently, we derive several complexity-free generalization bounds for essential statistical quantities like the training error, the excess risk and cross-validation estimates. These refined bounds allow understanding the benefits of transfer learning and comparing the behaviour of standard losses in different scenarios, leading to valuable insights for practitioners.

We consider a persuasion problem between a sender and a receiver whose utility may be nonlinear in her belief; we call such receivers risk-conscious. Such utility models arise when the receiver exhibits systematic biases away from expected-utility-maximization, such as uncertainty aversion (e.g., from sensitivity to the variance of the waiting time for a service). Due to this nonlinearity, the standard approach to finding the optimal persuasion mechanism using revelation principle fails. To overcome this difficulty, we use the underlying geometry of the problem to develop a convex optimization framework to find the optimal persuasion mechanism. We define the notion of full persuasion and use our framework to characterize conditions under which full persuasion can be achieved. We use our approach to study binary persuasion, where the receiver has two actions and the sender strictly prefers one of them at every state. Under a convexity assumption, we show that the binary persuasion problem reduces to a linear program, and establish a canonical set of signals where each signal either reveals the state or induces in the receiver uncertainty between two states. Finally, we discuss the broader applicability of our methods to more general contexts, and illustrate our methodology by studying information sharing of waiting times in service systems.

Mixtures of factor analysers (MFA) models represent a popular tool for finding structure in data, particularly high-dimensional data. While in most applications the number of clusters, and especially the number of latent factors within clusters, is mostly fixed in advance, in the recent literature models with automatic inference on both the number of clusters and latent factors have been introduced. The automatic inference is usually done by assigning a nonparametric prior and allowing the number of clusters and factors to potentially go to infinity. The MCMC estimation is performed via an adaptive algorithm, in which the parameters associated with the redundant factors are discarded as the chain moves. While this approach has clear advantages, it also bears some significant drawbacks. Running a separate factor-analytical model for each cluster involves matrices of changing dimensions, which can make the model and programming somewhat cumbersome. In addition, discarding the parameters associated with the redundant factors could lead to a bias in estimating cluster covariance matrices. At last, identification remains problematic for infinite factor models. The current work contributes to the MFA literature by providing for the automatic inference on the number of clusters and the number of cluster-specific factors while keeping both cluster and factor dimensions finite. This allows us to avoid many of the aforementioned drawbacks of the infinite models. For the automatic inference on the cluster structure, we employ the dynamic mixture of finite mixtures (MFM) model. Automatic inference on cluster-specific factors is performed by assigning an exchangeable shrinkage process (ESP) prior to the columns of the factor loading matrices. The performance of the model is demonstrated on several benchmark data sets as well as real data applications.

This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.

北京阿比特科技有限公司