亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study the implicit bias of gradient flow on linear equivariant steerable networks in group-invariant binary classification. Our findings reveal that the parameterized predictor converges in direction to the unique group-invariant classifier with a maximum margin defined by the input group action. Under a unitary assumption on the input representation, we establish the equivalence between steerable networks and data augmentation. Furthermore, we demonstrate the improved margin and generalization bound of steerable networks over their non-invariant counterparts.

相關內容

Networking:IFIP International Conferences on Networking。 Explanation:國際網絡會議。 Publisher:IFIP。 SIT:

Stochastic Bilevel optimization usually involves minimizing an upper-level (UL) function that is dependent on the arg-min of a strongly-convex lower-level (LL) function. Several algorithms utilize Neumann series to approximate certain matrix inverses involved in estimating the implicit gradient of the UL function (hypergradient). The state-of-the-art StOchastic Bilevel Algorithm (SOBA) [16] instead uses stochastic gradient descent steps to solve the linear system associated with the explicit matrix inversion. This modification enables SOBA to match the lower bound of sample complexity for the single-level counterpart in non-convex settings. Unfortunately, the current analysis of SOBA relies on the assumption of higher-order smoothness for the UL and LL functions to achieve optimality. In this paper, we introduce a novel fully single-loop and Hessian-inversion-free algorithmic framework for stochastic bilevel optimization and present a tighter analysis under standard smoothness assumptions (first-order Lipschitzness of the UL function and second-order Lipschitzness of the LL function). Furthermore, we show that by a slight modification of our approach, our algorithm can handle a more general multi-objective robust bilevel optimization problem. For this case, we obtain the state-of-the-art oracle complexity results demonstrating the generality of both the proposed algorithmic and analytic frameworks. Numerical experiments demonstrate the performance gain of the proposed algorithms over existing ones.

A sequence of random variables is called exchangeable if its joint distribution is invariant under permutations. The original formulation of de Finetti's theorem says that any exchangeable sequence of $\{0,1\}$-valued random variables can be thought of as a mixture of independent and identically distributed sequences in a certain precise mathematical sense. Interpreting this statement from a convex analytic perspective, Hewitt and Savage obtained the same conclusion for more general state spaces under some topological conditions. The main contribution of this paper is in providing a new framework that explains the theorem purely as a consequence of the underlying distribution of the random variables, with no topological conditions (beyond Hausdorffness) on the state space being necessary if the distribution is Radon. We also show that it is consistent with the axioms of ZFC that de Finetti's theorem holds for all sequences of exchangeable random variables taking values in any complete metric space. The framework we use is based on nonstandard analysis. We have provided a self-contained introduction to nonstandard analysis as an appendix, thus rendering measure theoretic probability and point-set topology as the only prerequisites for this paper. Our introduction aims to develop some new ideologies that might be of interest to mathematicians, philosophers, and mathematics educators alike. Our technical tools come from nonstandard topological measure theory, in which a highlight is a new generalization of Prokhorov's theorem. Modulo such technical tools, our proof relies on properties of the empirical measures induced by hyperfinitely many identically distributed random variables -- a feature that allows us to establish de Finetti's theorem in the generality that we seek while still retaining the combinatorial intuition of proofs of simpler versions of de Finetti's theorem.

We study the implicit bias of batch normalization trained by gradient descent. We show that when learning a linear model with batch normalization for binary classification, gradient descent converges to a uniform margin classifier on the training data with an $\exp(-\Omega(\log^2 t))$ convergence rate. This distinguishes linear models with batch normalization from those without batch normalization in terms of both the type of implicit bias and the convergence rate. We further extend our result to a class of two-layer, single-filter linear convolutional neural networks, and show that batch normalization has an implicit bias towards a patch-wise uniform margin. Based on two examples, we demonstrate that patch-wise uniform margin classifiers can outperform the maximum margin classifiers in certain learning problems. Our results contribute to a better theoretical understanding of batch normalization.

We present an implicit-explicit finite volume scheme for two-fluid single-temperature flow in all Mach number regimes which is based on a symmetric hyperbolic thermodynamically compatible description of the fluid flow. The scheme is stable for large time steps controlled by the interface transport and is computational efficient due to a linear implicit character. The latter is achieved by linearizing along constant reference states given by the asymptotic analysis of the single-temperature model. Thus, the use of a stiffly accurate IMEX Runge Kutta time integration and the centered treatment of pressure based quantities provably guarantee the asymptotic preserving property of the scheme for weakly compressible Euler equations with variable volume fraction. The properties of the first and second order scheme are validated by several numerical test cases.

When machine learning is outsourced to a rational agent, conflicts of interest might arise and severely impact predictive performance. In this work, we propose a theoretical framework for incentive-aware delegation of machine learning tasks. We model delegation as a principal-agent game, in which accurate learning can be incentivized by the principal using performance-based contracts. Adapting the economic theory of contract design to this setting, we define budget-optimal contracts and prove they take a simple threshold form under reasonable assumptions. In the binary-action case, the optimality of such contracts is shown to be equivalent to the classic Neyman-Pearson lemma, establishing a formal connection between contract design and statistical hypothesis testing. Empirically, we demonstrate that budget-optimal contracts can be constructed using small-scale data, leveraging recent advances in the study of learning curves and scaling laws. Performance and economic outcomes are evaluated using synthetic and real-world classification tasks.

Warm-Start reinforcement learning (RL), aided by a prior policy obtained from offline training, is emerging as a promising RL approach for practical applications. Recent empirical studies have demonstrated that the performance of Warm-Start RL can be improved \textit{quickly} in some cases but become \textit{stagnant} in other cases, especially when the function approximation is used. To this end, the primary objective of this work is to build a fundamental understanding on ``\textit{whether and when online learning can be significantly accelerated by a warm-start policy from offline RL?}''. Specifically, we consider the widely used Actor-Critic (A-C) method with a prior policy. We first quantify the approximation errors in the Actor update and the Critic update, respectively. Next, we cast the Warm-Start A-C algorithm as Newton's method with perturbation, and study the impact of the approximation errors on the finite-time learning performance with inaccurate Actor/Critic updates. Under some general technical conditions, we derive the upper bounds, which shed light on achieving the desired finite-learning performance in the Warm-Start A-C algorithm. In particular, our findings reveal that it is essential to reduce the algorithm bias in online learning. We also obtain lower bounds on the sub-optimality gap of the Warm-Start A-C algorithm to quantify the impact of the bias and error propagation.

In this paper, we present a novel characterization of the smoothness of a model based on basic principles of Large Deviation Theory. In contrast to prior work, where the smoothness of a model is normally characterized by a real value (e.g., the weights' norm), we show that smoothness can be described by a simple real-valued function. Based on this concept of smoothness, we propose an unifying theoretical explanation of why some interpolators generalize remarkably well and why a wide range of modern learning techniques (i.e., stochastic gradient descent, $\ell_2$-norm regularization, data augmentation, invariant architectures, and overparameterization) are able to find them. The emergent conclusion is that all these methods provide complimentary procedures that bias the optimizer to smoother interpolators, which, according to this theoretical analysis, are the ones with better generalization error.

Elliptical distribution is a basic assumption underlying many multivariate statistical methods. For example, in sufficient dimension reduction and statistical graphical models, this assumption is routinely imposed to simplify the data dependence structure. Before applying such methods, we need to decide whether the data are elliptically distributed. Currently existing tests either focus exclusively on spherical distributions, or rely on bootstrap to determine the null distribution, or require specific forms of the alternative distribution. In this paper, we introduce a general nonparametric test for elliptical distribution based on kernel embedding of the probability measure that embodies the two properties that characterize an elliptical distribution: namely, after centering and rescaling, (1) the direction and length of the random vector are independent, and (2) the directional vector is uniformly distributed on the unit sphere. We derive the null asymptotic distribution of the test statistic via von-Mises expansion, develop the sample-level procedure to determine the rejection region, and establish the consistency and validity of the proposed test. We apply our test to a SENIC dataset with and without a transformation aimed to achieve ellipticity.

We investigate the statistical behavior of gradient descent iterates with dropout in the linear regression model. In particular, non-asymptotic bounds for expectations and covariance matrices of the iterates are derived. In contrast with the widely cited connection between dropout and $\ell_2$-regularization in expectation, the results indicate a much more subtle relationship, owing to interactions between the gradient descent dynamics and the additional randomness induced by dropout. We also study a simplified variant of dropout which does not have a regularizing effect and converges to the least squares estimator.

A core capability of intelligent systems is the ability to quickly learn new tasks by drawing on prior experience. Gradient (or optimization) based meta-learning has recently emerged as an effective approach for few-shot learning. In this formulation, meta-parameters are learned in the outer loop, while task-specific models are learned in the inner-loop, by using only a small amount of data from the current task. A key challenge in scaling these approaches is the need to differentiate through the inner loop learning process, which can impose considerable computational and memory burdens. By drawing upon implicit differentiation, we develop the implicit MAML algorithm, which depends only on the solution to the inner level optimization and not the path taken by the inner loop optimizer. This effectively decouples the meta-gradient computation from the choice of inner loop optimizer. As a result, our approach is agnostic to the choice of inner loop optimizer and can gracefully handle many gradient steps without vanishing gradients or memory constraints. Theoretically, we prove that implicit MAML can compute accurate meta-gradients with a memory footprint that is, up to small constant factors, no more than that which is required to compute a single inner loop gradient and at no overall increase in the total computational cost. Experimentally, we show that these benefits of implicit MAML translate into empirical gains on few-shot image recognition benchmarks.

北京阿比特科技有限公司