亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

This paper studies the approximation property of ReLU neural networks (NNs) to piecewise constant functions with unknown interfaces in bounded regions in $\mathbb{R}^d$. Under the assumption that the discontinuity interface $\Gamma$ may be approximated by a connected series of hyperplanes with a prescribed accuracy $\varepsilon >0$, we show that a three-layer ReLU NN is sufficient to accurately approximate any piecewise constant function and establish its error bound. Moreover, if the discontinuity interface is convex, an analytical formula of the ReLU NN approximation with exact weights and biases is provided.

相關內容

Generative diffusion models apply the concept of Langevin dynamics in physics to machine leaning, attracting a lot of interests from engineering, statistics and physics, but a complete picture about inherent mechanisms is still lacking. In this paper, we provide a transparent physics analysis of diffusion models, formulating the fluctuation theorem, entropy production, equilibrium measure, and Franz-Parisi potential to understand the dynamic process and intrinsic phase transitions. Our analysis is rooted in a path integral representation of both forward and backward dynamics, and in treating the reverse diffusion generative process as a statistical inference, where the time-dependent state variables serve as quenched disorder akin to that in spin glass theory. Our study thus links stochastic thermodynamics, statistical inference and geometry based analysis together to yield a coherent picture about how the generative diffusion models work.

The motivation for sparse learners is to compress the inputs (features) by selecting only the ones needed for good generalization. Linear models with LASSO-type regularization achieve this by setting the weights of irrelevant features to zero, effectively identifying and ignoring them. In artificial neural networks, this selective focus can be achieved by pruning the input layer. Given a cost function enhanced with a sparsity-promoting penalty, our proposal selects a regularization term $\lambda$ (without the use of cross-validation or a validation set) that creates a local minimum in the cost function at the origin where no features are selected. This local minimum acts as a baseline, meaning that if there is no strong enough signal to justify a feature inclusion, the local minimum remains at zero with a high prescribed probability. The method is flexible, applying to complex models ranging from shallow to deep artificial neural networks and supporting various cost functions and sparsity-promoting penalties. We empirically show a remarkable phase transition in the probability of retrieving the relevant features, as well as good generalization thanks to the choice of $\lambda$, the non-convex penalty and the optimization scheme developed. This approach can be seen as a form of compressed sensing for complex models, allowing us to distill high-dimensional data into a compact, interpretable subset of meaningful features.

This paper proposes a novel canonical correlation analysis for semiparametric inference in $I(1)/I(0)$ systems via functional approximation. The approach can be applied coherently to panels of $p$ variables with a generic number $s$ of stochastic trends, as well as to subsets or aggregations of variables. This study discusses inferential tools on $s$ and on the loading matrix $\psi$ of the stochastic trends (and on their duals $r$ and $\beta$, the cointegration rank and the cointegrating matrix): asymptotically pivotal test sequences and consistent estimators of $s$ and $r$, $T$-consistent, mixed Gaussian and efficient estimators of $\psi$ and $\beta$, Wald tests thereof, and misspecification tests for checking model assumptions. Monte Carlo simulations show that these tools have reliable performance uniformly in $s$ for small, medium and large-dimensional systems, with $p$ ranging from 10 to 300. An empirical analysis of 20 exchange rates illustrates the methods.

Modern large language model (LLM) alignment techniques rely on human feedback, but it is unclear whether these techniques fundamentally limit the capabilities of aligned LLMs. In particular, it is unknown if it is possible to align (stronger) LLMs with superhuman capabilities with (weaker) human feedback without degrading their capabilities. This is an instance of the weak-to-strong generalization problem: using feedback from a weaker (less capable) model to train a stronger (more capable) model. We prove that weak-to-strong generalization is possible by eliciting latent knowledge from pre-trained LLMs. In particular, we cast the weak-to-strong generalization problem as a transfer learning problem in which we wish to transfer a latent concept prior from a weak model to a strong pre-trained model. We prove that a naive fine-tuning approach suffers from fundamental limitations, but an alternative refinement-based approach suggested by the problem structure provably overcomes the limitations of fine-tuning. Finally, we demonstrate the practical applicability of the refinement approach in multiple LLM alignment tasks.

Adaptive cubic regularization methods for solving nonconvex problems need the efficient computation of the trial step, involving the minimization of a cubic model. We propose a new approach in which this model is minimized in a low dimensional subspace that, in contrast to classic approaches, is reused for a number of iterations. Whenever the trial step produced by the low-dimensional minimization process is unsatisfactory, we employ a regularized Newton step whose regularization parameter is a by-product of the model minimization over the low-dimensional subspace. We show that the worst-case complexity of classic cubic regularized methods is preserved, despite the possible regularized Newton steps. We focus on the large class of problems for which (sparse) direct linear system solvers are available and provide several experimental results showing the very large gains of our new approach when compared to standard implementations of adaptive cubic regularization methods based on direct linear solvers. Our first choice as projection space for the low-dimensional model minimization is the polynomial Krylov subspace; nonetheless, we also explore the use of rational Krylov subspaces in case where the polynomial ones lead to less competitive numerical results.

We present a novel formulation for the dynamics of geometrically exact Timoshenko beams and beam structures made of viscoelastic material featuring complex, arbitrarily curved initial geometries. An $\textrm{SO}(3)$-consistent and second-order accurate time integration scheme for accelerations, velocities and rate-dependent viscoelastic strain measures is adopted. To achieve high efficiency and geometrical flexibility, the spatial discretization is carried out with the isogemetric collocation (IGA-C) method, which permits bypassing elements integration keeping all the advantages of the isogeometric analysis (IGA) in terms of high-order space accuracy and geometry representation. Moreover, a primal formulation guarantees the minimal kinematic unknowns. The generalized Maxwell model is deployed directly to the one-dimensional beam strain and stress measures. This allows to express the internal variables in terms of the same kinematic unknowns, as for the case of linear elastic rate-independent materials bypassing the complexities introduced by the viscoelastic material. As a result, existing $\textrm{SO}(3)$-consistent linearizations of the governing equations in the strong form (and associated updating formulas) can straightforwardly be used. Through a series of numerical tests, the attributes and potentialities of the proposed formulation are demonstrated. In particular, we show the capability to accurately simulate beams and beam systems featuring complex initial geometry and topology, opening interesting perspectives in the inverse design of programmable mechanical meta-materials and objects.

The augmented Lagrange method is employed to address the optimal control problem involving pointwise state constraints in parabolic equations. The strong convergence of the primal variables and the weak convergence of the dual variables are rigorously established. The sub-problems arising in the algorithm are solved using the Method of Successive Approximations (MSA), derived from Pontryagin's principle. Numerical experiments are provided to validate the convergence of the proposed algorithm.

We present a novel framework for estimation and inference for the broad class of universal approximators. Estimation is based on the decomposition of model predictions into Shapley values. Inference relies on analyzing the bias and variance properties of individual Shapley components. We show that Shapley value estimation is asymptotically unbiased, and we introduce Shapley regressions as a tool to uncover the true data generating process from noisy data alone. The well-known case of the linear regression is the special case in our framework if the model is linear in parameters. We present theoretical, numerical, and empirical results for the estimation of heterogeneous treatment effects as our guiding example.

We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.

When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.

北京阿比特科技有限公司