亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this paper, we provide a geometric interpretation of the structure of Deep Learning (DL) networks, characterized by $L$ hidden layers, a ramp activation function, an ${\mathcal L}^2$ Schatten class (or Hilbert-Schmidt) cost function, and input and output spaces ${\mathbb R}^Q$ with equal dimension $Q\geq1$. The hidden layers are defined on spaces ${\mathbb R}^{Q}$, as well. We apply our recent results on shallow neural networks to construct an explicit family of minimizers for the global minimum of the cost function in the case $L\geq Q$, which we show to be degenerate. In the context presented here, the hidden layers of the DL network "curate" the training inputs by recursive application of a truncation map that minimizes the noise to signal ratio of the training inputs. Moreover, we determine a set of $2^Q-1$ distinct degenerate local minima of the cost function.

相關內容

在數(shu)學優化,統計(ji)(ji)學,計(ji)(ji)量(liang)經(jing)濟學,決策理論,機器(qi)學習和計(ji)(ji)算神經(jing)科學中,代價函數(shu),又叫(jiao)損(sun)失函數(shu)或(huo)(huo)成本函數(shu),它是將一(yi)個(ge)(ge)或(huo)(huo)多(duo)個(ge)(ge)變量(liang)的事(shi)件(jian)閾值映射到(dao)直觀地(di)表(biao)示與該(gai)事(shi)件(jian)。 一(yi)個(ge)(ge)優化問題試(shi)圖最小(xiao)化損(sun)失函數(shu)。 目標函數(shu)是損(sun)失函數(shu)或(huo)(huo)其負(fu)值,在這種情況下它將被最大(da)化。

Using a fully Bayesian approach, Gaussian Process regression is extended to include marginalisation over the kernel choice and kernel hyperparameters. In addition, Bayesian model comparison via the evidence enables direct kernel comparison. The calculation of the joint posterior was implemented with a transdimensional sampler which simultaneously samples over the discrete kernel choice and their hyperparameters by embedding these in a higher-dimensional space, from which samples are taken using nested sampling. This method was explored on synthetic data from exoplanet transit light curve simulations. The true kernel was recovered in the low noise region while no kernel was preferred for larger noise. Furthermore, inference of the physical exoplanet hyperparameters was conducted. In the high noise region, either the bias in the posteriors was removed, the posteriors were broadened or the accuracy of the inference was increased. In addition, the uncertainty in mean function predictive distribution increased due to the uncertainty in the kernel choice. Subsequently, the method was extended to marginalisation over mean functions and noise models and applied to the inference of the present-day Hubble parameter, $H_0$, from real measurements of the Hubble parameter as a function of redshift, derived from the cosmologically model-independent cosmic chronometer and {\Lambda}CDM-dependent baryon acoustic oscillation observations. The inferred $H_0$ values from the cosmic chronometers, baryon acoustic oscillations and combined datasets are $H_0$ = 66$\pm$6 km/s/Mpc, $H_0$ = 67$\pm$10 km/s/Mpc and $H_0$ = 69$\pm$6 km/s/Mpc, respectively. The kernel posterior of the cosmic chronometers dataset prefers a non-stationary linear kernel. Finally, the datasets are shown to be not in tension with ln(R)=12.17$\pm$0.02.

In the paper, we describe in operator form classes of PDEs that admit PINN's error estimation. Also, for $L^p$ spaces, we obtain a Bramble-Hilbert type lemma that is a tool for PINN's residuals bounding.

In this paper, we develop reliable a posteriori error estimates for numerical approximations of scalar hyperbolic conservation laws in one space dimension. Our methods have no inherent small-data limitations and are a step towards error control of numerical schemes for systems. We are careful not to appeal to the Kruzhkov theory for scalar conservation laws. Instead, we derive novel quantitative stability estimates that extend the theory of shifts, and in particular, the framework for proving stability first developed by the second author and Vasseur. This is the first time this methodology has been used for quantitative estimates. We work entirely within the context of the theory of shifts and $a$-contraction, techniques which adapt well to systems. In fact, the stability framework by the second author and Vasseur has itself recently been pushed to systems [Chen-Krupa-Vasseur. Uniqueness and weak-BV stability for $2\times 2$ conservation laws. Arch. Ration. Mech. Anal., 246(1):299--332, 2022]. Our theoretical findings are complemented by a numerical implementation in MATLAB and numerical experiments.

The paper tackles the problem of clustering multiple networks, directed or not, that do not share the same set of vertices, into groups of networks with similar topology. A statistical model-based approach based on a finite mixture of stochastic block models is proposed. A clustering is obtained by maximizing the integrated classification likelihood criterion. This is done by a hierarchical agglomerative algorithm, that starts from singleton clusters and successively merges clusters of networks. As such, a sequence of nested clusterings is computed that can be represented by a dendrogram providing valuable insights on the collection of networks. Using a Bayesian framework, model selection is performed in an automated way since the algorithm stops when the best number of clusters is attained. The algorithm is computationally efficient, when carefully implemented. The aggregation of clusters requires a means to overcome the label-switching problem of the stochastic block model and to match the block labels of the networks. To address this problem, a new tool is proposed based on a comparison of the graphons of the associated stochastic block models. The clustering approach is assessed on synthetic data. An application to a set of ecological networks illustrates the interpretability of the obtained results.

We develop a flexible online version of the permutation test. This allows us to test exchangeability as the data is arriving, where we can choose to stop or continue without invalidating the size of the test. Our methods generalize beyond exchangeability to other forms of invariance under a compact group. Our approach relies on constructing an $e$-process that is the running product of multiple conditional $e$-values. To construct $e$-values, we first develop an essentially complete class of admissible $e$-values in which one can flexibly `plug in' almost any desired test statistic. To make the $e$-values conditional, we explore the intersection between the concepts of conditional invariance and sequential invariance, and find that the appropriate conditional distribution can be captured by a compact subgroup. To find powerful $e$-values for given alternatives, we develop the theory of likelihood ratios for testing group invariance yielding new optimality results for group invariance tests. These statistics turn out to exist in three different flavors, depending on the space on which we specify our alternative. We apply these statistics to test against a Gaussian location shift, which yields connections to the $t$-test when testing sphericity, connections to the softmax function and its temperature when testing exchangeability, and yields an improved version of a known $e$-value for testing sign-symmetry. Moreover, we introduce an impatience parameter that allows users to obtain more power now in exchange for less power in the long run.

We study integration and $L^2$-approximation of functions of infinitely many variables in the following setting: The underlying function space is the countably infinite tensor product of univariate Hermite spaces and the probability measure is the corresponding product of the standard normal distribution. The maximal domain of the functions from this tensor product space is necessarily a proper subset of the sequence space $\mathbb{R}^\mathbb{N}$. We establish upper and lower bounds for the minimal worst case errors under general assumptions; these bounds do match for tensor products of well-studied Hermite spaces of functions with finite or with infinite smoothness. In the proofs we employ embedding results, and the upper bounds are attained constructively with the help of multivariate decomposition methods.

Interior point methods (IPMs) that handle nonconvex constraints such as IPOPT, KNITRO and LOQO have had enormous practical success. We consider IPMs in the setting where the objective and constraints are thrice differentiable, and have Lipschitz first and second derivatives on the feasible region. We provide an IPM that, starting from a strictly feasible point, finds a $\mu$-approximate Fritz John point by solving $\mathcal{O}( \mu^{-7/4})$ trust-region subproblems. For IPMs that handle nonlinear constraints, this result represents the first iteration bound with a polynomial dependence on $1/\mu$. We also show how to use our method to find scaled-KKT points starting from an infeasible solution and improve on existing complexity bounds.

This paper develops a general asymptotic theory of local polynomial (LP) regression for spatial data observed at irregularly spaced locations in a sampling region $R_n \subset \mathbb{R}^d$. We adopt a stochastic sampling design that can generate irregularly spaced sampling sites in a flexible manner including both pure increasing and mixed increasing domain frameworks. We first introduce a nonparametric regression model for spatial data defined on $\mathbb{R}^d$ and then establish the asymptotic normality of LP estimators with general order $p \geq 1$. We also propose methods for constructing confidence intervals and establishing uniform convergence rates of LP estimators. Our dependence structure conditions on the underlying processes cover a wide class of random fields such as L\'evy-driven continuous autoregressive moving average random fields. As an application of our main results, we discuss a two-sample testing problem for mean functions and their partial derivatives.

In this paper we propose a definition of the distributional Riemann curvature tensor in dimension $N\geq 2$ if the underlying metric tensor $g$ defined on a triangulation $\mathcal{T}$ possesses only single-valued tangential-tangential components on codimension 1 simplices. We analyze the convergence of the curvature approximation in the $H^{-2}$-norm if a sequence of interpolants $g_h$ of polynomial order $k\geq 0$ of a smooth metric $g$ is given. We show that for dimension $N=2$ convergence rates of order $\mathcal{O}(h^{k+1})$ are obtained. For $N\geq 3$ convergence holds only in the case $k\geq 1$. Numerical examples demonstrate that our theoretical results are sharp. By choosing appropriate test functions we show that the distributional Gauss and scalar curvature in 2D respectively any dimension are obtained. Further, a first definition of the distributional Ricci curvature tensor in arbitrary dimension is derived, for which our analysis is applicable.

We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, Princeton ModelNet40, and NVIDIA Dynamic Hand Gesture.

北京阿比特科技有限公司