亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Four-dimensional weak-constraint variational data assimilation estimates a state given partial noisy observations and dynamical model by minimizing a cost function that takes into account both discrepancy between the state and observations and model error over time. It can be formulated as a Gauss-Newton iteration of an associated least-squares problem. In this paper, we introduce a parameter in front of the observation mismatch and show analytically that this parameter is crucial either for convergence to the true solution when observations are noise-free or for boundness of the error when observations are noisy with bounded observation noise. We also consider joint state-parameter estimation. We illustrated theoretical results with numerical experiments using the Lorenz 63 and Lorenz 96 models.

相關內容

This paper investigates gradient descent for solving low-rank matrix approximation problems. We begin by establishing the local linear convergence of gradient descent for symmetric matrix approximation. Building on this result, we prove the rapid global convergence of gradient descent, particularly when initialized with small random values. Remarkably, we show that even with moderate random initialization, which includes small random initialization as a special case, gradient descent achieves fast global convergence in scenarios where the top eigenvalues are identical. Furthermore, we extend our analysis to address asymmetric matrix approximation problems and investigate the effectiveness of a retraction-free eigenspace computation method. Numerical experiments strongly support our theory. In particular, the retraction-free algorithm outperforms the corresponding Riemannian gradient descent method, resulting in a significant 29\% reduction in runtime.

Can we recover the hidden parameters of an Artificial Neural Network (ANN) by probing its input-output mapping? We propose a systematic method, called `Expand-and-Cluster' that needs only the number of hidden layers and the activation function of the probed ANN to identify all network parameters. In the expansion phase, we train a series of networks of increasing size using the probed data of the ANN as a teacher. Expansion stops when a minimal loss is consistently reached in networks of a given size. In the clustering phase, weight vectors of the expanded students are clustered, which allows structured pruning of superfluous neurons in a principled way. We find that an overparameterization of a factor four is sufficient to reliably identify the minimal number of neurons and to retrieve the original network parameters in $80\%$ of tasks across a family of 150 toy problems of variable difficulty. Furthermore, shallow and deep teacher networks trained on MNIST data can be identified with less than $5\%$ overhead in the neuron number. Thus, while direct training of a student network with a size identical to that of the teacher is practically impossible because of the highly non-convex loss function, training with mild overparameterization followed by clustering and structured pruning correctly identifies the target network.

We consider the problem of maximum likelihood estimation in linear models represented by factor graphs and solved via the Gaussian belief propagation algorithm. Motivated by massive internet of things (IoT) networks and edge computing, we set the above problem in a clustered scenario, where the factor graph is divided into clusters and assigned for processing in a distributed fashion across a number of edge computing nodes. For these scenarios, we show that an alternating Gaussian belief propagation (AGBP) algorithm that alternates between inter- and intra-cluster iterations, demonstrates superior performance in terms of convergence properties compared to the existing solutions in the literature. We present a comprehensive framework and introduce appropriate metrics to analyse AGBP algorithm across a wide range of linear models characterised by symmetric and non-symmetric, square, and rectangular matrices. We extend the analysis to the case of dynamic linear models by introducing dynamic arrival of new data over time. Using a combination of analytical and extensive numerical results, we show the efficiency and scalability of AGBP algorithm, making it a suitable solution for large-scale inference in massive IoT networks.

The Stratified Foundations are a restriction of naive set theory where the comprehension scheme is restricted to stratifiable propositions. It is known that this theory is consistent and that proofs strongly normalize in this theory. Deduction modulo is a formulation of first-order logic with a general notion of cut. It is known that proofs normalize in a theory modulo if it has some kind of many-valued model called a pre-model. We show in this paper that the Stratified Foundations can be presented in deduction modulo and that the method used in the original normalization proof can be adapted to construct a pre-model for this theory.

With contemporary data sets becoming too large to analyze the data directly, various forms of aggregated data are becoming common. The original individual data are points, but after aggregation, the observations are interval-valued (e.g.). While some researchers simply analyze the set of averages of the observations by aggregated class, it is easily established that approach ignores much of the information in the original data set. The initial theoretical work for interval-valued data was that of Le-Rademacher and Billard (2011), but those results were limited to estimation of the mean and variance of a single variable only. This article seeks to redress the limitation of their work by deriving the maximum likelihood estimator for the all important covariance statistic, a basic requirement for numerous methodologies, such as regression, principal components, and canonical analyses. Asymptotic properties of the proposed estimators are established. The Le-Rademacher and Billard results emerge as special cases of our wider derivations.

Generative Adversarial Networks (GANs) are a popular formulation to train generative models for complex high dimensional data. The standard method for training GANs involves a gradient descent-ascent (GDA) procedure on a minimax optimization problem. This procedure is hard to analyze in general due to the nonlinear nature of the dynamics. We study the local dynamics of GDA for training a GAN with a kernel-based discriminator. This convergence analysis is based on a linearization of a non-linear dynamical system that describes the GDA iterations, under an \textit{isolated points model} assumption from [Becker et al. 2022]. Our analysis brings out the effect of the learning rates, regularization, and the bandwidth of the kernel discriminator, on the local convergence rate of GDA. Importantly, we show phase transitions that indicate when the system converges, oscillates, or diverges. We also provide numerical simulations that verify our claims.

The convergence of GD and SGD when training mildly parameterized neural networks starting from random initialization is studied. For a broad range of models and loss functions, including the most commonly used square loss and cross entropy loss, we prove an ``early stage convergence'' result. We show that the loss is decreased by a significant amount in the early stage of the training, and this decrease is fast. Furthurmore, for exponential type loss functions, and under some assumptions on the training data, we show global convergence of GD. Instead of relying on extreme over-parameterization, our study is based on a microscopic analysis of the activation patterns for the neurons, which helps us derive more powerful lower bounds for the gradient. The results on activation patterns, which we call ``neuron partition'', help build intuitions for understanding the behavior of neural networks' training dynamics, and may be of independent interest.

We study the convergence to local Nash equilibria of gradient methods for two-player zero-sum differentiable games. It is well-known that such dynamics converge locally when $S \succ 0$ and may diverge when $S=0$, where $S\succeq 0$ is the symmetric part of the Jacobian at equilibrium that accounts for the "potential" component of the game. We show that these dynamics also converge as soon as $S$ is nonzero (partial curvature) and the eigenvectors of the antisymmetric part $A$ are in general position with respect to the kernel of $S$. We then study the convergence rates when $S \ll A$ and prove that they typically depend on the average of the eigenvalues of $S$, instead of the minimum as an analogy with minimization problems would suggest. To illustrate our results, we consider the problem of computing mixed Nash equilibria of continuous games. We show that, thanks to partial curvature, conic particle methods -- which optimize over both weights and supports of the mixed strategies -- generically converge faster than fixed-support methods. For min-max games, it is thus beneficial to add degrees of freedom "with curvature": this can be interpreted as yet another benefit of over-parameterization.

In this paper we derive sufficient conditions for the convergence of two popular alternating minimisation algorithms for dictionary learning - the Method of Optimal Directions (MOD) and Online Dictionary Learning (ODL), which can also be thought of as approximative K-SVD. We show that given a well-behaved initialisation that is either within distance at most $1/\log(K)$ to the generating dictionary or has a special structure ensuring that each element of the initialisation only points to one generating element, both algorithms will converge with geometric convergence rate to the generating dictionary. This is done even for data models with non-uniform distributions on the supports of the sparse coefficients. These allow the appearance frequency of the dictionary elements to vary heavily and thus model real data more closely.

This paper introduces a matrix quantile factor model for matrix-valued data with a low-rank structure. We estimate the row and column factor spaces via minimizing the empirical check loss function over all panels. We show the estimates converge at rate $1/\min\{\sqrt{p_1p_2}, \sqrt{p_2T},$ $\sqrt{p_1T}\}$ in average Frobenius norm, where $p_1$, $p_2$ and $T$ are the row dimensionality, column dimensionality and length of the matrix sequence. This rate is faster than that of the quantile estimates via ``flattening" the matrix model into a large vector model. Smoothed estimates are given and their central limit theorems are derived under some mild condition. We provide three consistent criteria to determine the pair of row and column factor numbers. Extensive simulation studies and an empirical study justify our theory.

北京阿比特科技有限公司