亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study a preconditioner for a Hermitian positive definite linear system, which is obtained as the solution of a matrix nearness problem based on the Bregman \emph{log determinant} divergence. The preconditioner is on the form of a Hermitian positive definite matrix plus a low-rank matrix. For this choice of structure, the generalised eigenvalues of the preconditioned system are easily calculated, and we show that the preconditioner is optimal in the sense that it minimises the $\ell_2$ condition number of the preconditioned matrix. We develop practical numerical approximations of the preconditioner based on the randomised singular value decomposition (SVD) and the Nystr\"om approximation and provide corresponding approximation results. Furthermore, we prove that the Nystr\"om approximation is in fact also a matrix approximation in a range-restricted Bregman divergence and establish several connections between this divergence and matrix nearness problems in different measures. Numerical examples are provided to support the theoretical results.

相關內容

Stochastic gradient descent (SGD) has become a cornerstone of neural network optimization, yet the noise introduced by SGD is often assumed to be uncorrelated over time, despite the ubiquity of epoch-based training. In this work, we challenge this assumption and investigate the effects of epoch-based noise correlations on the stationary distribution of discrete-time SGD with momentum, limited to a quadratic loss. Our main contributions are twofold: first, we calculate the exact autocorrelation of the noise for training in epochs under the assumption that the noise is independent of small fluctuations in the weight vector; second, we explore the influence of correlations introduced by the epoch-based learning scheme on SGD dynamics. We find that for directions with a curvature greater than a hyperparameter-dependent crossover value, the results for uncorrelated noise are recovered. However, for relatively flat directions, the weight variance is significantly reduced. We provide an intuitive explanation for these results based on a crossover between correlation times, contributing to a deeper understanding of the dynamics of SGD in the presence of epoch-based noise correlations.

One of the fundamental challenges found throughout the data sciences is to explain why things happen in specific ways, or through which mechanisms a certain variable $X$ exerts influences over another variable $Y$. In statistics and machine learning, significant efforts have been put into developing machinery to estimate correlations across variables efficiently. In causal inference, a large body of literature is concerned with the decomposition of causal effects under the rubric of mediation analysis. However, many variations are spurious in nature, including different phenomena throughout the applied sciences. Despite the statistical power to estimate correlations and the identification power to decompose causal effects, there is still little understanding of the properties of spurious associations and how they can be decomposed in terms of the underlying causal mechanisms. In this manuscript, we develop formal tools for decomposing spurious variations in both Markovian and Semi-Markovian models. We prove the first results that allow a non-parametric decomposition of spurious effects and provide sufficient conditions for the identification of such decompositions. The described approach has several applications, ranging from explainable and fair AI to questions in epidemiology and medicine, and we empirically demonstrate its use on a real-world dataset.

We propose a novel $K$-nearest neighbor resampling procedure for estimating the performance of a policy from historical data containing realized episodes of a decision process generated under a different policy. We focus on feedback policies that depend deterministically on the current state in environments with continuous state-action spaces and system-inherent stochasticity effected by chosen actions. Such settings are common in a wide range of high-stake applications and are actively investigated in the context of stochastic control. Our procedure exploits that similar state/action pairs (in a metric sense) are associated with similar rewards and state transitions. This enables our resampling procedure to tackle the counterfactual estimation problem underlying off-policy evaluation (OPE) by simulating trajectories similarly to Monte Carlo methods. Compared to other OPE methods, our algorithm does not require optimization, can be efficiently implemented via tree-based nearest neighbor search and parallelization and does not explicitly assume a parametric model for the environment's dynamics. These properties make the proposed resampling algorithm particularly useful for stochastic control environments. We prove that our method is statistically consistent in estimating the performance of a policy in the OPE setting under weak assumptions and for data sets containing entire episodes rather than independent transitions. To establish the consistency, we generalize Stone's Theorem, a well-known result in nonparametric statistics on local averaging, to include episodic data and the counterfactual estimation underlying OPE. Numerical experiments demonstrate the effectiveness of the algorithm in a variety of stochastic control settings including a linear quadratic regulator, trade execution in limit order books and online stochastic bin packing.

An important aspect of developing reliable deep learning systems is devising strategies that make these systems robust to adversarial attacks. There is a long line of work that focuses on developing defenses against these attacks, but recently, researchers have began to study ways to reverse engineer the attack process. This allows us to not only defend against several attack models, but also classify the threat model. However, there is still a lack of theoretical guarantees for the reverse engineering process. Current approaches that give any guarantees are based on the assumption that the data lies in a union of linear subspaces, which is not a valid assumption for more complex datasets. In this paper, we build on prior work and propose a novel framework for reverse engineering of deceptions which supposes that the clean data lies in the range of a GAN. To classify the signal and attack, we jointly solve a GAN inversion problem and a block-sparse recovery problem. For the first time in the literature, we provide deterministic linear convergence guarantees for this problem. We also empirically demonstrate the merits of the proposed approach on several nonlinear datasets as compared to state-of-the-art methods.

Many popular models from the networks literature can be viewed through a common lens of contingency tables on network dyads, resulting in \emph{log-linear ERGMs}: exponential family models for random graphs whose sufficient statistics are linear on the dyads. We propose a new model in this family, the \emph{$p_1$-SBM}, which combines node and group effects common in network formation mechanisms. In particular, it is a generalization of several well-known ERGMs including the stochastic blockmodel for undirected graphs, the degree-corrected version of it, and the directed $p_1$ model without group structure. We frame the problem of testing model fit for the log-linear ERGM class through an exact conditional test whose $p$-value can be approximated efficiently in networks of both small and moderately large sizes. The sampling methods we build rely on a dynamic adaptation of Markov bases. We use quick estimation algorithms adapted from the contingency table literature and effective sampling methods rooted in graph theory and algebraic statistics. The performance and scalability of the method is demonstrated on two data sets from biology: the connectome of \emph{C. elegans} and the interactome of \emph{Arabidopsis thaliana}. These two networks -- a neuronal network and a protein-protein interaction network -- have been popular examples in the network science literature. Our work provides a model-based approach to studying them.

Empirical neural tangent kernels (eNTKs) can provide a good understanding of a given network's representation: they are often far less expensive to compute and applicable more broadly than infinite width NTKs. For networks with O output units (e.g. an O-class classifier), however, the eNTK on N inputs is of size $NO \times NO$, taking $O((NO)^2)$ memory and up to $O((NO)^3)$ computation. Most existing applications have therefore used one of a handful of approximations yielding $N \times N$ kernel matrices, saving orders of magnitude of computation, but with limited to no justification. We prove that one such approximation, which we call "sum of logits", converges to the true eNTK at initialization for any network with a wide final "readout" layer. Our experiments demonstrate the quality of this approximation for various uses across a range of settings.

In this paper, we introduce the problem of Matroid-Constrained Vertex Cover: given a graph with weights on the edges and a matroid imposed on the vertices, our problem is to choose a subset of vertices that is independent in the matroid, with the objective of maximizing the total weight of covered edges. This problem is a generalization of the much studied max $k$-vertex cover problem, in which the matroid is the simple uniform matroid, and it is also a special case of the problem of maximizing a monotone submodular function under a matroid constraint. First, we give a Fixed-Parameter Tractable Approximation Scheme (FPT-AS) when the given matroid is a partition matroid, a laminar matroid, or a transversal matroid. Precisely, if $k$ is the rank of the matroid, we obtain $(1 - \varepsilon)$ approximation using $(1/\varepsilon)^{O(k)}n^{O(1)}$ time for partition and laminar matroids and using $(1/\varepsilon+k)^{O(k)}n^{O(1)}$ time for transversal matroids. This extends a result of Manurangsi for uniform matroids [Manurangsi, 2018]. We also show that these ideas can be applied in the context of (single-pass) streaming algorithms. Besides, our FPT-AS introduces a new technique based on matroid union, which may be of independent interest in extremal combinatorics. In the second part, we consider general matroids. We propose a simple local search algorithm that guarantees $2/3 \approx 0.66$ approximation. For the more general problem where two matroids are imposed on the vertices and a feasible solution must be a common independent set, we show that a local search algorithm gives a $2/3 \cdot (1 - 1/(p+1))$ approximation in $n^{O(p)}$ time, for any integer $p$. We also provide some evidence to show that with the constraint of one or two matroids, the approximation ratio of $2/3$ is likely the best possible, using the currently known techniques of local search.

The Fokker-Planck equation describes the evolution of the probability density associated with a stochastic differential equation. As the dimension of the system grows, solving this partial differential equation (PDE) using conventional numerical methods becomes computationally prohibitive. Here, we introduce a fast, scalable, and interpretable method for solving the Fokker-Planck equation which is applicable in higher dimensions. This method approximates the solution as a linear combination of shape-morphing Gaussians with time-dependent means and covariances. These parameters evolve according to the method of reduced-order nonlinear solutions (RONS) which ensures that the approximate solution stays close to the true solution of the PDE for all times. As such, the proposed method approximates the transient dynamics as well as the equilibrium density, when the latter exists. Our approximate solutions can be viewed as an evolution on a finite-dimensional statistical manifold embedded in the space of probability densities. We show that the metric tensor in RONS coincides with the Fisher information matrix on this manifold. We also discuss the interpretation of our method as a shallow neural network with Gaussian activation functions and time-varying parameters. In contrast to existing deep learning methods, our method is interpretable, requires no training, and automatically ensures that the approximate solution satisfies all properties of a probability density.

When is heterogeneity in the composition of an autonomous robotic team beneficial and when is it detrimental? We investigate and answer this question in the context of a minimally viable model that examines the role of heterogeneous speeds in perimeter defense problems, where defenders share a total allocated speed budget. We consider two distinct problem settings and develop strategies based on dynamic programming and on local interaction rules. We present a theoretical analysis of both approaches and our results are extensively validated using simulations. Interestingly, our results demonstrate that the viability of heterogeneous teams depends on the amount of information available to the defenders. Moreover, our results suggest a universality property: across a wide range of problem parameters the optimal ratio of the speeds of the defenders remains nearly constant.

Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.

北京阿比特科技有限公司