亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We define a novel notion of ``non-backtracking'' matrix associated to any symmetric matrix, and we prove a ``Ihara-Bass'' type formula for it. We use this theory to prove new results on polynomial-time strong refutations of random constraint satisfaction problems with $k$ variables per constraints (k-CSPs). For a random k-CSP instance constructed out of a constraint that is satisfied by a $p$ fraction of assignments, if the instance contains $n$ variables and $n^{k/2} / \epsilon^2$ constraints, we can efficiently compute a certificate that the optimum satisfies at most a $p+O_k(\epsilon)$ fraction of constraints. Previously, this was known for even $k$, but for odd $k$ one needed $n^{k/2} (\log n)^{O(1)} / \epsilon^2$ random constraints to achieve the same conclusion. Although the improvement is only polylogarithmic, it overcomes a significant barrier to these types of results. Strong refutation results based on current approaches construct a certificate that a certain matrix associated to the k-CSP instance is quasirandom. Such certificate can come from a Feige-Ofek type argument, from an application of Grothendieck's inequality, or from a spectral bound obtained with a trace argument. The first two approaches require a union bound that cannot work when the number of constraints is $o(n^{\lceil k/2 \rceil})$ and the third one cannot work when the number of constraints is $o(n^{k/2} \sqrt{\log n})$. We further apply our techniques to obtain a new PTAS finding assignments for $k$-CSP instances with $n^{k/2} / \epsilon^2$ constraints in the semi-random settings where the constraints are random, but the sign patterns are adversarial.

相關內容

In this work, we consider a rational approximation of the exponential function to design an algorithm for computing matrix exponential in the Hermitian case. Using partial fraction decomposition, we obtain a parallelizable method, where the computation reduces to independent resolutions of linear systems. We analyze the effects of rounding errors on the accuracy of our algorithm. We complete this work with numerical tests showing the efficiency of our method and a comparison of its performances with Krylov algorithms.

It is well known that the Euler method for approximating the solutions of a random ordinary differential equation $\mathrm{d}X_t/\mathrm{d}t = f(t, X_t, Y_t)$ driven by a stochastic process $\{Y_t\}_t$ with $\theta$-H\"older sample paths is estimated to be of strong order $\theta$ with respect to the time step, provided $f=f(t, x, y)$ is sufficiently regular and with suitable bounds. Here, it is proved that, in many typical cases, further conditions on the noise can be exploited so that the strong convergence is actually of order 1, regardless of the H\"older regularity of the sample paths. This applies for instance to additive or multiplicative It\^o process noises (such as Wiener, Ornstein-Uhlenbeck, and geometric Brownian motion processes); to point-process noises (such as Poisson point processes and Hawkes self-exciting processes, which even have jump-type discontinuities); and to transport-type processes with sample paths of bounded variation. The result is based on a novel approach, estimating the global error as an iterated integral over both large and small mesh scales, and switching the order of integration to move the critical regularity to the large scale. The work is complemented with numerical simulations illustrating the strong order 1 convergence in those cases, and with an example with fractional Brownian motion noise with Hurst parameter $0 < H < 1/2$ for which the order of convergence is $H + 1/2$, hence lower than the attained order 1 in the examples above, but still higher than the order $H$ of convergence expected from previous works.

Modern advancements in large-scale machine learning would be impossible without the paradigm of data-parallel distributed computing. Since distributed computing with large-scale models imparts excessive pressure on communication channels, significant recent research has been directed toward co-designing communication compression strategies and training algorithms with the goal of reducing communication costs. While pure data parallelism allows better data scaling, it suffers from poor model scaling properties. Indeed, compute nodes are severely limited by memory constraints, preventing further increases in model size. For this reason, the latest achievements in training giant neural network models also rely on some form of model parallelism. In this work, we take a closer theoretical look at Independent Subnetwork Training (IST), which is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication, and provide a precise analysis of its optimization performance on a quadratic model.

The extremal theory of forbidden 0--1 matrices studies the asymptotic growth of the function $\mathrm{Ex}(P,n)$, which is the maximum weight of a matrix $A\in\{0,1\}^{n\times n}$ whose submatrices avoid a fixed pattern $P\in\{0,1\}^{k\times l}$. This theory has been wildly successful at resolving problems in combinatorics, discrete and computational geometry, structural graph theory, and the analysis of data structures, particularly corollaries of the dynamic optimality conjecture. All these applications use acyclic patterns, meaning that when $P$ is regarded as the adjacency matrix of a bipartite graph, the graph is acyclic. The biggest open problem in this area is to bound $\mathrm{Ex}(P,n)$ for acyclic $P$. Prior results have only ruled out the strict $O(n\log n)$ bound conjectured by Furedi and Hajnal. It is consistent with prior results that $\forall P. \mathrm{Ex}(P,n)\leq n\log^{1+o(1)} n$, and also consistent that $\forall \epsilon>0.\exists P. \mathrm{Ex}(P,n) \geq n^{2-\epsilon}$. In this paper we establish a stronger lower bound on the extremal functions of acyclic $P$. Specifically, we give a new construction of relatively dense 0--1 matrices with $\Theta(n(\log n/\log\log n)^t)$ 1s that avoid an acyclic $X_t$. Pach and Tardos have conjectured that this type of result is the best possible, i.e., no acyclic $P$ exists for which $\mathrm{Ex}(P,n)\geq n(\log n)^{\omega(1)}$.

In this paper, we investigate computational power of threshold circuits and other theoretical models of neural networks in terms of the following four complexity measures: size (the number of gates), depth, weight and energy. Here the energy complexity of a circuit measures sparsity of their computation, and is defined as the maximum number of gates outputting non-zero values taken over all the input assignments. As our main result, we prove that any threshold circuit $C$ of size $s$, depth $d$, energy $e$ and weight $w$ satisfies $\log (rk(M_C)) \le ed (\log s + \log w + \log n)$, where $rk(M_C)$ is the rank of the communication matrix $M_C$ of a $2n$-variable Boolean function that $C$ computes. Thus, such a threshold circuit $C$ is able to compute only a Boolean function of which communication matrix has rank bounded by a product of logarithmic factors of $s,w$ and linear factors of $d,e$. This implies an exponential lower bound on the size of even sublinear-depth threshold circuit if energy and weight are sufficiently small. For other models of neural networks such as a discretized ReLE circuits and decretized sigmoid circuits, we prove that a similar inequality also holds for a discretized circuit $C$: $rk(M_C) = O(ed(\log s + \log w + \log n)^3)$.

In this paper, we derive the analytical behavior of the limiting spectral distribution of non-central covariance matrices of the "general information-plus-noise" type, as studied in [14]. Through the equation defining its Stieltjes transform, it is shown that the limiting distribution has a continuous derivative away from zero, the derivative being analytic wherever it is positive, and we show the determination criterion for its support. We also extend the result in [14] to allow for all possible ratios of row to column of the underlying random matrix.

We present theory for general partial derivatives of matrix functions on the form $f(A(x))$ where $A(x)$ is a matrix path of several variables ($x=(x_1,\dots,x_j)$). Building on results by Mathias [SIAM J. Matrix Anal. Appl., 17 (1996), pp. 610-620] for the first order derivative, we develop a block upper triangular form for higher order partial derivatives. This block form is used to derive conditions for existence and a generalized Dalecki\u{i}-Kre\u{i}n formula for higher order derivatives. We show that certain specializations of this formula lead to classical formulas of quantum perturbation theory. We show how our results are related to earlier results for higher order Fr\'echet derivatives. Block forms of complex step approximations are introduced and we show how those are related to evaluation of derivatives through the upper triangular form. These relations are illustrated with numerical examples.

We present a new approach to approximate nearest-neighbor queries in fixed dimension under a variety of non-Euclidean distances. We are given a set $S$ of $n$ points in $\mathbb{R}^d$, an approximation parameter $\varepsilon > 0$, and a distance function that satisfies certain smoothness and growth-rate assumptions. The objective is to preprocess $S$ into a data structure so that for any query point $q$ in $\mathbb{R}^d$, it is possible to efficiently report any point of $S$ whose distance from $q$ is within a factor of $1+\varepsilon$ of the actual closest point. Prior to this work, the most efficient data structures for approximate nearest-neighbor searching in spaces of constant dimensionality applied only to the Euclidean metric. This paper overcomes this limitation through a method called convexification. For admissible distance functions, the proposed data structures answer queries in logarithmic time using $O(n \log (1 / \varepsilon) / \varepsilon^{d/2})$ space, nearly matching the best known bounds for the Euclidean metric. These results apply to both convex scaling distance functions (including the Mahalanobis distance and weighted Minkowski metrics) and Bregman divergences (including the Kullback-Leibler divergence and the Itakura-Saito distance).

We consider the problem of low-rank rectangular matrix completion in the regime where the matrix $M$ of size $n\times m$ is ``long", i.e., the aspect ratio $m/n$ diverges to infinity. Such matrices are of particular interest in the study of tensor completion, where they arise from the unfolding of a low-rank tensor. In the case where the sampling probability is $\frac{d}{\sqrt{mn}}$, we propose a new spectral algorithm for recovering the singular values and left singular vectors of the original matrix $M$ based on a variant of the standard non-backtracking operator of a suitably defined bipartite weighted random graph, which we call a \textit{non-backtracking wedge operator}. When $d$ is above a Kesten-Stigum-type sampling threshold, our algorithm recovers a correlated version of the singular value decomposition of $M$ with quantifiable error bounds. This is the first result in the regime of bounded $d$ for weak recovery and the first for weak consistency when $d\to\infty$ arbitrarily slowly without any polylog factors. As an application, for low-rank orthogonal $k$-tensor completion, we efficiently achieve weak recovery with sample size $O(n^{k/2})$, and weak consistency with sample size $\omega(n^{k/2})$.

In 1954, Alston S. Householder published Principles of Numerical Analysis, one of the first modern treatments on matrix decomposition that favored a (block) LU decomposition-the factorization of a matrix into the product of lower and upper triangular matrices. And now, matrix decomposition has become a core technology in machine learning, largely due to the development of the back propagation algorithm in fitting a neural network. The sole aim of this survey is to give a self-contained introduction to concepts and mathematical tools in numerical linear algebra and matrix analysis in order to seamlessly introduce matrix decomposition techniques and their applications in subsequent sections. However, we clearly realize our inability to cover all the useful and interesting results concerning matrix decomposition and given the paucity of scope to present this discussion, e.g., the separated analysis of the Euclidean space, Hermitian space, Hilbert space, and things in the complex domain. We refer the reader to literature in the field of linear algebra for a more detailed introduction to the related fields.

北京阿比特科技有限公司