亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The pseudoinverse of a matrix, a generalized notion of the inverse, is of fundamental importance in linear algebra. However, there does not exist a closed form representation of the pseudoinverse, which can be straightforwardly computed. Therefore, an algorithmic computation is necessary. An algorithmic computation can only be evaluated by also considering the underlying hardware, typically digital hardware, which is responsible for performing the actual computations step by step. In this paper, we analyze if and to what degree the pseudoinverse actually can be computed on digital hardware platforms modeled as Turing machines. For this, we utilize the notion of an effective algorithm which describes a provably correct computation: upon an input of any error parameter, the algorithm provides an approximation within the given error bound with respect to the unknown solution. We prove that an effective algorithm for computing the pseudoinverse of any matrix can not exist on a Turing machine, although provably correct algorithms do exist for specific classes of matrices. Even more, our results introduce a lower bound on the accuracy that can be obtained algorithmically when computing the pseudoinverse on Turing machines.

相關內容

Given two strings of length $n$ over alphabet $\Sigma$, and an upper bound $k$ on their edit distance, the algorithm of Myers (Algorithmica'86) and Landau and Vishkin (JCSS'88) computes the unweighted string edit distance in $\mathcal{O}(n+k^2)$ time. Till date, it remains the fastest algorithm for exact edit distance computation, and it is optimal under the Strong Exponential Hypothesis (STOC'15). Over the years, this result has inspired many developments, including fast approximation algorithms for string edit distance as well as similar $\tilde{\mathcal{O}}(n+$poly$(k))$-time algorithms for generalizations to tree and Dyck edit distances. Surprisingly, all these results hold only for unweighted instances. While unweighted edit distance is theoretically fundamental, almost all real-world applications require weighted edit distance, where different weights are assigned to different edit operations and may vary with the characters being edited. Given a weight function $w: \Sigma \cup \{\varepsilon \}\times \Sigma \cup \{\varepsilon \} \rightarrow \mathbb{R}_{\ge 0}$ (such that $w(a,a)=0$ and $w(a,b)\ge 1$ for all $a,b\in \Sigma \cup \{\varepsilon\}$ with $a\ne b$), the goal is to find an alignment that minimizes the total weight of edits. Except for the vanilla $\mathcal{O}(n^2)$-time dynamic-programming algorithm and its almost trivial $\mathcal{O}(nk)$-time implementation, none of the aforementioned developments on the unweighted edit distance apply to the weighted variant. In this paper, we propose the first $\mathcal{O}(n+$poly$(k))$-time algorithm that computes weighted string edit distance exactly, thus bridging a fundamental gap between our understanding of unweighted and weighted edit distance. We then generalize this result to weighted tree and Dyck edit distances, which lead to a deterministic algorithm that improves upon the previous work for unweighted tree edit distance.

Reliability of machine learning evaluation -- the consistency of observed evaluation scores across replicated model training runs -- is affected by several sources of nondeterminism which can be regarded as measurement noise. Current tendencies to remove noise in order to enforce reproducibility of research results neglect inherent nondeterminism at the implementation level and disregard crucial interaction effects between algorithmic noise factors and data properties. This limits the scope of conclusions that can be drawn from such experiments. Instead of removing noise, we propose to incorporate several sources of variance, including their interaction with data properties, into an analysis of significance and reliability of machine learning evaluation, with the aim to draw inferences beyond particular instances of trained models. We show how to use linear mixed effects models (LMEMs) to analyze performance evaluation scores, and to conduct statistical inference with a generalized likelihood ratio test (GLRT). This allows us to incorporate arbitrary sources of noise like meta-parameter variations into statistical significance testing, and to assess performance differences conditional on data properties. Furthermore, a variance component analysis (VCA) enables the analysis of the contribution of noise sources to overall variance and the computation of a reliability coefficient by the ratio of substantial to total variance.

The graph Laplacian is a fundamental object in the analysis of and optimization on graphs. This operator can be extended to a simplicial complex $K$ and therefore offers a way to perform ``signal processing" on $p$-(co)chains of $K$. Recently, the concept of persistent Laplacian was proposed and studied for a pair of simplicial complexes $K\hookrightarrow L$ connected by an inclusion relation, further broadening the use of Laplace-based operators. In this paper, we expand the scope of the persistent Laplacian by generalizing it to a pair of simplicial complexes connected by a simplicial map $f: K \to L$. Such simplicial map setting arises frequently, e.g., when relating a coarsened simplicial representation with an original representation, or the case when the two simplicial complexes are spanned by different point sets i.e. cases in which it does not hold that $K\subset L$. However, the simplicial map setting is more challenging than the inclusion setting since the underlying algebraic structure is more complicated. We present a natural generalization of the persistent Laplacian to the simplicial setting. To shed insight on the structure behind it, as well as to develop an algorithm to compute it, we exploit the relationship between the persistent Laplacian and the Schur complement of a matrix. A critical step is to view the Schur complement as a functorial way of restricting a self-adjoint PSD operator to a given subspace. As a consequence, we prove that persistent Betti numbers of a simplicial map can be recovered by persistent Laplacians. We then propose an algorithm for finding the matrix representations of persistent Laplacians which in turn yields a new algorithm for computing persistent Betti numbers of a simplicial map. Finally, we study the persistent Laplacian on simplicial towers under simplicial maps and establish monotonicity results for their eigenvalues.

We present an efficient quantum algorithm to simulate nonlinear differential equations with polynomial vector fields of arbitrary degree on quantum platforms. Models of physical systems that are governed by ordinary differential equations (ODEs) or partial differential equation (PDEs) can be challenging to solve on classical computers due to high dimensionality, stiffness, nonlinearities, and sensitive dependence to initial conditions. For sparse $n$-dimensional linear ODEs, quantum algorithms have been developed which can produce a quantum state proportional to the solution in poly(log(nx)) time using the quantum linear systems algorithm (QLSA). Recently, this framework was extended to systems of nonlinear ODEs with quadratic polynomial vector fields by applying Carleman linearization that enables the embedding of the quadratic system into an approximate linear form. A detailed complexity analysis was conducted which showed significant computational advantage under certain conditions. We present an extension of this algorithm to deal with systems of nonlinear ODEs with $k$-th degree polynomial vector fields for arbitrary (finite) values of $k$. The steps involve: 1) mapping the $k$-th degree polynomial ODE to a higher dimensional quadratic polynomial ODE; 2) applying Carleman linearization to transform the quadratic ODE to an infinite-dimensional system of linear ODEs; 3) truncating and discretizing the linear ODE and solving using the forward Euler method and QLSA. Alternatively, one could apply Carleman linearization directly to the $k$-th degree polynomial ODE, resulting in a system of infinite-dimensional linear ODEs, and then apply step 3. This solution route can be computationally more efficient. We present detailed complexity analysis of the proposed algorithms, prove polynomial scaling of runtime on $k$ and demonstrate the framework on an example.

Despite the state-of-the-art performance of deep convolutional neural networks, they are susceptible to bias and malfunction in unseen situations. The complex computation behind their reasoning is not sufficiently human-understandable to develop trust. External explainer methods have tried to interpret the network decisions in a human-understandable way, but they are accused of fallacies due to their assumptions and simplifications. On the other side, the inherent self-interpretability of models, while being more robust to the mentioned fallacies, cannot be applied to the already trained models. In this work, we propose a new attention-based pooling layer, called Local Attention Pooling (LAP), that accomplishes self-interpretability and the possibility for knowledge injection while improving the model's performance. Moreover, several weakly-supervised knowledge injection methodologies are provided to enhance the process of training. We verified our claims by evaluating several LAP-extended models on three different datasets, including Imagenet. The proposed framework offers more valid human-understandable and more faithful-to-the-model interpretations than the commonly used white-box explainer methods.

Gradient coding schemes effectively mitigate full stragglers in distributed learning by introducing identical redundancy in coded local partial derivatives corresponding to all model parameters. However, they are no longer effective for partial stragglers as they cannot utilize incomplete computation results from partial stragglers. This paper aims to design a new gradient coding scheme for mitigating partial stragglers in distributed learning. Specifically, we consider a distributed system consisting of one master and N workers, characterized by a general partial straggler model and focuses on solving a general large-scale machine learning problem with L model parameters using gradient coding. First, we propose a coordinate gradient coding scheme with L coding parameters representing L possibly different diversities for the L coordinates, which generates most gradient coding schemes. Then, we consider the minimization of the expected overall runtime and the maximization of the completion probability with respect to the L coding parameters for coordinates, which are challenging discrete optimization problems. To reduce computational complexity, we first transform each to an equivalent but much simpler discrete problem with N\llL variables representing the partition of the L coordinates into N blocks, each with identical redundancy. This indicates an equivalent but more easily implemented block coordinate gradient coding scheme with N coding parameters for blocks. Then, we adopt continuous relaxation to further reduce computational complexity. For the resulting minimization of expected overall runtime, we develop an iterative algorithm of computational complexity O(N^2) to obtain an optimal solution and derive two closed-form approximate solutions both with computational complexity O(N). For the resultant maximization of the completion probability, we develop an iterative algorithm of...

The interconnected smart devices and industrial internet of things devices require low-latency communication to fulfill control objectives despite limited resources. In essence, such devices have a time-critical nature but also require a highly accurate data input based on its significance. In this paper, we investigate various coordinated and distributed semantic scheduling schemes with a data significance perspective. In particular, novel algorithms are proposed to analyze the benefit of such schemes for the significance in terms of estimation accuracy. Then, we derive the bounds of the achievable estimation accuracy. Our numerical results showcase the superiority of semantic scheduling policies that adopt an integrated control and communication strategy. In essence, such policies can reduce the weighted sum of mean squared errors compared to traditional policies.

A $0,1$ matrix is said to be regular if all of its rows and columns have the same number of ones. We prove that for infinitely many integers $k$, there exists a square regular $0,1$ matrix with binary rank $k$, such that the Boolean rank of its complement is $k^{\widetilde{\Omega}(\log k)}$. Equivalently, the ones in the matrix can be partitioned into $k$ combinatorial rectangles, whereas the number of rectangles needed for any cover of its zeros is $k^{\widetilde{\Omega}(\log k)}$. This settles, in a strong form, a question of Pullman (Linear Algebra Appl., 1988) and a conjecture of Hefner, Henson, Lundgren, and Maybee (Congr. Numer., 1990). The result can be viewed as a regular analogue of a recent result of Balodis, Ben-David, G\"{o}\"{o}s, Jain, and Kothari (FOCS, 2021), motivated by the clique vs. independent set problem in communication complexity and by the (disproved) Alon-Saks-Seymour conjecture in graph theory. As an application of the produced regular matrices, we obtain regular counterexamples to the Alon-Saks-Seymour conjecture and prove that for infinitely many integers $k$, there exists a regular graph with biclique partition number $k$ and chromatic number $k^{\widetilde{\Omega}(\log k)}$.

Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. Current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for keeping the entire graph and the embedding of each node in memory. In this paper, we propose Cluster-GCN, a novel GCN algorithm that is suitable for SGD-based training by exploiting the graph clustering structure. Cluster-GCN works as the following: at each step, it samples a block of nodes that associate with a dense subgraph identified by a graph clustering algorithm, and restricts the neighborhood search within this subgraph. This simple but effective strategy leads to significantly improved memory and computational efficiency while being able to achieve comparable test accuracy with previous algorithms. To test the scalability of our algorithm, we create a new Amazon2M data with 2 million nodes and 61 million edges which is more than 5 times larger than the previous largest publicly available dataset (Reddit). For training a 3-layer GCN on this data, Cluster-GCN is faster than the previous state-of-the-art VR-GCN (1523 seconds vs 1961 seconds) and using much less memory (2.2GB vs 11.2GB). Furthermore, for training 4 layer GCN on this data, our algorithm can finish in around 36 minutes while all the existing GCN training algorithms fail to train due to the out-of-memory issue. Furthermore, Cluster-GCN allows us to train much deeper GCN without much time and memory overhead, which leads to improved prediction accuracy---using a 5-layer Cluster-GCN, we achieve state-of-the-art test F1 score 99.36 on the PPI dataset, while the previous best result was 98.71 by [16]. Our codes are publicly available at //github.com/google-research/google-research/tree/master/cluster_gcn.

Graph Convolutional Networks (GCNs) and their variants have experienced significant attention and have become the de facto methods for learning graph representations. GCNs derive inspiration primarily from recent deep learning approaches, and as a result, may inherit unnecessary complexity and redundant computation. In this paper, we reduce this excess complexity through successively removing nonlinearities and collapsing weight matrices between consecutive layers. We theoretically analyze the resulting linear model and show that it corresponds to a fixed low-pass filter followed by a linear classifier. Notably, our experimental evaluation demonstrates that these simplifications do not negatively impact accuracy in many downstream applications. Moreover, the resulting model scales to larger datasets, is naturally interpretable, and yields up to two orders of magnitude speedup over FastGCN.

北京阿比特科技有限公司