Multiple Tensor-Times-Matrix (Multi-TTM) is a key computation in algorithms for computing and operating with the Tucker tensor decomposition, which is frequently used in multidimensional data analysis. We establish communication lower bounds that determine how much data movement is required to perform the Multi-TTM computation in parallel. The crux of the proof relies on analytically solving a constrained, nonlinear optimization problem. We also present a parallel algorithm to perform this computation that organizes the processors into a logical grid with twice as many modes as the input tensor. We show that with correct choices of grid dimensions, the communication cost of the algorithm attains the lower bounds and is therefore communication optimal. Finally, we show that our algorithm can significantly reduce communication compared to the straightforward approach of expressing the computation as a sequence of tensor-times-matrix operations.
We introduce a new domain decomposition strategy for time harmonic Maxwell's equations that is valid in the case of automatically generated subdomain partitions with possible presence of cross-points. The convergence of the algorithm is guaranteed and we present a complete analysis of the matrix form of the method. The method involves transmission matrices responsible for imposing coupling between subdomains. We discuss the choice of such matrices, their construction and the impact of this choice on the convergence of the domain decomposition algorithm. Numerical results and algorithms are provided.
Bundle Adjustment (BA) refers to the problem of simultaneous determination of sensor poses and scene geometry, which is a fundamental problem in robot vision. This paper presents an efficient and consistent bundle adjustment method for lidar sensors. The method employs edge and plane features to represent the scene geometry, and directly minimizes the natural Euclidean distance from each raw point to the respective geometry feature. A nice property of this formulation is that the geometry features can be analytically solved, drastically reducing the dimension of the numerical optimization. To represent and solve the resultant optimization problem more efficiently, this paper then proposes a novel concept {\it point clusters}, which encodes all raw points associated to the same feature by a compact set of parameters, the {\it point cluster coordinates}. We derive the closed-form derivatives, up to the second order, of the BA optimization based on the point cluster coordinates and show their theoretical properties such as the null spaces and sparsity. Based on these theoretical results, this paper develops an efficient second-order BA solver. Besides estimating the lidar poses, the solver also exploits the second order information to estimate the pose uncertainty caused by measurement noises, leading to consistent estimates of lidar poses. Moreover, thanks to the use of point cluster, the developed solver fundamentally avoids the enumeration of each raw point (which is very time-consuming due to the large number) in all steps of the optimization: cost evaluation, derivatives evaluation and uncertainty evaluation. The implementation of our method is open sourced to benefit the robotics community and beyond.
Sparse code multiple access (SCMA) is the most concerning scheme among non-orthogonal multiple access (NOMA) technologies for 5G wireless communication new interface. Another efficient technique in 5G aimed to improve spectral efficiency for local communications is device-to-device (D2D) communications. Therefore, we utilize the SCMA cellular network coexisting with D2D communications for the connection demand of the Internet of things (IOT), and improve the system sum rate performance of the hybrid network. We first derive the information-theoretic expression of the capacity for all users and find the capacity bound of cellular users based on the mutual interference between cellular users and D2D users. Then we consider the power optimization problem for the cellular users and D2D users jointly to maximize the system sum rate. To tackle the non-convex optimization problem, we propose a geometric programming (GP) based iterative power allocation algorithm. Simulation results demonstrate that the proposed algorithm converges fast and well improves the sum rate performance.
Node clustering is a powerful tool in the analysis of networks. We introduce a graph neural network framework to obtain node embeddings for directed networks in a self-supervised manner, including a novel probabilistic imbalance loss, which can be used for network clustering. Here, we propose directed flow imbalance measures, which are tightly related to directionality, to reveal clusters in the network even when there is no density difference between clusters. In contrast to standard approaches in the literature, in this paper, directionality is not treated as a nuisance, but rather contains the main signal. DIGRAC optimizes directed flow imbalance for clustering without requiring label supervision, unlike existing graph neural network methods, and can naturally incorporate node features, unlike existing spectral methods. Extensive experimental results on synthetic data, in the form of directed stochastic block models, and real-world data at different scales, demonstrate that our method, based on flow imbalance, attains state-of-the-art results on directed graph clustering when compared against 10 state-of-the-art methods from the literature, for a wide range of noise and sparsity levels, graph structures and topologies, and even outperforms supervised methods.
We study two problems of private matrix multiplication, over a distributed computing system consisting of a master node, and multiple servers who collectively store a family of public matrices using Maximum-Distance-Separable (MDS) codes. In the first problem of Private and Secure Matrix Multiplication from Colluding servers (MDS-C-PSMM), the master intends to compute the product of its confidential matrix $\mathbf{A}$ with a target matrix stored on the servers, without revealing any information about $\mathbf{A}$ and the index of target matrix to some colluding servers. In the second problem of Fully Private Matrix Multiplication from Colluding servers (MDS-C-FPMM), the matrix $\mathbf{A}$ is also selected from another family of public matrices stored at the servers in MDS form. In this case, the indices of the two target matrices should both be kept private from colluding servers. We develop novel strategies for MDS-C-PSMM and MDS-C-FPMM, which simultaneously guarantee information-theoretic data/index privacy and computation correctness. The key ingredient is a careful design of secret sharings of the matrix $\mathbf{A}$ and the private indices, which are tailored to matrix multiplication task and MDS storage structure, such that the computation results from the servers can be viewed as evaluations of a polynomial at distinct points, from which the intended result can be obtained through polynomial interpolation. We compare the proposed MDS-C-PSMM strategy with a previous MDS-PSMM strategy with a weaker privacy guarantee (non-colluding servers), and demonstrate substantial improvements over the previous strategy in terms of communication and computation performance.
The CP decomposition for high dimensional non-orthogonal spiked tensors is an important problem with broad applications across many disciplines. However, previous works with theoretical guarantee typically assume restrictive incoherence conditions on the basis vectors for the CP components. In this paper, we propose new computationally efficient composite PCA and concurrent orthogonalization algorithms for tensor CP decomposition with theoretical guarantees under mild incoherence conditions. The composite PCA applies the principal component or singular value decompositions twice, first to a matrix unfolding of the tensor data to obtain singular vectors and then to the matrix folding of the singular vectors obtained in the first step. It can be used as an initialization for any iterative optimization schemes for the tensor CP decomposition. The concurrent orthogonalization algorithm iteratively estimates the basis vector in each mode of the tensor by simultaneously applying projections to the orthogonal complements of the spaces generated by other CP components in other modes. It is designed to improve the alternating least squares estimator and other forms of the high order orthogonal iteration for tensors with low or moderately high CP ranks, and it is guaranteed to converge rapidly when the error of any given initial estimator is bounded by a small constant. Our theoretical investigation provides estimation accuracy and convergence rates for the two proposed algorithms. Both proposed algorithms are applicable to deterministic tensor, its noisy version, and the order-$2K$ covariance tensor of order-$K$ tensor data in a factor model with uncorrelated factors. Our implementations on synthetic data demonstrate significant practical superiority of our approach over existing methods.
Classical results in general equilibrium theory assume divisible goods and convex preferences of market participants. In many real-world markets, participants have non-convex preferences and the allocation problem needs to consider complex constraints. Electricity markets are a prime example. In such markets, Walrasian prices are impossible, and heuristic pricing rules based on the dual of the relaxed allocation problem are used in practice. However, these rules have been criticized for high side-payments and inadequate congestion signals. We show that existing pricing heuristics optimize specific design goals that can be conflicting. The trade-offs can be substantial, and we establish that the design of pricing rules is fundamentally a multi-objective optimization problem addressing different incentives. In addition to traditional multi-objective optimization techniques using weighing of individual objectives, we introduce a novel parameter-free pricing rule that minimizes incentives for market participants to deviate locally. Our findings show how the new pricing rule capitalizes on the upsides of existing pricing rules under scrutiny today. It leads to prices that incur low make-whole payments while providing adequate congestion signals and low lost opportunity costs. Our suggested pricing rule does not require weighing of objectives, it is computationally scalable, and balances trade-offs in a principled manner, addressing an important policy issue in electricity markets.
This paper considers a natural fault-tolerant shortest paths problem: for some constant integer $f$, given a directed weighted graph with no negative cycles and two fixed vertices $s$ and $t$, compute (either explicitly or implicitly) for every tuple of $f$ edges, the distance from $s$ to $t$ if these edges fail. We call this problem $f$-Fault Replacement Paths ($f$FRP). We first present an $\tilde{O}(n^3)$ time algorithm for $2$FRP in $n$-vertex directed graphs with arbitrary edge weights and no negative cycles. As $2$FRP is a generalization of the well-studied Replacement Paths problem (RP) that asks for the distances between $s$ and $t$ for any single edge failure, $2$FRP is at least as hard as RP. Since RP in graphs with arbitrary weights is equivalent in a fine-grained sense to All-Pairs Shortest Paths (APSP) [Vassilevska Williams and Williams FOCS'10, J.~ACM'18], $2$FRP is at least as hard as APSP, and thus a substantially subcubic time algorithm in the number of vertices for $2$FRP would be a breakthrough. Therefore, our algorithm in $\tilde{O}(n^3)$ time is conditionally nearly optimal. Our algorithm implies an $\tilde{O}(n^{f+1})$ time algorithm for the $f$FRP problem, giving the first improvement over the straightforward $O(n^{f+2})$ time algorithm. Then we focus on the restriction of $2$FRP to graphs with small integer weights bounded by $M$ in absolute values. Using fast rectangular matrix multiplication, we obtain a randomized algorithm that runs in $\tilde{O}(M^{2/3}n^{2.9153})$ time. This implies an improvement over our $\tilde{O}(n^{f+1})$ time arbitrary weight algorithm for all $f>1$. We also present a data structure variant of the algorithm that can trade off pre-processing and query time. In addition to the algebraic algorithms, we also give an $n^{8/3-o(1)}$ conditional lower bound for combinatorial $2$FRP algorithms in directed unweighted graphs.
In 1954, Alston S. Householder published Principles of Numerical Analysis, one of the first modern treatments on matrix decomposition that favored a (block) LU decomposition-the factorization of a matrix into the product of lower and upper triangular matrices. And now, matrix decomposition has become a core technology in machine learning, largely due to the development of the back propagation algorithm in fitting a neural network. The sole aim of this survey is to give a self-contained introduction to concepts and mathematical tools in numerical linear algebra and matrix analysis in order to seamlessly introduce matrix decomposition techniques and their applications in subsequent sections. However, we clearly realize our inability to cover all the useful and interesting results concerning matrix decomposition and given the paucity of scope to present this discussion, e.g., the separated analysis of the Euclidean space, Hermitian space, Hilbert space, and things in the complex domain. We refer the reader to literature in the field of linear algebra for a more detailed introduction to the related fields.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.