We prove that every element of the special linear group can be represented as the product of at most six block unitriangular matrices, and that there exist matrices for which six products are necessary, independent of indexing. We present an analogous result for the general linear group. These results serve as general statements regarding the representational power of alternating linear updates. The factorizations and lower bounds of this work immediately imply tight estimates on the expressive power of linear affine coupling blocks in machine learning.
Most of the existing research on degrees-of-freedom (DoF) with imperfect channel state information at the transmitter (CSIT) assume the messages are private, which may not reflect reality as the two receivers can request the same content. To overcome this limitation, we consider hybrid private and common messages. We characterize the optimal DoF region for the two-user multiple-input multiple-output (MIMO) broadcast channel with hybrid messages and imperfect CSIT. We establish a three-step procedure for the DoF converse to exploit the utmost possible relaxation. For the DoF achievability, since the DoF region has a specific three-dimensional structure w.r.t. antenna configurations and CSIT qualities, by dividing CSIT qualities into cases, we check the existence of corner point solutions, and then design a hybrid messages-aware rate-splitting scheme to achieve them. Besides, we show that to achieve the strictly positive corner points, it is unnecessary to split the private messages into unicast and multicast parts because the allocated power for the multicast part should be zero. This implies that adding a common message can mitigate the rate-splitting complexity of private messages.
While research on the geometry of planar graphs has been active in the past decades, many properties of planar metrics remain mysterious. This paper studies a fundamental aspect of the planar graph geometry: covering planar metrics by a small collection of simpler metrics. Specifically, a \emph{tree cover} of a metric space $(X, \delta)$ is a collection of trees, so that every pair of points $u$ and $v$ in $X$ has a low-distortion path in at least one of the trees. The celebrated ``Dumbbell Theorem'' [ADMSS95] states that any low-dimensional Euclidean space admits a tree cover with $O(1)$ trees and distortion $1+\varepsilon$, for any fixed $\varepsilon \in (0,1)$. This result has found numerous algorithmic applications, and has been generalized to the wider family of doubling metrics [BFN19]. Does the same result hold for planar metrics? A positive answer would add another evidence to the well-observed connection between Euclidean/doubling metrics and planar metrics. In this work, we answer this fundamental question affirmatively. Specifically, we show that for any given fixed $\varepsilon \in (0,1)$, any planar metric can be covered by $O(1)$ trees with distortion $1+\varepsilon$. Our result for planar metrics follows from a rather general framework: First we reduce the problem to constructing tree covers with \emph{additive distortion}. Then we introduce the notion of \emph{shortcut partition}, and draw connection between shortcut partition and additive tree cover. Finally we prove the existence of shortcut partition for any planar metric, using new insights regarding the grid-like structure of planar graphs. [...]
Suitable representations of dynamical systems can simplify their analysis and control. On this line of thought, this paper considers the input decoupling problem for input-affine Lagrangian dynamics, namely the problem of finding a transformation of the generalized coordinates that decouples the input channels. We identify a class of systems for which this problem is solvable. Such systems are called collocated because the decoupling variables correspond to the coordinates on which the actuators directly perform work. Under mild conditions on the input matrix, a simple test is presented to verify whether a system is collocated or not. By exploiting power invariance, it is proven that a change of coordinates decouples the input channels if and only if the dynamics is collocated. We illustrate the theoretical results by considering several Lagrangian systems, focusing on underactuated mechanical systems, for which novel controllers that exploit input decoupling are designed.
Forward Gradients - the idea of using directional derivatives in forward differentiation mode - have recently been shown to be utilizable for neural network training while avoiding problems generally associated with backpropagation gradient computation, such as locking and memorization requirements. The cost is the requirement to guess the step direction, which is hard in high dimensions. While current solutions rely on weighted averages over isotropic guess vector distributions, we propose to strongly bias our gradient guesses in directions that are much more promising, such as feedback obtained from small, local auxiliary networks. For a standard computer vision neural network, we conduct a rigorous study systematically covering a variety of combinations of gradient targets and gradient guesses, including those previously presented in the literature. We find that using gradients obtained from a local loss as a candidate direction drastically improves on random noise in Forward Gradient methods.
Hybrid dynamical systems, i.e. systems that have both continuous and discrete states, are ubiquitous in engineering, but are difficult to work with due to their discontinuous transitions. For example, a robot leg is able to exert very little control effort while it is in the air compared to when it is on the ground. When the leg hits the ground, the penetrating velocity instantaneously collapses to zero. These instantaneous changes in dynamics and discontinuities (or jumps) in state make standard smooth tools for planning, estimation, control, and learning difficult for hybrid systems. One of the key tools for accounting for these jumps is called the saltation matrix. The saltation matrix is the sensitivity update when a hybrid jump occurs and has been used in a variety of fields including robotics, power circuits, and computational neuroscience. This paper presents an intuitive derivation of the saltation matrix and discusses what it captures, where it has been used in the past, how it is used for linear and quadratic forms, how it is computed for rigid body systems with unilateral constraints, and some of the structural properties of the saltation matrix in these cases.
The training of high-dimensional regression models on comparably sparse data is an important yet complicated topic, especially when there are many more model parameters than observations in the data. From a Bayesian perspective, inference in such cases can be achieved with the help of shrinkage prior distributions, at least for generalized linear models. However, real-world data usually possess multilevel structures, such as repeated measurements or natural groupings of individuals, which existing shrinkage priors are not built to deal with. We generalize and extend one of these priors, the R2D2 prior by Zhang et al. (2020), to linear multilevel models leading to what we call the R2D2M2 prior. The proposed prior enables both local and global shrinkage of the model parameters. It comes with interpretable hyperparameters, which we show to be intrinsically related to vital properties of the prior, such as rates of concentration around the origin, tail behavior, and amount of shrinkage the prior exerts. We offer guidelines on how to select the prior's hyperparameters by deriving shrinkage factors and measuring the effective number of non-zero model coefficients. Hence, the user can readily evaluate and interpret the amount of shrinkage implied by a specific choice of hyperparameters. Finally, we perform extensive experiments on simulated and real data, showing that our inference procedure for the prior is well calibrated, has desirable global and local regularization properties and enables the reliable and interpretable estimation of much more complex Bayesian multilevel models than was previously possible.
In this paper, we study the problems of detection and recovery of hidden submatrices with elevated means inside a large Gaussian random matrix. We consider two different structures for the planted submatrices. In the first model, the planted matrices are disjoint, and their row and column indices can be arbitrary. Inspired by scientific applications, the second model restricts the row and column indices to be consecutive. In the detection problem, under the null hypothesis, the observed matrix is a realization of independent and identically distributed standard normal entries. Under the alternative, there exists a set of hidden submatrices with elevated means inside the same standard normal matrix. Recovery refers to the task of locating the hidden submatrices. For both problems, and for both models, we characterize the statistical and computational barriers by deriving information-theoretic lower bounds, designing and analyzing algorithms matching those bounds, and proving computational lower bounds based on the low-degree polynomials conjecture. In particular, we show that the space of the model parameters (i.e., number of planted submatrices, their dimensions, and elevated mean) can be partitioned into three regions: the impossible regime, where all algorithms fail; the hard regime, where while detection or recovery are statistically possible, we give some evidence that polynomial-time algorithm do not exist; and finally the easy regime, where polynomial-time algorithms exist.
We approximate the d complex zeros of a univariate polynomial p(x) of a degree d or those zeros that lie in a fixed region of interest on the complex plane such as a disc or a square. Our divide and conquer algorithm of STOC 1995 supports solution of this problem in optimal Boolean time (up to a poly-logarithmic factor), that is, runs nearly as fast as one can access the coefficients of p with the precision necessary to support required accuracy of the output. That record complexity has not been matched by any other algorithm yet, but our root-finder of 1995 is quite involved and has never been implemented. We present alternative nearly optimal root-finders based on our novel variants of the classical subdivision iterations. Unlike our predecessor of 1995, we require randomization of Las Vegas type, allowing us to detect any output error at a dominated computational cost, but our new root-finders are much simpler to implement than their predecessor of 1995. According to the results of extensive test with standard test polynomials for their preliminary version, which incorporates only a part of our novel techniques, the new root-finders compete and for a large class of inputs significantly supersedes the package of root-finding subroutines MPSolve, which for decades has been user's choice package. Unlike our predecessor of 1995 and all known fast algorithms for the cited tasks of polynomial root-finding, our new algorithms can be also applied to a polynomial given by a black box oracle for its evaluation rather than by its coefficients. This makes our root-finders particularly efficient for polynomials p(x) that can be evaluated fast such as the Mandelbrot polynomials or those given by the sum of a small number of shifted monomials. Our algorithm can be readily extended to fast approximation of the eigenvalues of a matrix or a matrix polynomial.
In a recent article the authors showed that the radiative Transfer equations with multiple frequencies and scattering can be formulated as a nonlinear integral system. In the present article, the formulation is extended to handle reflective boundary conditions. The fixed point method to solve the system is shown to be monotone. The discretization is done with a $P^1$ Finite Element Method. The convolution integrals are precomputed at every vertices of the mesh and stored in compressed hierarchical matrices, using Partially Pivoted Adaptive Cross-Approximation. Then the fixed point iterations involve only matrix vector products. The method is $O(N\sqrt[3]{N}\ln N)$, with respect to the number of vertices, when everything is smooth. A numerical implementation is proposed and tested on two examples. As there are some analogies with ray tracing the programming is complex.
Since the invention of word2vec, the skip-gram model has significantly advanced the research of network embedding, such as the recent emergence of the DeepWalk, LINE, PTE, and node2vec approaches. In this work, we show that all of the aforementioned models with negative sampling can be unified into the matrix factorization framework with closed forms. Our analysis and proofs reveal that: (1) DeepWalk empirically produces a low-rank transformation of a network's normalized Laplacian matrix; (2) LINE, in theory, is a special case of DeepWalk when the size of vertices' context is set to one; (3) As an extension of LINE, PTE can be viewed as the joint factorization of multiple networks' Laplacians; (4) node2vec is factorizing a matrix related to the stationary distribution and transition probability tensor of a 2nd-order random walk. We further provide the theoretical connections between skip-gram based network embedding algorithms and the theory of graph Laplacian. Finally, we present the NetMF method as well as its approximation algorithm for computing network embedding. Our method offers significant improvements over DeepWalk and LINE for conventional network mining tasks. This work lays the theoretical foundation for skip-gram based network embedding methods, leading to a better understanding of latent network representation learning.