One approach to make progress on the symbolic determinant identity testing (SDIT) problem is to study the structure of singular matrix spaces. After settling the non-commutative rank problem (Garg-Gurvits-Oliveira-Wigderson, Found. Comput. Math. 2020; Ivanyos-Qiao-Subrahmanyam, Comput. Complex. 2018), a natural next step is to understand singular matrix spaces whose non-commutative rank is full. At present, examples of such matrix spaces are mostly sporadic, so it is desirable to discover them in a more systematic way. In this paper, we make a step towards this direction, by studying the family of matrix spaces that are closed under the commutator operation, that is matrix Lie algebras. On the one hand, we demonstrate that matrix Lie algebras over the complex number field give rise to singular matrix spaces with full non-commutative ranks. On the other hand, we show that SDIT of such spaces can be decided in deterministic polynomial time. Moreover, we give a characterization for the matrix Lie algebras to yield a matrix space possessing singularity certificates as studied by Lov'asz (B. Braz. Math. Soc., 1989) and Raz and Wigderson (Building Bridges II, 2019).
Any reasonable machine learning (ML) model should not only interpolate efficiently in between the training samples provided (in-distribution region), but also approach the extrapolative or out-of-distribution (OOD) region without being overconfident. Our experiment on human subjects justifies the aforementioned properties for human intelligence as well. Many state-of-the-art algorithms have tried to fix the overconfidence problem of ML models in the OOD region. However, in doing so, they have often impaired the in-distribution performance of the model. Our key insight is that ML models partition the feature space into polytopes and learn constant (random forests) or affine (ReLU networks) functions over those polytopes. This leads to the OOD overconfidence problem for the polytopes which lie in the training data boundary and extend to infinity. To resolve this issue, we propose kernel density methods that fit Gaussian kernel over the polytopes, which are learned using ML models. Specifically, we introduce two variants of kernel density polytopes: Kernel Density Forest (KDF) and Kernel Density Network (KDN) based on random forests and deep networks, respectively. Studies on various simulation settings show that both KDF and KDN achieve uniform confidence over the classes in the OOD region while maintaining good in-distribution accuracy compared to that of their respective parent models.
The noise transition matrix plays a central role in the problem of learning from noisy labels. Among many other reasons, a significant number of existing solutions rely on access to it. Estimating the transition matrix without using ground truth labels is a critical and challenging task. When label noise transition depends on each instance, the problem of identifying the instance-dependent noise transition matrix becomes substantially more challenging. Despite recent works proposing solutions for learning from instance-dependent noisy labels, we lack a unified understanding of when such a problem remains identifiable, and therefore learnable. This paper seeks to provide answers to a sequence of related questions: What are the primary factors that contribute to the identifiability of a noise transition matrix? Can we explain the observed empirical successes? When a problem is not identifiable, what can we do to make it so? We will relate our theoretical findings to the literature and hope to provide guidelines for developing effective solutions for battling instance-dependent label noise.
For $R\triangleq Mat_{m}(\mathbb{F})$, the ring of all the $m\times m$ matrices over the finite field $\mathbb{F}$ with $|\mathbb{F}|=q$, and the left $R$-module $A\triangleq Mat_{m,k}(\mathbb{F})$ with $m+1\leqslant k$, by deriving the minimal length of solutions of the related isometry equation, Dyshko has proved in \cite{3,4} that the minimal code length $n$ for $A^{n}$ not to satisfy the MacWilliams extension property with respect to Hamming weight is equal to $\prod_{i=1}^{m}(q^{i}+1)$. In this paper, using the M\"{o}bius functions, we derive the minimal length of nontrivial solutions of the isometry equation with respect to a finite lattice. For the finite vector space $\mathbf{H}\triangleq\prod_{i\in\Omega}\mathbb{F}^{k_{i}}$, a poset $\mathbf{P}=(\Omega,\preccurlyeq_{\mathbf{P}})$ and a map $\omega:\Omega\longrightarrow\mathbb{R}^{+}$ give rise to the $(\mathbf{P},\omega)$-weight on $\mathbf{H}$, which has been proposed by Hyun, Kim and Park in \cite{18}. For such a weight, we study the relations between the MacWilliams extension property and other properties including admitting MacWilliams identity, Fourier-reflexivity of involved partitions and Unique Decomposition Property defined for $(\mathbf{P},\omega)$. We give necessary and sufficient conditions for $\mathbf{H}$ to satisfy the MacWilliams extension property with the additional assumption that either $\mathbf{P}$ is hierarchical or $\omega$ is identically $1$, i.e., $(\mathbf{P},\omega)$-weight coincides with $\mathbf{P}$-weight, which further allow us to partly answer a conjecture proposed by Machado and Firer in \cite{22}.
Since the two seminal papers by Fisher (1915, 1921) were published, the test under a fixed value correlation coefficient null hypothesis for the bivariate normal distribution constitutes an important statistical problem. In the framework of asymptotic robust statistics, it remains being a topic of great interest to be investigated. For this and other tests, focused on paired correlated normal random samples, R\'enyi's pseudodistance estimators are proposed, their asymptotic distribution is established and an iterative algorithm is provided for their computation. From them the Wald-type test statistics are constructed for different problems of interest and their influence function is theoretically studied. For testing null correlation in different contexts, an extensive simulation study and two real data based examples support the robust properties of our proposal.
Factorization of matrices where the rank of the two factors diverges linearly with their sizes has many applications in diverse areas such as unsupervised representation learning, dictionary learning or sparse coding. We consider a setting where the two factors are generated from known component-wise independent prior distributions, and the statistician observes a (possibly noisy) component-wise function of their matrix product. In the limit where the dimensions of the matrices tend to infinity, but their ratios remain fixed, we expect to be able to derive closed form expressions for the optimal mean squared error on the estimation of the two factors. However, this remains a very involved mathematical and algorithmic problem. A related, but simpler, problem is extensive-rank matrix denoising, where one aims to reconstruct a matrix with extensive but usually small rank from noisy measurements. In this paper, we approach both these problems using high-temperature expansions at fixed order parameters. This allows to clarify how previous attempts at solving these problems failed at finding an asymptotically exact solution. We provide a systematic way to derive the corrections to these existing approximations, taking into account the structure of correlations particular to the problem. Finally, we illustrate our approach in detail on the case of extensive-rank matrix denoising. We compare our results with known optimal rotationally-invariant estimators, and show how exact asymptotic calculations of the minimal error can be performed using extensive-rank matrix integrals.
We consider problems that can be formulated as a task of finding an optimal triangulation of a graph w.r.t. some notion of optimality. We present algorithms parameterized by the size of a minimum edge clique cover ($cc$) to such problems. This parameterization occurs naturally in many problems in this setting, e.g., in the perfect phylogeny problem $cc$ is at most the number of taxa, in fractional hypertreewidth $cc$ is at most the number of hyperedges, and in treewidth of Bayesian networks $cc$ is at most the number of non-root nodes. We show that the number of minimal separators of graphs is at most $2^{cc}$, the number of potential maximal cliques is at most $3^{cc}$, and these objects can be listed in times $O^*(2^{cc})$ and $O^*(3^{cc})$, respectively, even when no edge clique cover is given as input; the $O^*(\cdot)$ notation omits factors polynomial in the input size. These enumeration algorithms imply $O^*(3^{cc})$ time algorithms for problems such as treewidth, weighted minimum fill-in, and feedback vertex set. For generalized and fractional hypertreewidth we give $O^*(4^m)$ time and $O^*(3^m)$ time algorithms, respectively, where $m$ is the number of hyperedges. When an edge clique cover of size $cc'$ is given as a part of the input we give $O^*(2^{cc'})$ time algorithms for treewidth, minimum fill-in, and chordal sandwich. This implies an $O^*(2^n)$ time algorithm for perfect phylogeny, where $n$ is the number of taxa. We also give polynomial space algorithms with time complexities $O^*(9^{cc'})$ and $O^*(9^{cc + O(\log^2 cc)})$ for problems in this framework.
This paper develops the theory of discrete Dirac reduction of discrete Lagrange-Dirac systems with an abelian symmetry group acting on the configuration space. We begin with the linear theory and, then, we extend it to the nonlinear setting using retraction compatible charts. We consider the reduction of both the discrete Dirac structure and the discrete Lagrange-Pontryagin principle, and show that they both lead to the same discrete Lagrange-Poincar\'e-Dirac equations. The coordinatization of the discrete reduced spaces relies on the notion of discrete connections on principal bundles. At last, we demonstrate the method obtained by applying it to a charged particle in a magnetic field, and to the double spherical pendulum.
Multiplying matrices is among the most fundamental and compute-intensive operations in machine learning. Consequently, there has been significant work on efficiently approximating matrix multiplies. We introduce a learning-based algorithm for this task that greatly outperforms existing methods. Experiments using hundreds of matrices from diverse domains show that it often runs $100\times$ faster than exact matrix products and $10\times$ faster than current approximate methods. In the common case that one matrix is known ahead of time, our method also has the interesting property that it requires zero multiply-adds. These results suggest that a mixture of hashing, averaging, and byte shuffling$-$the core operations of our method$-$could be a more promising building block for machine learning than the sparsified, factorized, and/or scalar quantized matrix products that have recently been the focus of substantial research and hardware investment.
We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ queries. Our algorithm works for any field $F$. This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix. Our algorithm is the first such algorithm which does not read a submatrix, and instead reads a carefully selected non-adaptive pattern of entries in rows and columns of $A$. We complement our algorithm with a matching query complexity lower bound for non-adaptive testers over any field. We also give tight bounds of $\widetilde{\Theta}(d^2)$ queries in the sensing model for which query access comes in the form of $\langle X_i, A\rangle:=tr(X_i^\top A)$; perhaps surprisingly these bounds do not depend on $\epsilon$. We next develop a novel property testing framework for testing numerical properties of a real-valued matrix $A$ more generally, which includes the stable rank, Schatten-$p$ norms, and SVD entropy. Specifically, we propose a bounded entry model, where $A$ is required to have entries bounded by $1$ in absolute value. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above.
Review-based recommender systems have gained noticeable ground in recent years. In addition to the rating scores, those systems are enriched with textual evaluations of items by the users. Neural language processing models, on the other hand, have already found application in recommender systems, mainly as a means of encoding user preference data, with the actual textual description of items serving only as side information. In this paper, a novel approach to incorporating the aforementioned models into the recommendation process is presented. Initially, a neural language processing model and more specifically the paragraph vector model is used to encode textual user reviews of variable length into feature vectors of fixed length. Subsequently this information is fused along with the rating scores in a probabilistic matrix factorization algorithm, based on maximum a-posteriori estimation. The resulting system, ParVecMF, is compared to a ratings' matrix factorization approach on a reference dataset. The obtained preliminary results on a set of two metrics are encouraging and may stimulate further research in this area.