We present a thorough study of the theoretical properties and devise efficient algorithms for the \emph{persistent Laplacian}, an extension of the standard combinatorial Laplacian to the setting of pairs (or, in more generality, sequences) of simplicial complexes $K \hookrightarrow L$, which was independently introduced by Lieutier et al. and by Wang et al. In particular, in analogy with the non-persistent case, we first prove that the nullity of the $q$-th persistent Laplacian $\Delta_q^{K,L}$ equals the $q$-th persistent Betti number of the inclusion $(K \hookrightarrow L)$. We then present an initial algorithm for finding a matrix representation of $\Delta_q^{K,L}$, which itself helps interpret the persistent Laplacian. We exhibit a novel relationship between the persistent Laplacian and the notion of Schur complement of a matrix which has several important implications. In the graph case, it both uncovers a link with the notion of effective resistance and leads to a persistent version of the Cheeger inequality. This relationship also yields an additional, very simple algorithm for finding (a matrix representation of) the $q$-th persistent Laplacian which in turn leads to a novel and fundamentally different algorithm for computing the $q$-th persistent Betti number for a pair $(K,L)$ which can be significantly more efficient than standard algorithms. Finally, we study persistent Laplacians for simplicial filtrations and present novel stability results for their eigenvalues. Our work brings methods from spectral graph theory, circuit theory, and persistent homology together with a topological view of the combinatorial Laplacian on simplicial complexes.
Boolean Matrix Factorization (BMF) aims to find an approximation of a given binary matrix as the Boolean product of two low-rank binary matrices. Binary data is ubiquitous in many fields, and representing data by binary matrices is common in medicine, natural language processing, bioinformatics, computer graphics, among many others. Unfortunately, BMF is computationally hard and heuristic algorithms are used to compute Boolean factorizations. Very recently, the theoretical breakthrough was obtained independently by two research groups. Ban et al. (SODA 2019) and Fomin et al. (Trans. Algorithms 2020) show that BMF admits an efficient polynomial-time approximation scheme (EPTAS). However, despite the theoretical importance, the high double-exponential dependence of the running times from the rank makes these algorithms unimplementable in practice. The primary research question motivating our work is whether the theoretical advances on BMF could lead to practical algorithms. The main conceptional contribution of our work is the following. While EPTAS for BMF is a purely theoretical advance, the general approach behind these algorithms could serve as the basis in designing better heuristics. We also use this strategy to develop new algorithms for related $\mathbb{F}_p$-Matrix Factorization. Here, given a matrix $A$ over a finite field GF($p$) where $p$ is a prime, and an integer $r$, our objective is to find a matrix $B$ over the same field with GF($p$)-rank at most $r$ minimizing some norm of $A-B$. Our empirical research on synthetic and real-world data demonstrates the advantage of the new algorithms over previous works on BMF and $\mathbb{F}_p$-Matrix Factorization.
We consider a causal inference model in which individuals interact in a social network and they may not comply with the assigned treatments. Estimating causal parameters is challenging in the presence of network interference of unknown form, as each individual may be influenced by both close individuals and distant ones in complex ways. Noncompliance with treatment assignment further complicates this problem, and prior methods dealing with network spillovers but disregarding the noncompliance issue may underestimate the effect of the treatment receipt on the outcome. To estimate meaningful causal parameters, we introduce a new concept of exposure mapping, which summarizes potentially complicated spillover effects into a fixed dimensional statistic of instrumental variables. We investigate identification conditions for the intention-to-treat effect and the average causal effect for compliers, while explicitly considering the possibility of misspecification of exposure mapping. Based on our identification results, we develop nonparametric estimation procedures via inverse probability weighting. Their asymptotic properties, including consistency and asymptotic normality, are investigated using an approximate neighborhood interference framework, which is convenient for dealing with unknown forms of spillovers between individuals. For an empirical illustration, we apply our method to experimental data on the anti-conflict intervention school program.
Bi-quadratic programming over unit spheres is a fundamental problem in quantum mechanics introduced by pioneer work of Einstein, Schr\"odinger, and others. It has been shown to be NP-hard; so it must be solve by efficient heuristic algorithms such as the block improvement method (BIM). This paper focuses on the maximization of bi-quadratic forms, which leads to a rank-one approximation problem that is equivalent to computing the M-spectral radius and its corresponding eigenvectors. Specifically, we provide a tight upper bound of the M-spectral radius for nonnegative fourth-order partially symmetric (PS) tensors, which can be considered as an approximation of the M-spectral radius. Furthermore, we showed that the proposed upper bound can be obtained more efficiently, if the nonnegative fourth-order PS-tensors is a member of certain monoid semigroups. Furthermore, as an extension of the proposed upper bound, we derive the exact solutions of the M-spectral radius and its corresponding M-eigenvectors for certain classes of fourth-order PS-tensors. Lastly, as an application of the proposed bound, we obtain a practically testable sufficient condition for nonsingular elasticity M-tensors with strong ellipticity condition. We conduct several numerical experiments to demonstrate the utility of the proposed results. The results show that: (a) our proposed method can attain a tight upper bound of the M-spectral radius with little computational burden, and (b) such tight and efficient upper bounds greatly enhance the convergence speed of the BIM-algorithm, allowing it to be applicable for large-scale problems in applications.
Are intelligent machines really intelligent? Is the underlying philosophical concept of intelligence satisfactory for describing how the present systems work? Is understanding a necessary and sufficient condition for intelligence? If a machine could understand, should we attribute subjectivity to it? This paper addresses the problem of deciding whether the so-called "intelligent machines" are capable of understanding, instead of merely processing signs. It deals with the relationship between syntaxis and semantics. The main thesis concerns the inevitability of semantics for any discussion about the possibility of building conscious machines, condensed into the following two tenets: "If a machine is capable of understanding (in the strong sense), then it must be capable of combining rules and intuitions"; "If semantics cannot be reduced to syntaxis, then a machine cannot understand." Our conclusion states that it is not necessary to attribute understanding to a machine in order to explain its exhibited "intelligent" behavior; a merely syntactic and mechanistic approach to intelligence as a task-solving tool suffices to justify the range of operations that it can display in the current state of technological development.
In the pursuit of explaining implicit regularization in deep learning, prominent focus was given to matrix and tensor factorizations, which correspond to simplified neural networks. It was shown that these models exhibit an implicit tendency towards low matrix and tensor ranks, respectively. Drawing closer to practical deep learning, the current paper theoretically analyzes the implicit regularization in hierarchical tensor factorization, a model equivalent to certain deep convolutional neural networks. Through a dynamical systems lens, we overcome challenges associated with hierarchy, and establish implicit regularization towards low hierarchical tensor rank. This translates to an implicit regularization towards locality for the associated convolutional networks. Inspired by our theory, we design explicit regularization discouraging locality, and demonstrate its ability to improve the performance of modern convolutional networks on non-local tasks, in defiance of conventional wisdom by which architectural changes are needed. Our work highlights the potential of enhancing neural networks via theoretical analysis of their implicit regularization.
We introduce a neural implicit framework that exploits the differentiable properties of neural networks and the discrete geometry of point-sampled surfaces to approximate them as the level sets of neural implicit functions. To train a neural implicit function, we propose a loss functional that approximates a signed distance function, and allows terms with high-order derivatives, such as the alignment between the principal directions of curvature, to learn more geometric details. During training, we consider a non-uniform sampling strategy based on the curvatures of the point-sampled surface to prioritize points with more geometric details. This sampling implies faster learning while preserving geometric accuracy when compared with previous approaches. We also present the analytical differential geometry formulas for neural surfaces, such as normal vectors and curvatures.
In this work we are interested in general linear inverse problems where the corresponding forward problem is solved iteratively using fixed point methods. Then one-shot methods, which iterate at the same time on the forward problem solution and on the inverse problem unknown, can be applied. We analyze two variants of the so-called multi-step one-shot methods and establish sufficient conditions on the descent step for their convergence, by studying the eigenvalues of the block matrix of the coupled iterations. Several numerical experiments are provided to illustrate the convergence of these methods in comparison with the classical usual and shifted gradient descent. In particular, we observe that very few inner iterations on the forward problem are enough to guarantee good convergence of the inversion algorithm.
Heterogeneity is a dominant factor in the behaviour of many biological processes. Despite this, it is common for mathematical and statistical analyses to ignore biological heterogeneity as a source of variability in experimental data. Therefore, methods for exploring the identifiability of models that explicitly incorporate heterogeneity through variability in model parameters are relatively underdeveloped. We develop a new likelihood-based framework, based on moment matching, for inference and identifiability analysis of differential equation models that capture biological heterogeneity through parameters that vary according to probability distributions. As our novel method is based on an approximate likelihood function, it is highly flexible; we demonstrate identifiability analysis using both a frequentist approach based on profile likelihood, and a Bayesian approach based on Markov-chain Monte Carlo. Through three case studies, we demonstrate our method by providing a didactic guide to inference and identifiability analysis of hyperparameters that relate to the statistical moments of model parameters from independent observed data. Our approach has a computational cost comparable to analysis of models that neglect heterogeneity, a significant improvement over many existing alternatives. We demonstrate how analysis of random parameter models can aid better understanding of the sources of heterogeneity from biological data.
Developing new ways to estimate probabilities can be valuable for science, statistics, and engineering. By considering the information content of different output patterns, recent work invoking algorithmic information theory has shown that a priori probability predictions based on pattern complexities can be made in a broad class of input-output maps. These algorithmic probability predictions do not depend on a detailed knowledge of how output patterns were produced, or historical statistical data. Although quantitatively fairly accurate, a main weakness of these predictions is that they are given as an upper bound on the probability of a pattern, but many low complexity, low probability patterns occur, for which the upper bound has little predictive value. Here we study this low complexity, low probability phenomenon by looking at example maps, namely a finite state transducer, natural time series data, RNA molecule structures, and polynomial curves. Some mechanisms causing low complexity, low probability behaviour are identified, and we argue this behaviour should be assumed as a default in the real world algorithmic probability studies. Additionally, we examine some applications of algorithmic probability and discuss some implications of low complexity, low probability patterns for several research areas including simplicity in physics and biology, a priori probability predictions, Solomonoff induction and Occam's razor, machine learning, and password guessing.
Since deep neural networks were developed, they have made huge contributions to everyday lives. Machine learning provides more rational advice than humans are capable of in almost every aspect of daily life. However, despite this achievement, the design and training of neural networks are still challenging and unpredictable procedures. To lower the technical thresholds for common users, automated hyper-parameter optimization (HPO) has become a popular topic in both academic and industrial areas. This paper provides a review of the most essential topics on HPO. The first section introduces the key hyper-parameters related to model training and structure, and discusses their importance and methods to define the value range. Then, the research focuses on major optimization algorithms and their applicability, covering their efficiency and accuracy especially for deep learning networks. This study next reviews major services and toolkits for HPO, comparing their support for state-of-the-art searching algorithms, feasibility with major deep learning frameworks, and extensibility for new modules designed by users. The paper concludes with problems that exist when HPO is applied to deep learning, a comparison between optimization algorithms, and prominent approaches for model evaluation with limited computational resources.