Algebraic varieties are the geometric shapes defined by systems of polynomial equations; they are ubiquitous across mathematics and science. Amongst these algebraic varieties are Q-Fano varieties: positively curved shapes which have Q-factorial terminal singularities. Q-Fano varieties are of fundamental importance in geometry as they are "atomic pieces" of more complex shapes - the process of breaking a shape into simpler pieces in this sense is called the Minimal Model Programme. Despite their importance, the classification of Q-Fano varieties remains unknown. In this paper we demonstrate that machine learning can be used to understand this classification. We focus on 8-dimensional positively-curved algebraic varieties that have toric symmetry and Picard rank 2, and develop a neural network classifier that predicts with 95% accuracy whether or not such an algebraic variety is Q-Fano. We use this to give a first sketch of the landscape of Q-Fanos in dimension 8. How the neural network is able to detect Q-Fano varieties with such accuracy remains mysterious, and hints at some deep mathematical theory waiting to be uncovered. Furthermore, when visualised using the quantum period, an invariant that has played an important role in recent theoretical developments, we observe that the classification as revealed by ML appears to fall within a bounded region, and is stratified by the Fano index. This suggests that it may be possible to state and prove conjectures on completeness in the future. Inspired by the ML analysis, we formulate and prove a new global combinatorial criterion for a positively curved toric variety of Picard rank 2 to have terminal singularities. Together with the first sketch of the landscape of Q-Fanos in higher dimensions, this gives new evidence that machine learning can be an essential tool in developing mathematical conjectures and accelerating theoretical discovery.
We propose, analyze, and test new iterative solvers for large-scale systems of linear algebraic equations arising from the finite element discretization of reduced optimality systems defining the finite element approximations to the solution of elliptic tracking-type distributed optimal control problems with both the standard $L_2$ and the more general energy regularizations. If we aim at an approximation of the given desired state $y_d$ by the computed finite element state $y_h$ that asymptotically differs from $y_d$ in the order of the best $L_2$ approximation under acceptable costs for the control, then the optimal choice of the regularization parameter $\varrho$ is linked to the mesh-size $h$ by the relations $\varrho=h^4$ and $\varrho=h^2$ for the $L_2$ and the energy regularization, respectively. For this setting, we can construct efficient parallel iterative solvers for the reduced finite element optimality systems. These results can be generalized to variable regularization parameters adapted to the local behavior of the mesh-size that can heavily change in case of adaptive mesh refinement. Similar results can be obtained for the space-time finite element discretization of the corresponding parabolic and hyperbolic optimal control problems.
We propose a novel stochastic algorithm that randomly samples entire rows and columns of the matrix as a way to approximate an arbitrary matrix function using the power series expansion. This contrasts with existing Monte Carlo methods, which only work with one entry at a time, resulting in a significantly better convergence rate than the original approach. To assess the applicability of our method, we compute the subgraph centrality and total communicability of several large networks. In all benchmarks analyzed so far, the performance of our method was significantly superior to the competition, being able to scale up to 64 CPU cores with remarkable efficiency.
This work puts forth low-complexity Riemannian subspace descent algorithms for the minimization of functions over the symmetric positive definite (SPD) manifold. Different from the existing Riemannian gradient descent variants, the proposed approach utilizes carefully chosen subspaces that allow the update to be written as a product of the Cholesky factor of the iterate and a sparse matrix. The resulting updates avoid the costly matrix operations like matrix exponentiation and dense matrix multiplication, which are generally required in almost all other Riemannian optimization algorithms on SPD manifold. We further identify a broad class of functions, arising in diverse applications, such as kernel matrix learning, covariance estimation of Gaussian distributions, maximum likelihood parameter estimation of elliptically contoured distributions, and parameter estimation in Gaussian mixture model problems, over which the Riemannian gradients can be calculated efficiently. The proposed uni-directional and multi-directional Riemannian subspace descent variants incur per-iteration complexities of $O(n)$ and $O(n^2)$ respectively, as compared to the $O(n^3)$ or higher complexity incurred by all existing Riemannian gradient descent variants. The superior runtime and low per-iteration complexity of the proposed algorithms is also demonstrated via numerical tests on large-scale covariance estimation and matrix square root problems. MATLAB code implementation is publicly available on GitHub : //github.com/yogeshd-iitk/subspace_descent_over_SPD_manifold
Deep learning methods have gained considerable interest in the numerical solution of various partial differential equations (PDEs). One particular focus is physics-informed neural networks (PINN), which integrate physical principles into neural networks. This transforms the process of solving PDEs into optimization problems for neural networks. To address a collection of advection-diffusion equations (ADE) in a range of difficult circumstances, this paper proposes a novel network structure. This architecture integrates the solver, a multi-scale deep neural networks (MscaleDNN) utilized in the PINN method, with a hard constraint technique known as HCPINN. This method introduces a revised formulation of the desired solution for ADE by utilizing a loss function that incorporates the residuals of the governing equation and penalizes any deviations from the specified boundary and initial constraints. By surpassing the boundary constraints automatically, this method improves the accuracy and efficiency of the PINN technique. To address the ``spectral bias'' phenomenon in neural networks, a subnetwork structure of MscaleDNN and a Fourier-induced activation function are incorporated into the HCPINN, resulting in a hybrid approach called SFHCPINN. The effectiveness of SFHCPINN is demonstrated through various numerical experiments involving ADE in different dimensions. The numerical results indicate that SFHCPINN outperforms both standard PINN and its subnetwork version with Fourier feature embedding. It achieves remarkable accuracy and efficiency while effectively handling complex boundary conditions and high-frequency scenarios in ADE.
We study hypothesis testing under communication constraints, where each sample is quantized before being revealed to a statistician. Without communication constraints, it is well known that the sample complexity of simple binary hypothesis testing is characterized by the Hellinger distance between the distributions. We show that the sample complexity of simple binary hypothesis testing under communication constraints is at most a logarithmic factor larger than in the unconstrained setting and this bound is tight. We develop a polynomial-time algorithm that achieves the aforementioned sample complexity. Our framework extends to robust hypothesis testing, where the distributions are corrupted in the total variation distance. Our proofs rely on a new reverse data processing inequality and a reverse Markov inequality, which may be of independent interest. For simple $M$-ary hypothesis testing, the sample complexity in the absence of communication constraints has a logarithmic dependence on $M$. We show that communication constraints can cause an exponential blow-up leading to $\Omega(M)$ sample complexity even for adaptive algorithms.
A major technique in learning-augmented online algorithms is combining multiple algorithms or predictors. Since the performance of each predictor may vary over time, it is desirable to use not the single best predictor as a benchmark, but rather a dynamic combination which follows different predictors at different times. We design algorithms that combine predictions and are competitive against such dynamic combinations for a wide class of online problems, namely, metrical task systems. Against the best (in hindsight) unconstrained combination of $\ell$ predictors, we obtain a competitive ratio of $O(\ell^2)$, and show that this is best possible. However, for a benchmark with slightly constrained number of switches between different predictors, we can get a $(1+\epsilon)$-competitive algorithm. Moreover, our algorithms can be adapted to access predictors in a bandit-like fashion, querying only one predictor at a time. An unexpected implication of one of our lower bounds is a new structural insight about covering formulations for the $k$-server problem.
Spatial data can come in a variety of different forms, but two of the most common generating models for such observations are random fields and point processes. Whilst it is known that spectral analysis can unify these two different data forms, specific methodology for the related estimation is yet to be developed. In this paper, we solve this problem by extending multitaper estimation, to estimate the spectral density matrix function for multivariate spatial data, where processes can be any combination of either point processes or random fields. We discuss finite sample and asymptotic theory for the proposed estimators, as well as specific details on the implementation, including how to perform estimation on non-rectangular domains and the correct implementation of multitapering for processes sampled in different ways, e.g. continuously vs on a regular grid.
We investigate the performance of two approximation algorithms for the Hafnian of a nonnegative square matrix, namely the Barvinok and Godsil-Gutman estimators. We observe that, while there are examples of matrices for which these algorithms fail to provide a good approximation, the algorithms perform surprisingly well for adjacency matrices of random graphs. In most cases, the Godsil-Gutman estimator provides a far superior accuracy. For dense graphs, however, both estimators demonstrate a slow growth of the variance. For complete graphs, we show analytically that the relative variance $\sigma / \mu$ grows as a square root of the size of the graph. Finally, we simulate a Gaussian Boson Sampling experiment using the Godsil-Gutman estimator and show that the technique used can successfully reproduce low-order correlation functions.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.
Deep learning is usually described as an experiment-driven field under continuous criticizes of lacking theoretical foundations. This problem has been partially fixed by a large volume of literature which has so far not been well organized. This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analyzing the generalizability of deep learning; (2) stochastic differential equations and their dynamic systems for modelling stochastic gradient descent and its variants, which characterize the optimization and generalization of deep learning, partially inspired by Bayesian inference; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; (4) the roles of over-parameterization of deep neural networks from both positive and negative perspectives; (5) theoretical foundations of several special structures in network architectures; and (6) the increasingly intensive concerns in ethics and security and their relationships with generalizability.