For every constant $d$, we design a subexponential time deterministic algorithm that takes as input a multivariate polynomial $f$ given as a constant depth algebraic circuit over the field of rational numbers, and outputs all irreducible factors of $f$ of degree at most $d$ together with their respective multiplicities. Moreover, if $f$ is a sparse polynomial, then the algorithm runs in quasipolynomial time. Our results are based on a more fine-grained connection between polynomial identity testing (PIT) and polynomial factorization in the context of constant degree factors and rely on a clean connection between divisibility testing of polynomials and PIT due to Forbes and on subexponential time deterministic PIT algorithms for constant depth algebraic circuits from the recent work of Limaye, Srinivasan and Tavenas.
Algorithms based on regret matching, specifically regret matching$^+$ (RM$^+$), and its variants are the most popular approaches for solving large-scale two-player zero-sum games in practice. Unlike algorithms such as optimistic gradient descent ascent, which have strong last-iterate and ergodic convergence properties for zero-sum games, virtually nothing is known about the last-iterate properties of regret-matching algorithms. Given the importance of last-iterate convergence for numerical optimization reasons and relevance as modeling real-word learning in games, in this paper, we study the last-iterate convergence properties of various popular variants of RM$^+$. First, we show numerically that several practical variants such as simultaneous RM$^+$, alternating RM$^+$, and simultaneous predictive RM$^+$, all lack last-iterate convergence guarantees even on a simple $3\times 3$ game. We then prove that recent variants of these algorithms based on a smoothing technique do enjoy last-iterate convergence: we prove that extragradient RM$^{+}$ and smooth Predictive RM$^+$ enjoy asymptotic last-iterate convergence (without a rate) and $1/\sqrt{t}$ best-iterate convergence. Finally, we introduce restarted variants of these algorithms, and show that they enjoy linear-rate last-iterate convergence.
It is well-known that the approximate factor models have the rotation indeterminacy. It has been considered that the principal component (PC) estimators estimate some rotations of the true factors and factor loadings, but the rotation matrix commonly used in the literature depends on the PC estimator itself. This raises a question: what does the PC estimator consistently estimate? This paper aims to explore the answer. We first show that, assuming a quite general weak factor model with the $r$ signal eigenvalues diverging possibly at different rates, there always exists a unique rotation matrix composed only of the true factors and loadings, such that it rotates the true model to the identifiable model satisfying the standard $r^2$ restrictions. We call the rotated factors and loadings the pseudo-true parameters. We next establish the consistency and asymptotic normality of the PC estimator for this pseudo-true parameter. The results give an answer for the question: the PC estimator consistently estimates the pseudo-true parameter. We also investigate similar problems in the factor augmented regression. Finite sample experiments confirm the excellent approximation of the theoretical results.
In this note, we connect two different topics from linear algebra and numerical analysis: hypocoercivity of semi-dissipative matrices and strong stability for explicit Runge--Kutta schemes. Linear autonomous ODE systems with a non-coercive matrix are called hypocoercive if they still exhibit uniform exponential decay towards the steady state. Strong stability is a property of time-integration schemes for ODEs that preserve the temporal monotonicity of the discrete solutions. It is proved that explicit Runge--Kutta schemes are strongly stable with respect to semi-dissipative, asymptotically stable matrices if the hypocoercivity index is sufficiently small compared to the order of the scheme. Otherwise, the Runge--Kutta schemes are in general not strongly stable. As a corollary, explicit Runge--Kutta schemes of order $p\in 4\N$ with $s=p$ stages turn out to be \emph{not} strongly stable. This result was proved in \cite{AAJ23}, filling a gap left open in \cite{SunShu19}. Here, we present an alternative, direct proof.
Fr\'echet regression has received considerable attention to model metric-space valued responses that are complex and non-Euclidean data, such as probability distributions and vectors on the unit sphere. However, existing Fr\'echet regression literature focuses on the classical setting where the predictor dimension is fixed, and the sample size goes to infinity. This paper proposes sparse Fr\'echet sufficient dimension reduction with graphical structure among high-dimensional Euclidean predictors. In particular, we propose a convex optimization problem that leverages the graphical information among predictors and avoids inverting the high-dimensional covariance matrix. We also provide the Alternating Direction Method of Multipliers (ADMM) algorithm to solve the optimization problem. Theoretically, the proposed method achieves subspace estimation and variable selection consistency under suitable conditions. Extensive simulations and a real data analysis are carried out to illustrate the finite-sample performance of the proposed method.
The Weisfeiler-Leman (WL) dimension of a graph parameter $f$ is the minimum $k$ such that, if $G_1$ and $G_2$ are indistinguishable by the $k$-dimensional WL-algorithm then $f(G_1)=f(G_2)$. The WL-dimension of $f$ is $\infty$ if no such $k$ exists. We study the WL-dimension of graph parameters characterised by the number of answers from a fixed conjunctive query to the graph. Given a conjunctive query $\varphi$, we quantify the WL-dimension of the function that maps every graph $G$ to the number of answers of $\varphi$ in $G$. The works of Dvor\'ak (J. Graph Theory 2010), Dell, Grohe, and Rattan (ICALP 2018), and Neuen (ArXiv 2023) have answered this question for full conjunctive queries, which are conjunctive queries without existentially quantified variables. For such queries $\varphi$, the WL-dimension is equal to the treewidth of the Gaifman graph of $\varphi$. In this work, we give a characterisation that applies to all conjunctive qureies. Given any conjunctive query $\varphi$, we prove that its WL-dimension is equal to the semantic extension width $\mathsf{sew}(\varphi)$, a novel width measure that can be thought of as a combination of the treewidth of $\varphi$ and its quantified star size, an invariant introduced by Durand and Mengel (ICDT 2013) describing how the existentially quantified variables of $\varphi$ are connected with the free variables. Using the recently established equivalence between the WL-algorithm and higher-order Graph Neural Networks (GNNs) due to Morris et al. (AAAI 2019), we obtain as a consequence that the function counting answers to a conjunctive query $\varphi$ cannot be computed by GNNs of order smaller than $\mathsf{sew}(\varphi)$.
Neural operators have been applied in various scientific fields, such as solving parametric partial differential equations, dynamical systems with control, and inverse problems. However, challenges arise when dealing with input functions that exhibit heterogeneous properties, requiring multiple sensors to handle functions with minimal regularity. To address this issue, discretization-invariant neural operators have been used, allowing the sampling of diverse input functions with different sensor locations. However, existing frameworks still require an equal number of sensors for all functions. In our study, we propose a novel distributed approach to further relax the discretization requirements and solve the heterogeneous dataset challenges. Our method involves partitioning the input function space and processing individual input functions using independent and separate neural networks. A centralized neural network is used to handle shared information across all output functions. This distributed methodology reduces the number of gradient descent back-propagation steps, improving efficiency while maintaining accuracy. We demonstrate that the corresponding neural network is a universal approximator of continuous nonlinear operators and present four numerical examples to validate its performance.
We present a structure-preserving Eulerian algorithm for solving $L^2$-gradient flows and a structure-preserving Lagrangian algorithm for solving generalized diffusions. Both algorithms employ neural networks as tools for spatial discretization. Unlike most existing methods that construct numerical discretizations based on the strong or weak form of the underlying PDE, the proposed schemes are constructed based on the energy-dissipation law directly. This guarantees the monotonic decay of the system's energy, which avoids unphysical states of solutions and is crucial for the long-term stability of numerical computations. To address challenges arising from nonlinear neural-network discretization, we first perform temporal discretization on these variational systems. This approach is computationally memory-efficient when implementing neural network-based algorithms. The proposed neural-network-based schemes are mesh-free, allowing us to solve gradient flows in high dimensions. Various numerical experiments are presented to demonstrate the accuracy and energy stability of the proposed numerical schemes.
As artificial intelligence (AI) models continue to scale up, they are becoming more capable and integrated into various forms of decision-making systems. For models involved in moral decision-making, also known as artificial moral agents (AMA), interpretability provides a way to trust and understand the agent's internal reasoning mechanisms for effective use and error correction. In this paper, we provide an overview of this rapidly-evolving sub-field of AI interpretability, introduce the concept of the Minimum Level of Interpretability (MLI) and recommend an MLI for various types of agents, to aid their safe deployment in real-world settings.
As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.
Neural machine translation (NMT) is a deep learning based approach for machine translation, which yields the state-of-the-art translation performance in scenarios where large-scale parallel corpora are available. Although the high-quality and domain-specific translation is crucial in the real world, domain-specific corpora are usually scarce or nonexistent, and thus vanilla NMT performs poorly in such scenarios. Domain adaptation that leverages both out-of-domain parallel corpora as well as monolingual corpora for in-domain translation, is very important for domain-specific translation. In this paper, we give a comprehensive survey of the state-of-the-art domain adaptation techniques for NMT.