This article presents an in-depth educational overview of the latest mathematical developments in coupled cluster (CC) theory, beginning with Schneider's seminal work from 2009 that introduced the first local analysis of CC theory. We offer a tutorial review of second quantization and the CC ansatz, laying the groundwork for understanding the mathematical basis of the theory. This is followed by a detailed exploration of the most recent mathematical advancements in CC theory.Our review starts with an in-depth look at the local analysis pioneered by Schneider which has since been applied to analyze various CC methods. We then move on to discuss the graph-based framework for CC methods developed by Csirik and Laestadius. This framework provides a comprehensive platform for comparing different CC methods, including multireference approaches. Next, we delve into the latest numerical analysis results analyzing the single reference CC method developed by Hassan, Maday, and Wang. This very general approach is based on the invertibility of the CC function's Fr\'echet derivative. We conclude the article with a discussion on the recent incorporation of algebraic geometry into CC theory, highlighting how this novel and fundamentally different mathematical perspective has furthered our understanding and provides exciting pathways to new computational approaches.
We study existence and computability of finite bases for ideals of polynomials over infinitely many variables. In our setting, variables come from a countable logical structure A, and embeddings from A to A act on polynomials by renaming variables. First, we give a sufficient and necessary condition for A to guarantee the following generalisation of Hilbert's Basis Theorem: every polynomial ideal which is equivariant, i.e. invariant under renaming of variables, is finitely generated. Second, we develop an extension of classical Buchberger's algorithm to compute a Gr\"obner basis of a given equivariant ideal. This implies decidability of the membership problem for equivariant ideals. Finally, we sketch upon various applications of these results to register automata, Petri nets with data, orbit-finitely generated vector spaces, and orbit-finite systems of linear equations.
Effective application of mathematical models to interpret biological data and make accurate predictions often requires that model parameters are identifiable. Approaches to assess the so-called structural identifiability of models are well-established for ordinary differential equation models, yet there are no commonly adopted approaches that can be applied to assess the structural identifiability of the partial differential equation (PDE) models that are requisite to capture spatial features inherent to many phenomena. The differential algebra approach to structural identifiability has recently been demonstrated to be applicable to several specific PDE models. In this brief article, we present general methodology for performing structural identifiability analysis on partially observed reaction-advection-diffusion (RAD) PDE models that are linear in the unobserved quantities. We show that the differential algebra approach can always, in theory, be applied to such models. Moreover, despite the perceived complexity introduced by the addition of advection and diffusion terms, identifiability of spatial analogues of non-spatial models cannot decrease in structural identifiability. We conclude by discussing future possibilities and the computational cost of performing structural identifiability analysis on more general PDE models.
Generally, to apply the MUltiple SIgnal Classification (MUSIC) algorithm for the rapid imaging of small inhomogeneities, the complete elements of the multi-static response (MSR) matrix must be collected. However, in real-world applications such as microwave imaging or bistatic measurement configuration, diagonal elements of the MSR matrix are unknown. Nevertheless, it is possible to obtain imaging results using a traditional approach but theoretical reason of the applicability has not been investigated yet. In this paper, we establish mathematical structures of the imaging function of MUSIC from an MSR matrix without diagonal elements in both transverse magnetic (TM) and transverse electric (TE) polarizations. The established structures demonstrate why the shape of the location of small inhomogeneities can be retrieved via MUSIC without the diagonal elements of the MSR matrix. In addition, they reveal the intrinsic properties of imaging and the fundamental limitations. Results of numerical simulations are also provided to support the identified structures.
The Monte Carlo algorithm is increasingly utilized, with its central step involving computer-based random sampling from stochastic models. While both Markov Chain Monte Carlo (MCMC) and Reject Monte Carlo serve as sampling methods, the latter finds fewer applications compared to the former. Hence, this paper initially provides a concise introduction to the theory of the Reject Monte Carlo algorithm and its implementation techniques, aiming to enhance conceptual understanding and program implementation. Subsequently, a simplified rejection Monte Carlo algorithm is formulated. Furthermore, by considering multivariate distribution sampling and multivariate integration as examples, this study explores the specific application of the algorithm in statistical inference.
This essay provides a comprehensive analysis of the optimization and performance evaluation of various routing algorithms within the context of computer networks. Routing algorithms are critical for determining the most efficient path for data transmission between nodes in a network. The efficiency, reliability, and scalability of a network heavily rely on the choice and optimization of its routing algorithm. This paper begins with an overview of fundamental routing strategies, including shortest path, flooding, distance vector, and link state algorithms, and extends to more sophisticated techniques.
Quantum-inspired classical algorithms provide us with a new way to understand the computational power of quantum computers for practically-relevant problems, especially in machine learning. In the past several years, numerous efficient algorithms for various tasks have been found, while an analysis of lower bounds is still missing. Using communication complexity, in this work we propose the first method to study lower bounds for these tasks. We mainly focus on lower bounds for solving linear regressions, supervised clustering, principal component analysis, recommendation systems, and Hamiltonian simulations. More precisely, we show that for linear regressions, in the row-sparse case, the lower bound is quadratic in the Frobenius norm of the underlying matrix, which is tight. In the dense case, with an extra assumption on the accuracy we obtain that the lower bound is quartic in the Frobenius norm, which matches the upper bound. For supervised clustering, we obtain a tight lower bound that is quartic in the Frobenius norm. For the other three tasks, we obtain a lower bound that is quadratic in the Frobenius norm, and the known upper bound is quartic in the Frobenius norm. Through this research, we find that large quantum speedup can exist for sparse, high-rank, well-conditioned matrix-related problems. Finally, we extend our method to study lower bounds analysis of quantum query algorithms for matrix-related problems. Some applications are given.
This paper addresses a production scheduling problem derived from an industrial use case, focusing on unrelated parallel machine scheduling with the personnel availability constraint. The proposed model optimizes the production plan over a multi-period scheduling horizon, accommodating variations in personnel shift hours within each time period. It assumes shared personnel among machines, with one personnel required per machine for setup and supervision during job processing. Available personnel are fewer than the machines, thus limiting the number of machines that can operate in parallel. The model aims to minimize the total production time considering machine-dependent processing times and sequence-dependent setup times. The model handles practical scenarios like machine eligibility constraints and production time windows. A Mixed Integer Linear Programming (MILP) model is introduced to formulate the problem, taking into account both continuous and district variables. A two-step solution approach enhances computational speed, first maximizing accepted jobs and then minimizing production time. Validation with synthetic problem instances and a real industrial case study of a food processing plant demonstrates the performance of the model and its usefulness in personnel shift planning. The findings offer valuable insights for practical managerial decision-making in the context of production scheduling.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.
Deep learning is usually described as an experiment-driven field under continuous criticizes of lacking theoretical foundations. This problem has been partially fixed by a large volume of literature which has so far not been well organized. This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analyzing the generalizability of deep learning; (2) stochastic differential equations and their dynamic systems for modelling stochastic gradient descent and its variants, which characterize the optimization and generalization of deep learning, partially inspired by Bayesian inference; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; (4) the roles of over-parameterization of deep neural networks from both positive and negative perspectives; (5) theoretical foundations of several special structures in network architectures; and (6) the increasingly intensive concerns in ethics and security and their relationships with generalizability.
When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.