This paper enhances and develops bridges between statistics, mechanics, and geometry. For a given system of points in $\mathbb R^k$ representing a sample of full rank, we construct an explicit pencil of confocal quadrics with the following properties: (i) All the hyperplanes for which the hyperplanar moments of inertia for the given system of points are equal, are tangent to the same quadrics from the pencil of quadrics. As an application, we develop regularization procedures for the orthogonal least square method, analogues of lasso and ridge methods from linear regression. (ii) For any given point $P$ among all the hyperplanes that contain it, the best fit is the tangent hyperplane to the quadric from the confocal pencil corresponding to the maximal Jacobi coordinate of the point $P$; the worst fit among the hyperplanes containing $P$ is the tangent hyperplane to the ellipsoid from the confocal pencil that contains $P$. The confocal pencil of quadrics provides a universal tool to solve the restricted principal component analysis restricted at any given point. Both results (i) and (ii) can be seen as generalizations of the classical result of Pearson on orthogonal regression. They have natural and important applications in the statistics of the errors-in-variables models (EIV). For the classical linear regressions we provide a geometric characterization of hyperplanes of least squares in a given direction among all hyperplanes which contain a given point. The obtained results have applications in restricted regressions, both ordinary and orthogonal ones. For the latter, a new formula for test statistic is derived. The developed methods and results are illustrated in natural statistics examples.
The Lov\'asz theta function $\theta(G)$ provides a very good upper bound on the stability number of a graph $G$. It can be computed in polynomial time by solving a semidefinite program (SDP), which also turns out to be fairly tractable in practice. Consequently, $\theta(G)$ achieves a hard-to-beat trade-off between computational effort and strength of the bound. Indeed, several attempts to improve the theta bound are documented, mainly based on playing around the application of the $N_+(\cdot)$ lifting operator of Lov\'asz and Schrijver to the classical formulation of the maximum stable set problem. Experience shows that solving such SDP-s often struggles against practical intractability and requires highly specialized methods. We investigate the application of such an operator to two different linear formulations based on clique and nodal inequalities, respectively. Fewer inequalities describe these two and yet guarantee that the resulting SDP bound is at least as strong as $\theta(G)$. Our computational experience, including larger graphs than those previously documented, shows that upper bounds stronger than $\theta(G)$ can be accessed by a reasonable additional effort using the clique-based formulation on sparse graphs and the nodal-based one on dense graphs.
In applied mathematics and related disciplines, the modeling-simulation-optimization workflow is a prominent scheme, with mathematical models and numerical algorithms playing a crucial role. For these types of mathematical research data, the Mathematical Research Data Initiative has developed, merged and implemented ontologies and knowledge graphs. This contributes to making mathematical research data FAIR by introducing semantic technology and documenting the mathematical foundations accordingly. Using the concrete example of microfracture analysis of porous media, it is shown how the knowledge of the underlying mathematical model and the corresponding numerical algorithms for its solution can be represented by the ontologies.
The author introduced models of linear logic known as ''Interaction Graphs'' which generalise Girard's various geometry of interaction constructions. In this work, we establish how these models essentially rely on a deep connection between zeta functions and the execution of programs, expressed as a cocycle. This is first shown in the simple case of graphs, before begin lifted to dynamical systems. Focussing on probabilistic models, we then explain how the notion of graphings used in Interaction Graphs captures a natural class of sub-Markov processes. We then extend the realisability constructions and the notion of zeta function to provide a realisability model of second-order linear logic over the set of all (discrete-time) sub-Markov processes.
Kaplan et al. and Hoffmann et al. developed influential scaling laws for the optimal model size as a function of the compute budget, but these laws yield substantially different predictions. We explain the discrepancy by reproducing the Kaplan scaling law on two datasets (OpenWebText2 and RefinedWeb) and identifying three factors causing the difference: last layer computational cost, warmup duration, and scale-dependent optimizer tuning. With these factors corrected, we obtain excellent agreement with the Hoffmann et al. (i.e., "Chinchilla") scaling law. Counter to a hypothesis of Hoffmann et al., we find that careful learning rate decay is not essential for the validity of their scaling law. As a secondary result, we derive scaling laws for the optimal learning rate and batch size, finding that tuning the AdamW $\beta_2$ parameter is essential at lower batch sizes.
We describe Bayes factors based on z, t, $\chi^2$, and F statistics when non-local moment prior distributions are used to define alternative hypotheses. The non-local alternative prior distributions are centered on standardized effects. The prior densities include a dispersion parameter that can be used to model prior precision and the variation of effect sizes across replicated experiments. We examine the convergence rates of Bayes factors under true null and true alternative hypotheses and show how these Bayes factors can be used to construct Bayes factor functions. An example illustrates the application of resulting Bayes factors to psychological experiments.
The advent of quantum computing holds the potential to revolutionize various fields by solving complex problems more efficiently than classical computers. Despite this promise, practical quantum advantage is hindered by current hardware limitations, notably the small number of qubits and high noise levels. In this study, we leverage adiabatic quantum computers to optimize Kolmogorov-Arnold Networks, a powerful neural network architecture for representing complex functions with minimal parameters. By modifying the network to use Bezier curves as the basis functions and formulating the optimization problem into a Quadratic Unconstrained Binary Optimization problem, we create a fixed-sized solution space, independent of the number of training samples. Our approach demonstrates sparks of quantum advantage through faster training times compared to classical optimizers such as the Adam, Stochastic Gradient Descent, Adaptive Gradient, and simulated annealing. Additionally, we introduce a novel rapid retraining capability, enabling the network to be retrained with new data without reprocessing old samples, thus enhancing learning efficiency in dynamic environments. Experimental results on initial training of classification and regression tasks validate the efficacy of our approach, showcasing significant speedups and comparable performance to classical methods. While experiments on retraining demonstrate a sixty times speed up using adiabatic quantum computing based optimization compared to that of the gradient descent based optimizers, with theoretical models allowing this speed up to be even larger! Our findings suggest that with further advancements in quantum hardware and algorithm optimization, quantum-optimized machine learning models could have broad applications across various domains, with initial focus on rapid retraining.
A rectangular drawing of a planar graph $G$ is a planar drawing of $G$ in which vertices are mapped to grid points, edges are mapped to horizontal and vertical straight-line segments, and faces are drawn as rectangles. Sometimes this latter constraint is relaxed for the outer face. In this paper, we study rectangular drawings in which the edges have unit length. We show a complexity dichotomy for the problem of deciding the existence of a unit-length rectangular drawing, depending on whether the outer face must also be drawn as a rectangle or not. Specifically, we prove that the problem is NP-complete for biconnected graphs when the drawing of the outer face is not required to be a rectangle, even if the sought drawing must respect a given planar embedding, whereas it is polynomial-time solvable, both in the fixed and the variable embedding settings, if the outer face is required to be drawn as a rectangle.
A generator matrix of a linear code $\C$ over $\gf(q)$ is also a matrix of the same rank $k$ over any extension field $\gf(q^\ell)$ and generates a linear code of the same length, same dimension and same minimum distance over $\gf(q^\ell)$, denoted by $\C(q|q^\ell)$ and called a lifted code of $\C$. Although $\C$ and their lifted codes $\C(q|q^\ell)$ have the same parameters, they have different weight distributions and different applications. Few results about lifted linear codes are known in the literature. This paper proves some fundamental theory for lifted linear codes, settles the weight distributions of lifted Hamming codes and lifted Simplex codes, and investigates the $2$-designs supported by the lifted Hamming and Simplex codes. Infinite families of $2$-designs are obtained. In addition, an infinite family of two-weight codes and an infinite family of three-weight codes are presented.
We noisily observe solutions of an ordinary differential equation $\dot u = f(u)$ at given times, where $u$ lives in a $d$-dimensional state space. The model function $f$ is unknown and belongs to a H\"older-type smoothness class with parameter $\beta$. For the nonparametric problem of estimating $f$, we provide lower bounds on the error in two complementary model specifications: the snake model with few, long observed solutions and the stubble model with many short ones. The lower bounds are minimax optimal in some settings. They depend on various parameters, which in the optimal asymptotic regime leads to the same rate for the squared error in both models: it is characterized by the exponent $-2\beta/(2(\beta+1)+d)$ for the total number of observations $n$. To derive these results, we establish a master theorem for lower bounds in general nonparametric regression problems, which makes the proofs more comparable and seems to be a useful tool for future use.
The fusion of causal models with deep learning introducing increasingly intricate data sets, such as the causal associations within images or between textual components, has surfaced as a focal research area. Nonetheless, the broadening of original causal concepts and theories to such complex, non-statistical data has been met with serious challenges. In response, our study proposes redefinitions of causal data into three distinct categories from the standpoint of causal structure and representation: definite data, semi-definite data, and indefinite data. Definite data chiefly pertains to statistical data used in conventional causal scenarios, while semi-definite data refers to a spectrum of data formats germane to deep learning, including time-series, images, text, and others. Indefinite data is an emergent research sphere inferred from the progression of data forms by us. To comprehensively present these three data paradigms, we elaborate on their formal definitions, differences manifested in datasets, resolution pathways, and development of research. We summarize key tasks and achievements pertaining to definite and semi-definite data from myriad research undertakings, present a roadmap for indefinite data, beginning with its current research conundrums. Lastly, we classify and scrutinize the key datasets presently utilized within these three paradigms.