The field of category theory seeks to unify and generalize concepts and constructions across different areas of mathematics, from algebra to geometry to topology and also to logic and theoretical computer science. Formalized $1$-category theory forms a core component of various libraries of mathematical proofs. However, more sophisticated results in fields from algebraic topology to theoretical physics, where objects have "higher structure", rely on infinite-dimensional categories in place of $1$-dimensional categories, and $\infty$-category theory has thus far proved unamenable to computer formalization. Using a new proof assistant called Rzk, which is designed to support Riehl-Shulman's simplicial extension of homotopy type theory for synthetic $\infty$-category theory, we provide the first formalizations of results from $\infty$-category theory. This includes in particular a formalization of the Yoneda lemma, often regarded as the fundamental theorem of category theory, a theorem which roughly states that an object of a given category is determined by its relationship to all of the other objects of the category. A key feature of our framework is that, thanks to the synthetic theory, many constructions are automatically natural or functorial. We plan to use Rzk to formalize further results from $\infty$-category theory, such as the theory of limits and colimits and adjunctions.
We show how to express intuitionistic Zermelo set theory in deduction modulo (i.e. by replacing its axioms by rewrite rules) in such a way that the corresponding notion of proof enjoys the normalization property. To do so, we first rephrase set theory as a theory of pointed graphs (following a paradigm due to P. Aczel) by interpreting set-theoretic equality as bisimilarity, and show that in this setting, Zermelo's axioms can be decomposed into graph-theoretic primitives that can be turned into rewrite rules. We then show that the theory we obtain in deduction modulo is a conservative extension of (a minor extension of) Zermelo set theory. Finally, we prove the normalization of the intuitionistic fragment of the theory.
A convincing feature of least-squares finite element methods is the built-in a posteriori error estimator for any conforming discretization. In order to generalize this property to discontinuous finite element ansatz functions, this paper introduces a least-squares principle on piecewise Sobolev functions for the solution of the Poisson model problem in 2D with mixed boundary conditions. It allows for fairly general discretizations including standard piecewise polynomial ansatz spaces on triangular and polygonal meshes. The presented scheme enforces the interelement continuity of the piecewise polynomials by additional least-squares residuals. A side condition on the normal jumps of the flux variable requires a vanishing integral mean and enables a natural weighting of the jump in the least-squares functional in terms of the mesh size. This avoids over-penalization with additional regularity assumptions on the exact solution as usually present in the literature on discontinuous LSFEM. The proof of the built-in a posteriori error estimation for the over-penalized scheme is presented as well. All results in this paper are robust with respect to the size of the domain guaranteed by a suitable weighting of the residuals in the least-squares functional. Numerical experiments exhibit optimal convergence rates of the adaptive mesh-refining algorithm for various polynomial degrees.
In this paper, a linear second order numerical scheme is developed and investigated for the Allen-Cahn equation with a general positive mobility. In particular, our fully discrete scheme is mainly constructed based on the Crank-Nicolson formula for temporal discretization and the central finite difference method for spatial approximation, and two extra stabilizing terms are also introduced for the purpose of improving numerical stability. The proposed scheme is shown to unconditionally preserve the maximum bound principle (MBP) under mild restrictions on the stabilization parameters, which is of practical importance for achieving good accuracy and stability simultaneously. With the help of uniform boundedness of the numerical solutions due to MBP, we then successfully derive $H^{1}$-norm and $L^{\infty}$-norm error estimates for the Allen-Cahn equation with a constant and a variable mobility, respectively. Moreover, the energy stability of the proposed scheme is also obtained in the sense that the discrete free energy is uniformly bounded by the one at the initial time plus a {\color{black}constant}. Finally, some numerical experiments are carried out to verify the theoretical results and illustrate the performance of the proposed scheme with a time adaptive strategy.
We study least-squares trace regression when the parameter is the sum of a $r$-low-rank and a $s$-sparse matrices and a fraction $\epsilon$ of the labels is corrupted. For subgaussian distributions, we highlight three design properties. The first, termed $\PP$, handles additive decomposition and follows from a product process inequality. The second, termed $\IP$, handles both label contamination and additive decomposition. It follows from Chevet's inequality. The third, termed $\MP$, handles the interaction between the design and featured-dependent noise. It follows from a multiplier process inequality. Jointly, these properties entail the near-optimality of a tractable estimator with respect to the effective dimensions $d_{\eff,r}$ and $d_{\eff,s}$ for the low-rank and sparse components, $\epsilon$ and the failure probability $\delta$. This rate has the form $$ \mathsf{r}(n,d_{\eff,r}) + \mathsf{r}(n,d_{\eff,s}) + \sqrt{(1+\log(1/\delta))/n} + \epsilon\log(1/\epsilon). $$ Here, $\mathsf{r}(n,d_{\eff,r})+\mathsf{r}(n,d_{\eff,s})$ is the optimal uncontaminated rate, independent of $\delta$. Our estimator is adaptive to $(s,r,\epsilon,\delta)$ and, for fixed absolute constant $c>0$, it attains the mentioned rate with probability $1-\delta$ uniformly over all $\delta\ge\exp(-cn)$. Disconsidering matrix decomposition, our analysis also entails optimal bounds for a robust estimator adapted to the noise variance. Finally, we consider robust matrix completion. We highlight a new property for this problem: one can robustly and optimally estimate the incomplete matrix regardless of the \emph{magnitude of the corruption}. Our estimators are based on ``sorted'' versions of Huber's loss. We present simulations matching the theory. In particular, it reveals the superiority of ``sorted'' Huber loss over the classical Huber's loss.
Target similarity tuning (TST) is a method of selecting relevant examples in natural language (NL) to code generation through large language models (LLMs) to improve performance. Its goal is to adapt a sentence embedding model to have the similarity between two NL inputs match the similarity between their associated code outputs. In this paper, we propose different methods to apply and improve TST in the real world. First, we replace the sentence transformer with embeddings from a larger model, which reduces sensitivity to the language distribution and thus provides more flexibility in synthetic generation of examples, and we train a tiny model that transforms these embeddings to a space where embedding similarity matches code similarity, which allows the model to remain a black box and only requires a few matrix multiplications at inference time. Second, we show how to efficiently select a smaller number of training examples to train the TST model. Third, we introduce a ranking-based evaluation for TST that does not require end-to-end code generation experiments, which can be expensive to perform.
The eigenvalue decomposition (EVD) of (a batch of) Hermitian matrices of order two has a role in many numerical algorithms, of which the one-sided Jacobi method for the singular value decomposition (SVD) is the prime example. In this paper the batched EVD is vectorized, with a vector-friendly data layout and the AVX-512 SIMD instructions of Intel CPUs, alongside other key components of a real and a complex OpenMP-parallel Jacobi-type SVD method, inspired by the sequential xGESVJ routines from LAPACK. These vectorized building blocks should be portable to other platforms that support similar vector operations. Unconditional numerical reproducibility is guaranteed for the batched EVD, sequential or threaded, and for the column transformations, that are, like the scaled dot-products, presently sequential but can be threaded if nested parallelism is desired. No avoidable overflow of the results can occur with the proposed EVD or the whole SVD. The measured accuracy of the proposed EVD often surpasses that of the xLAEV2 routines from LAPACK. While the batched EVD outperforms the matching sequence of xLAEV2 calls, speedup of the parallel SVD is modest but can be improved and is already beneficial with enough threads. Regardless of their number, the proposed SVD method gives identical results, but of somewhat lower accuracy than xGESVJ.
Causal representation learning algorithms discover lower-dimensional representations of data that admit a decipherable interpretation of cause and effect; as achieving such interpretable representations is challenging, many causal learning algorithms utilize elements indicating prior information, such as (linear) structural causal models, interventional data, or weak supervision. Unfortunately, in exploratory causal representation learning, such elements and prior information may not be available or warranted. Alternatively, scientific datasets often have multiple modalities or physics-based constraints, and the use of such scientific, multimodal data has been shown to improve disentanglement in fully unsupervised settings. Consequently, we introduce a causal representation learning algorithm (causalPIMA) that can use multimodal data and known physics to discover important features with causal relationships. Our innovative algorithm utilizes a new differentiable parametrization to learn a directed acyclic graph (DAG) together with a latent space of a variational autoencoder in an end-to-end differentiable framework via a single, tractable evidence lower bound loss function. We place a Gaussian mixture prior on the latent space and identify each of the mixtures with an outcome of the DAG nodes; this novel identification enables feature discovery with causal relationships. Tested against a synthetic and a scientific dataset, our results demonstrate the capability of learning an interpretable causal structure while simultaneously discovering key features in a fully unsupervised setting.
This work presents a comparative study to numerically compute impulse approximate controls for parabolic equations with various boundary conditions. Theoretical controllability results have been recently investigated using a logarithmic convexity estimate at a single time based on a Carleman commutator approach. We propose a numerical algorithm for computing the impulse controls with minimal $L^2$-norms by adapting a penalized Hilbert Uniqueness Method (HUM) combined with a Conjugate Gradient (CG) method. We consider static boundary conditions (Dirichlet and Neumann) and dynamic boundary conditions. Some numerical experiments based on our developed algorithm are given to validate and compare the theoretical impulse controllability results.
This paper analyzes a $\theta$-method and 3-point time filter. This approach adds one additional line of code to the existing source code of $\theta$-method. We prove the method's $0$-stability, accuracy, and $A$-stability for both constant time step and variable time step. Some numerical tests are performed to validate the theoretical results.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.