We propose a superfast method for constructing orthogonal matrices $M\in\mathcal{O}(n,q)$ in finite fields $GF(q)$. It can be used to construct $n\times n$ orthogonal matrices in $Z_p$ with very high values of $n$ and $p$, and also orthogonal matrices with a certain circulant structure. Equally well one can construct paraunitary filter banks or wavelet matrices over finite fields. The construction mechanism is highly efficient, allowing for the complete screening and selection of an orthogonal matrix that meets specific constraints. For instance, one can generate a complete list of orthogonal matrices with given $n$ and $q=p^m$ provided that the order of $\mathcal{O}(n,q)$ is not too large. Although the method is based on randomness, isolated cases of failure can be identified well in advance of the basic procedure's start. The proposed procedures are based on the Janashia-Lagvilava method which was developed for an entirely different task, therefore, it may seem somewhat unexpected.
Given an $n$ by $n$ matrix $A$ and an $n$-vector $b$, along with a rational function $R(z) := D(z )^{-1} N(z)$, we show how to find the optimal approximation to $R(A) b$ from the Krylov space, $\mbox{span}( b, Ab, \ldots , A^{k-1} b)$, using the basis vectors produced by the Arnoldi algorithm. To find this optimal approximation requires running $\max \{ \mbox{deg} (D) , \mbox{deg} (N) \} - 1$ extra Arnoldi steps and solving a $k + \max \{ \mbox{deg} (D) , \mbox{deg} (N) \}$ by $k$ least squares problem. Here {\em optimal} is taken to mean optimal in the $D(A )^{*} D(A)$-norm. Similar to the case for linear systems, we show that eigenvalues alone cannot provide information about the convergence behavior of this algorithm and we discuss other possible error bounds for highly nonnormal matrices.
Planar functions, introduced by Dembowski and Ostrom, are functions from a finite field to itself that give rise to finite projective planes. They exist, however, only for finite fields of odd characteristics. They have attracted much attention in the last decade thanks to their interest in theory and those deep and various applications in many fields. This paper focuses on planar trinomials over cubic and quartic extensions of finite fields. Our achievements are obtained using connections with quadratic forms and classical algebraic tools over finite fields. Furthermore, given the generality of our approach, the methodology presented could be employed to drive more planar functions on some finite extension fields.
A good automatic evaluation metric for language generation ideally correlates highly with human judgements of text quality. Yet, there is a dearth of such metrics, which inhibits the rapid and efficient progress of language generators. One exception is the recently proposed Mauve. In theory, Mauve measures an information-theoretic divergence between two probability distributions over strings: one representing the language generator under evaluation; the other representing the true natural language distribution. Mauve's authors argue that its success comes from the qualitative properties of their proposed divergence. Yet in practice, as this divergence is uncomputable, Mauve approximates it by measuring the divergence between multinomial distributions over clusters instead, where cluster assignments are attained by grouping strings based on a pre-trained language model's embeddings. As we show, however, this is not a tight approximation -- in either theory or practice. This begs the question: why does Mauve work so well? In this work, we show that Mauve was right for the wrong reasons, and that its newly proposed divergence is not necessary for its high performance. In fact, classical divergences paired with its proposed cluster-based approximation may actually serve as better evaluation metrics. We finish the paper with a probing analysis; this analysis leads us to conclude that -- by encoding syntactic- and coherence-level features of text, while ignoring surface-level features -- such cluster-based substitutes to string distributions may simply be better for evaluating state-of-the-art language generators.
Dynamic structural causal models (SCMs) are a powerful framework for reasoning in dynamic systems about direct effects which measure how a change in one variable affects another variable while holding all other variables constant. The causal relations in a dynamic structural causal model can be qualitatively represented with a full-time causal graph. Assuming linearity and causal sufficiency and given the full-time causal graph, the direct causal effect is always identifiable and can be estimated from data by adjusting on any set of variables given by the so-called single-door criterion. However, in many application such a graph is not available for various reasons but nevertheless experts have access to an abstraction of the full-time causal graph which represents causal relations between time series while omitting temporal information. This paper presents a complete identifiability result which characterizes all cases for which the direct effect is graphically identifiable from summary causal graphs and gives two sound finite adjustment sets that can be used to estimate the direct effect whenever it is identifiable.
Lensless illumination single-pixel imaging with a multicore fiber (MCF) is a computational imaging technique that enables potential endoscopic observations of biological samples at cellular scale. In this work, we show that this technique is tantamount to collecting multiple symmetric rank-one projections (SROP) of an interferometric matrix--a matrix encoding the spectral content of the sample image. In this model, each SROP is induced by the complex sketching vector shaping the incident light wavefront with a spatial light modulator (SLM), while the projected interferometric matrix collects up to $O(Q^2)$ image frequencies for a $Q$-core MCF. While this scheme subsumes previous sensing modalities, such as raster scanning (RS) imaging with beamformed illumination, we demonstrate that collecting the measurements of $M$ random SLM configurations--and thus acquiring $M$ SROPs--allows us to estimate an image of interest if $M$ and $Q$ scale log-linearly with the image sparsity level This demonstration is achieved both theoretically, with a specific restricted isometry analysis of the sensing scheme, and with extensive Monte Carlo experiments. On a practical side, we perform a single calibration of the sensing system robust to certain deviations to the theoretical model and independent of the sketching vectors used during the imaging phase. Experimental results made on an actual MCF system demonstrate the effectiveness of this imaging procedure on a benchmark image.
The extremal theory of forbidden 0--1 matrices studies the asymptotic growth of the function $\mathrm{Ex}(P,n)$, which is the maximum weight of a matrix $A\in\{0,1\}^{n\times n}$ whose submatrices avoid a fixed pattern $P\in\{0,1\}^{k\times l}$. This theory has been wildly successful at resolving problems in combinatorics, discrete and computational geometry, structural graph theory, and the analysis of data structures, particularly corollaries of the dynamic optimality conjecture. All these applications use acyclic patterns, meaning that when $P$ is regarded as the adjacency matrix of a bipartite graph, the graph is acyclic. The biggest open problem in this area is to bound $\mathrm{Ex}(P,n)$ for acyclic $P$. Prior results have only ruled out the strict $O(n\log n)$ bound conjectured by Furedi and Hajnal. It is consistent with prior results that $\forall P. \mathrm{Ex}(P,n)\leq n\log^{1+o(1)} n$, and also consistent that $\forall \epsilon>0.\exists P. \mathrm{Ex}(P,n) \geq n^{2-\epsilon}$. In this paper we establish a stronger lower bound on the extremal functions of acyclic $P$. Specifically, we give a new construction of relatively dense 0--1 matrices with $\Theta(n(\log n/\log\log n)^t)$ 1s that avoid an acyclic $X_t$. Pach and Tardos have conjectured that this type of result is the best possible, i.e., no acyclic $P$ exists for which $\mathrm{Ex}(P,n)\geq n(\log n)^{\omega(1)}$.
Uniform sampling of bipartite graphs and hypergraphs with given degree sequences is necessary for building null models to statistically evaluate their topology. Because these graphs can be represented as binary matrices, the problem is equivalent to uniformly sampling $r \times c$ binary matrices with fixed row and column sums. The trade algorithm, which includes both the curveball and fastball implementations, is the state-of-the-art for performing such sampling. Its mixing time is currently unknown, although $5r$ is currently used as a heuristic. In this paper we propose a new distribution-based approach that not only provides an estimation of the mixing time, but also actually returns a sample of matrices that are guaranteed (within a user-chosen error tolerance) to be uniformly randomly sampled. In numerical experiments on matrices that vary by size, fill, and row and column sum distributions, we find that the upper bound on mixing time is at least $10r$, and that it increases as a function of both $c$ and the fraction of cells containing a 1.
We derive upper bounds on the Wasserstein distance ($W_1$), with respect to $\sup$-norm, between any continuous $\mathbb{R}^d$ valued random field indexed by the $n$-sphere and the Gaussian, based on Stein's method. We develop a novel Gaussian smoothing technique that allows us to transfer a bound in a smoother metric to the $W_1$ distance. The smoothing is based on covariance functions constructed using powers of Laplacian operators, designed so that the associated Gaussian process has a tractable Cameron-Martin or Reproducing Kernel Hilbert Space. This feature enables us to move beyond one dimensional interval-based index sets that were previously considered in the literature. Specializing our general result, we obtain the first bounds on the Gaussian random field approximation of wide random neural networks of any depth and Lipschitz activation functions at the random field level. Our bounds are explicitly expressed in terms of the widths of the network and moments of the random weights. We also obtain tighter bounds when the activation function has three bounded derivatives.
PyVBMC is a Python implementation of the Variational Bayesian Monte Carlo (VBMC) algorithm for posterior and model inference for black-box computational models (Acerbi, 2018, 2020). VBMC is an approximate inference method designed for efficient parameter estimation and model assessment when model evaluations are mildly-to-very expensive (e.g., a second or more) and/or noisy. Specifically, VBMC computes: - a flexible (non-Gaussian) approximate posterior distribution of the model parameters, from which statistics and posterior samples can be easily extracted; - an approximation of the model evidence or marginal likelihood, a metric used for Bayesian model selection. PyVBMC can be applied to any computational or statistical model with up to roughly 10-15 continuous parameters, with the only requirement that the user can provide a Python function that computes the target log likelihood of the model, or an approximation thereof (e.g., an estimate of the likelihood obtained via simulation or Monte Carlo methods). PyVBMC is particularly effective when the model takes more than about a second per evaluation, with dramatic speed-ups of 1-2 orders of magnitude when compared to traditional approximate inference methods. Extensive benchmarks on both artificial test problems and a large number of real models from the computational sciences, particularly computational and cognitive neuroscience, show that VBMC generally - and often vastly - outperforms alternative methods for sample-efficient Bayesian inference, and is applicable to both exact and simulator-based models (Acerbi, 2018, 2019, 2020). PyVBMC brings this state-of-the-art inference algorithm to Python, along with an easy-to-use Pythonic interface for running the algorithm and manipulating and visualizing its results.
Click-through rate (CTR) prediction plays a critical role in recommender systems and online advertising. The data used in these applications are multi-field categorical data, where each feature belongs to one field. Field information is proved to be important and there are several works considering fields in their models. In this paper, we proposed a novel approach to model the field information effectively and efficiently. The proposed approach is a direct improvement of FwFM, and is named as Field-matrixed Factorization Machines (FmFM, or $FM^2$). We also proposed a new explanation of FM and FwFM within the FmFM framework, and compared it with the FFM. Besides pruning the cross terms, our model supports field-specific variable dimensions of embedding vectors, which acts as soft pruning. We also proposed an efficient way to minimize the dimension while keeping the model performance. The FmFM model can also be optimized further by caching the intermediate vectors, and it only takes thousands of floating-point operations (FLOPs) to make a prediction. Our experiment results show that it can out-perform the FFM, which is more complex. The FmFM model's performance is also comparable to DNN models which require much more FLOPs in runtime.