The membership and threshold problems for recurrence sequences are fundamental open decision problems in automated verification. The former problem asks whether a chosen target is an element of a sequence, whilst the latter asks whether every term in a sequence is bounded from below by a given value. A rational-valued sequence $\langle u_n \rangle_n$ is hypergeometric if it satisfies a first-order linear recurrence of the form $p(n)u_{n+1} = q(n)u_{n}$ with polynomial coefficients $p,q\in\mathbb{Z}[x]$. In this note we establish decidability results for the aforementioned problems for restricted classes of hypergeometric sequences. For example, we establish decidability for the aforementioned problems under the assumption that the polynomial coefficients $p,q\in\mathbb{Z}[x]$ are monic and split over an imaginary rational extension of $\mathbb{Q}$. We also establish conditional decidability results; that is, conditional on Schanuel's conjecture, when the irreducible factors of the monic polynomial coefficients $p,q\in\mathbb{Z}[x]$ are either linear or quadratic.
We consider the Sobolev embedding operator $E_s : H^s(\Omega) \to L_2(\Omega)$ and its role in the solution of inverse problems. In particular, we collect various properties and investigate different characterizations of its adjoint operator $E_s^*$, which is a common component in both iterative and variational regularization methods. These include variational representations and connections to boundary value problems, Fourier and wavelet representations, as well as connections to spatial filters. Moreover, we consider characterizations in terms of Fourier series, singular value decompositions and frame decompositions, as well as representations in finite dimensional settings. While many of these results are already known to researchers from different fields, a detailed and general overview or reference work containing rigorous mathematical proofs is still missing. Hence, in this paper we aim to fill this gap by collecting, introducing and generalizing a large number of characterizations of $E_s^*$ and discuss their use in regularization methods for solving inverse problems. The resulting compilation can serve both as a reference as well as a useful guide for its efficient numerical implementation in practice.
This paper considers the problem of learning a single ReLU neuron with squared loss (a.k.a., ReLU regression) in the overparameterized regime, where the input dimension can exceed the number of samples. We analyze a Perceptron-type algorithm called GLM-tron (Kakade et al., 2011) and provide its dimension-free risk upper bounds for high-dimensional ReLU regression in both well-specified and misspecified settings. Our risk bounds recover several existing results as special cases. Moreover, in the well-specified setting, we provide an instance-wise matching risk lower bound for GLM-tron. Our upper and lower risk bounds provide a sharp characterization of the high-dimensional ReLU regression problems that can be learned via GLM-tron. On the other hand, we provide some negative results for stochastic gradient descent (SGD) for ReLU regression with symmetric Bernoulli data: if the model is well-specified, the excess risk of SGD is provably no better than that of GLM-tron ignoring constant factors, for each problem instance; and in the noiseless case, GLM-tron can achieve a small risk while SGD unavoidably suffers from a constant risk in expectation. These results together suggest that GLM-tron might be preferable to SGD for high-dimensional ReLU regression.
We develop a new model for spatial random field reconstruction of a binary-valued spatial phenomenon. In our model, sensors are deployed in a wireless sensor network across a large geographical region. Each sensor measures a non-Gaussian inhomogeneous temporal process which depends on the spatial phenomenon. Two types of sensors are employed: one collects point observations at specific time points, while the other collects integral observations over time intervals. Subsequently, the sensors transmit these time-series observations to a Fusion Center (FC), and the FC infers the spatial phenomenon from these observations. We show that the resulting posterior predictive distribution is intractable and develop a tractable two-step procedure to perform inference. Firstly, we develop algorithms to perform approximate Likelihood Ratio Tests on the time-series observations, compressing them to a single bit for both point sensors and integral sensors. Secondly, once the compressed observations are transmitted to the FC, we utilize a Spatial Best Linear Unbiased Estimator (S-BLUE) to reconstruct the binary spatial random field at any desired spatial location. The performance of the proposed approach is studied using simulation. We further illustrate the effectiveness of our method using a weather dataset from the National Environment Agency (NEA) of Singapore with fields including temperature and relative humidity.
A series-parallel matrix is a binary matrix that can be obtained from an empty matrix by successively adjoining rows or columns that are parallel to an existing row/column or have at most one 1-entry. Equivalently, series-parallel matrices are representation matrices of graphic matroids of series-parallel graphs, which can be recognized in linear time. We propose an algorithm that, for an m-by-n matrix A with k nonzeros, determines in expected $\mathcal{O}(m + n + k)$ time whether A is series-parallel, or returns a minimal non-series-parallel submatrix of A. We complement the developed algorithm by an efficient implementation and report about computational results.
Parameter selection in high-dimensional models is typically finetuned in a way that keeps the (relative) number of false positives under control. This is because otherwise the few true positives may be dominated by the many possible false positives. This happens, for instance, when the selection follows from a naive optimisation of an information criterion, such as AIC or Mallows's Cp. It can be argued that the overestimation of the selection comes from the optimisation process itself changing the statistics of the selected variables, in a way that the information criterion no longer reflects the true divergence between the selection and the data generating process. In lasso, the overestimation can also be linked to the shrinkage estimator, which makes the selection too tolerant of false positive selections. For these reasons, this paper works on refined information criteria, carefully balancing false positives and false negatives, for use with estimators without shrinkage. In particular, the paper develops corrected Mallows's Cp criteria for structured selection in trees and graphical models.
The stochastic block model is a canonical random graph model for clustering and community detection on network-structured data. Decades of extensive study on the problem have established many profound results, among which the phase transition at the Kesten-Stigum threshold is particularly interesting both from a mathematical and an applied standpoint. It states that no estimator based on the network topology can perform substantially better than chance on sparse graphs if the model parameter is below certain threshold. Nevertheless, if we slightly extend the horizon to the ubiquitous semi-supervised setting, such a fundamental limitation will disappear completely. We prove that with arbitrary fraction of the labels revealed, the detection problem is feasible throughout the parameter domain. Moreover, we introduce two efficient algorithms, one combinatorial and one based on optimization, to integrate label information with graph structures. Our work brings a new perspective to stochastic model of networks and semidefinite program research.
For a given function $F$ from $\mathbb F_{p^n}$ to itself, determining whether there exists a function which is CCZ-equivalent but EA-inequivalent to $F$ is a very important and interesting problem. For example, K\"olsch \cite{KOL21} showed that there is no function which is CCZ-equivalent but EA-inequivalent to the inverse function. On the other hand, for the cases of Gold function $F(x)=x^{2^i+1}$ and $F(x)=x^3+{\rm Tr}(x^9)$ over $\mathbb F_{2^n}$, Budaghyan, Carlet and Pott (respectively, Budaghyan, Carlet and Leander) \cite{BCP06, BCL09FFTA} found functions which are CCZ-equivalent but EA-inequivalent to $F$. In this paper, when a given function $F$ has a component function which has a linear structure, we present functions which are CCZ-equivalent to $F$, and if suitable conditions are satisfied, the constructed functions are shown to be EA-inequivalent to $F$. As a consequence, for every quadratic function $F$ on $\mathbb F_{2^n}$ ($n\geq 4$) with nonlinearity $>0$ and differential uniformity $\leq 2^{n-3}$, we explicitly construct functions which are CCZ-equivalent but EA-inequivalent to $F$. Also for every non-planar quadratic function on $\mathbb F_{p^n}$ $(p>2, n\geq 4)$ with $|\mathcal W_F|\leq p^{n-1}$ and differential uniformity $\leq p^{n-3}$, we explicitly construct functions which are CCZ-equivalent but EA-inequivalent to $F$.
This paper presents a novel approach to Bayesian nonparametric spectral analysis of stationary multivariate time series. Starting with a parametric vector-autoregressive model, the parametric likelihood is nonparametrically adjusted in the frequency domain to account for potential deviations from parametric assumptions. We show mutual contiguity of the nonparametrically corrected likelihood, the multivariate Whittle likelihood approximation and the exact likelihood for Gaussian time series. A multivariate extension of the nonparametric Bernstein-Dirichlet process prior for univariate spectral densities to the space of Hermitian positive definite spectral density matrices is specified directly on the correction matrices. An infinite series representation of this prior is then used to develop a Markov chain Monte Carlo algorithm to sample from the posterior distribution. The code is made publicly available for ease of use and reproducibility. With this novel approach we provide a generalization of the multivariate Whittle-likelihood-based method of Meier et al. (2020) as well as an extension of the nonparametrically corrected likelihood for univariate stationary time series of Kirch et al. (2019) to the multivariate case. We demonstrate that the nonparametrically corrected likelihood combines the efficiencies of a parametric with the robustness of a nonparametric model. Its numerical accuracy is illustrated in a comprehensive simulation study. We illustrate its practical advantages by a spectral analysis of two environmental time series data sets: a bivariate time series of the Southern Oscillation Index and fish recruitment and time series of windspeed data at six locations in California.
We propose a model to flexibly estimate joint tail properties by exploiting the convergence of an appropriately scaled point cloud onto a compact limit set. Characteristics of the shape of the limit set correspond to key tail dependence properties. We directly model the shape of the limit set using B\'ezier splines, which allow flexible and parsimonious specification of shapes in two dimensions. We then fit the B\'ezier splines to data in pseudo-polar coordinates using Markov chain Monte Carlo, utilizing a limiting approximation to the conditional likelihood of the radii given angles. By imposing appropriate constraints on the parameters of the B\'ezier splines, we guarantee that each posterior sample is a valid limit set boundary, allowing direct posterior analysis of any quantity derived from the shape of the curve. Furthermore, we obtain interpretable inference on the asymptotic dependence class by using mixture priors with point masses on the corner of the unit box. Finally, we apply our model to bivariate datasets of extremes of variables related to fire risk and air pollution.
The generalization mystery in deep learning is the following: Why do over-parameterized neural networks trained with gradient descent (GD) generalize well on real datasets even though they are capable of fitting random datasets of comparable size? Furthermore, from among all solutions that fit the training data, how does GD find one that generalizes well (when such a well-generalizing solution exists)? We argue that the answer to both questions lies in the interaction of the gradients of different examples during training. Intuitively, if the per-example gradients are well-aligned, that is, if they are coherent, then one may expect GD to be (algorithmically) stable, and hence generalize well. We formalize this argument with an easy to compute and interpretable metric for coherence, and show that the metric takes on very different values on real and random datasets for several common vision networks. The theory also explains a number of other phenomena in deep learning, such as why some examples are reliably learned earlier than others, why early stopping works, and why it is possible to learn from noisy labels. Moreover, since the theory provides a causal explanation of how GD finds a well-generalizing solution when one exists, it motivates a class of simple modifications to GD that attenuate memorization and improve generalization. Generalization in deep learning is an extremely broad phenomenon, and therefore, it requires an equally general explanation. We conclude with a survey of alternative lines of attack on this problem, and argue that the proposed approach is the most viable one on this basis.