Recently, it was discovered that for a given function class $\mathbf{F}$ the error of best linear recovery in the square norm can be bounded above by the Kolmogorov width of $\mathbf{F}$ in the uniform norm. That analysis is based on deep results in discretization of the square norm of functions from finite dimensional subspaces. In this paper we show how very recent results on universal discretization of the square norm of functions from a collection of finite dimensional subspaces lead to an inequality between optimal sparse recovery in the square norm and best sparse approximations in the uniform norm with respect to appropriate dictionaries.
Many algorithms which exactly solve hard problems require branching on more or less complex structures in order to do their job. Those who design such algorithms often find themselves doing a meticulous analysis of numerous different cases in order to identify these structures and design suitable branching rules, all done by hand. This process tends to be error prone and often the resulting algorithm may be difficult to implement in practice. In this work, we aim to automate a part of this process and focus on simplicity of the resulting implementation. We showcase our approach on the following problem. For a constant $d$, the $d$-Path Vertex Cover problem ($d$-PVC) is as follows: Given an undirected graph and an integer $k$, find a subset of at most $k$ vertices of the graph, such that their deletion results in a graph not containing a path on $d$ vertices as a subgraph. We develop a fully automated framework to generate parameterized branching algorithms for the problem and obtain algorithms outperforming those previously known for $3 \le d \le 8$. E.g., we show that $5$-PVC can be solved in $O(2.7^k\cdot n^{O(1)})$ time.
This paper proposes a novel signed $\beta$-model for directed signed network, which is frequently encountered in application domains but largely neglected in literature. The proposed signed $\beta$-model decomposes a directed signed network as the difference of two unsigned networks and embeds each node with two latent factors for in-status and out-status. The presence of negative edges leads to a non-concave log-likelihood, and a one-step estimation algorithm is developed to facilitate parameter estimation, which is efficient both theoretically and computationally. We also develop an inferential procedure for pairwise and multiple node comparisons under the signed $\beta$-model, which fills the void of lacking uncertainty quantification for node ranking. Theoretical results are established for the coverage probability of confidence interval, as well as the false discovery rate (FDR) control for multiple node comparison. The finite sample performance of the signed $\beta$-model is also examined through extensive numerical experiments on both synthetic and real-life networks.
We study sampling problems associated with potentials that lack smoothness. The potentials can be either convex or non-convex. Departing from the standard smooth setting, the potentials are only assumed to be weakly smooth or non-smooth, or the summation of multiple such functions. We develop a sampling algorithm that resembles proximal algorithms in optimization for this challenging sampling task. Our algorithm is based on a special case of Gibbs sampling known as the alternating sampling framework (ASF). The key contribution of this work is a practical realization of the ASF based on rejection sampling for both non-convex and convex potentials that are not necessarily smooth. In almost all the cases of sampling considered in this work, our proximal sampling algorithm achieves better complexity than all existing methods.
Linear wave equations sourced by a Dirac delta distribution $\delta(x)$ and its derivative(s) can serve as a model for many different phenomena. We describe a discontinuous Galerkin (DG) method to numerically solve such equations with source terms proportional to $\partial^n \delta /\partial x^n$. Despite the presence of singular source terms, which imply discontinuous or potentially singular solutions, our DG method achieves global spectral accuracy even at the source's location. Our DG method is developed for the wave equation written in fully first-order form. The first-order reduction is carried out using a distributional auxiliary variable that removes some of the source term's singular behavior. While this is helpful numerically, it gives rise to a distributional constraint. We show that a time-independent spurious solution can develop if the initial constraint violation is proportional to $\delta(x)$. Numerical experiments verify this behavior and our scheme's convergence properties by comparing against exact solutions.
In this paper, we discuss some numerical realizations of Shannon's sampling theorem. First we show the poor convergence of classical Shannon sampling sums by presenting sharp upper and lower bounds of the norm of the Shannon sampling operator. In addition, it is known that in the presence of noise in the samples of a bandlimited function, the convergence of Shannon sampling series may even break down completely. To overcome these drawbacks, one can use oversampling and regularization with a convenient window function. Such a window function can be chosen either in frequency domain or in time domain. We especially put emphasis on the comparison of these two approaches in terms of error decay rates. It turns out that the best numerical results are obtained by oversampling and regularization in time domain using a $\sinh$-type window function or a continuous Kaiser--Bessel window function, which results in an interpolating approximation with localized sampling. Several numerical experiments illustrate the theoretical results.
In this paper, practically computable low-order approximations of potentially high-dimensional differential equations driven by geometric rough paths are proposed and investigated. In particular, equations are studied that cover the linear setting, but we allow for a certain type of dissipative nonlinearity in the drift as well. In a first step, a linear subspace is found that contains the solution space of the underlying rough differential equation (RDE). This subspace is associated to covariances of linear Ito-stochastic differential equations which is shown exploiting a Gronwall lemma for matrix differential equations. Orthogonal projections onto the identified subspace lead to a first exact reduced order system. Secondly, a linear map of the RDE solution (quantity of interest) is analyzed in terms of redundant information meaning that state variables are found that do not contribute to the quantity of interest. Once more, a link to Ito-stochastic differential equations is used. Removing such unnecessary information from the RDE provides a further dimension reduction without causing an error. Finally, we discretize a linear parabolic rough partial differential equation in space. The resulting large-order RDE is subsequently tackled with the exact reduction techniques studied in this paper. We illustrate the enormous complexity reduction potential in the corresponding numerical experiments.
Consider the family of power divergence statistics based on $n$ trials, each leading to one of $r$ possible outcomes. This includes the log-likelihood ratio and Pearson's statistic as important special cases. It is known that in certain regimes (e.g., when $r$ is of order $n^2$ and the allocation is asymptotically uniform as $n\to\infty$) the power divergence statistic converges in distribution to a linear transformation of a Poisson random variable. We establish explicit error bounds in the Kolmogorov (or uniform) metric to complement this convergence result, which may be applied for any values of $n$, $r$ and the index parameter $\lambda$ for which such a finite-sample bound is meaningful. We further use this Poisson approximation result to derive error bounds in Gaussian approximation of the power divergence statistics.
Singularly perturbed problems present inherent difficulty due to the presence of a thin boundary layer in its solution. To overcome this difficulty, we propose using deep operator networks (DeepONets), a method previously shown to be effective in approximating nonlinear operators between infinite-dimensional Banach spaces. In this paper, we demonstrate for the first time the application of DeepONets to one-dimensional singularly perturbed problems, achieving promising results that suggest their potential as a robust tool for solving this class of problems. We consider the convergence rate of the approximation error incurred by the operator networks in approximating the solution operator, and examine the generalization gap and empirical risk, all of which are shown to converge uniformly with respect to the perturbation parameter. By utilizing Shishkin mesh points as locations of the loss function, we conduct several numerical experiments that provide further support for the effectiveness of operator networks in capturing the singular boundary layer behavior.
Large language models (LLMs) have demonstrated remarkable capabilities out of box for a wide range of applications, yet accuracy still remains a major growth area, especially in mission-critical domains such as biomedicine. An effective method to calibrate the confidence level on LLM responses is essential to automatically detect errors and facilitate human-in-the-loop verification. An important source of calibration signals stems from expert-stipulated programmatic supervision, which is often available at low cost but has its own limitations such as noise and coverage. In this paper, we introduce a Pareto optimal self-supervision framework that can leverage available programmatic supervision to systematically calibrate LLM responses by producing a risk score for every response, without any additional manual efforts. This is accomplished by learning a harmonizer model to align LLM output with other available supervision sources, which would assign higher risk scores to more uncertain LLM responses and facilitate error correction. Experiments on standard relation extraction tasks in biomedical and general domains demonstrate the promise of this approach, with our proposed risk scores highly correlated with the real error rate of LLMs. For the most uncertain test instances, dynamic prompting based on our proposed risk scores results in significant accuracy improvement for off-the-shelf LLMs, boosting GPT-3 results past state-of-the-art (SOTA) weak supervision and GPT-4 results past SOTA supervised results on challenging evaluation datasets.
We consider the problem of discovering $K$ related Gaussian directed acyclic graphs (DAGs), where the involved graph structures share a consistent causal order and sparse unions of supports. Under the multi-task learning setting, we propose a $l_1/l_2$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models. We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order (or topological order) than separate estimations. Moreover, the joint estimator is able to recover non-identifiable DAGs, by estimating them together with some identifiable DAGs. Lastly, our analysis also shows the consistency of union support recovery of the structures. To allow practical implementation, we design a continuous optimization problem whose optimizer is the same as the joint estimator and can be approximated efficiently by an iterative algorithm. We validate the theoretical analysis and the effectiveness of the joint estimator in experiments.