We provide a new approach for establishing hardness of approximation results, based on the theory recently introduced by the author. It allows one to directly show that approximating a problem beyond a certain threshold requires super-polynomial time. To exhibit the framework, we revisit two famous problems in this paper. The particular results we prove are: MAX-3-SAT$(1,\frac{7}{8}+\epsilon)$ requires exponential time for any constant $\epsilon$ satisfying $\frac{1}{8} \geq \epsilon > 0$. In particular, the gap exponential time hypothesis (Gap-ETH) holds. MAX-3-LIN-2$(1-\epsilon, \frac{1}{2}+\epsilon)$ requires exponential time for any constant $\epsilon$ satisfying $\frac{1}{4} \geq \epsilon > 0$.
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to the intrinsic data structures. In real world applications, such an assumption of data lying exactly on a low dimensional manifold is stringent. This paper introduces a relaxed assumption that the input data are concentrated around a subset of $\mathbb{R}^d$ denoted by $\mathcal{S}$, and the intrinsic dimension of $\mathcal{S}$ can be characterized by a new complexity notation -- effective Minkowski dimension. We prove that, the sample complexity of deep nonparametric regression only depends on the effective Minkowski dimension of $\mathcal{S}$ denoted by $p$. We further illustrate our theoretical findings by considering nonparametric regression with an anisotropic Gaussian random design $N(0,\Sigma)$, where $\Sigma$ is full rank. When the eigenvalues of $\Sigma$ have an exponential or polynomial decay, the effective Minkowski dimension of such an Gaussian random design is $p=\mathcal{O}(\sqrt{\log n})$ or $p=\mathcal{O}(n^\gamma)$, respectively, where $n$ is the sample size and $\gamma\in(0,1)$ is a small constant depending on the polynomial decay rate. Our theory shows that, when the manifold assumption does not hold, deep neural networks can still adapt to the effective Minkowski dimension of the data, and circumvent the curse of the ambient dimensionality for moderate sample sizes.
We present a short proof of a celebrated result of G\'acs and K\"orner giving sufficient and necessary condition on the joint distribution of two discrete random variables $X$ and $Y$ for the case when their mutual information matches the extractable (in the limit) common information. Our proof is based on the observation that the mere existence of certain random variables jointly distributed with $X$ and $Y$ can impose restriction on all random variables jointly distributed with $X$ and $Y$.
We study the structural and statistical properties of $\mathcal{R}$-norm minimizing interpolants of datasets labeled by specific target functions. The $\mathcal{R}$-norm is the basis of an inductive bias for two-layer neural networks, recently introduced to capture the functional effect of controlling the size of network weights, independently of the network width. We find that these interpolants are intrinsically multivariate functions, even when there are ridge functions that fit the data, and also that the $\mathcal{R}$-norm inductive bias is not sufficient for achieving statistically optimal generalization for certain learning problems. Altogether, these results shed new light on an inductive bias that is connected to practical neural network training.
In data-driven optimization, sample average approximation is known to suffer from the so-called optimizer's curse that causes optimistic bias in evaluating the solution performance. This can be tackled by adding a "margin" to the estimated objective value, or via distributionally robust optimization (DRO), a fast-growing approach based on worst-case analysis, which gives a protective bound on the attained objective value. However, in all these existing approaches, a statistically guaranteed bound on the true solution performance either requires restrictive conditions and knowledge on the objective function complexity, or otherwise exhibits an over-conservative rate that depends on the distribution dimension. We argue that a special type of DRO offers strong theoretical advantages in regard to these challenges: It attains a statistical bound on the true solution performance that is the tightest possible in terms of exponential decay rate, for a wide class of objective functions that notably does not hinge on function complexity. Correspondingly, its calibration also does not require any complexity information. This DRO uses an ambiguity set based on a KL-divergence smoothed by the Wasserstein or Levy-Prokhorov distance via a suitable distance optimization. Computationally, we also show that such a DRO, and its generalized version using smoothed $f$-divergence, is not much harder than standard DRO problems using the $f$-divergence or Wasserstein distance, thus supporting the strengths of such DRO as both statistically optimal and computationally viable.
The stochastic block model is a canonical random graph model for clustering and community detection on network-structured data. Decades of extensive study on the problem have established many profound results, among which the phase transition at the Kesten-Stigum threshold is particularly interesting both from a mathematical and an applied standpoint. It states that no estimator based on the network topology can perform substantially better than chance on sparse graphs if the model parameter is below certain threshold. Nevertheless, if we slightly extend the horizon to the ubiquitous semi-supervised setting, such a fundamental limitation will disappear completely. We prove that with arbitrary fraction of the labels revealed, the detection problem is feasible throughout the parameter domain. Moreover, we introduce two efficient algorithms, one combinatorial and one based on optimization, to integrate label information with graph structures. Our work brings a new perspective to stochastic model of networks and semidefinite program research.
An implicit variable-step BDF2 scheme is established for solving the space fractional Cahn-Hilliard equation, involving the fractional Laplacian, derived from a gradient flow in the negative order Sobolev space $H^{-\alpha}$, $\alpha\in(0,1)$. The Fourier pseudo-spectral method is applied for the spatial approximation. The proposed scheme inherits the energy dissipation law in the form of the modified discrete energy under the sufficient restriction of the time-step ratios. The convergence of the fully discrete scheme is rigorously provided utilizing the newly proved discrete embedding type convolution inequality dealing with the fractional Laplacian. Besides, the mass conservation and the unique solvability are also theoretically guaranteed. Numerical experiments are carried out to show the accuracy and the energy dissipation both for various interface widths. In particular, the multiple-time-scale evolution of the solution is captured by an adaptive time-stepping strategy in the short-to-long time simulation.
We consider the numerical approximation of second-order semi-linear parabolic stochastic partial differential equations interpreted in the mild sense which we solve on general two-dimensional domains with a $\mathcal{C}^2$ boundary with homogeneous Dirichlet boundary conditions. The equations are driven by Gaussian additive noise, and several Lipschitz-like conditions are imposed on the nonlinear function. We discretize in space with a spectral Galerkin method and in time using an explicit Euler-like scheme. For irregular shapes, the necessary Dirichlet eigenvalues and eigenfunctions are obtained from a boundary integral equation method. This yields a nonlinear eigenvalue problem, which is discretized using a boundary element collocation method and is solved with the Beyn contour integral algorithm. We present an error analysis as well as numerical results on an exemplary asymmetric shape, and point out limitations of the approach.
The equilibrium configuration of a plasma in an axially symmetric reactor is described mathematically by a free boundary problem associated with the celebrated Grad--Shafranov equation. The presence of uncertainty in the model parameters introduces the need to quantify the variability in the predictions. This is often done by computing a large number of model solutions on a computational grid for an ensemble of parameter values and then obtaining estimates for the statistical properties of solutions. In this study, we explore the savings that can be obtained using multilevel Monte Carlo methods, which reduce costs by performing the bulk of the computations on a sequence of spatial grids that are coarser than the one that would typically be used for a simple Monte Carlo simulation. We examine this approach using both a set of uniformly refined grids and a set of adaptively refined grids guided by a discrete error estimator. Numerical experiments show that multilevel methods dramatically reduce the cost of simulation, with cost reductions typically on the order of 60 or more and possibly as large as 200. Adaptive gridding results in more accurate computation of geometric quantities such as x-points associated with the model.
Many multivariate data sets exhibit a form of positive dependence, which can either appear globally between all variables or only locally within particular subgroups. A popular notion of positive dependence that allows for localized positivity is positive association. In this work we introduce the notion of extremal positive association for multivariate extremes from threshold exceedances. Via a sufficient condition for extremal association, we show that extremal association generalizes extremal tree models. For H\"usler--Reiss distributions the sufficient condition permits a parametric description that we call the metric property. As the parameter of a H\"usler--Reiss distribution is a Euclidean distance matrix, the metric property relates to research in electrical network theory and Euclidean geometry. We show that the metric property can be localized with respect to a graph and study surrogate likelihood inference. This gives rise to a two-step estimation procedure for locally metrical H\"usler--Reiss graphical models. The second step allows for a simple dual problem, which is implemented via a gradient descent algorithm. Finally, we demonstrate our results on simulated and real data.
This paper presents a novel, efficient, high-order accurate, and stable spectral element-based model for computing the complete three-dimensional linear radiation and diffraction problem for floating offshore structures. We present a solution to a pseudo-impulsive formulation in the time domain, where the frequency-dependent quantities, such as added mass, radiation damping, and wave excitation force for arbitrary heading angle, $\beta$, are evaluated using Fourier transforms from the tailored time-domain responses. The spatial domain is tessellated by an unstructured high-order hybrid configured mesh and represented by piece-wise polynomial basis functions in the spectral element space. Fourth-order accurate time integration is employed through an explicit four-stage Runge-Kutta method and complemented by fourth-order finite difference approximations for time differentiation. To reduce the computational burden, the model can make use of symmetry boundaries in the domain representation. The key piece of the numerical model -- the discrete Laplace solver -- is validated through $p$- and $h$-convergence studies. Moreover, to highlight the capabilities of the proposed model, we present prof-of-concept examples of simple floating bodies (a sphere and a box). Lastly, a much more involved case is performed of an oscillating water column, including generalized modes resembling the piston motion and wave sloshing effects inside the wave energy converter chamber. In this case, the spectral element model trivially computes the infinite-frequency added mass, which is a singular problem for conventional boundary element type solvers.