Statistical inverse learning aims at recovering an unknown function $f$ from randomly scattered and possibly noisy point evaluations of another function $g$, connected to $f$ via an ill-posed mathematical model. In this paper we blend statistical inverse learning theory with the classical regularization strategy of applying finite-dimensional projections. Our key finding is that coupling the number of random point evaluations with the choice of projection dimension, one can derive probabilistic convergence rates for the reconstruction error of the maximum likelihood (ML) estimator. Convergence rates in expectation are derived with a ML estimator complemented with a norm-based cut-off operation. Moreover, we prove that the obtained rates are minimax optimal.
Recently, a class of machine learning methods called physics-informed neural networks (PINNs) has been proposed and gained prevalence in solving various scientific computing problems. This approach enables the solution of partial differential equations (PDEs) via embedding physical laws into the loss function. Many inverse problems can be tackled by simply combining the data from real life scenarios with existing PINN algorithms. In this paper, we present a multi-task learning method using uncertainty weighting to improve the training efficiency and accuracy of PINNs for inverse problems in linear elasticity and hyperelasticity. Furthermore, we demonstrate an application of PINNs to a practical inverse problem in structural analysis: prediction of external loads of diverse engineering structures based on limited displacement monitoring points. To this end, we first determine a simplified loading scenario at the offline stage. By setting unknown boundary conditions as learnable parameters, PINNs can predict the external loads with the support of measured data. When it comes to the online stage in real engineering projects, transfer learning is employed to fine-tune the pre-trained model from offline stage. Our results show that, even with noisy gappy data, satisfactory results can still be obtained from the PINN model due to the dual regularization of physics laws and prior knowledge, which exhibits better robustness compared to traditional analysis methods. Our approach is capable of bridging the gap between various structures with geometric scaling and under different loading scenarios, and the convergence of training is also greatly accelerated through not only the layer freezing but also the multi-task weight inheritance from pre-trained models, thus making it possible to be applied as surrogate models in actual engineering projects.
Nonparametric estimation for semilinear SPDEs, namely stochastic reaction-diffusion equations in one space dimension, is studied. We consider observations of the solution field on a discrete grid in time and space with infill asymptotics in both coordinates. Firstly, we derive a nonparametric estimator for the reaction function of the underlying equation. The estimate is chosen from a finite-dimensional function space based on a least squares criterion. Oracle inequalities provide conditions for the estimator to achieve the usual nonparametric rate of convergence. Adaptivity is provided via model selection. Secondly, we show that the asymptotic properties of realized quadratic variation based estimators for the diffusivity and volatility carry over from linear SPDEs. In particular, we obtain a rate-optimal joint estimator of the two parameters. The result relies on our precise analysis of the H\"older regularity of the solution process and its nonlinear component, which may be of its own interest. Both steps of the calibration can be carried out simultaneously without prior knowledge of the parameters.
This paper is devoted to the construction and analysis of immersed finite element (IFE) methods in three dimensions. Different from the 2D case, the points of intersection of the interface and the edges of a tetrahedron are usually not coplanar, which makes the extension of the original 2D IFE methods based on a piecewise linear approximation of the interface to the 3D case not straightforward. We address this coplanarity issue by an approach where the interface is approximated via discrete level set functions. This approach is very convenient from a computational point of view since in many practical applications the exact interface is often unknown, and only a discrete level set function is available. As this approach has also not be considered in the 2D IFE methods, in this paper we present a unified framework for both 2D and 3D cases. We consider an IFE method based on the traditional Crouzeix-Raviart element using integral values on faces as degrees of freedom. The novelty of the proposed IFE is the unisolvence of basis functions on arbitrary triangles/tetrahedrons without any angle restrictions even for anisotropic interface problems, which is advantageous over the IFE using nodal values as degrees of freedom. The optimal bounds for the IFE interpolation errors are proved on shape-regular triangulations. For the IFE method, optimal a priori error and condition number estimates are derived with constants independent of the location of the interface with respect to the unfitted mesh. The extension to anisotropic interface problems with tensor coefficients is also discussed. Numerical examples supporting the theoretical results are provided.
Using gradient descent (GD) with fixed or decaying step-size is a standard practice in unconstrained optimization problems. However, when the loss function is only locally convex, such a step-size schedule artificially slows GD down as it cannot explore the flat curvature of the loss function. To overcome that issue, we propose to exponentially increase the step-size of the GD algorithm. Under homogeneous assumptions on the loss function, we demonstrate that the iterates of the proposed \emph{exponential step size gradient descent} (EGD) algorithm converge linearly to the optimal solution. Leveraging that optimization insight, we then consider using the EGD algorithm for solving parameter estimation under both regular and non-regular statistical models whose loss function becomes locally convex when the sample size goes to infinity. We demonstrate that the EGD iterates reach the final statistical radius within the true parameter after a logarithmic number of iterations, which is in stark contrast to a \emph{polynomial} number of iterations of the GD algorithm in non-regular statistical models. Therefore, the total computational complexity of the EGD algorithm is \emph{optimal} and exponentially cheaper than that of the GD for solving parameter estimation in non-regular statistical models while being comparable to that of the GD in regular statistical settings. To the best of our knowledge, it resolves a long-standing gap between statistical and algorithmic computational complexities of parameter estimation in non-regular statistical models. Finally, we provide targeted applications of the general theory to several classes of statistical models, including generalized linear models with polynomial link functions and location Gaussian mixture models.
In the Colored Clustering problem, one is asked to cluster edge-colored (hyper-)graphs whose colors represent interaction types. More specifically, the goal is to select as many edges as possible without choosing two edges that share an endpoint and are colored differently. Equivalently, the goal can also be described as assigning colors to the vertices in a way that fits the edge-coloring as well as possible. As this problem is NP-hard, we build on previous work by studying its parameterized complexity. We give a $2^{\mathcal O(k)} \cdot n^{\mathcal O(1)}$-time algorithm where $k$ is the number of edges to be selected and $n$ the number of vertices. We also prove the existence of a problem kernel of size $\mathcal O(k^{5/2} )$, resolving an open problem posed in the literature. We consider parameters that are smaller than $k$, the number of edges to be selected, and $r$, the number of edges that can be deleted. Such smaller parameters are obtained by considering the difference between $k$ or $r$ and some lower bound on these values. We give both algorithms and lower bounds for Colored Clustering with such parameterizations. Finally, we settle the parameterized complexity of Colored Clustering with respect to structural graph parameters by showing that it is $W[1]$-hard with respect to both vertex cover number and tree-cut width, but fixed-parameter tractable with respect to slim tree-cut width.
We show that independent and uniformly distributed sampling points are as good as optimal sampling points for the approximation of functions from the Sobolev space $W_p^s(\Omega)$ on bounded convex domains $\Omega\subset \mathbb{R}^d$ in the $L_q$-norm if $q<p$. More generally, we characterize the quality of arbitrary sampling points $P\subset \Omega$ via the $L_\gamma(\Omega)$-norm of the distance function $\rm{dist}(\cdot,P)$, where $\gamma=s(1/q-1/p)^{-1}$ if $q<p$ and $\gamma=\infty$ if $q\ge p$. This improves upon previous characterizations based on the covering radius of $P$.
Probabilistic programs are typically normal-looking programs describing posterior probability distributions. They intrinsically code up randomized algorithms and have long been at the heart of modern machine learning and approximate computing. We explore the theory of generating functions [19] and investigate its usage in the exact quantitative reasoning of probabilistic programs. Important topics include the exact representation of program semantics [13], proving exact program equivalence [5], and -- as our main focus in this extended abstract -- exact probabilistic inference. In probabilistic programming, inference aims to derive a program's posterior distribution. In contrast to approximate inference, inferring exact distributions comes with several benefits [8], e.g., no loss of precision, natural support for symbolic parameters, and efficiency on models with certain structures. Exact probabilistic inference, however, is a notoriously hard task [6,12,17,18]. The challenges mainly arise from three program constructs: (1) unbounded while-loops and/or recursion, (2) infinite-support distributions, and (3) conditioning (via posterior observations). We present our ongoing research in addressing these challenges (with a focus on conditioning) leveraging generating functions and show their potential in facilitating exact probabilistic inference for discrete probabilistic programs.
In many industrial applications, obtaining labeled observations is not straightforward as it often requires the intervention of human experts or the use of expensive testing equipment. In these circumstances, active learning can be highly beneficial in suggesting the most informative data points to be used when fitting a model. Reducing the number of observations needed for model development alleviates both the computational burden required for training and the operational expenses related to labeling. Online active learning, in particular, is useful in high-volume production processes where the decision about the acquisition of the label for a data point needs to be taken within an extremely short time frame. However, despite the recent efforts to develop online active learning strategies, the behavior of these methods in the presence of outliers has not been thoroughly examined. In this work, we investigate the performance of online active linear regression in contaminated data streams. Our study shows that the currently available query strategies are prone to sample outliers, whose inclusion in the training set eventually degrades the predictive performance of the models. To address this issue, we propose a solution that bounds the search area of a conditional D-optimal algorithm and uses a robust estimator. Our approach strikes a balance between exploring unseen regions of the input space and protecting against outliers. Through numerical simulations, we show that the proposed method is effective in improving the performance of online active learning in the presence of outliers, thus expanding the potential applications of this powerful tool.
In this paper we consider mean-field optimal control problems with selective action of the control, where the constraint is a continuity equation involving a non-local term and diffusion. First order optimality conditions are formally derived in a general framework, accounting for boundary conditions. Hence, the optimality system is used to construct a reduced gradient method, where we introduce a novel algorithm for the numerical realization of the forward and the backward equations, based on exponential integrators. We illustrate extensive numerical experiments on different control problems for collective motion in the context of opinion formation and pedestrian dynamics.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.