By learning the mappings between infinite function spaces using carefully designed neural networks, the operator learning methodology has exhibited significantly more efficiency than traditional methods in solving complex problems such as differential equations, but faces concerns about their accuracy and reliability. To overcomes these limitations, combined with the structures of the spectral numerical method, a general neural architecture named spectral operator learning (SOL) is introduced, and one variant called the orthogonal polynomial neural operator (OPNO), developed for PDEs with Dirichlet, Neumann and Robin boundary conditions (BCs), is proposed later. The strict BC satisfaction properties and the universal approximation capacity of the OPNO are theoretically proven. A variety of numerical experiments with physical backgrounds show that the OPNO outperforms other existing deep learning methodologies, as well as the traditional 2nd-order finite difference method (FDM) with a considerably fine mesh (with the relative errors reaching the order of 1e-6), and is up to almost 5 magnitudes faster than the traditional method.
We derive well-posed boundary conditions for the linearized Serre equations in one spatial dimension by utilizing the energy method. An energy stable and conservative discontinuous Galerkin spectral element method with simple upwind numerical fluxes is proposed for solving the initial boundary value problem. We derive discrete energy estimates for the numerical approximation and prove a priori error estimates in the energy norm. Detailed numerical examples are provided to verify the theoretical analysis and show convergence of numerical errors.
While interference in time domain (caused by path difference) is mitigated by OFDM modulation, interference in frequency domain (due to velocity difference), can be mitigated by OTFS modulation. However, in non-stationary channels, the relative difference in acceleration will cause Inter-Doppler Interference (IDI) and a modulation method for mitigating IDI does not exist in the literature. Both methods in the literature use carriers in a specific domain which achieve orthogonality in the target domain to mitigate interference. Moreover, those modulation cannot directly incorporate space domain, which requires additional precoding technique to mitigate inter-user interference (IUI) for MU-MIMO channels. This work presents a generalized modulation for any multidimensional channel. Recently, Higher Order Mercer's Theorem (HOGMT) [1] has been proposed to decompose multi-user non-stationary channels into independent fading subchannels (Eigenwaves). Based on HOGMT decomposition, we develop Multidimensional Eigenwaves Multiplexing (MEM) modulation which uses jointly orthogonal eigenwaves, decomposed from the multidimensional channel as subcarriers. Data symbols modulated by these eigenwaves can achieve orthogonality across each degree of freedom(\eg space (users/antennas), time-frequency and delay-Doppler). Consequently, the transmitted remain independent over the high dimensional channel, thereby avoiding interference from other symbols.
Gaussian process regression underpins countless academic and industrial applications of machine learning and statistics, with maximum likelihood estimation routinely used to select appropriate parameters for the covariance kernel. However, it remains an open problem to establish the circumstances in which maximum likelihood estimation is well-posed, that is, when the predictions of the regression model are insensitive to small perturbations of the data. This article identifies scenarios where the maximum likelihood estimator fails to be well-posed, in that the predictive distributions are not Lipschitz in the data with respect to the Hellinger distance. These failure cases occur in the noiseless data setting, for any Gaussian process with a stationary covariance function whose lengthscale parameter is estimated using maximum likelihood. Although the failure of maximum likelihood estimation is part of Gaussian process folklore, these rigorous theoretical results appear to be the first of their kind. The implication of these negative results is that well-posedness may need to be assessed post-hoc, on a case-by-case basis, when maximum likelihood estimation is used to train a Gaussian process model.
Obtaining guarantees on the convergence of the minimizers of empirical risks to the ones of the true risk is a fundamental matter in statistical learning. Instead of deriving guarantees on the usual estimation error, the goal of this paper is to provide concentration inequalities on the distance between the sets of minimizers of the risks for a broad spectrum of estimation problems. In particular, the risks are defined on metric spaces through probability measures that are also supported on metric spaces. A particular attention will therefore be given to include unbounded spaces and non-convex cost functions that might also be unbounded. This work identifies a set of assumptions allowing to describe a regime that seem to govern the concentration in many estimation problems, where the empirical minimizers are stable. This stability can then be leveraged to prove parametric concentration rates in probability and in expectation. The assumptions are verified, and the bounds showcased, on a selection of estimation problems such as barycenters on metric space with positive or negative curvature, subspaces of covariance matrices, regression problems and entropic-Wasserstein barycenters.
We study multivariate Gaussian statistical models whose maximum likelihood estimator (MLE) is a rational function of the observed data. We establish a one-to-one correspondence between such models and the solutions to a nonlinear first-order partial differential equation (PDE). Using our correspondence, we reinterpret familiar classes of models with rational MLE, such as directed (and decomposable undirected) Gaussian graphical models. We also find new models with rational MLE. For linear concentration models with rational MLE, we show that homaloidal polynomials from birational geometry lead to solutions to the PDE. We thus shed light on the problem of classifying Gaussian models with rational MLE by relating it to the open problem in birational geometry of classifying homaloidal polynomials.
The idea of embedding optimization problems into deep neural networks as optimization layers to encode constraints and inductive priors has taken hold in recent years. Most existing methods focus on implicitly differentiating Karush-Kuhn-Tucker (KKT) conditions in a way that requires expensive computations on the Jacobian matrix, which can be slow and memory-intensive. In this paper, we developed a new framework, named Alternating Differentiation (Alt-Diff), that differentiates optimization problems (here, specifically in the form of convex optimization problems with polyhedral constraints) in a fast and recursive way. Alt-Diff decouples the differentiation procedure into a primal update and a dual update in an alternating way. Accordingly, Alt-Diff substantially decreases the dimensions of the Jacobian matrix especially for optimization with large-scale constraints and thus increases the computational speed of implicit differentiation. We show that the gradients obtained by Alt-Diff are consistent with those obtained by differentiating KKT conditions. In addition, we propose to truncate Alt-Diff to further accelerate the computational speed. Under some standard assumptions, we show that the truncation error of gradients is upper bounded by the same order of variables' estimation error. Therefore, Alt-Diff can be truncated to further increase computational speed without sacrificing much accuracy. A series of comprehensive experiments validate the superiority of Alt-Diff.
Reflectance losses on solar mirrors due to soiling pose a formidable challenge for Concentrating Solar Power (CSP) plants. Soiling can vary significantly from site to site -- from fractions of a percent to several percentage points per day (pp/day), a fact that has motivated several studies in soiling predictive modelling. Yet, existing studies have so far neglected the characterization of statistical uncertainty in their parameters and predictions. In this paper, two reflectance loss models are proposed that model uncertainty: an extension of a previously developed physical model and a simplified model. A novel uncertainty characterization enables Maximum Likelihood Estimation techniques for parameter estimation for both models, and permits the estimation of parameter (and prediction) confidence intervals. The models are applied to data from ten soiling campaigns conducted at three Australian sites (Brisbane, Mount Isa, Wodonga). The simplified model produces high-quality predictions of soiling losses on novel data, while the semi-physical model performance is mixed. The statistical distributions of daily losses were estimated for different dust loadings. Under median conditions, the daily soiling losses for Brisbane, Mount Isa, and Wodonga are estimated as $0.53 \pm 0.62$, $0.1 \pm 0.1$, and $0.57 \pm 0.14$ pp/day, respectively. Yet, higher observed dust loadings can drive average losses as high as $2.50$ pp/day. Overall, the results suggest a relatively simple approach characterizing the statistical distributions of soiling losses using airborne dust measurements and short reflectance monitoring campaigns.
Practitioners often use data from a randomized controlled trial to learn a treatment assignment policy that can be deployed on a target population. A recurring concern in doing so is that, even if the randomized trial was well-executed (i.e., internal validity holds), the study participants may not represent a random sample of the target population (i.e., external validity fails)--and this may lead to policies that perform suboptimally on the target population. We consider a model where observable attributes can impact sample selection probabilities arbitrarily but the effect of unobservable attributes is bounded by a constant, and we aim to learn policies with the best possible performance guarantees that hold under any sampling bias of this type. In particular, we derive the partial identification result for the worst-case welfare in the presence of sampling bias and show that the optimal max-min, max-min gain, and minimax regret policies depend on both the conditional average treatment effect (CATE) and the conditional value-at-risk (CVaR) of potential outcomes given covariates. To avoid finite-sample inefficiencies of plug-in estimates, we further provide an end-to-end procedure for learning the optimal max-min and max-min gain policies that does not require the separate estimation of nuisance parameters.
PCA-Net is a recently proposed neural operator architecture which combines principal component analysis (PCA) with neural networks to approximate operators between infinite-dimensional function spaces. The present work develops approximation theory for this approach, improving and significantly extending previous work in this direction: First, a novel universal approximation result is derived, under minimal assumptions on the underlying operator and the data-generating distribution. Then, two potential obstacles to efficient operator learning with PCA-Net are identified, and made precise through lower complexity bounds; the first relates to the complexity of the output distribution, measured by a slow decay of the PCA eigenvalues. The other obstacle relates to the inherent complexity of the space of operators between infinite-dimensional input and output spaces, resulting in a rigorous and quantifiable statement of the curse of dimensionality. In addition to these lower bounds, upper complexity bounds are derived. A suitable smoothness criterion is shown to ensure an algebraic decay of the PCA eigenvalues. Furthermore, it is shown that PCA-Net can overcome the general curse of dimensionality for specific operators of interest, arising from the Darcy flow and the Navier-Stokes equations.
High-order implicit shock tracking (fitting) is a class of high-order, optimization-based numerical methods to approximate solutions of conservation laws with non-smooth features by aligning elements of the computational mesh with non-smooth features. This ensures the non-smooth features are perfectly represented by inter-element jumps and high-order basis functions approximate smooth regions of the solution without nonlinear stabilization, which leads to accurate approximations on traditionally coarse meshes. In this work, we introduce a robust implicit shock tracking framework specialized for problems with parameter-dependent lead shocks (i.e., shocks separating a farfield condition from the downstream flow), which commonly arise in high-speed aerodynamics and astrophysics applications. After a shock-aligned mesh is produced at one parameter configuration, all elements upstream of the lead shock are removed and the nodes on the lead shock are positioned for new parameter configurations using the implicit shock tracking solver. The proposed framework can be used for most many-query applications involving parametrized lead shocks such as optimization, uncertainty quantification, parameter sweeps, "what-if" scenarios, or parameter-based continuation. We demonstrate the robustness and flexibility of the framework using a one-dimensional space-time Riemann problem, and two- and three-dimensional supersonic and hypersonic benchmark problems.