Many problems in computer science reduce to the recovery of an $n$-sparse measure from its (generalized) moments. Sparse measure recovery has been the research focus in super-resolution, tensor decomposition, and learning neural networks. The existing methods use either convex relaxations or overparameterization for recovery. Here, we propose recovery with non-convex optimization without overparameterization. Our algorithm is a (sub)gradient descent method optimizing a non-convex energy function studied in physics. We establish the global convergence of gradient descent on the energy function. This result enables us to solve super-resolution in $O(n^2)$ time, which significantly improves upon $O(n^3)$ time for solving convex relaxations. For a particular neural network, we prove the global convergence of subgradient descent on the population loss without overparameterization. The studied network has zero-one activations, and inputs drawn from the unit sphere.
In this paper, an efficient ensemble domain decomposition algorithm is proposed for fast solving the fully-mixed random Stokes-Darcy model with the physically realistic Beavers-Joseph (BJ) interface conditions. We utilize the Monte Carlo method for the coupled model with random inputs to derive some deterministic Stokes-Darcy numerical models and use the idea of the ensemble to realize the fast computation of multiple problems. One remarkable feature of the algorithm is that multiple linear systems share a common coefficient matrix in each deterministic numerical model, which significantly reduces the computational cost and achieves comparable accuracy with the traditional methods. Moreover, by domain decomposition, we can decouple the Stokes-Darcy system into two smaller sub-physics problems naturally. Both mesh-dependent and mesh-independent convergence rates of the algorithm are rigorously derived by choosing suitable Robin parameters. Optimized Robin parameters are derived and analyzed to accelerate the convergence of the proposed algorithm. Especially, for small hydraulic conductivity in practice, the almost optimal geometric convergence can be obtained by finite element discretization. Finally, two groups of numerical experiments are conducted to validate and illustrate the exclusive features of the proposed algorithm.
This work introduces a method to select linear functional measurements of a vector-valued time series optimized for forecasting distant time-horizons. By formulating and solving the problem of sequential linear measurement design as an infinite-horizon problem with the time-averaged trace of the Cram\'{e}r-Rao lower bound (CRLB) for forecasting as the cost, the most informative data can be collected irrespective of the eventual forecasting algorithm. By introducing theoretical results regarding measurements under additive noise from natural exponential families, we construct an equivalent problem from which a local dimensionality reduction can be derived. This alternative formulation is based on the future collapse of dimensionality inherent in the limiting behavior of many differential equations and can be directly observed in the low-rank structure of the CRLB for forecasting. Implementations of both an approximate dynamic programming formulation and the proposed alternative are illustrated using an extended Kalman filter for state estimation, with results on simulated systems with limit cycles and chaotic behavior demonstrating a linear improvement in the CRLB as a function of the number of collapsing dimensions of the system.
A class of generative models that unifies flow-based and diffusion-based methods is introduced. These models extend the framework proposed in Albergo & Vanden-Eijnden (2023), enabling the use of a broad class of continuous-time stochastic processes called `stochastic interpolants' to bridge any two arbitrary probability density functions exactly in finite time. These interpolants are built by combining data from the two prescribed densities with an additional latent variable that shapes the bridge in a flexible way. The time-dependent probability density function of the stochastic interpolant is shown to satisfy a first-order transport equation as well as a family of forward and backward Fokker-Planck equations with tunable diffusion. Upon consideration of the time evolution of an individual sample, this viewpoint immediately leads to both deterministic and stochastic generative models based on probability flow equations or stochastic differential equations with an adjustable level of noise. The drift coefficients entering these models are time-dependent velocity fields characterized as the unique minimizers of simple quadratic objective functions, one of which is a new objective for the score of the interpolant density. Remarkably, we show that minimization of these quadratic objectives leads to control of the likelihood for any of our generative models built upon stochastic dynamics. By contrast, we establish that generative models based upon a deterministic dynamics must, in addition, control the Fisher divergence between the target and the model. We also construct estimators for the likelihood and the cross-entropy of interpolant-based generative models, discuss connections with other stochastic bridges, and demonstrate that such models recover the Schr\"odinger bridge between the two target densities when explicitly optimizing over the interpolant.
Measurement error (ME) and missing values in covariates are often unavoidable in disciplines that deal with data, and both problems have separately received considerable attention during the past decades. However, while most researchers are familiar with methods for treating missing data, accounting for ME in covariates of regression models is less common. In addition, ME and missing data are typically treated as two separate problems, despite practical and theoretical similarities. Here, we exploit the fact that missing data in a continuous covariate is an extreme case of classical ME, allowing us to use existing methodology that accounts for ME via a Bayesian framework that employs integrated nested Laplace approximations (INLA), and thus to simultaneously account for both ME and missing data in the same covariate. As a useful by-product, we present an approach to handle missing data in INLA, since this corresponds to the special case when no ME is present. In addition, we show how to account for Berkson ME in the same framework. In its broadest generality, the proposed joint Bayesian framework can thus account for Berkson ME, classical ME, and missing data, or for any combination of these in the same or different continuous covariates of the family of regression models that are feasible with INLA. The approach is exemplified using both simulated and real data. We provide extensive and fully reproducible Supplementary Material with thoroughly documented examples using {R-INLA} and {inlabru}.
The Divide and Distribute Fixed Weights algorithm (ddfw) is a dynamic local search SAT-solving algorithm that transfers weight from satisfied to falsified clauses in local minima. ddfw is remarkably effective on several hard combinatorial instances. Yet, despite its success, it has received little study since its debut in 2005. In this paper, we propose three modifications to the base algorithm: a linear weight transfer method that moves a dynamic amount of weight between clauses in local minima, an adjustment to how satisfied clauses are chosen in local minima to give weight, and a weighted-random method of selecting variables to flip. We implemented our modifications to ddfw on top of the solver yalsat. Our experiments show that our modifications boost the performance compared to the original ddfw algorithm on multiple benchmarks, including those from the past three years of SAT competitions. Moreover, our improved solver exclusively solves hard combinatorial instances that refute a conjecture on the lower bound of two Van der Waerden numbers set forth by Ahmed et al. (2014), and it performs well on a hard graph-coloring instance that has been open for over three decades.
Verifying relations between programs arises as a task in various verification contexts such as optimizing transformations, relating new versions of programs with older versions (regression verification), and noninterference. However, relational verification for programs acting on dynamically allocated mutable state is not well supported by existing tools, which provide a high level of automation at the cost of restricting the programs considered. Auto-active tools, on the other hand, require more user interaction but enable verification of a broader class of programs. This article presents WhyRel, a tool for the auto-active verification of relational properties of pointer programs based on relational region logic. WhyRel is evaluated through verification case studies, relying on SMT solvers orchestrated by the Why3 platform on which it builds. Case studies include establishing representation independence of ADTs, showing noninterference, and challenge problems from recent literature.
Quantization of a probability measure means representing it with a finite set of Dirac masses that approximates the input distribution well enough (in some metric space of probability measures). Various methods exists to do so, but the situation of quantizing a conditional law has been less explored. We propose a method, called DCMQ, involving a Huber-energy kernel-based approach coupled with a deep neural network architecture. The method is tested on several examples and obtains promising results.
Deep learning surrogate models are being increasingly used in accelerating scientific simulations as a replacement for costly conventional numerical techniques. However, their use remains a significant challenge when dealing with real-world complex examples. In this work, we demonstrate three types of neural network architectures for efficient learning of highly non-linear deformations of solid bodies. The first two architectures are based on the recently proposed CNN U-NET and MAgNET (graph U-NET) frameworks which have shown promising performance for learning on mesh-based data. The third architecture is Perceiver IO, a very recent architecture that belongs to the family of attention-based neural networks--a class that has revolutionised diverse engineering fields and is still unexplored in computational mechanics. We study and compare the performance of all three networks on two benchmark examples, and show their capabilities to accurately predict the non-linear mechanical responses of soft bodies.
Deep Neural Networks (DNNs) have obtained impressive performance across tasks, however they still remain as black boxes, e.g., hard to theoretically analyze. At the same time, Polynomial Networks (PNs) have emerged as an alternative method with a promising performance and improved interpretability but have yet to reach the performance of the powerful DNN baselines. In this work, we aim to close this performance gap. We introduce a class of PNs, which are able to reach the performance of ResNet across a range of six benchmarks. We demonstrate that strong regularization is critical and conduct an extensive study of the exact regularization schemes required to match performance. To further motivate the regularization schemes, we introduce D-PolyNets that achieve a higher-degree of expansion than previously proposed polynomial networks. D-PolyNets are more parameter-efficient while achieving a similar performance as other polynomial networks. We expect that our new models can lead to an understanding of the role of elementwise activation functions (which are no longer required for training PNs). The source code is available at //github.com/grigorisg9gr/regularized_polynomials.
Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.