Online optimization is a well-established optimization paradigm that aims to make a sequence of correct decisions given knowledge of the correct answer to previous decision tasks. Bilevel programming involves a hierarchical optimization problem where the feasible region of the so-called outer problem is restricted by the graph of the solution set mapping of the inner problem. This paper brings these two ideas together and studies an online bilevel optimization setting in which a sequence of time-varying bilevel problems are revealed one after the other. We extend the known regret bounds for single-level online algorithms to the bilevel setting. Specifically, we introduce new notions of bilevel regret, develop an online alternating time-averaged gradient method that is capable of leveraging smoothness, and provide regret bounds in terms of the path-length of the inner and outer minimizer sequences.
Structural monitoring for complex built environments often suffers from mismatch between design, laboratory testing, and actual built parameters. Additionally, real-world structural identification problems encounter many challenges. For example, the lack of accurate baseline models, high dimensionality, and complex multivariate partial differential equations (PDEs) pose significant difficulties in training and learning conventional data-driven algorithms. This paper explores a new framework, dubbed NeuralSI, for structural identification by augmenting PDEs that govern structural dynamics with neural networks. Our approach seeks to estimate nonlinear parameters from governing equations. We consider the vibration of nonlinear beams with two unknown parameters, one that represents geometric and material variations, and another that captures energy losses in the system mainly through damping. The data for parameter estimation is obtained from a limited set of measurements, which is conducive to applications in structural health monitoring where the exact state of an existing structure is typically unknown and only a limited amount of data samples can be collected in the field. The trained model can also be extrapolated under both standard and extreme conditions using the identified structural parameters. We compare with pure data-driven Neural Networks and other classical Physics-Informed Neural Networks (PINNs). Our approach reduces both interpolation and extrapolation errors in displacement distribution by two to five orders of magnitude over the baselines. Code is available at //github.com/human-analysis/neural-structural-identification
Mixed-integer nonlinear optimization is a broad class of problems that feature combinatorial structures and nonlinearities. Typical exact methods combine a branch-and-bound scheme with relaxation and separation subroutines. We investigate the properties and advantages of error-adaptive first-order methods based on the Frank-Wolfe algorithm for this setting, requiring only a gradient oracle for the objective function and linear optimization over the feasible set. In particular, we will study the algorithmic consequences of optimizing with a branch-and-bound approach where the subproblem is solved over the convex hull of the mixed-integer feasible set thanks to linear oracle calls, compared to solving the subproblems over the continuous relaxation of the same set. This novel approach computes feasible solutions while working on a single representation of the polyhedral constraints, leveraging the full extent of Mixed-Integer Programming (MIP) solvers without an outer approximation scheme.
Inference on the minimum clinically important difference, or MCID, is an important practical problem in medicine. The basic idea is that a treatment being statistically significant may not lead to an improvement in the patients' well-being. The MCID is defined as a threshold such that, if a diagnostic measure exceeds this threshold, then the patients are more likely to notice an improvement. Typical formulations use an underspecified model, which makes a genuine Bayesian solution out of reach. Here, for a challenging personalized MCID problem, where the practically-significant threshold depends on patients' profiles, we develop a novel generalized posterior distribution, based on a working binary quantile regression model, that can be used for estimation and inference. The advantage of this formulation is two-fold: we can theoretically control the bias of the misspecified model and it has a latent variable representation which we can leverage for efficient Gibbs sampling. To ensure that the generalized Bayes inferences achieve a level of frequentist reliability, we propose a variation on the so-called generalized posterior calibration algorithm to suitably tune the spread of our proposed posterior.
We investigate online Markov Decision Processes (MDPs) with adversarially changing loss functions and known transitions. We choose dynamic regret as the performance measure, defined as the performance difference between the learner and any sequence of feasible changing policies. The measure is strictly stronger than the standard static regret that benchmarks the learner's performance with a fixed compared policy. We consider three foundational models of online MDPs, including episodic loop-free Stochastic Shortest Path (SSP), episodic SSP, and infinite-horizon MDPs. For these three models, we propose novel online ensemble algorithms and establish their dynamic regret guarantees respectively, in which the results for episodic (loop-free) SSP are provably minimax optimal in terms of time horizon and certain non-stationarity measure. Furthermore, when the online environments encountered by the learner are predictable, we design improved algorithms and achieve better dynamic regret bounds for the episodic (loop-free) SSP; and moreover, we demonstrate impossibility results for the infinite-horizon MDPs.
For the nonlinear Richards equation as an unsaturated flow through heterogeneous media, we build a new coarse-scale approximation algorithm utilizing numerical homogenization. This approach follows deep neural networks (DNNs) to quickly and frequently calculate macroscopic parameters. More specifically, we train neural networks with a training set consisting of stochastic permeability realizations and corresponding computed macroscopic targets (effective permeability tensor, homogenized stiffness matrix, and right-hand side vector). Our proposed deep learning scheme develops nonlinear maps between such permeability fields and macroscopic characteristics, and the treatment for Richards equation's nonlinearity is included in the predicted coarse-scale homogenized stiffness matrix, which is a novelty. This strategy's good performance is demonstrated by several numerical tests in two-dimensional model problems, for predictions of the macroscopic properties and consequently solutions.
We address the problem of recovering a probability measure $P$ over $\R^n$ (e.g. its density $f_P$ if one exists) knowing the associated multivariate spatial rank $R_P$ only. It has been shown in \cite{Kol1997} that multivariate spatial ranks characterize probability measures. We strengthen this result by explictly recovering $f_P$ by $R_P$ in the form of a (potentially fractional) partial differential equation $f_P = \LL_n (R_P)$, where $\LL_n$ is a differential operator given in closed form that depends on $n$. When $P$ admits no density, we further show that the equality $P=\LL_n (R_P)$ still holds in the sense of distributions (i.e. generalized functions). We throughly investigate the regularity properties of spatial ranks and use the PDE we established to give qualitative results on depths contours and regions. %We illustrate the relation between $f_P$ and $R_P$ on a few examples in dimension $2$ and $3$. We study the local properties of the operator $\LL_n$ and show that it is non-local when $n$ is even. We conclude the paper with a partial counterpart to the non-localizability in even dimensions.
Topology inference for network systems (NSs) plays a crucial role in many areas. This paper advocates a causality-based method based on noisy observations from a single trajectory of a NS, which is represented by the state-space model with general directed topology. Specifically, we first prove its close relationships with the ideal Granger estimator for multiple trajectories and the traditional ordinary least squares (OLS) estimator for a single trajectory. Along with this line, we analyze the non-asymptotic inference performance of the proposed method by taking the OLS estimator as a reference, covering both asymptotically and marginally stable systems. The derived convergence rates and accuracy results suggest the proposed method has better performance in addressing potentially correlated observations and achieves zero inference error asymptotically. Besides, an online/recursive version of our method is established for efficient computation or time-varying cases. Extensions on NSs with nonlinear dynamics are also discussed. Comprehensive tests corroborate the theoretical findings and comparisons with other algorithms highlight the superiority of the proposed method.
In this paper, we explore the theory and expand upon the practice of fitness landscape analysis for optimization problems over the space of permutations. Many of the computational and analytical tools for fitness landscape analysis, such as fitness distance correlation, require identifying a distance metric for measuring the similarity of different solutions to the problem. We begin with a survey of the available distance metrics for permutations, and then use principal component analysis to classify these metrics. The result of this analysis aligns with existing classifications of permutation problem types produced through less formal means, including the A-permutation, R-permutation, and P-permutation types, which classifies problems by whether absolute position of permutation elements, relative positions of elements, or general precedence of pairs of elements, is the dominant influence over solution fitness. Additionally, the formal analysis identifies subtypes within these problem categories. We see that the classification can assist in identifying appropriate metrics based on optimization problem feature for use in fitness landscape analysis. Using optimization problems of each class, we also demonstrate how the classification scheme can subsequently inform the choice of mutation operator within an evolutionary algorithm. From this, we present a classification of a variety of mutation operators as a counterpart to that of the metrics. Our implementations of the permutation metrics, permutation mutation operators, and associated evolutionary algorithm, are available in a pair of open source Java libraries. All of the code necessary to recreate our analysis and experimental results are also available as open source.
Modeling multivariate time series has long been a subject that has attracted researchers from a diverse range of fields including economics, finance, and traffic. A basic assumption behind multivariate time series forecasting is that its variables depend on one another but, upon looking closely, it is fair to say that existing methods fail to fully exploit latent spatial dependencies between pairs of variables. In recent years, meanwhile, graph neural networks (GNNs) have shown high capability in handling relational dependencies. GNNs require well-defined graph structures for information propagation which means they cannot be applied directly for multivariate time series where the dependencies are not known in advance. In this paper, we propose a general graph neural network framework designed specifically for multivariate time series data. Our approach automatically extracts the uni-directed relations among variables through a graph learning module, into which external knowledge like variable attributes can be easily integrated. A novel mix-hop propagation layer and a dilated inception layer are further proposed to capture the spatial and temporal dependencies within the time series. The graph learning, graph convolution, and temporal convolution modules are jointly learned in an end-to-end framework. Experimental results show that our proposed model outperforms the state-of-the-art baseline methods on 3 of 4 benchmark datasets and achieves on-par performance with other approaches on two traffic datasets which provide extra structural information.
Image segmentation is considered to be one of the critical tasks in hyperspectral remote sensing image processing. Recently, convolutional neural network (CNN) has established itself as a powerful model in segmentation and classification by demonstrating excellent performances. The use of a graphical model such as a conditional random field (CRF) contributes further in capturing contextual information and thus improving the segmentation performance. In this paper, we propose a method to segment hyperspectral images by considering both spectral and spatial information via a combined framework consisting of CNN and CRF. We use multiple spectral cubes to learn deep features using CNN, and then formulate deep CRF with CNN-based unary and pairwise potential functions to effectively extract the semantic correlations between patches consisting of three-dimensional data cubes. Effective piecewise training is applied in order to avoid the computationally expensive iterative CRF inference. Furthermore, we introduce a deep deconvolution network that improves the segmentation masks. We also introduce a new dataset and experimented our proposed method on it along with several widely adopted benchmark datasets to evaluate the effectiveness of our method. By comparing our results with those from several state-of-the-art models, we show the promising potential of our method.