Many problems in computational science and engineering can be described in terms of approximating a smooth function of $d$ variables, defined over an unknown domain of interest $\Omega\subset \mathbb{R}^d$, from sample data. Here both the curse of dimensionality ($d\gg 1$) and the lack of domain knowledge with $\Omega$ potentially irregular and/or disconnected are confounding factors for sampling-based methods. Na\"{i}ve approaches often lead to wasted samples and inefficient approximation schemes. For example, uniform sampling can result in upwards of 20\% wasted samples in some problems. In surrogate model construction in computational uncertainty quantification (UQ), the high cost of computing samples needs a more efficient sampling procedure. In the last years, methods for computing such approximations from sample data have been studied in the case of irregular domains. The advantages of computing sampling measures depending on an approximation space $P$ of $\dim(P)=N$ have been shown. In particular, such methods confer advantages such as stability and well-conditioning, with $\mathcal{O}(N\log(N))$ as sample complexity. The recently-proposed adaptive sampling for general domains (ASGD) strategy is one method to construct these sampling measures. The main contribution of this paper is to improve ASGD by adaptively updating the sampling measures over unknown domains. We achieve this by first introducing a general domain adaptivity strategy (GDAS), which approximates the function and domain of interest from sample points. Second, we propose adaptive sampling for unknown domains (ASUD), which generates sampling measures over a domain that may not be known in advance. Then, we derive least squares techniques for polynomial approximation on unknown domains. Numerical results show that the ASUD approach can reduce the computational cost by as 50\% when compared with uniform sampling.
We investigate optimal execution problems with instantaneous price impact and stochastic resilience. First, in the setting of linear price impact function we derive a closed-form recursion for the optimal strategy, generalizing previous results with deterministic transient price impact. Second, we develop a numerical algorithm for the case of nonlinear price impact. We utilize an actor-critic framework that constructs two neural-network surrogates for the value function and the feedback control. One advantage of such functional approximators is the ability to do parametric learning, i.e. to incorporate some of the model parameters as part of the input space. Precise calibration of price impact, resilience, etc., is known to be extremely challenging and hence it is critical to understand sensitivity of the strategy to these parameters. Our parametric neural network (NN) learner organically scales across 3-6 input dimensions and is shown to accurately approximate optimal strategy across a range of parameter configurations. We provide a fully reproducible Jupyter Notebook with our NN implementation, which is of independent pedagogical interest, demonstrating the ease of use of NN surrogates in (parametric) stochastic control problems.
Covariance estimation for matrix-valued data has received an increasing interest in applications. Unlike previous works that rely heavily on matrix normal distribution assumption and the requirement of fixed matrix size, we propose a class of distribution-free regularized covariance estimation methods for high-dimensional matrix data under a separability condition and a bandable covariance structure. Under these conditions, the original covariance matrix is decomposed into a Kronecker product of two bandable small covariance matrices representing the variability over row and column directions. We formulate a unified framework for estimating bandable covariance, and introduce an efficient algorithm based on rank one unconstrained Kronecker product approximation. The convergence rates of the proposed estimators are established, and the derived minimax lower bound shows our proposed estimator is rate-optimal under certain divergence regimes of matrix size. We further introduce a class of robust covariance estimators and provide theoretical guarantees to deal with heavy-tailed data. We demonstrate the superior finite-sample performance of our methods using simulations and real applications from a gridded temperature anomalies dataset and a S&P 500 stock data analysis.
The naive importance sampling (IS) estimator generally does not work well in examples involving simultaneous inference on several targets, as the importance weights can take arbitrarily large values, making the estimator highly unstable. In such situations, alternative multiple IS estimators involving samples from multiple proposal distributions are preferred. Just like the naive IS, the success of these multiple IS estimators crucially depends on the choice of the proposal distributions. The selection of these proposal distributions is the focus of this article. We propose three methods: (i) a geometric space filling approach, (ii) a minimax variance approach, and (iii) a maximum entropy approach. The first two methods are applicable to any IS estimator, whereas the third approach is described in the context of Doss's (2010) two-stage IS estimator. For the first method, we propose a suitable measure of 'closeness' based on the symmetric Kullback-Leibler divergence, while the second and third approaches use estimates of asymptotic variances of Doss's (2010) IS estimator and Geyer's (1994) reverse logistic regression estimator, respectively. Thus, when samples from the proposal distributions are obtained by running Markov chains, we provide consistent spectral variance estimators for these asymptotic variances. The proposed methods for selecting proposal densities are illustrated using various detailed examples.
We define positive and strictly positive definite functions on a domain and study these functions on a list of regular domains. The list includes the unit ball, conic surface, hyperbolic surface, solid hyperboloid, and simplex. Each of these domains is embedded in a quadrant or a union of quadrants of the unit sphere by a distance preserving map, from which characterizations of positive definite and strictly positive definite functions are derived for these regular domains.
In the storied Colonel Blotto game, two colonels allocate $a$ and $b$ troops, respectively, to $k$ distinct battlefields. A colonel wins a battle if they assign more troops to that particular battle, and each colonel seeks to maximize their total number of victories. Despite the problem's formulation in 1921, the first polynomial-time algorithm to compute Nash equilibrium (NE) strategies for this game was discovered only quite recently. In 2016, \citep{ahmadinejad_dehghani_hajiaghayi_lucier_mahini_seddighin_2019} formulated a breakthrough algorithm to compute NE strategies for the Colonel Blotto game\footnote{To the best of our knowledge, the algorithm from \citep{ahmadinejad_dehghani_hajiaghayi_lucier_mahini_seddighin_2019} has computational complexity $O(k^{14}\max\{a,b\}^{13})$}, receiving substantial media coverage (e.g. \citep{Insider}, \citep{NSF}, \citep{ScienceDaily}). In this work, we present the first known $\epsilon$-approximation algorithm to compute NE strategies in the two-player Colonel Blotto game in runtime $\widetilde{O}(\epsilon^{-4} k^8 \max\{a,b\}^2)$ for arbitrary settings of these parameters. Moreover, this algorithm computes approximate coarse correlated equilibrium strategies in the multiplayer (continuous and discrete) Colonel Blotto game (when there are $\ell > 2$ colonels) with runtime $\widetilde{O}(\ell \epsilon^{-4} k^8 n^2 + \ell^2 \epsilon^{-2} k^3 n (n+k))$, where $n$ is the maximum troop count. Before this work, no polynomial-time algorithm was known to compute exact or approximate equilibrium (in any sense) strategies for multiplayer Colonel Blotto with arbitrary parameters. Our algorithm computes these approximate equilibria by a novel (to the author's knowledge) sampling technique with which we implicitly perform multiplicative weights update over the exponentially many strategies available to each player.
Bayesian model selection provides a powerful framework for objectively comparing models directly from observed data, without reference to ground truth data. However, Bayesian model selection requires the computation of the marginal likelihood (model evidence), which is computationally challenging, prohibiting its use in many high-dimensional Bayesian inverse problems. With Bayesian imaging applications in mind, in this work we present the proximal nested sampling methodology to objectively compare alternative Bayesian imaging models for applications that use images to inform decisions under uncertainty. The methodology is based on nested sampling, a Monte Carlo approach specialised for model comparison, and exploits proximal Markov chain Monte Carlo techniques to scale efficiently to large problems and to tackle models that are log-concave and not necessarily smooth (e.g., involving l_1 or total-variation priors). The proposed approach can be applied computationally to problems of dimension O(10^6) and beyond, making it suitable for high-dimensional inverse imaging problems. It is validated on large Gaussian models, for which the likelihood is available analytically, and subsequently illustrated on a range of imaging problems where it is used to analyse different choices of dictionary and measurement model.
One of the most important problems in system identification and statistics is how to estimate the unknown parameters of a given model. Optimization methods and specialized procedures, such as Empirical Minimization (EM) can be used in case the likelihood function can be computed. For situations where one can only simulate from a parametric model, but the likelihood is difficult or impossible to evaluate, a technique known as the Two-Stage (TS) Approach can be applied to obtain reliable parametric estimates. Unfortunately, there is currently a lack of theoretical justification for TS. In this paper, we propose a statistical decision-theoretical derivation of TS, which leads to Bayesian and Minimax estimators. We also show how to apply the TS approach on models for independent and identically distributed samples, by computing quantiles of the data as a first step, and using a linear function as the second stage. The proposed method is illustrated via numerical simulations.
Convolutional neural network (CNN) achieves impressive success in the field of computer vision during the past few decades. As the core of CNNs, image convolution operation helps CNNs to achieve good performance on image-related tasks. However, image convolution is hard to be implemented and parallelized. In this paper, we propose a novel neural network model, namely CEMNet, that can be trained in frequency domain. The most important motivation of this research is that we can use the very simple element-wise multiplication operation to replace the image convolution in frequency domain based on Cross-Correlation Theorem. We further introduce Weight Fixation Mechanism to alleviate over-fitting, and analyze the working behavior of Batch Normalization, Leaky ReLU and Dropout in frequency domain to design their counterparts for CEMNet. Also, to deal with complex inputs brought by DFT, we design two branch network structure for CEMNet. Experimental results imply that CEMNet works well in frequency domain, and achieve good performance on MNIST and CIFAR-10 databases. To our knowledge, CEMNet is the first model trained in Fourier Domain that achieves more than 70\% validation accuracy on CIFAR-10 database.
The adaptive processing of structured data is a long-standing research topic in machine learning that investigates how to automatically learn a mapping from a structured input to outputs of various nature. Recently, there has been an increasing interest in the adaptive processing of graphs, which led to the development of different neural network-based methodologies. In this thesis, we take a different route and develop a Bayesian Deep Learning framework for graph learning. The dissertation begins with a review of the principles over which most of the methods in the field are built, followed by a study on graph classification reproducibility issues. We then proceed to bridge the basic ideas of deep learning for graphs with the Bayesian world, by building our deep architectures in an incremental fashion. This framework allows us to consider graphs with discrete and continuous edge features, producing unsupervised embeddings rich enough to reach the state of the art on several classification tasks. Our approach is also amenable to a Bayesian nonparametric extension that automatizes the choice of almost all model's hyper-parameters. Two real-world applications demonstrate the efficacy of deep learning for graphs. The first concerns the prediction of information-theoretic quantities for molecular simulations with supervised neural models. After that, we exploit our Bayesian models to solve a malware-classification task while being robust to intra-procedural code obfuscation techniques. We conclude the dissertation with an attempt to blend the best of the neural and Bayesian worlds together. The resulting hybrid model is able to predict multimodal distributions conditioned on input graphs, with the consequent ability to model stochasticity and uncertainty better than most works. Overall, we aim to provide a Bayesian perspective into the articulated research field of deep learning for graphs.
Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph. This allows one to apply GCNs not only in scenarios where the given graph is incomplete or corrupted but also in those where a graph is not available. We conduct a series of experiments that analyze the behavior of the proposed method and demonstrate that it outperforms related methods by a significant margin.