We consider the problem of inferring the conditional independence graph (CIG) of a sparse, high-dimensional stationary multivariate Gaussian time series. A sparse-group lasso-based frequency-domain formulation of the problem based on frequency-domain sufficient statistic for the observed time series is presented. We investigate an alternating direction method of multipliers (ADMM) approach for optimization of the sparse-group lasso penalized log-likelihood. We provide sufficient conditions for convergence in the Frobenius norm of the inverse PSD estimators to the true value, jointly across all frequencies, where the number of frequencies are allowed to increase with sample size. This results also yields a rate of convergence. We also empirically investigate selection of the tuning parameters based on Bayesian information criterion, and illustrate our approach using numerical examples utilizing both synthetic and real data.
We develop and implement a novel fast bootstrap for dependent data. Our scheme is based on the i.i.d. resampling of the smoothed moment indicators. We characterize the class of parametric and semi-parametric estimation problems for which the method is valid. We show the asymptotic refinements of the proposed procedure, proving that it is higher-order correct under mild assumptions on the time series, the estimating functions, and the smoothing kernel. We illustrate the applicability and the advantages of our procedure for Generalized Empirical Likelihood estimation. As a by-product, our fast bootstrap provides higher-order correct asymptotic confidence distributions. Monte Carlo simulations on an autoregressive conditional duration model provide numerical evidence that the novel bootstrap yields higher-order accurate confidence intervals. A real-data application on dynamics of trading volume of stocks illustrates the advantage of our method over the routinely-applied first-order asymptotic theory, when the underlying distribution of the test statistic is skewed or fat-tailed.
Stochastic PDE eigenvalue problems often arise in the field of uncertainty quantification, whereby one seeks to quantify the uncertainty in an eigenvalue, or its eigenfunction. In this paper we present an efficient multilevel quasi-Monte Carlo (MLQMC) algorithm for computing the expectation of the smallest eigenvalue of an elliptic eigenvalue problem with stochastic coefficients. Each sample evaluation requires the solution of a PDE eigenvalue problem, and so tackling this problem in practice is notoriously computationally difficult. We speed up the approximation of this expectation in four ways: we use a multilevel variance reduction scheme to spread the work over a hierarchy of FE meshes and truncation dimensions; we use QMC methods to efficiently compute the expectations on each level; we exploit the smoothness in parameter space and reuse the eigenvector from a nearby QMC point to reduce the number of iterations of the eigensolver; and we utilise a two-grid discretisation scheme to obtain the eigenvalue on the fine mesh with a single linear solve. The full error analysis of a basic MLQMC algorithm is given in the companion paper [Gilbert and Scheichl, 2022], and so in this paper we focus on how to further improve the efficiency and provide theoretical justification for using nearby QMC points and two-grid methods. Numerical results are presented that show the efficiency of our algorithm, and also show that the four strategies we employ are complementary.
This paper is focused on the optimization approach to the solution of inverse problems. We introduce a stochastic dynamical system in which the parameter-to-data map is embedded, with the goal of employing techniques from nonlinear Kalman filtering to estimate the parameter given the data. The extended Kalman filter (which we refer to as ExKI in the context of inverse problems) can be effective for some inverse problems approached this way, but is impractical when the forward map is not readily differentiable and is given as a black box, and also for high dimensional parameter spaces because of the need to propagate large covariance matrices. Application of ensemble Kalman filters, for example use of the ensemble Kalman inversion (EKI) algorithm, has emerged as a useful tool which overcomes both of these issues: it is derivative free and works with a low-rank covariance approximation formed from the ensemble. In this paper, we work with the ExKI, EKI, and a variant on EKI which we term unscented Kalman inversion (UKI). The paper contains two main contributions. Firstly, we identify a novel stochastic dynamical system in which the parameter-to-data map is embedded. We present theory in the linear case to show exponential convergence of the mean of the filtering distribution to the solution of a regularized least squares problem. This is in contrast to previous work in which the EKI has been employed where the dynamical system used leads to algebraic convergence to an unregularized problem. Secondly, we show that the application of the UKI to this novel stochastic dynamical system yields improved inversion results, in comparison with the application of EKI to the same novel stochastic dynamical system.
Learning to classify time series with limited data is a practical yet challenging problem. Current methods are primarily based on hand-designed feature extraction rules or domain-specific data augmentation. Motivated by the advances in deep speech processing models and the fact that voice data are univariate temporal signals, in this paper, we propose Voice2Series (V2S), a novel end-to-end approach that reprograms acoustic models for time series classification, through input transformation learning and output label mapping. Leveraging the representation learning power of a large-scale pre-trained speech processing model, on 30 different time series tasks we show that V2S performs competitive results on 19 time series classification tasks. We further provide a theoretical justification of V2S by proving its population risk is upper bounded by the source risk and a Wasserstein distance accounting for feature alignment via reprogramming. Our results offer new and effective means to time series classification.
Partial Differential Equations (PDEs) describe several problems relevant to many fields of applied sciences, and their discrete counterparts typically involve the solution of sparse linear systems. In this context, we focus on the analysis of the computational aspects related to the solution of large and sparse linear systems with HPC solvers, by considering the performances of direct and iterative solvers in terms of computational efficiency, scalability, and numerical accuracy. Our aim is to identify the main criteria to support application-domain specialists in the selection of the most suitable solvers, according to the application requirements and available resources. To this end, we discuss how the numerical solver is affected by the regular/irregular discretisation of the input domain, the discretisation of the input PDE with piecewise linear or polynomial basis functions, which generally result in a higher/lower sparsity of the coefficient matrix, and the choice of different initial conditions, which are associated with linear systems with multiple right-hand side terms. Finally, our analysis is independent of the characteristics of the underlying computational architectures, and provides a methodological approach that can be applied to different classes of PDEs or with approximation problems.
Model complexity is a fundamental problem in deep learning. In this paper we conduct a systematic overview of the latest studies on model complexity in deep learning. Model complexity of deep learning can be categorized into expressive capacity and effective model complexity. We review the existing studies on those two categories along four important factors, including model framework, model size, optimization process and data complexity. We also discuss the applications of deep learning model complexity including understanding model generalization capability, model optimization, and model selection and design. We conclude by proposing several interesting future directions.
This paper addresses the difficulty of forecasting multiple financial time series (TS) conjointly using deep neural networks (DNN). We investigate whether DNN-based models could forecast these TS more efficiently by learning their representation directly. To this end, we make use of the dynamic factor graph (DFG) from that we enhance by proposing a novel variable-length attention-based mechanism to render it memory-augmented. Using this mechanism, we propose an unsupervised DNN architecture for multivariate TS forecasting that allows to learn and take advantage of the relationships between these TS. We test our model on two datasets covering 19 years of investment funds activities. Our experimental results show that our proposed approach outperforms significantly typical DNN-based and statistical models at forecasting their 21-day price trajectory.
Importance sampling is one of the most widely used variance reduction strategies in Monte Carlo rendering. In this paper, we propose a novel importance sampling technique that uses a neural network to learn how to sample from a desired density represented by a set of samples. Our approach considers an existing Monte Carlo rendering algorithm as a black box. During a scene-dependent training phase, we learn to generate samples with a desired density in the primary sample space of the rendering algorithm using maximum likelihood estimation. We leverage a recent neural network architecture that was designed to represent real-valued non-volume preserving ('Real NVP') transformations in high dimensional spaces. We use Real NVP to non-linearly warp primary sample space and obtain desired densities. In addition, Real NVP efficiently computes the determinant of the Jacobian of the warp, which is required to implement the change of integration variables implied by the warp. A main advantage of our approach is that it is agnostic of underlying light transport effects, and can be combined with many existing rendering techniques by treating them as a black box. We show that our approach leads to effective variance reduction in several practical scenarios.
Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space, such as the simplex, the time-discretisation error can dominate when we are near the boundary of the space. We demonstrate that while current SGMCMC methods for the simplex perform well in certain cases, they struggle with sparse simplex spaces; when many of the components are close to zero. However, most popular large-scale applications of Bayesian inference on simplex spaces, such as network or topic models, are sparse. We argue that this poor performance is due to the biases of SGMCMC caused by the discretization error. To get around this, we propose the stochastic CIR process, which removes all discretization error and we prove that samples from the stochastic CIR process are asymptotically unbiased. Use of the stochastic CIR process within a SGMCMC algorithm is shown to give substantially better performance for a topic model and a Dirichlet process mixture model than existing SGMCMC approaches.
Image foreground extraction is a classical problem in image processing and vision, with a large range of applications. In this dissertation, we focus on the extraction of text and graphics in mixed-content images, and design novel approaches for various aspects of this problem. We first propose a sparse decomposition framework, which models the background by a subspace containing smooth basis vectors, and foreground as a sparse and connected component. We then formulate an optimization framework to solve this problem, by adding suitable regularizations to the cost function to promote the desired characteristics of each component. We present two techniques to solve the proposed optimization problem, one based on alternating direction method of multipliers (ADMM), and the other one based on robust regression. Promising results are obtained for screen content image segmentation using the proposed algorithm. We then propose a robust subspace learning algorithm for the representation of the background component using training images that could contain both background and foreground components, as well as noise. With the learnt subspace for the background, we can further improve the segmentation results, compared to using a fixed subspace. Lastly, we investigate a different class of signal/image decomposition problem, where only one signal component is active at each signal element. In this case, besides estimating each component, we need to find their supports, which can be specified by a binary mask. We propose a mixed-integer programming problem, that jointly estimates the two components and their supports through an alternating optimization scheme. We show the application of this algorithm on various problems, including image segmentation, video motion segmentation, and also separation of text from textured images.