Functional Principal Component Analysis is a reference method for dimension reduction of curve data. Its theoretical properties are now well understood in the simplified case where the sample curves are fully observed without noise. However, functional data are noisy and necessarily observed on a finite discretization grid. Common practice consists in smoothing the data and then to compute the functional estimates, but the impact of this denoising step on the procedure's statistical performance are rarely considered. Here we prove new convergence rates for functional principal component estimators. We introduce a double asymptotic framework: one corresponding to the sampling size and a second to the size of the grid. We prove that estimates based on projection onto histograms show optimal rates in a minimax sense. Theoretical results are illustrated on simulated data and the method is applied to the visualization of genomic data.
In all areas of human knowledge, datasets are increasing in both size and complexity, creating the need for richer statistical models. This trend is also true for economic data, where high-dimensional and nonlinear/nonparametric inference is the norm in several fields of applied econometric work. The purpose of this paper is to introduce the reader to the world of Bayesian model determination, by surveying modern shrinkage and variable selection algorithms and methodologies. Bayesian inference is a natural probabilistic framework for quantifying uncertainty and learning about model parameters, and this feature is particularly important for inference in modern models of high dimensions and increased complexity. We begin with a linear regression setting in order to introduce various classes of priors that lead to shrinkage/sparse estimators of comparable value to popular penalized likelihood estimators (e.g.\ ridge, lasso). We explore various methods of exact and approximate inference, and discuss their pros and cons. Finally, we explore how priors developed for the simple regression setting can be extended in a straightforward way to various classes of interesting econometric models. In particular, the following case-studies are considered, that demonstrate application of Bayesian shrinkage and variable selection strategies to popular econometric contexts: i) vector autoregressive models; ii) factor models; iii) time-varying parameter regressions; iv) confounder selection in treatment effects models; and v) quantile regression models. A MATLAB package and an accompanying technical manual allow the reader to replicate many of the algorithms described in this review.
In this study, we develop an asymptotic theory of nonparametric regression for a locally stationary functional time series. First, we introduce the notion of a locally stationary functional time series (LSFTS) that takes values in a semi-metric space. Then, we propose a nonparametric model for LSFTS with a regression function that changes smoothly over time. We establish the uniform convergence rates of a class of kernel estimators, the Nadaraya-Watson (NW) estimator of the regression function, and a central limit theorem of the NW estimator.
We consider the inverse conductivity problem with discontinuous conductivities. We show in a rigorous way, by a convergence analysis, that one can construct a completely discrete minimization problem whose solution is a good approximation of a solution to the inverse problem. The minimization problem contains a regularization term which is given by a total variation penalization and is characterized by a regularization parameter. The discretization involves at the same time the boundary measurements, by the use of the complete electrode model, the unknown conductivity and the solution to the direct problem. The electrodes are characterized by a parameter related to their size, which in turn controls the number of electrodes to be used. The discretization of the unknown and of the solution to the direct problem is characterized by another parameter related to the size of the mesh involved. In our analysis we show how to precisely choose the regularization, electrodes size and mesh size parameters with respect to the noise level in such a way that the solution to the discrete regularized problem is meaningful. In particular we obtain that the electrodes and mesh size parameters should decay polynomially with respect to the noise level.
This paper introduces SPDE bridges with observation noise and contains an analysis of their spatially semidiscrete approximations. The SPDEs are considered in the form of mild solutions in an abstract Hilbert space framework suitable for parabolic equations. They are assumed to be linear with additive noise in the form of a cylindrical Wiener process. The observational noise is also cylindrical and SPDE bridges are formulated via conditional distributions of Gaussian random variables in Hilbert spaces. A general framework for the spatial discretization of these bridge processes is introduced. Explicit convergence rates are derived for a spectral and a finite element based method. It is shown that for sufficiently rough observation noise, the rates are essentially the same as those of the corresponding discretization of the original SPDE.
We consider the problem of estimating the difference between two functional undirected graphical models with shared structures. In many applications, data are naturally regarded as a vector of random functions rather than a vector of scalars. For example, electroencephalography (EEG) data are more appropriately treated as functions of time. In such a problem, not only can the number of functions measured per sample be large, but each function is itself an infinite dimensional object, making estimation of model parameters challenging. This is further complicated by the fact that the curves are usually only observed at discrete time points. We first define a functional differential graph that captures the differences between two functional graphical models and formally characterize when the functional differential graph is well defined. We then propose a method, FuDGE, that directly estimates the functional differential graph without first estimating each individual graph. This is particularly beneficial in settings where the individual graphs are dense, but the differential graph is sparse. We show that FuDGE consistently estimates the functional differential graph even in a high-dimensional setting for both fully observed and discretely observed function paths. We illustrate the finite sample properties of our method through simulation studies. We also propose a competing method, the Joint Functional Graphical Lasso, which generalizes the Joint Graphical Lasso to the functional setting. Finally, we apply our method to EEG data to uncover differences in functional brain connectivity between a group of individuals with alcohol use disorder and a control group.
In the functional linear regression model, many methods have been proposed and studied to estimate the slope function while the functional predictor was observed in the entire domain. However, works on functional linear regression models with partially observed trajectories have received less attention. In this paper, to fill the literature gap we consider the scenario where individual functional predictor may be observed only on part of the domain. Depending on whether measurement error is presented in functional predictors, two methods are developed, one is based on linear functionals of the observed part of the trajectory and the other one uses conditional principal component scores. We establish the asymptotic properties of the two proposed methods. Finite sample simulations are conducted to verify their performance. Diffusion tensor imaging (DTI) data from Alzheimer's Disease Neuroimaging Initiative (ADNI) study is analyzed.
We study the problem of estimating the density $f(\boldsymbol x)$ of a random vector ${\boldsymbol X}$ in $\mathbb R^d$. For a spanning tree $T$ defined on the vertex set $\{1,\dots ,d\}$, the tree density $f_{T}$ is a product of bivariate conditional densities. An optimal spanning tree minimizes the Kullback-Leibler divergence between $f$ and $f_{T}$. From i.i.d. data we identify an optimal tree $T^*$ and efficiently construct a tree density estimate $f_n$ such that, without any regularity conditions on the density $f$, one has $\lim_{n\to \infty} \int |f_n(\boldsymbol x)-f_{T^*}(\boldsymbol x)|d\boldsymbol x=0$ a.s. For Lipschitz $f$ with bounded support, $\mathbb E \left\{ \int |f_n(\boldsymbol x)-f_{T^*}(\boldsymbol x)|d\boldsymbol x\right\}=O\big(n^{-1/4}\big)$, a dimension-free rate.
The aim of this work is to develop a fully-distributed algorithmic framework for training graph convolutional networks (GCNs). The proposed method is able to exploit the meaningful relational structure of the input data, which are collected by a set of agents that communicate over a sparse network topology. After formulating the centralized GCN training problem, we first show how to make inference in a distributed scenario where the underlying data graph is split among different agents. Then, we propose a distributed gradient descent procedure to solve the GCN training problem. The resulting model distributes computation along three lines: during inference, during back-propagation, and during optimization. Convergence to stationary solutions of the GCN training problem is also established under mild conditions. Finally, we propose an optimization criterion to design the communication topology between agents in order to match with the graph describing data relationships. A wide set of numerical results validate our proposal. To the best of our knowledge, this is the first work combining graph convolutional neural networks with distributed optimization.
In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.
Many problems on signal processing reduce to nonparametric function estimation. We propose a new methodology, piecewise convex fitting (PCF), and give a two-stage adaptive estimate. In the first stage, the number and location of the change points is estimated using strong smoothing. In the second stage, a constrained smoothing spline fit is performed with the smoothing level chosen to minimize the MSE. The imposed constraint is that a single change point occurs in a region about each empirical change point of the first-stage estimate. This constraint is equivalent to requiring that the third derivative of the second-stage estimate has a single sign in a small neighborhood about each first-stage change point. We sketch how PCF may be applied to signal recovery, instantaneous frequency estimation, surface reconstruction, image segmentation, spectral estimation and multivariate adaptive regression.