Distribution-on-distribution regression considers the problem of formulating and estimating a regression relationship where both covariate and response are probability distributions. The optimal transport distributional regression model postulates that the conditional Fr\'echet mean of the response distribution is linked to the covariate distribution via an optimal transport map. We establish the minimax rate of estimation of such a regression function, by deriving a lower-bound that matches the convergence rate attained by the Fr\'echet least squares estimator.
Asymmetry along with heteroscedasticity or contamination often occurs with the growth of data dimensionality. In ultra-high dimensional data analysis, such irregular settings are usually overlooked for both theoretical and computational convenience. In this paper, we establish a framework for estimation in high-dimensional regression models using Penalized Robust Approximated quadratic M-estimators (PRAM). This framework allows general settings such as random errors lack of symmetry and homogeneity, or the covariates are not sub-Gaussian. To reduce the possible bias caused by the data's irregularity in mean regression, PRAM adopts a loss function with a flexible robustness parameter growing with the sample size. Theoretically, we first show that, in the ultra-high dimension setting, PRAM estimators have local estimation consistency at the minimax rate enjoyed by the LS-Lasso. Then we show that PRAM with an appropriate non-convex penalty in fact agrees with the local oracle solution, and thus obtain its oracle property. Computationally, we demonstrate the performances of six PRAM estimators using three types of loss functions for approximation (Huber, Tukey's biweight and Cauchy loss) combined with two types of penalty functions (Lasso and MCP). Our simulation studies and real data analysis demonstrate satisfactory finite sample performances of the PRAM estimator under general irregular settings.
In this work a general approach to compute a compressed representation of the exponential $\exp(h)$ of a high-dimensional function $h$ is presented. Such exponential functions play an important role in several problems in Uncertainty Quantification, e.g. the approximation of log-normal random fields or the evaluation of Bayesian posterior measures. Usually, these high-dimensional objects are intractable numerically and can only be accessed pointwise in sampling methods. In contrast, the proposed method constructs a functional representation of the exponential by exploiting its nature as a solution of an ordinary differential equation. The application of a Petrov--Galerkin scheme to this equation provides a tensor train representation of the solution for which we derive an efficient and reliable a posteriori error estimator. Numerical experiments with a log-normal random field and a Bayesian likelihood illustrate the performance of the approach in comparison to other recent low-rank representations for the respective applications. Although the present work considers only a specific differential equation, the presented method can be applied in a more general setting. We show that the composition of a generic holonomic function and a high-dimensional function corresponds to a differential equation that can be used in our method. Moreover, the differential equation can be modified to adapt the norm in the a posteriori error estimates to the problem at hand.
We propose a novel deep neural network (DNN) architecture for compressing an image when a correlated image is available as side information only at the decoder side, a special case of the well-known and heavily studied distributed source coding (DSC) problem. In particular, we consider a pair of stereo images, which have overlapping fields of view, captured by a synchronized and calibrated pair of cameras; and therefore, are highly correlated. We assume that one image of the pair is to be compressed and transmitted, while the other image is available only at the decoder. In the proposed architecture, the encoder maps the input image to a latent space using a DNN, quantizes the latent representation, and compresses it losslessly using entropy coding. The proposed decoder extracts useful information common between the images solely from the available side information, as well as a latent representation of the side information. Then, the latent representations of the two images, one received from the encoder, the other extracted locally, along with the locally generated common information, are fed to the respective decoders of the two images. We employ a cross-attention module (CAM) to align the feature maps obtained in the intermediate layers of the respective decoders of the two images, thus allowing better utilization of the side information. We train and demonstrate the effectiveness of the proposed algorithm on various realistic setups, such as KITTI and Cityscape datasets of stereo image pairs. Our results show that the proposed architecture is capable of exploiting the decoder-only side information in a more efficient manner as it outperforms previous works. We also show that the proposed method is able to provide significant gains even in the case of uncalibrated and unsynchronized camera array use cases.
In Japan, the Housing and Land Survey (HLS) provides grouped data on household incomes at the municipality level. Although this data could serve for effective local policy-making, there are some challenges in analysing the HLS data, such as the scarcity of information due to the grouping, the presence of the non-sampled areas and the very low frequency of the survey implementation. This paper tackles these challenges through a new spatio-temporal finite mixture model based on grouped data for modelling the income distributions of multiple spatial units at multiple points in time. The main idea of the proposed method is that all areas share the common latent distributions and the potential area-wise heterogeneity is captured by the mixing proportions that includes the spatial and temporal effects. Including these effects can smooth out the quantities of interest over time and space, impute missing values and predict future values. Applying the proposed method to the HLS data, we can obtain complete maps of income and poverty measures at an arbitrary point in time, which can be used for fast and efficient policy-making at a fine granularity.
We propose a new framework for differentially private optimization of convex functions which are Lipschitz in an arbitrary norm $\normx{\cdot}$. Our algorithms are based on a regularized exponential mechanism which samples from the density $\propto \exp(-k(F+\mu r))$ where $F$ is the empirical loss and $r$ is a regularizer which is strongly convex with respect to $\normx{\cdot}$, generalizing a recent work of \cite{GLL22} to non-Euclidean settings. We show that this mechanism satisfies Gaussian differential privacy and solves both DP-ERM (empirical risk minimization) and DP-SCO (stochastic convex optimization), by using localization tools from convex geometry. Our framework is the first to apply to private convex optimization in general normed spaces, and directly recovers non-private SCO rates achieved by mirror descent, as the privacy parameter $\eps \to \infty$. As applications, for Lipschitz optimization in $\ell_p$ norms for all $p \in (1, 2)$, we obtain the first optimal privacy-utility tradeoffs; for $p = 1$, we improve tradeoffs obtained by the recent works \cite{AsiFKT21, BassilyGN21} by at least a logarithmic factor. Our $\ell_p$ norm and Schatten-$p$ norm optimization frameworks are complemented with polynomial-time samplers whose query complexity we explicitly bound.
We consider regression estimation with modified ReLU neural networks in which network weight matrices are first modified by a function $\alpha$ before being multiplied by input vectors. We give an example of continuous, piecewise linear function $\alpha$ for which the empirical risk minimizers over the classes of modified ReLU networks with $l_1$ and squared $l_2$ penalties attain, up to a logarithmic factor, the minimax rate of prediction of unknown $\beta$-smooth function.
Functional linear regression gets its popularity as a statistical tool to study the relationship between function-valued response and exogenous explanatory variables. However, in practice, it is hard to expect that the explanatory variables of interest are perfectly exogenous, due to, for example, the presence of omitted variables and measurement error. Despite its empirical relevance, it was not until recently that this issue of endogeneity was studied in the literature on functional regression, and the development in this direction does not seem to sufficiently meet practitioners' needs; for example, this issue has been discussed with paying particular attention on consistent estimation and thus the distributional properties of the proposed estimators still remain to be further explored. To fill this gap, this paper proposes new consistent FPCA-based instrumental variable estimators and develops their asymptotic properties in detail. We also provide a novel test for examining if various characteristics of the response variable depend on the explanatory variable in our model. Simulation experiments under a wide range of settings show that the proposed estimators and test perform considerably well. We apply our methodology to estimate the impact of immigration on native wages.
Conformalized quantile regression is a procedure that inherits the advantages of conformal prediction and quantile regression. That is, we use quantile regression to estimate the true conditional quantile and then apply a conformal step on a calibration set to ensure marginal coverage. In this way, we get adaptive prediction intervals that account for heteroscedasticity. However, the aforementioned conformal step lacks adaptiveness as described in (Romano et al., 2019). To overcome this limitation, instead of applying a single conformal step after estimating conditional quantiles with quantile regression, we propose to cluster the explanatory variables weighted by their permutation importance with an optimized k-means and apply k conformal steps. To show that this improved version outperforms the classic version of conformalized quantile regression and is more adaptive to heteroscedasticity, we extensively compare the prediction intervals of both in open datasets.
This work deals with the asymptotic distribution of both potentials and couplings of entropic regularized optimal transport for compactly supported probabilities in $\R^d$. We first provide the central limit theorem of the Sinkhorn potentials -- the solutions of the dual problem -- as a Gaussian process in $\Cs$. Then we obtain the weak limits of the couplings -- the solutions of the primal problem -- evaluated on integrable functions, proving a conjecture of \cite{ChaosDecom}. In both cases, their limit is a real Gaussian random variable. Finally we consider the weak limit of the entropic Sinkhorn divergence under both assumptions $H_0:\ {\rm P}={\rm Q}$ or $H_1:\ {\rm P}\neq{\rm Q}$. Under $H_0$ the limit is a quadratic form applied to a Gaussian process in a Sobolev space, while under $H_1$, the limit is Gaussian. We provide also a different characterisation of the limit under $H_0$ in terms of an infinite sum of an i.i.d. sequence of standard Gaussian random variables. Such results enable statistical inference based on entropic regularized optimal transport.
Quantile regression is a fundamental problem in statistical learning motivated by the need to quantify uncertainty in predictions, or to model a diverse population without being overly reductive. For instance, epidemiological forecasts, cost estimates, and revenue predictions all benefit from being able to quantify the range of possible values accurately. As such, many models have been developed for this problem over many years of research in econometrics, statistics, and machine learning. Rather than proposing yet another (new) algorithm for quantile regression we adopt a meta viewpoint: we investigate methods for aggregating any number of conditional quantile models, in order to improve accuracy and robustness. We consider weighted ensembles where weights may vary over not only individual models, but also over quantile levels, and feature values. All of the models we consider in this paper can be fit using modern deep learning toolkits, and hence are widely accessible (from an implementation point of view) and scalable. To improve the accuracy of the predicted quantiles (or equivalently, prediction intervals), we develop tools for ensuring that quantiles remain monotonically ordered, and apply conformal calibration methods. These can be used without any modification of the original library of base models. We also review some basic theory surrounding quantile aggregation and related scoring rules, and contribute a few new results to this literature (for example, the fact that post sorting or post isotonic regression can only improve the weighted interval score). Finally, we provide an extensive suite of empirical comparisons across 34 data sets from two different benchmark repositories.