We provide a new theory for nodewise regression when the residuals from a fitted factor model are used. We apply our results to the analysis of the consistency of Sharpe ratio estimators when there are many assets in a portfolio. We allow for an increasing number of assets as well as time observations of the portfolio. Since the nodewise regression is not feasible due to the unknown nature of idiosyncratic errors, we provide a feasible-residual-based nodewise regression to estimate the precision matrix of errors which is consistent even when number of assets, p, exceeds the time span of the portfolio, n. In another new development, we also show that the precision matrix of returns can be estimated consistently, even with an increasing number of factors and p>n. We show that: (1) with p>n, the Sharpe ratio estimators are consistent in global minimum-variance and mean-variance portfolios; and (2) with p>n, the maximum Sharpe ratio estimator is consistent when the portfolio weights sum to one; and (3) with p<<n, the maximum-out-of-sample Sharpe ratio estimator is consistent.
Mixed-dimensional elliptic equations exhibiting a hierarchical structure are commonly used to model problems with high aspect ratio inclusions, such as flow in fractured porous media. We derive general abstract estimates based on the theory of functional a posteriori error estimates, for which guaranteed upper bounds for the primal and dual variables and two-sided bounds for the primal-dual pair are obtained. We improve on the abstract results obtained with the functional approach by proposing four different ways of estimating the residual errors based on the extent the approximate solution has conservation properties, i.e.: (1) no conservation, (2) subdomain conservation, (3) grid-level conservation, and (4) exact conservation. This treatment results in sharper and fully computable estimates when mass is conserved either at the grid level or exactly, with a comparable structure to those obtained from grid-based a posteriori techniques. We demonstrate the practical effectiveness of our theoretical results through numerical experiments using four different discretization methods for synthetic problems and applications based on benchmarks of flow in fractured porous media.
Super-Resolution is the technique to improve the quality of a low-resolution photo by boosting its plausible resolution. The computer vision community has extensively explored the area of Super-Resolution. However, previous Super-Resolution methods require vast amounts of data for training which becomes problematic in domains where very few low-resolution, high-resolution pairs might be available. One such area is statistical downscaling, where super-resolution is increasingly being used to obtain high-resolution climate information from low-resolution data. Acquiring high-resolution climate data is extremely expensive and challenging. To reduce the cost of generating high-resolution climate information, Super-Resolution algorithms should be able to train with a limited number of low-resolution, high-resolution pairs. This paper tries to solve the aforementioned problem by introducing a semi-supervised way to perform super-resolution that can generate sharp, high-resolution images with as few as 500 paired examples. The proposed semi-supervised technique can be used as a plug-and-play module with any supervised GAN-based Super-Resolution method to enhance its performance. We quantitatively and qualitatively analyze the performance of the proposed model and compare it with completely supervised methods as well as other unsupervised techniques. Comprehensive evaluations show the superiority of our method over other methods on different metrics. We also offer the applicability of our approach in statistical downscaling to obtain high-resolution climate images.
We study approaches for compressing the empirical measure in the context of finite dimensional reproducing kernel Hilbert spaces (RKHSs).In this context, the empirical measure is contained within a natural convex set and can be approximated using convex optimization methods. Such an approximation gives under certain conditions rise to a coreset of data points. A key quantity that controls how large such a coreset has to be is the size of the largest ball around the empirical measure that is contained within the empirical convex set. The bulk of our work is concerned with deriving high probability lower bounds on the size of such a ball under various conditions. We complement this derivation of the lower bound by developing techniques that allow us to apply the compression approach to concrete inference problems such as kernel ridge regression. We conclude with a construction of an infinite dimensional RKHS for which the compression is poor, highlighting some of the difficulties one faces when trying to move to infinite dimensional RKHSs.
We consider the question of adaptive data analysis within the framework of convex optimization. We ask how many samples are needed in order to compute $\epsilon$-accurate estimates of $O(1/\epsilon^2)$ gradients queried by gradient descent, and we provide two intermediate answers to this question. First, we show that for a general analyst (not necessarily gradient descent) $\Omega(1/\epsilon^3)$ samples are required. This rules out the possibility of a foolproof mechanism. Our construction builds upon a new lower bound (that may be of interest of its own right) for an analyst that may ask several non adaptive questions in a batch of fixed and known $T$ rounds of adaptivity and requires a fraction of true discoveries. We show that for such an analyst $\Omega (\sqrt{T}/\epsilon^2)$ samples are necessary. Second, we show that, under certain assumptions on the oracle, in an interaction with gradient descent $\tilde \Omega(1/\epsilon^{2.5})$ samples are necessary. Our assumptions are that the oracle has only \emph{first order access} and is \emph{post-hoc generalizing}. First order access means that it can only compute the gradients of the sampled function at points queried by the algorithm. Our assumption of \emph{post-hoc generalization} follows from existing lower bounds for statistical queries. More generally then, we provide a generic reduction from the standard setting of statistical queries to the problem of estimating gradients queried by gradient descent. These results are in contrast with classical bounds that show that with $O(1/\epsilon^2)$ samples one can optimize the population risk to accuracy of $O(\epsilon)$ but, as it turns out, with spurious gradients.
We investigate the feature compression of high-dimensional ridge regression using the optimal subsampling technique. Specifically, based on the basic framework of random sampling algorithm on feature for ridge regression and the A-optimal design criterion, we first obtain a set of optimal subsampling probabilities. Considering that the obtained probabilities are uneconomical, we then propose the nearly optimal ones. With these probabilities, a two step iterative algorithm is established which has lower computational cost and higher accuracy. We provide theoretical analysis and numerical experiments to support the proposed methods. Numerical results demonstrate the decent performance of our methods.
On-demand delivery has become increasingly popular around the world. Motivated by a large grocery chain store who offers fast on-demand delivery services, we model and solve a stochastic dynamic driver dispatching and routing problem for last-mile delivery systems where on-time performance is the main target. The system operator needs to dispatch a set of drivers and specify their delivery routes facing random demand that arrives over a fixed number of periods. The resulting stochastic dynamic program is challenging to solve due to the curse of dimensionality. We propose a novel structured approximation framework to approximate the value function via a parametrized dispatching and routing policy. We analyze the structural properties of the approximation framework and establish its performance guarantee under large-demand scenarios. We then develop efficient exact algorithms for the approximation problem based on Benders decomposition and column generation, which deliver verifiably optimal solutions within minutes. The evaluation results on a real-world data set show that our framework outperforms the current policy of the company by 36.53% on average in terms of delivery time. We also perform several policy experiments to understand the value of dynamic dispatching and routing with varying fleet sizes and dispatch frequencies.
Existing inferential methods for small area data involve a trade-off between maintaining area-level frequentist coverage rates and improving inferential precision via the incorporation of indirect information. In this article, we propose a method to obtain an area-level prediction region for a future observation which mitigates this trade-off. The proposed method takes a conformal prediction approach in which the conformity measure is the posterior predictive density of a working model that incorporates indirect information. The resulting prediction region has guaranteed frequentist coverage regardless of the working model, and, if the working model assumptions are accurate, the region has minimum expected volume compared to other regions with the same coverage rate. When constructed under a normal working model, we prove such a prediction region is an interval and construct an efficient algorithm to obtain the exact interval. We illustrate the performance of our method through simulation studies and an application to EPA radon survey data.
Let $X^{(n)}$ be an observation sampled from a distribution $P_{\theta}^{(n)}$ with an unknown parameter $\theta,$ $\theta$ being a vector in a Banach space $E$ (most often, a high-dimensional space of dimension $d$). We study the problem of estimation of $f(\theta)$ for a functional $f:E\mapsto {\mathbb R}$ of some smoothness $s>0$ based on an observation $X^{(n)}\sim P_{\theta}^{(n)}.$ Assuming that there exists an estimator $\hat \theta_n=\hat \theta_n(X^{(n)})$ of parameter $\theta$ such that $\sqrt{n}(\hat \theta_n-\theta)$ is sufficiently close in distribution to a mean zero Gaussian random vector in $E,$ we construct a functional $g:E\mapsto {\mathbb R}$ such that $g(\hat \theta_n)$ is an asymptotically normal estimator of $f(\theta)$ with $\sqrt{n}$ rate provided that $s>\frac{1}{1-\alpha}$ and $d\leq n^{\alpha}$ for some $\alpha\in (0,1).$ We also derive general upper bounds on Orlicz norm error rates for estimator $g(\hat \theta)$ depending on smoothness $s,$ dimension $d,$ sample size $n$ and the accuracy of normal approximation of $\sqrt{n}(\hat \theta_n-\theta).$ In particular, this approach yields asymptotically efficient estimators in some high-dimensional exponential models.
Music Structure Analysis (MSA) consists in segmenting a music piece in several distinct sections. We approach MSA within a compression framework, under the hypothesis that the structure is more easily revealed by a simplified representation of the original content of the song. More specifically, under the hypothesis that MSA is correlated with similarities occurring at the bar scale, this article introduces the use of linear and non-linear compression schemes on barwise audio signals. Compressed representations capture the most salient components of the different bars in the song and are then used to infer the song structure using a dynamic programming algorithm. This work explores both low-rank approximation models such as Principal Component Analysis or Nonnegative Matrix Factorization and "piece-specific" Auto-Encoding Neural Networks, with the objective to learn latent representations specific to a given song. Such approaches do not rely on supervision nor annotations, which are well-known to be tedious to collect and possibly ambiguous in MSA description. In our experiments, several unsupervised compression schemes achieve a level of performance comparable to that of state-of-the-art supervised methods (for 3s tolerance) on the RWC-Pop dataset, showcasing the importance of the barwise compression processing for MSA.
State-of-the-art Convolutional Neural Network (CNN) benefits a lot from multi-task learning (MTL), which learns multiple related tasks simultaneously to obtain shared or mutually related representations for different tasks. The most widely-used MTL CNN structure is based on an empirical or heuristic split on a specific layer (e.g., the last convolutional layer) to minimize different task-specific losses. However, this heuristic sharing/splitting strategy may be harmful to the final performance of one or multiple tasks. In this paper, we propose a novel CNN structure for MTL, which enables automatic feature fusing at every layer. Specifically, we first concatenate features from different tasks according to their channel dimension, and then formulate the feature fusing problem as discriminative dimensionality reduction. We show that this discriminative dimensionality reduction can be done by 1x1 Convolution, Batch Normalization, and Weight Decay in one CNN, which we refer to as Neural Discriminative Dimensionality Reduction (NDDR). We perform ablation analysis in details for different configurations in training the network. The experiments carried out on different network structures and different task sets demonstrate the promising performance and desirable generalizability of our proposed method.