We study quantile trend filtering, a recently proposed method for nonparametric quantile regression with the goal of generalizing existing risk bounds known for the usual trend filtering estimators which perform mean regression. We study both the penalized and the constrained version (of order $r \geq 1$) of univariate quantile trend filtering. Our results show that both the constrained and the penalized version (of order $r \geq 1$) attain the minimax rate up to log factors, when the $(r-1)$th discrete derivative of the true vector of quantiles belongs to the class of bounded variation signals. Moreover we also show that if the true vector of quantiles is a discrete spline with a few polynomial pieces then both versions attain a near parametric rate of convergence. Corresponding results for the usual trend filtering estimators are known to hold only when the errors are sub-Gaussian. In contrast, our risk bounds are shown to hold under minimal assumptions on the error variables. In particular, no moment assumptions are needed and our results hold under heavy-tailed errors. Our proof techniques are general and thus can potentially be used to study other nonparametric quantile regression methods. To illustrate this generality we also employ our proof techniques to obtain new results for multivariate quantile total variation denoising and high dimensional quantile linear regression.
The list-decodable code has been an active topic in theoretical computer science since the seminal papers of M. Sudan and V. Guruswami in 1997-1998. List-decodable codes are also considered in rank-metric, subspace metric, cover-metric, pair metric and insdel metric settings. In this paper we show that rates, list-decodable radius and list sizes are closely related to the classical topic of covering codes. We prove new general simple but strong upper bounds for list-decodable codes in general finite metric spaces based on various covering codes of finite metric spaces. The general covering code upper bounds can apply to the case when the volumes of the balls depend on the centers, not only on the radius case. Then any good upper bound on the covering radius or the size of covering code imply a good upper bound on the size of list-decodable codes.Our results give exponential improvements on the recent generalized Singleton upper bound in STOC 2020 for Hamming metric list-decodable codes, when the code lengths are large. Even for the list size $L=1$ case our covering code upper bounds give highly non-trivial upper bounds on the sizes of codes with the given minimum distance.The generalized Singleton upper bound for average-radius list-decodable codes is given. The asymptotic forms of covering code bounds can partially recover the Blinovsky bound and the combinatorial bound of Guruswami-H{\aa}stad-Sudan-Zuckerman in Hamming metric setting. We also suggest to study the combinatorial covering list-decodable codes as a natural generalization of combinatorial list-decodable codes. We apply our general covering code upper bounds for list-decodable rank-metric codes, list-decodable subspace codes, list-decodable insertion codes and list-decodable deletion codes. Some new better results about non-list-decodability of rank-metric codes and subspace codes are obtained.
Decision trees are important both as interpretable models amenable to high-stakes decision-making, and as building blocks of ensemble methods such as random forests and gradient boosting. Their statistical properties, however, are not well understood. The most cited prior works have focused on deriving pointwise consistency guarantees for CART in a classical nonparametric regression setting. We take a different approach, and advocate studying the generalization performance of decision trees with respect to different generative regression models. This allows us to elicit their inductive bias, that is, the assumptions the algorithms make (or do not make) to generalize to new data, thereby guiding practitioners on when and how to apply these methods. In this paper, we focus on sparse additive generative models, which have both low statistical complexity and some nonparametric flexibility. We prove a sharp squared error generalization lower bound for a large class of decision tree algorithms fitted to sparse additive models with $C^1$ component functions. This bound is surprisingly much worse than the minimax rate for estimating such sparse additive models. The inefficiency is due not to greediness, but to the loss in power for detecting global structure when we average responses solely over each leaf, an observation that suggests opportunities to improve tree-based algorithms, for example, by hierarchical shrinkage. To prove these bounds, we develop new technical machinery, establishing a novel connection between decision tree estimation and rate-distortion theory, a sub-field of information theory.
In this paper, we present a new method for estimating the number of terms in a sum of exponentially damped sinusoids embedded in noise. In particular, we propose to combine the shift-invariance property of the Hankel matrix associated with the signal with a constraint over its singular values to penalize small order estimations. With this new methodology, the algebraic and statistical structures of the Hankel matrix are considered. The new order estimation technique shows significant improvements over subspace-based methods. In particular, when a good separation between the noise and the signal subspaces is not possible, the new methodology outperforms known techniques. We evaluate the performance of our method using numerical experiments and comparing its performance with previous results found in the literature.
We introduce a new algorithm for expected log-likelihood maximization in situations where the objective function is multi-modal and/or has saddle points, that we term G-PFSO. The key idea underpinning G-PFSO is to define a sequence of probability distributions which (a) is shown to concentrate on the target parameter value and (b) can be efficiently estimated by means of a standard particle filter algorithm. These distributions depends on a learning rate, where the faster the learning rate is the faster is the rate at which they concentrate on the desired parameter value but the lesser is the ability of G-PFSO to escape from a local optimum of the objective function. To conciliate ability to escape from a local optimum and fast convergence rate, the proposed estimator exploits the acceleration property of averaging, well-known in the stochastic gradient literature. Based on challenging estimation problems, our numerical experiments suggest that the estimator introduced in this paper converges at the optimal rate, and illustrate the practical usefulness of G-PFSO for parameter inference in large datasets. If the focus of this work is expected log-likelihood maximization the proposed approach and its theory apply more generally for optimizing a function defined through an expectation.
In this paper, we propose a reduced-bias estimator of the EVI for Pareto-type tails (heavy-tailed) distributions. This is derived using the weighted least squares method. It is shown that the estimator is unbiased, consistent and asymptotically normal under the second-order conditions on the underlying distribution of the data. The finite sample properties of the proposed estimator are studied through a simulation study. The results show that it is competitive to the existing estimators of the extreme value index in terms of bias and Mean Square Error. In addition, it yields estimates of $\gamma>0$ that are less sensitive to the number of top-order statistics, and hence, can be used for selecting an optimal tail fraction. The proposed estimator is further illustrated using practical datasets from pedochemical and insurance.
In an influential critique of empirical practice, Freedman \cite{freedman2008A,freedman2008B} showed that the linear regression estimator was biased for the analysis of randomized controlled trials under the randomization model. Under Freedman's assumptions, we derive exact closed-form bias corrections for the linear regression estimator with and without treatment-by-covariate interactions. We show that the limiting distribution of the bias corrected estimator is identical to the uncorrected estimator, implying that the asymptotic gains from adjustment can be attained without introducing any risk of bias. Taken together with results from Lin \cite{lin2013agnostic}, our results show that Freedman's theoretical arguments against the use of regression adjustment can be completely resolved with minor modifications to practice.
In this paper, we are interested in nonparametric kernel estimation of a generalized regression function, including conditional cumulative distribution and conditional quantile functions, based on an incomplete sample $(X_t, Y_t, \zeta_t)_{t\in \mathbb{ R}^+}$ copies of a continuous-time stationary ergodic process $(X, Y, \zeta)$. The predictor $X$ is valued in some infinite-dimensional space, whereas the real-valued process $Y$ is observed when $\zeta= 1$ and missing whenever $\zeta = 0$. Pointwise and uniform consistency (with rates) of these estimators as well as a central limit theorem are established. Conditional bias and asymptotic quadratic error are also provided. Asymptotic and bootstrap-based confidence intervals for the generalized regression function are also discussed. A first simulation study is performed to compare the discrete-time to the continuous-time estimations. A second simulation is also conducted to discuss the selection of the optimal sampling mesh in the continuous-time case. Finally, it is worth noting that our results are stated under ergodic assumption without assuming any classical mixing conditions.
Partial least squares (PLS) is a dimensionality reduction technique used as an alternative to ordinary least squares (OLS) in situations where the data is colinear or high dimensional. Both PLS and OLS provide mean based estimates, which are extremely sensitive to the presence of outliers or heavy tailed distributions. In contrast, quantile regression is an alternative to OLS that computes robust quantile based estimates. In this work, the multivariate PLS is extended to the quantile regression framework, obtaining a theoretical formulation of the problem and a robust dimensionality reduction technique that we call fast partial quantile regression (fPQR), that provides quantile based estimates. An efficient implementation of fPQR is also derived, and its performance is studied through simulation experiments and the chemometrics well known biscuit dough dataset, a real high dimensional example.
Quantile regression has been successfully used to study heterogeneous and heavy-tailed data. Varying-coefficient models are frequently used to capture changes in the effect of input variables on the response as a function of an index or time. In this work, we study high-dimensional varying-coefficient quantile regression models and develop new tools for statistical inference. We focus on development of valid confidence intervals and honest tests for nonparametric coefficients at a fixed time point and quantile, while allowing for a high-dimensional setting where the number of input variables exceeds the sample size. Performing statistical inference in this regime is challenging due to the usage of model selection techniques in estimation. Nevertheless, we can develop valid inferential tools that are applicable to a wide range of data generating processes and do not suffer from biases introduced by model selection. We performed numerical simulations to demonstrate the finite sample performance of our method, and we also illustrated the application with a real data example.
We consider the exploration-exploitation trade-off in reinforcement learning and we show that an agent imbued with a risk-seeking utility function is able to explore efficiently, as measured by regret. The parameter that controls how risk-seeking the agent is can be optimized exactly, or annealed according to a schedule. We call the resulting algorithm K-learning and show that the corresponding K-values are optimistic for the expected Q-values at each state-action pair. The K-values induce a natural Boltzmann exploration policy for which the `temperature' parameter is equal to the risk-seeking parameter. This policy achieves an expected regret bound of $\tilde O(L^{3/2} \sqrt{S A T})$, where $L$ is the time horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the total number of elapsed time-steps. This bound is only a factor of $L$ larger than the established lower bound. K-learning can be interpreted as mirror descent in the policy space, and it is similar to other well-known methods in the literature, including Q-learning, soft-Q-learning, and maximum entropy policy gradient, and is closely related to optimism and count based exploration methods. K-learning is simple to implement, as it only requires adding a bonus to the reward at each state-action and then solving a Bellman equation. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.