We consider the sparse principal component analysis for high-dimensional stationary processes. The standard principal component analysis performs poorly when the dimension of the process is large. We establish the oracle inequalities for penalized principal component estimators for the processes including heavy-tailed time series. The rate of convergence of the estimators is established. We also elucidate the theoretical rate for choosing the tuning parameter in penalized estimators. The performance of the sparse principal component analysis is demonstrated by numerical simulations. The utility of the sparse principal component analysis for time series data is exemplified by the application to average temperature data.
This paper investigates pooling strategies for tail index and extreme quantile estimation from heavy-tailed data. To fully exploit the information contained in several samples, we present general weighted pooled Hill estimators of the tail index and weighted pooled Weissman estimators of extreme quantiles calculated through a nonstandard geometric averaging scheme. We develop their large-sample asymptotic theory across a fixed number of samples, covering the general framework of heterogeneous sample sizes with different and asymptotically dependent distributions. Our results include optimal choices of pooling weights based on asymptotic variance and MSE minimization. In the important application of distributed inference, we prove that the variance-optimal distributed estimators are asymptotically equivalent to the benchmark Hill and Weissman estimators based on the unfeasible combination of subsamples, while the AMSE-optimal distributed estimators enjoy a smaller AMSE than the benchmarks in the case of large bias. We consider additional scenarios where the number of subsamples grows with the total sample size and effective subsample sizes can be low. We extend our methodology to handle serial dependence and the presence of covariates. Simulations confirm that our pooled estimators perform virtually as well as the benchmark estimators. Two applications to real weather and insurance data are showcased.
We revisit the theoretical properties of Hamiltonian stochastic differential equations (SDES) for Bayesian posterior sampling, and we study the two types of errors that arise from numerical SDE simulation: the discretization error and the error due to noisy gradient estimates in the context of data subsampling. Our main result is a novel analysis for the effect of mini-batches through the lens of differential operator splitting, revising previous literature results. The stochastic component of a Hamiltonian SDE is decoupled from the gradient noise, for which we make no normality assumptions. This leads to the identification of a convergence bottleneck: when considering mini-batches, the best achievable error rate is $\mathcal{O}(\eta^2)$, with $\eta$ being the integrator step size. Our theoretical results are supported by an empirical study on a variety of regression and classification tasks for Bayesian neural networks.
We introduce locality: a new property of multi-bidder auctions that formally separates the simplicity of optimal single-dimensional multi-bidder auctions from the complexity of optimal multi-dimensional multi-bidder auctions. Specifically, consider the revenue-optimal, Bayesian Incentive Compatible auction for buyers with valuations drawn from $\vec{D}:=\times_i D_i$, where each distribution has support-size $n$. This auction takes as input a valuation profile $\vec{v}$ and produces as output an allocation of the items and prices to charge, $Opt_{\vec{D}}(\vec{v})$. When each $D_i$ is single-dimensional, this mapping is locally-implementable: defining each input $v_i$ requires $\Theta(\log n)$ bits, and $Opt_{\vec{D}}(\vec{v})$ can be fully determined using just $\Theta(\log n)$ bits from each $D_i$. This follows immediately from Myerson's virtual value theory [Mye81]. Our main result establishes that optimal multi-dimensional mechanisms are not locally-implementable: in order to determine the output $Opt_{\vec{D}}(\vec{v})$ on one particular input $\vec{v}$, one still needs to know (essentially) the entire distribution $\vec{D}$. Formally, $\Omega(n)$ bits from each $D_i$ is necessary: (essentially) enough to fully describe $D_i$, and exponentially more than the $\Theta(\log n)$ needed to define the input $v_i$. We show that this phenomenon already occurs with just two bidders, even when one bidder is single-dimensional, and when the other bidder is barely multi-dimensional. More specifically, the multi-dimensional bidder is ``inter-dimensional'' from the FedEx setting with just two days [FGKK16]. Our techniques are fairly robust: we additionally establish that optimal mechanisms for single-dimensional buyers with budget constraints are not locally-implementable. This occurs with just two bidders, even when one has no budget constraint, and even when the other's budget is public.
Principal Component Analysis (PCA) is a transform for finding the principal components (PCs) that represent features of random data. PCA also provides a reconstruction of the PCs to the original data. We consider an extension of PCA which allows us to improve the associated accuracy and diminish the numerical load, in comparison with known techniques. This is achieved due to the special structure of the proposed transform which contains two matrices $T_0$ and $T_1$, and a special transformation $\mathcal{f}$ of the so called auxiliary random vector $\mathbf w$. For this reason, we call it the three-term PCA. In particular, we show that the three-term PCA always exists, i.e. is applicable to the case of singular data. Both rigorous theoretical justification of the three-term PCA and simulations with real-world data are provided.
We prove an optimal $O(n \log n)$ mixing time of the Glauber dynamics for the Ising models with edge activity $\beta \in \left(\frac{\Delta-2}{\Delta}, \frac{\Delta}{\Delta-2}\right)$. This mixing time bound holds even if the maximum degree $\Delta$ is unbounded. We refine the boosting technique developed in [CFYZ21], and prove a new boosting theorem by utilizing the entropic independence defined in [AJK+21]. The theorem relates the modified log-Sobolev (MLS) constant of the Glauber dynamics for a near-critical Ising model to that for an Ising model in a sub-critical regime.
We propose a novel method for sampling and optimization tasks based on a stochastic interacting particle system. We explain how this method can be used for the following two goals: (i) generating approximate samples from a given target distribution; (ii) optimizing a given objective function. The approach is derivative-free and affine invariant, and is therefore well-suited for solving inverse problems defined by complex forward models: (i) allows generation of samples from the Bayesian posterior and (ii) allows determination of the maximum a posteriori estimator. We investigate the properties of the proposed family of methods in terms of various parameter choices, both analytically and by means of numerical simulations. The analysis and numerical simulation establish that the method has potential for general purpose optimization tasks over Euclidean space; contraction properties of the algorithm are established under suitable conditions, and computational experiments demonstrate wide basins of attraction for various specific problems. The analysis and experiments also demonstrate the potential for the sampling methodology in regimes in which the target distribution is unimodal and close to Gaussian; indeed we prove that the method recovers a Laplace approximation to the measure in certain parametric regimes and provide numerical evidence that this Laplace approximation attracts a large set of initial conditions in a number of examples.
A fundamental algorithm for data analytics at the edge of wireless networks is distributed principal component analysis (DPCA), which finds the most important information embedded in a distributed high-dimensional dataset by distributed computation of a reduced-dimension data subspace, called principal components (PCs). In this paper, to support one-shot DPCA in wireless systems, we propose a framework of analog MIMO transmission featuring the uncoded analog transmission of local PCs for estimating the global PCs. To cope with channel distortion and noise, two maximum-likelihood (global) PC estimators are presented corresponding to the cases with and without receive channel state information (CSI). The first design, termed coherent PC estimator, is derived by solving a Procrustes problem and reveals the form of regularized channel inversion where the regulation attempts to alleviate the effects of both channel noise and data noise. The second one, termed blind PC estimator, is designed based on the subspace channel-rotation-invariance property and computes a centroid of received local PCs on a Grassmann manifold. Using the manifold-perturbation theory, tight bounds on the mean square subspace distance (MSSD) of both estimators are derived for performance evaluation. The results reveal simple scaling laws of MSSD concerning device population, data and channel signal-to-noise ratios (SNRs), and array sizes. More importantly, both estimators are found to have identical scaling laws, suggesting the dispensability of CSI to accelerate DPCA. Simulation results validate the derived results and demonstrate the promising latency performance of the proposed analog MIMO.
We define a notion called leftmost separator of size at most $k$. A leftmost separator of size $k$ is a minimal separator $S$ that separates two given sets of vertices $X$ and $Y$ such that we "cannot move $S$ more towards $X$" such that $|S|$ remains smaller than the threshold. One of the incentives is that by using leftmost separators we can improve the time complexity of treewidth approximation. Treewidth approximation is a problem which is known to have a linear time FPT algorithm in terms of input size, and only single exponential in terms of the parameter, treewidth. It is not known whether this result can be improved theoretically. However, the coefficient of the parameter $k$ (the treewidth) in the exponent is large. Hence, our goal is to decrease the coefficient of $k$ in the exponent, in order to achieve a more practical algorithm. Hereby, we trade a linear-time algorithm for an $\mathcal{O}(n \log n)$-time algorithm. The previous known $\mathcal{O}(f(k) n \log n)$-time algorithms have dependences of $2^{24k}k!$, $2^{8.766k}k^2$ (a better analysis shows that it is $2^{7.671k}k^2$), and higher. In this paper, we present an algorithm for treewidth approximation which runs in time $\mathcal{O}(2^{6.755k}\ n \log n)$, Furthermore, we count the number of leftmost separators and give a tight upper bound for them. We show that the number of leftmost separators of size $\leq k$ is at most $C_{k-1}$ (Catalan number). Then, we present an algorithm which outputs all leftmost separators in time $\mathcal{O}(\frac{4^k}{\sqrt{k}}n)$.
We study the $c$-approximate near neighbor problem under the continuous Fr\'echet distance: Given a set of $n$ polygonal curves with $m$ vertices, a radius $\delta > 0$, and a parameter $k \leq m$, we want to preprocess the curves into a data structure that, given a query curve $q$ with $k$ vertices, either returns an input curve with Fr\'echet distance at most $c\cdot \delta$ to $q$, or returns that there exists no input curve with Fr\'echet distance at most $\delta$ to $q$. We focus on the case where the input and the queries are one-dimensional polygonal curves -- also called time series -- and we give a comprehensive analysis for this case. We obtain new upper bounds that provide different tradeoffs between approximation factor, preprocessing time, and query time. Our data structures improve upon the state of the art in several ways. We show that for any $0 < \varepsilon \leq 1$ an approximation factor of $(1+\varepsilon)$ can be achieved within the same asymptotic time bounds as the previously best result for $(2+\varepsilon)$. Moreover, we show that an approximation factor of $(2+\varepsilon)$ can be obtained by using preprocessing time and space $O(nm)$, which is linear in the input size, and query time in $O(\frac{1}{\varepsilon})^{k+2}$, where the previously best result used preprocessing time in $n \cdot O(\frac{m}{\varepsilon k})^k$ and query time in $O(1)^k$. We complement our upper bounds with matching conditional lower bounds based on the Orthogonal Vectors Hypothesis. Interestingly, some of our lower bounds already hold for any super-constant value of $k$. This is achieved by proving hardness of a one-sided sparse version of the Orthogonal Vectors problem as an intermediate problem, which we believe to be of independent interest.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.