Approximate K nearest neighbor (AKNN) search is a fundamental and challenging problem. We observe that in high-dimensional space, the time consumption of nearly all AKNN algorithms is dominated by that of the distance comparison operations (DCOs). For each operation, it scans full dimensions of an object and thus, runs in linear time wrt the dimensionality. To speed it up, we propose a randomized algorithm named ADSampling which runs in logarithmic time wrt to the dimensionality for the majority of DCOs and succeeds with high probability. In addition, based on ADSampling we develop one general and two algorithm-specific techniques as plugins to enhance existing AKNN algorithms. Both theoretical and empirical studies confirm that: (1) our techniques introduce nearly no accuracy loss and (2) they consistently improve the efficiency.
We study the problem of overcoming exponential sample complexity in differential entropy estimation under Gaussian convolutions. Specifically, we consider the estimation of the differential entropy $h(X+Z)$ via $n$ independently and identically distributed samples of $X$, where $X$ and $Z$ are independent $D$-dimensional random variables with $X$ subgaussian with bounded second moment and $Z\sim\mathcal{N}(0,\sigma^2I_D)$. Under the absolute-error loss, the above problem has a parametric estimation rate of $\frac{c^D}{\sqrt{n}}$, which is exponential in data dimension $D$ and often problematic for applications. We overcome this exponential sample complexity by projecting $X$ to a low-dimensional space via principal component analysis (PCA) before the entropy estimation, and show that the asymptotic error overhead vanishes as the unexplained variance of the PCA vanishes. This implies near-optimal performance for inherently low-dimensional structures embedded in high-dimensional spaces, including hidden-layer outputs of deep neural networks (DNN), which can be used to estimate mutual information (MI) in DNNs. We provide numerical results verifying the performance of our PCA approach on Gaussian and spiral data. We also apply our method to analysis of information flow through neural network layers (c.f. information bottleneck), with results measuring mutual information in a noisy fully connected network and a noisy convolutional neural network (CNN) for MNIST classification.
Convex optimization with equality and inequality constraints is a ubiquitous problem in several optimization and control problems in large-scale systems. Recently there has been a lot of interest in establishing accelerated convergence of the loss function. A class of high-order tuners was recently proposed in an effort to lead to accelerated convergence for the case when no constraints are present. In this paper, we propose a new high-order tuner that can accommodate the presence of equality constraints. In order to accommodate the underlying box constraints, time-varying gains are introduced in the high-order tuner which leverage convexity and ensure anytime feasibility of the constraints. Numerical examples are provided to support the theoretical derivations.
In this paper, we focus our attention on the high-dimensional double sparse linear regression, that is, a combination of element-wise and group-wise sparsity.To address this problem, we propose an IHT-style (iterative hard thresholding) procedure that dynamically updates the threshold at each step. We establish the matching upper and lower bounds for parameter estimation, showing the optimality of our proposal in the minimax sense. Coupled with a novel sparse group information criterion, we develop a fully adaptive procedure to handle unknown group sparsity and noise levels.We show that our adaptive procedure achieves optimal statistical accuracy with fast convergence. Finally, we demonstrate the superiority of our method by comparing it with several state-of-the-art algorithms on both synthetic and real-world datasets.
Treatment effect estimation under unconfoundedness is a fundamental task in causal inference. In response to the challenge of analyzing high-dimensional datasets collected in substantive fields such as epidemiology, genetics, economics, and social sciences, many methods for treatment effect estimation with high-dimensional nuisance parameters (the outcome regression and the propensity score) have been developed in recent years. However, it is still unclear what is the necessary and sufficient sparsity condition on the nuisance parameters for the treatment effect to be $\sqrt{n}$-estimable. In this paper, we propose a new Double-Calibration strategy that corrects the estimation bias of the nuisance parameter estimates computed by regularized high-dimensional techniques and demonstrate that the corresponding Doubly-Calibrated estimator achieves $1 / \sqrt{n}$-rate as long as one of the nuisance parameters is sparse with sparsity below $\sqrt{n} / \log p$, where $p$ denotes the ambient dimension of the covariates, whereas the other nuisance parameter can be arbitrarily complex and completely misspecified. The Double-Calibration strategy can also be applied to settings other than treatment effect estimation, e.g. regression coefficient estimation in the presence of diverging number of controls in a semiparametric partially linear model.
Many modern algorithms for inverse problems and data assimilation rely on ensemble Kalman updates to blend prior predictions with observed data. Ensemble Kalman methods often perform well with a small ensemble size, which is essential in applications where generating each particle is costly. This paper develops a non-asymptotic analysis of ensemble Kalman updates that rigorously explains why a small ensemble size suffices if the prior covariance has moderate effective dimension due to fast spectrum decay or approximate sparsity. We present our theory in a unified framework, comparing several implementations of ensemble Kalman updates that use perturbed observations, square root filtering, and localization. As part of our analysis, we develop new dimension-free covariance estimation bounds for approximately sparse matrices that may be of independent interest.
Factor analysis provides a canonical framework for imposing lower-dimensional structure such as sparse covariance in high-dimensional data. High-dimensional data on the same set of variables are often collected under different conditions, for instance in reproducing studies across research groups. In such cases, it is natural to seek to learn the shared versus condition-specific structure. Existing hierarchical extensions of factor analysis have been proposed, but face practical issues including identifiability problems. To address these shortcomings, we propose a class of SUbspace Factor Analysis (SUFA) models, which characterize variation across groups at the level of a lower-dimensional subspace. We prove that the proposed class of SUFA models lead to identifiability of the shared versus group-specific components of the covariance, and study their posterior contraction properties. Taking a Bayesian approach, these contributions are developed alongside efficient posterior computation algorithms. Our sampler fully integrates out latent variables, is easily parallelizable and has complexity that does not depend on sample size. We illustrate the methods through application to integration of multiple gene expression datasets relevant to immunology.
In this paper, we investigate the impact of fading channel correlation on the performance of dual-hop decode-and-forward (DF) simultaneous wireless information and power transfer (SWIPT) relay networks. More specifically, by considering the power splitting-based relaying (PSR) protocol for the energy harvesting (EH) process, we quantify the effect of positive and negative dependency between the source-to-relay (SR) and relay-to-destination (RD) links on key performance metrics such as ergodic capacity and outage probability. To this end, we first represent general formulations for the cumulative distribution function (CDF) of the product of two arbitrary random variables, exploiting copula theory. This is used to derive the closed-form expressions of the ergodic capacity and outage probability in a SWIPT relay network under correlated Nakagami-m fading channels. Monte-Carlo (MC) simulation results are provided throughout to validate the correctness of the developed analytical results, showing that the system performance significantly improves under positive dependence in the SR-RD links, compared to the case of negative dependence and independent links. Results further demonstrate that the efficiency of the ergodic capacity and outage probability ameliorates as the fading severity reduces for the PSR protocol.
Answering complex queries on knowledge graphs is important but particularly challenging because of the data incompleteness. Query embedding methods address this issue by learning-based models and simulating logical reasoning with set operators. Previous works focus on specific forms of embeddings, but scoring functions between embeddings are underexplored. In contrast to existing scoring functions motivated by local comparison or global transport, this work investigates the local and global trade-off with unbalanced optimal transport theory. Specifically, we embed sets as bounded measures in $\real$ endowed with a scoring function motivated by the Wasserstein-Fisher-Rao metric. Such a design also facilitates closed-form set operators in the embedding space. Moreover, we introduce a convolution-based algorithm for linear time computation and a block-diagonal kernel to enforce the trade-off. Results show that WFRE can outperform existing query embedding methods on standard datasets, evaluation sets with combinatorially complex queries, and hierarchical knowledge graphs. Ablation study shows that finding a better local and global trade-off is essential for performance improvement.
Linear inverse problems arise in diverse engineering fields especially in signal and image reconstruction. The development of computational methods for linear inverse problems with sparsity tool is one of the recent trends in this area. The so-called optimal $k$-thresholding is a newly introduced method for sparse optimization and linear inverse problems. Compared to other sparsity-aware algorithms, the advantage of optimal $k$-thresholding method lies in that it performs thresholding and error metric reduction simultaneously and thus works stably and robustly for solving medium-sized linear inverse problems. However, the runtime of this method remains high when the problem size is relatively large. The purpose of this paper is to propose an acceleration strategy for this method. Specifically, we propose a heavy-ball-based optimal $k$-thresholding (HBOT) algorithm and its relaxed variants for sparse linear inverse problems. The convergence of these algorithms is shown under the restricted isometry property. In addition, the numerical performance of the heavy-ball-based relaxed optimal $k$-thresholding pursuit (HBROTP) has been studied, and simulations indicate that HBROTP admits robust capability for signal and image reconstruction even in noisy environments.
Knowledge graphs capture structured information and relations between a set of entities or items. As such they represent an attractive source of information that could help improve recommender systems. However existing approaches in this domain rely on manual feature engineering and do not allow for end-to-end training. Here we propose knowledge-aware graph neural networks with label smoothness regularization to provide better recommendations. Conceptually, our approach computes user-specific item embeddings by first applying a trainable function that identifies important knowledge graph relationships for a given user. This way we transform the knowledge graph into a user-specific weighted graph and then applies a graph neural network to compute personalized item embeddings. To provide better inductive bias, we use label smoothness, which assumes that adjacent items in the knowledge graph are likely to have similar user relevance labels/scores. Label smoothness provides regularization over edge weights and we prove that it is equivalent to a label propagation scheme on a graph. Finally, we combine knowledge-aware graph neural networks and label smoothness and present the unified model. Experiment results show that our method outperforms strong baselines in four datasets. It also achieves strong performance in the scenario where user-item interactions are sparse.