When missing values occur in multi-view data, all features in a view are likely to be missing simultaneously. This leads to very large quantities of missing data which, especially when combined with high-dimensionality, makes the application of conditional imputation methods computationally infeasible. We introduce a new meta-learning imputation method based on stacked penalized logistic regression (StaPLR), which performs imputation in a dimension-reduced space. We evaluate the new imputation method with several imputation algorithms using simulations. The results show that meta-level imputation of missing values leads to good results at a much lower computational cost, and makes the use of advanced imputation algorithms such as missForest and predictive mean matching possible in settings where they would otherwise be computationally infeasible.
Function approximation is widely used in reinforcement learning to handle the computational difficulties associated with very large state spaces. However, function approximation introduces errors which may lead to instabilities when using approximate dynamic programming techniques to obtain the optimal policy. Therefore, techniques such as lookahead for policy improvement and m-step rollout for policy evaluation are used in practice to improve the performance of approximate dynamic programming with function approximation. We quantitatively characterize, for the first time, the impact of lookahead and m-step rollout on the performance of approximate dynamic programming (DP) with function approximation: (i) without a sufficient combination of lookahead and m-step rollout, approximate DP may not converge, (ii) both lookahead and m-step rollout improve the convergence rate of approximate DP, and (iii) lookahead helps mitigate the effect of function approximation and the discount factor on the asymptotic performance of the algorithm. Our results are presented for two approximate DP methods: one which uses least-squares regression to perform function approximation and another which performs several steps of gradient descent of the least-squares objective in each iteration.
The Gaussian graphical model is routinely employed to model the joint distribution of multiple random variables. The graph it induces is not only useful for describing the relationship between random variables but also critical for improving statistical estimation precision. In high-dimensional data analysis, despite an abundant literature on estimating this graph structure, tests for the adequacy of its specification at a global level is severely underdeveloped. To make progress, this paper proposes a novel goodness-of-fit test that is computationally easy and theoretically tractable. Under the null hypothesis, it is shown that asymptotic distribution of the proposed test statistic follows a Gumbel distribution. Interestingly the location parameter of this limiting Gumbel distribution depends on the dependence structure under the null. We further develop a novel consistency-empowered test statistic when the true structure is nested in the postulated structure, by amplifying the noise incurred in estimation. Extensive simulation illustrates that the proposed test procedure has the right size under the null, and is powerful under the alternative. As an application, we apply the test to the analysis of a COVID-19 data set, demonstrating that our test can serve as a valuable tool in choosing a graph structure to improve estimation efficiency.
The rise of variational autoencoders for image and video compression has opened the door to many elaborate coding techniques. One example here is the possibility of conditional interframe coding. Here, instead of transmitting the residual between the original frame and the predicted frame (often obtained by motion compensation), the current frame is transmitted under the condition of knowing the prediction signal. In practice, conditional coding can be straightforwardly implemented using a conditional autoencoder, which has also shown good results in recent works. In this paper, we provide an information theoretical analysis of conditional coding for inter frames and show in which cases gains compared to traditional residual coding can be expected. We also show the effect of information bottlenecks which can occur in practical video coders in the prediction signal path due to the network structure, as a consequence of the data-processing theorem or due to quantization. We demonstrate that conditional coding has theoretical benefits over residual coding but that there are cases in which the benefits are quickly canceled by small information bottlenecks of the prediction signal.
We initiate the study of coresets for clustering in graph metrics, i.e., the shortest-path metric of edge-weighted graphs. Such clustering problems are essential to data analysis and used for example in road networks and data visualization. A coreset is a compact summary of the data that approximately preserves the clustering objective for every possible center set, and it offers significant efficiency improvements in terms of running time, storage, and communication, including in streaming and distributed settings. Our main result is a near-linear time construction of a coreset for k-Median in a general graph $G$, with size $O_{\epsilon, k}(\mathrm{tw}(G))$ where $\mathrm{tw}(G)$ is the treewidth of $G$, and we complement the construction with a nearly-tight size lower bound. The construction is based on the framework of Feldman and Langberg [STOC 2011], and our main technical contribution, as required by this framework, is a uniform bound of $O(\mathrm{tw}(G))$ on the shattering dimension under any point weights. We validate our coreset on real-world road networks, and our scalable algorithm constructs tiny coresets with high accuracy, which translates to a massive speedup of existing approximation algorithms such as local search for graph k-Median.
Wasserstein distributionally robust optimization (DRO) has found success in operations research and machine learning applications as a powerful means to obtain solutions with favourable out-of-sample performances. Two compelling explanations for the success are the generalization bounds derived from Wasserstein DRO and the equivalency between Wasserstein DRO and the regularization scheme commonly applied in machine learning. Existing results on generalization bounds and the equivalency to regularization are largely limited to the setting where the Wasserstein ball is of a certain type and the decision criterion takes certain forms of an expected function. In this paper, we show that by focusing on Wasserstein DRO problems with affine decision rules, it is possible to obtain generalization bounds and the equivalency to regularization in a significantly broader setting where the Wasserstein ball can be of a general type and the decision criterion can be a general measure of risk, i.e., nonlinear in distributions. This allows for accommodating many important classification, regression, and risk minimization applications that have not been addressed to date using Wasserstein DRO. Our results are strong in that the generalization bounds do not suffer from the curse of dimensionality and the equivalency to regularization is exact. As a byproduct, our regularization results broaden considerably the class of Wasserstein DRO models that can be solved efficiently via regularization formulations.
Evaluating a building's wireless performance during the building design process is a new paradigm for wireless communications and building design. This paper proposes the earliest building wireless performance (BWP) evaluation theory focusing on the channel delay spread (DS). The novel contributions of this paper lie in the following aspects. 1) We define a new metric called DS gain, which is the first metric for evaluating the channel DS performance of a building under design. 2) We propose an analytical model to compute the metric quickly and accurately. 3) The proposed scheme is validated via Monte-Carlo simulations under typical indoor scenarios. Numerical results reveal that building design has a clear impact on the root mean square (RMS) DS. In the future, architects need to design a building taking its DS gain into account carefully. Otherwise, indoor networks in it will suffer from severe signal inter-symbol interference (ISI) due to an over large DS gain.
Particle dynamics and multi-agent systems provide accurate dynamical models for studying and forecasting the behavior of complex interacting systems. They often take the form of a high-dimensional system of differential equations parameterized by an interaction kernel that models the underlying attractive or repulsive forces between agents. We consider the problem of constructing a data-based approximation of the interacting forces directly from noisy observations of the paths of the agents in time. The learned interaction kernels are then used to predict the agents behavior over a longer time interval. The approximation developed in this work uses a randomized feature algorithm and a sparse randomized feature approach. Sparsity-promoting regression provides a mechanism for pruning the randomly generated features which was observed to be beneficial when one has limited data, in particular, leading to less overfitting than other approaches. In addition, imposing sparsity reduces the kernel evaluation cost which significantly lowers the simulation cost for forecasting the multi-agent systems. Our method is applied to various examples, including first-order systems with homogeneous and heterogeneous interactions, second order homogeneous systems, and a new sheep swarming system.
Imputing missing potential outcomes using an estimated regression function is a natural idea for estimating causal effects. In the literature, estimators that combine imputation and regression adjustments are believed to be comparable to augmented inverse probability weighting. Accordingly, people for a long time conjectured that such estimators, while avoiding directly constructing the weights, are also doubly robust (Imbens, 2004; Stuart, 2010). Generalizing an earlier result of the authors (Lin et al., 2021), this paper formalizes this conjecture, showing that a large class of regression-adjusted imputation methods are indeed doubly robust for estimating the average treatment effect. In addition, they are provably semiparametrically efficient as long as both the density and regression models are correctly specified. Notable examples of imputation methods covered by our theory include kernel matching, (weighted) nearest neighbor matching, local linear matching, and (honest) random forests.
The term ``neuromorphic'' refers to systems that are closely resembling the architecture and/or the dynamics of biological neural networks. Typical examples are novel computer chips designed to mimic the architecture of a biological brain, or sensors that get inspiration from, e.g., the visual or olfactory systems in insects and mammals to acquire information about the environment. This approach is not without ambition as it promises to enable engineered devices able to reproduce the level of performance observed in biological organisms -- the main immediate advantage being the efficient use of scarce resources, which translates into low power requirements. The emphasis on low power and energy efficiency of neuromorphic devices is a perfect match for space applications. Spacecraft -- especially miniaturized ones -- have strict energy constraints as they need to operate in an environment which is scarce with resources and extremely hostile. In this work we present an overview of early attempts made to study a neuromorphic approach in a space context at the European Space Agency's (ESA) Advanced Concepts Team (ACT).
This paper makes 3 contributions. First, it generalizes the Lindeberg\textendash Feller and Lyapunov Central Limit Theorems to Hilbert Spaces by way of $L^2$. Second, it generalizes these results to spaces in which sample failure and missingness can occur. Finally, it shows that satisfaction of the Lindeberg\textendash Feller Condition in such spaces guarantees the consistency of all inferences from the partial functional data with respect to the completely observed data. These latter two results are especially important given the increasing attention to statistical inference with partially observed functional data. This paper goes beyond previous research by providing simple boundedness conditions which guarantee that \textit{all} inferences, as opposed to some proper subset of them, will be consistently estimated. This is shown primarily by aggregating conditional expectations with respect to the space of missingness patterns. This paper appears to be the first to apply this technique.