Estimating the maximum mean finds a variety of applications in practice. In this paper, we study estimation of the maximum mean using an upper confidence bound (UCB) approach where the sampling budget is adaptively allocated to one of the systems. We study in depth the existing grand average (GA) estimator, and propose a new largest-size average (LSA) estimator. Specifically, we establish statistical guarantees, including strong consistency, asymptotic mean squared errors, and central limit theorems (CLTs) for both estimators, which are new to the literature. We show that LSA is preferable over GA, as the bias of the former decays at a rate much faster than that of the latter when sample size increases. By using the CLTs, we further construct asymptotically valid confidence intervals for the maximum mean, and propose a single hypothesis test for a multiple comparison problem with application to clinical trials. Statistical efficiency of the resulting point and interval estimates and the proposed single hypothesis test is demonstrated via numerical examples.
In this paper, we introduce the Maximum Distance Sublattice Problem (MDSP). We observed that the problem of solving an instance of the Closest Vector Problem (CVP) in a lattice $\mathcal{L}$ is the same as solving an instance of MDSP in the dual lattice of $\mathcal{L}$. We give an alternate reduction between the CVP and MDSP. This alternate reduction does not use the concept of dual lattice.
In this paper, we address the optimal sampling of a Wiener process under sampling and transmission costs, with the samples being forwarded to a remote estimator over a channel with random IID delay. The goal of the estimator is to reconstruct an estimate of the real-time signal value from causally received samples. Our study focuses on the optimal online strategy for both sampling and transmission, aiming to minimize the mean square estimation error. We establish that the optimal strategy involves threshold policies for both sampling and transmission, and we derive the optimal thresholds. We utilize Lagrange relaxation and backward induction as our methodology, revealing the problem of minimizing estimation error, under the assumption that sampling and transmission times are independent of the observed Wiener process. Our comparative analysis demonstrates that the estimation error achieved by the optimal joint sampling and transmission policy is significantly lower than that of age-optimal sampling, zero-wait sampling, periodic sampling, and policies that optimize only the sampling times.
The Artificial Intelligence field is flooded with optimisation methods. In this paper, we change the focus to developing modelling methods with the aim of getting us closer to Artificial General Intelligence. To do so, we propose a novel way to interpret reality as an information source, that is later translated into a computational framework able to capture and represent such information. This framework is able to build elements of classical cognitive architectures, like Long Term Memory and Working Memory, starting from a simple primitive that only processes Spatial Distributed Representations. Moreover, it achieves such level of verticality in a seamless scalable hierarchical way.
In this paper, we observe an interesting phenomenon for a hybridizable discontinuous Galerkin (HDG) method for eigenvalue problems. Specifically, using the same finite element method, we may achieve both upper and lower eigenvalue bounds simultaneously, simply by the fine tuning of the stabilization parameter. Based on this observation, a high accuracy algorithm for computing eigenvalues is designed to yield higher convergence rate at a lower computational cost. Meanwhile, we demonstrate that certain type of HDG methods can only provide upper bounds. As a by-product, the asymptotic upper bound property of the Brezzi-Douglas-Marini mixed finite element is also established. Numerical results supporting our theory are given.
In this work, we establish theoretical and practical connections between vertex indexing for sparse graph/network compression and matrix ordering for sparse matrix-vector multiplication and variable elimination. We present a fundamental analysis of adjacency access locality in vertex ordering from the perspective of graph composition of, or decomposition into, elementary compact graphs. We introduce an algebraic indexing approach that maintains the advantageous features of existing methods, mitigates their shortcomings, and adapts to the degree distribution. The new method demonstrates superior and versatile performance in graph compression across diverse types of graphs. It also renders proportional improvement in the efficiency of matrix-vector multiplications for subspace iterations in response to random walk queries on a large network.
In this work, we provide data stream algorithms that compute optimal splits in decision tree learning. In particular, given a data stream of observations $x_i$ and their labels $y_i$, the goal is to find the optimal split $j$ that divides the data into two sets such that the mean squared error (for regression) or misclassification rate and Gini impurity (for classification) is minimized. We provide several fast streaming algorithms that use sublinear space and a small number of passes for these problems. These algorithms can also be extended to the massively parallel computation model. Our work, while not directly comparable, complements the seminal work of Domingos-Hulten (KDD 2000) and Hulten-Spencer-Domingos (KDD 2001).
In this paper, we introduce a low-cost and low-power tiny supervised on-device learning (ODL) core that can address the distributional shift of input data for human activity recognition. Although ODL for resource-limited edge devices has been studied recently, how exactly to provide the training labels to these devices at runtime remains an open-issue. To address this problem, we propose to combine an automatic data pruning with supervised ODL to reduce the number queries needed to acquire predicted labels from a nearby teacher device and thus save power consumption during model retraining. The data pruning threshold is automatically tuned, eliminating a manual threshold tuning. As a tinyML solution at a few mW for the human activity recognition, we design a supervised ODL core that supports our automatic data pruning using a 45nm CMOS process technology. We show that the required memory size for the core is smaller than the same-shaped multilayer perceptron (MLP) and the power consumption is only 3.39mW. Experiments using a human activity recognition dataset show that the proposed automatic data pruning reduces the communication volume by 55.7% and power consumption accordingly with only 0.9% accuracy loss.
In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.
Model complexity is a fundamental problem in deep learning. In this paper we conduct a systematic overview of the latest studies on model complexity in deep learning. Model complexity of deep learning can be categorized into expressive capacity and effective model complexity. We review the existing studies on those two categories along four important factors, including model framework, model size, optimization process and data complexity. We also discuss the applications of deep learning model complexity including understanding model generalization capability, model optimization, and model selection and design. We conclude by proposing several interesting future directions.
Textual entailment is a fundamental task in natural language processing. Most approaches for solving the problem use only the textual content present in training data. A few approaches have shown that information from external knowledge sources like knowledge graphs (KGs) can add value, in addition to the textual content, by providing background knowledge that may be critical for a task. However, the proposed models do not fully exploit the information in the usually large and noisy KGs, and it is not clear how it can be effectively encoded to be useful for entailment. We present an approach that complements text-based entailment models with information from KGs by (1) using Personalized PageR- ank to generate contextual subgraphs with reduced noise and (2) encoding these subgraphs using graph convolutional networks to capture KG structure. Our technique extends the capability of text models exploiting structural and semantic information found in KGs. We evaluate our approach on multiple textual entailment datasets and show that the use of external knowledge helps improve prediction accuracy. This is particularly evident in the challenging BreakingNLI dataset, where we see an absolute improvement of 5-20% over multiple text-based entailment models.