Sequential change detection is a classical problem with a variety of applications. However, the majority of prior work has been parametric, for example, focusing on exponential families. We develop a fundamentally new and general framework for sequential change detection when the pre- and post-change distributions are nonparametrically specified (and thus composite). Our procedures come with clean, nonasymptotic bounds on the average run length (frequency of false alarms). In certain nonparametric cases (like sub-Gaussian or sub-exponential), we also provide near-optimal bounds on the detection delay following a changepoint. The primary technical tool that we introduce is called an \emph{e-detector}, which is composed of sums of e-processes -- a fundamental generalization of nonnegative supermartingales -- that are started at consecutive times. We first introduce simple Shiryaev-Roberts and CUSUM-style e-detectors, and then show how to design their mixtures in order to achieve both statistical and computational efficiency. Our e-detector framework can be instantiated to recover classical likelihood-based procedures for parametric problems, as well as yielding the first change detection method for many nonparametric problems. As a running example, we tackle the problem of detecting changes in the mean of a bounded random variable without i.i.d. assumptions, with an application to tracking the performance of a basketball team over multiple seasons.
Block majorization-minimization (BMM) is a simple iterative algorithm for nonconvex optimization that sequentially minimizes a majorizing surrogate of the objective function in each block coordinate while the other block coordinates are held fixed. We consider a family of BMM algorithms for minimizing smooth nonconvex objectives, where each parameter block is constrained within a subset of a Riemannian manifold. We establish that this algorithm converges asymptotically to the set of stationary points, and attains an $\epsilon$-stationary point within $\widetilde{O}(\epsilon^{-2})$ iterations. In particular, the assumptions for our complexity results are completely Euclidean when the underlying manifold is a product of Euclidean or Stiefel manifolds, although our analysis makes explicit use of the Riemannian geometry. Our general analysis applies to a wide range of algorithms with Riemannian constraints: Riemannian MM, block projected gradient descent, optimistic likelihood estimation, geodesically constrained subspace tracking, robust PCA, and Riemannian CP-dictionary-learning. We experimentally validate that our algorithm converges faster than standard Euclidean algorithms applied to the Riemannian setting.
We develop a machine learning approach to identifying parameters with steady-state solutions, locating such solutions, and determining their linear stability for systems of ordinary differential equations and dynamical systems with parameters. Our approach begins with the construction of target functions that can be used to identify parameters with steady-state solution and the linear stability of such solutions. We design a parameter-solution neural network (PSNN) that couples a parameter neural network and a solution neural network to approximate the target function, and develop efficient algorithms to train the PSNN and to locate steady-state solutions. We also present a theory of approximation of the target function by our PSNN based on the neural network kernel decomposition. Numerical results are reported to show that our approach is robust in identifying the phase boundaries separating different regions in the parameter space corresponding to no solution or different numbers of solutions and in classifying the stability of solutions. These numerical results also validate our analysis. Although the primary focus in this study centers on steady states of parameterized dynamical systems, our approach is applicable generally to finding solutions for parameterized nonlinear systems of algebraic equations. Some potential improvements and future work are discussed.
Spatial data can come in a variety of different forms, but two of the most common generating models for such observations are random fields and point processes. Whilst it is known that spectral analysis can unify these two different data forms, specific methodology for the related estimation is yet to be developed. In this paper, we solve this problem by extending multitaper estimation, to estimate the spectral density matrix function for multivariate spatial data, where processes can be any combination of either point processes or random fields. We discuss finite sample and asymptotic theory for the proposed estimators, as well as specific details on the implementation, including how to perform estimation on non-rectangular domains and the correct implementation of multitapering for processes sampled in different ways, e.g. continuously vs on a regular grid.
This work concerns the construction and characterization of product kernels for multivariate approximation from a finite set of discrete samples. To this end, we consider composing different component kernels, each acting on a low-dimensional Euclidean space. Due to Aronszajn (1950), the product of positive semi-definite kernel functions is again positive semi-definite, where, moreover, the corresponding native space is a particular instance of a tensor product, referred to as Hilbert tensor product. We first analyze the general problem of multivariate interpolation by product kernels. Then, we further investigate the tensor product structure, in particular for grid-like samples. We use this case to show that the product of strictly positive definite kernel functions is again strictly positive definite. Moreover, we develop an efficient computation scheme for the well-known Newton basis. Supporting numerical examples show the good performance of product kernels, especially for their flexibility.
Compressive learning is an emerging approach to drastically reduce the memory footprint of large-scale learning, by first summarizing a large dataset into a low-dimensional sketch vector, and then decoding from this sketch the latent information needed for learning. In light of recent progress on information preservation guarantees for sketches based on random features, a major objective is to design easy-to-tune algorithms (called decoders) to robustly and efficiently extract this information. To address the underlying non-convex optimization problems, various heuristics have been proposed. In the case of compressive clustering, the standard heuristic is CL-OMPR, a variant of sliding Frank-Wolfe. Yet, CL-OMPR is hard to tune, and the examination of its robustness was overlooked. In this work, we undertake a scrutinized examination of CL-OMPR to circumvent its limitations. In particular, we show how this algorithm can fail to recover the clusters even in advantageous scenarios. To gain insight, we show how the deficiencies of this algorithm can be attributed to optimization difficulties related to the structure of a correlation function appearing at core steps of the algorithm. To address these limitations, we propose an alternative decoder offering substantial improvements over CL-OMPR. Its design is notably inspired from the mean shift algorithm, a classic approach to detect the local maxima of kernel density estimators. The proposed algorithm can extract clustering information from a sketch of the MNIST dataset that is 10 times smaller than previously.
Text prompts are crucial for generalizing pre-trained open-set object detection models to new categories. However, current methods for text prompts are limited as they require manual feedback when generalizing to new categories, which restricts their ability to model complex scenes, often leading to incorrect detection results. To address this limitation, we propose a novel visual prompt method that learns new category knowledge from a few labeled images, which generalizes the pre-trained detection model to the new category. To allow visual prompts to represent new categories adequately, we propose a statistical-based prompt construction module that is not limited by predefined vocabulary lengths, thus allowing more vectors to be used when representing categories. We further utilize the category dictionaries in the pre-training dataset to design task-specific similarity dictionaries, which make visual prompts more discriminative. We evaluate the method on the ODinW dataset and show that it outperforms existing prompt learning methods and performs more consistently in combinatorial inference.
While question answering over knowledge bases (KBQA) has shown progress in addressing factoid questions, KBQA with numerical reasoning remains relatively unexplored. In this paper, we focus on the complex numerical reasoning in KBQA and propose a new task, NR-KBQA, which necessitates the ability to perform both multi-hop reasoning and numerical reasoning. We design a logic form in Python format called PyQL to represent the reasoning process of numerical reasoning questions. To facilitate the development of NR-KBQA, we present a large dataset called MarkQA, which is automatically constructed from a small set of seeds. Each question in MarkQA is equipped with its corresponding SPARQL query, alongside the step-by-step reasoning process in the QDMR format and PyQL program. Experimental results of some state-of-the-art QA methods on the MarkQA show that complex numerical reasoning in KBQA faces great challenges.
We study computational aspects of repulsive Gibbs point processes, which are probabilistic models of interacting particles in a finite-volume region of space. We introduce an approach for reducing a Gibbs point process to the hard-core model, a well-studied discrete spin system. Given an instance of such a point process, our reduction generates a random graph drawn from a natural geometric model. We show that the partition function of a hard-core model on graphs generated by the geometric model concentrates around the partition function of the Gibbs point process. Our reduction allows us to use a broad range of algorithms developed for the hard-core model to sample from the Gibbs point process and approximate its partition function. This is, to the extend of our knowledge, the first approach that deals with pair potentials of unbounded range. We compare the resulting algorithms with recently established results and study further properties of the random geometric graphs with respect to the hard-core model.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.
In recent years, object detection has experienced impressive progress. Despite these improvements, there is still a significant gap in the performance between the detection of small and large objects. We analyze the current state-of-the-art model, Mask-RCNN, on a challenging dataset, MS COCO. We show that the overlap between small ground-truth objects and the predicted anchors is much lower than the expected IoU threshold. We conjecture this is due to two factors; (1) only a few images are containing small objects, and (2) small objects do not appear enough even within each image containing them. We thus propose to oversample those images with small objects and augment each of those images by copy-pasting small objects many times. It allows us to trade off the quality of the detector on large objects with that on small objects. We evaluate different pasting augmentation strategies, and ultimately, we achieve 9.7\% relative improvement on the instance segmentation and 7.1\% on the object detection of small objects, compared to the current state of the art method on MS COCO.