The Gaussian mixed-effects model driven by a stationary integrated Ornstein-Uhlenbeck process has been used for analyzing longitudinal data having an explicit and simple serial-correlation structure in each individual. However, the theoretical aspect of its asymptotic inference is yet to be elucidated. We prove the local asymptotics for the associated log-likelihood function, which in particular guarantees the asymptotic optimality of the suitably chosen maximum-likelihood estimator. We illustrate the obtained asymptotic normality result through some simulations for both balanced and unbalanced datasets.
A fundamental problem of causal discovery is cause-effect inference, learning the correct causal direction between two random variables. Significant progress has been made through modelling the effect as a function of its cause and a noise term, which allows us to leverage assumptions about the generating function class. The recently introduced heteroscedastic location-scale noise functional models (LSNMs) combine expressive power with identifiability guarantees. LSNM model selection based on maximizing likelihood achieves state-of-the-art accuracy, when the noise distributions are correctly specified. However, through an extensive empirical evaluation, we demonstrate that the accuracy deteriorates sharply when the form of the noise distribution is misspecified by the user. Our analysis shows that the failure occurs mainly when the conditional variance in the anti-causal direction is smaller than that in the causal direction. As an alternative, we find that causal model selection through residual independence testing is much more robust to noise misspecification and misleading conditional variance.
The importance of unspanned macroeconomic variables for Dynamic Term Structure Models has been intensively discussed in the literature. To our best knowledge the earlier studies considered only linear interactions between the economy and the real-world dynamics of interest rates in DTSMs. We propose a generalized modelling setup for Gaussian DTSMs which allows for unspanned nonlinear associations between the two and we exploit it in forecasting. Specifically, we construct a custom sequential Monte Carlo estimation and forecasting scheme where we introduce Gaussian Process priors to model nonlinearities. Sequential scheme we propose can also be used with dynamic portfolio optimization to assess the potential of generated economic value to investors. The methodology is presented using US Treasury data and selected macroeconomic indices. Namely, we look at core inflation and real economic activity. We contrast the results obtained from the nonlinear model with those stemming from an application of a linear model. Unlike for real economic activity, in case of core inflation we find that, compared to linear models, application of nonlinear models leads to statistically significant gains in economic value across considered maturities.
Generalized approximate message passing (GAMP) is a computationally efficient algorithm for estimating an unknown signal $w_0\in\mathbb{R}^N$ from a random linear measurement $y= Xw_0 + \epsilon\in\mathbb{R}^M$, where $X\in\mathbb{R}^{M\times N}$ is a known measurement matrix and $\epsilon$ is the noise vector. The salient feature of GAMP is that it can provide an unbiased estimator $\hat{r}^{\rm G}\sim\mathcal{N}(w_0, \hat{s}^2I_N)$, which can be used for various hypothesis-testing methods. In this study, we consider the bootstrap average of an unbiased estimator of GAMP for the elastic net. By numerically analyzing the state evolution of \emph{approximate message passing with resampling}, which has been proposed for computing bootstrap statistics of the elastic net estimator, we investigate when the bootstrap averaging reduces the variance of the unbiased estimator and the effect of optimizing the size of each bootstrap sample and hyperparameter of the elastic net regularization in the asymptotic setting $M, N\to\infty, M/N\to\alpha\in(0,\infty)$. The results indicate that bootstrap averaging effectively reduces the variance of the unbiased estimator when the actual data generation process is inconsistent with the sparsity assumption of the regularization and the sample size is small. Furthermore, we find that when $w_0$ is less sparse, and the data size is small, the system undergoes a phase transition. The phase transition indicates the existence of the region where the ensemble average of unbiased estimators of GAMP for the elastic net norm minimization problem yields the unbiased estimator with the minimum variance.
A growing line of work shows how learned predictions can be used to break through worst-case barriers to improve the running time of an algorithm. However, incorporating predictions into data structures with strong theoretical guarantees remains underdeveloped. This paper takes a step in this direction by showing that predictions can be leveraged in the fundamental online list labeling problem. In the problem, n items arrive over time and must be stored in sorted order in an array of size Theta(n). The array slot of an element is its label and the goal is to maintain sorted order while minimizing the total number of elements moved (i.e., relabeled). We design a new list labeling data structure and bound its performance in two models. In the worst-case learning-augmented model, we give guarantees in terms of the error in the predictions. Our data structure provides strong guarantees: it is optimal for any prediction error and guarantees the best-known worst-case bound even when the predictions are entirely erroneous. We also consider a stochastic error model and bound the performance in terms of the expectation and variance of the error. Finally, the theoretical results are demonstrated empirically. In particular, we show that our data structure has strong performance on real temporal data sets where predictions are constructed from elements that arrived in the past, as is typically done in a practical use case.
The likelihood ratio is a crucial quantity for statistical inference in science that enables hypothesis testing, construction of confidence intervals, reweighting of distributions, and more. Many modern scientific applications, however, make use of data- or simulation-driven models for which computing the likelihood ratio can be very difficult or even impossible. By applying the so-called ``likelihood ratio trick,'' approximations of the likelihood ratio may be computed using clever parametrizations of neural network-based classifiers. A number of different neural network setups can be defined to satisfy this procedure, each with varying performance in approximating the likelihood ratio when using finite training data. We present a series of empirical studies detailing the performance of several common loss functionals and parametrizations of the classifier output in approximating the likelihood ratio of two univariate and multivariate Gaussian distributions as well as simulated high-energy particle physics datasets.
It is well known, that Fr\'echet means on non-Euclidean spaces may exhibit nonstandard asymptotic rates depending on curvature. Even for distributions featuring standard asymptotic rates, there are non-Euclidean effects, altering finite sampling rates up to considerable sample sizes. These effects can be measured by the variance modulation function proposed by Pennec (2019). Among others, in view of statistical inference, it is important to bound this function on intervals of sampling sizes. In a first step into this direction, for the special case of a K-spider we give such an interval, based only on folded moments and total probabilities of spider legs and illustrate the method by simulations.
The Mean Field Variational Bayes (MFVB) method is one of the most computationally efficient techniques for Bayesian inference. However, its use has been restricted to models with conjugate priors or those that require analytical calculations. This paper proposes a novel particle-based MFVB approach that greatly expands the applicability of the MFVB method. We establish the theoretical basis of the new method by leveraging the connection between Wasserstein gradient flows and Langevin diffusion dynamics, and demonstrate the effectiveness of this approach using Bayesian logistic regression, stochastic volatility, and deep neural networks.
Extended Dynamic Mode Decomposition (EDMD) is a data-driven tool for forecasting and model reduction of dynamics, which has been extensively taken up in the physical sciences. While the method is conceptually simple, in deterministic chaos it is unclear what its properties are or even what it converges to. In particular, it is not clear how EDMD's least-squares approximation treats the classes of regular functions needed to make sense of chaotic dynamics. In this paper we develop a general, rigorous theory of EDMD on the simplest examples of chaotic maps: analytic expanding maps of the circle. To do this, we prove a new result in the theory of orthogonal polynomials on the unit circle (OPUC) and apply methods from transfer operator theory. We show that in the infinite-data limit, the least-squares projection is exponentially efficient for trigonometric polynomial observable dictionaries. As a result, we show that the forecasts and Koopman spectral data produced using EDMD in this setting converge to the physically meaningful limits, exponentially quickly in the size of the dictionary. This demonstrates that with only a relatively small polynomial dictionary, EDMD can be very effective, even when the sampling measure is not uniform. Furthermore, our OPUC result suggests that data-based least-squares projections may be a very effective approximation strategy.
Most object recognition approaches predominantly focus on learning discriminative visual patterns while overlooking the holistic object structure. Though important, structure modeling usually requires significant manual annotations and therefore is labor-intensive. In this paper, we propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions into the traditional framework. We show the recognition backbone can be substantially enhanced for more robust representation learning, without any cost of extra annotation and inference speed. Specifically, we first propose an object-extent learning module for localizing the object according to the visual patterns shared among the instances in the same category. We then design a spatial context learning module for modeling the internal structures of the object, through predicting the relative positions within the extent. These two modules can be easily plugged into any backbone networks during training and detached at inference time. Extensive experiments show that our look-into-object approach (LIO) achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft). We also show that this learning paradigm is highly generalizable to other tasks such as object detection and segmentation (MS COCO). Project page: //github.com/JDAI-CV/LIO.
Graph Neural Networks (GNNs), which generalize deep neural networks to graph-structured data, have drawn considerable attention and achieved state-of-the-art performance in numerous graph related tasks. However, existing GNN models mainly focus on designing graph convolution operations. The graph pooling (or downsampling) operations, that play an important role in learning hierarchical representations, are usually overlooked. In this paper, we propose a novel graph pooling operator, called Hierarchical Graph Pooling with Structure Learning (HGP-SL), which can be integrated into various graph neural network architectures. HGP-SL incorporates graph pooling and structure learning into a unified module to generate hierarchical representations of graphs. More specifically, the graph pooling operation adaptively selects a subset of nodes to form an induced subgraph for the subsequent layers. To preserve the integrity of graph's topological information, we further introduce a structure learning mechanism to learn a refined graph structure for the pooled graph at each layer. By combining HGP-SL operator with graph neural networks, we perform graph level representation learning with focus on graph classification task. Experimental results on six widely used benchmarks demonstrate the effectiveness of our proposed model.