Latent variable models serve as powerful tools to infer underlying dynamics from observed neural activity. However, due to the absence of ground truth data, prediction benchmarks are often employed as proxies. In this study, we reveal the limitations of the widely-used 'co-smoothing' prediction framework and propose an improved few-shot prediction approach that encourages more accurate latent dynamics. Utilizing a student-teacher setup with Hidden Markov Models, we demonstrate that the high co-smoothing model space can encompass models with arbitrary extraneous dynamics within their latent representations. To address this, we introduce a secondary metric -- a few-shot version of co-smoothing. This involves performing regression from the latent variables to held-out channels in the data using fewer trials. Our results indicate that among models with near-optimal co-smoothing, those with extraneous dynamics underperform in the few-shot co-smoothing compared to 'minimal' models devoid of such dynamics. We also provide analytical insights into the origin of this phenomenon. We further validate our findings on real neural data using two state-of-the-art methods: LFADS and STNDT. In the absence of ground truth, we suggest a proxy measure to quantify extraneous dynamics. By cross-decoding the latent variables of all model pairs with high co-smoothing, we identify models with minimal extraneous dynamics. We find a correlation between few-shot co-smoothing performance and this new measure. In summary, we present a novel prediction metric designed to yield latent variables that more accurately reflect the ground truth, offering a significant improvement for latent dynamics inference.
Our goal is to highlight some of the deep links between numerical splitting methods and control theory. We consider evolution equations of the form $\dot{x} = f_0(x) + f_1(x)$, where $f_0$ encodes a non-reversible dynamic, so that one is interested in schemes only involving forward flows of $f_0$. In this context, a splitting method can be interpreted as a trajectory of the control-affine system $\dot{x}(t)=f_0(x(t))+u(t)f_1(x(t))$, associated with a control~$u$ which is a finite sum of Dirac masses. The general goal is then to find a control such that the flow of $f_0 + u(t) f_1$ is as close as possible to the flow of $f_0+f_1$. Using this interpretation and classical tools from control theory, we revisit well-known results concerning numerical splitting methods, and we prove a handful of new ones, with an emphasis on splittings with additional positivity conditions on the coefficients. First, we show that there exist numerical schemes of any arbitrary order involving only forward flows of $f_0$ if one allows complex coefficients for the flows of $f_1$. Equivalently, for complex-valued controls, we prove that the Lie algebra rank condition is equivalent to the small-time local controllability of a system. Second, for real-valued coefficients, we show that the well-known order restrictions are linked with so-called "bad" Lie brackets from control theory, which are known to yield obstructions to small-time local controllability. We use our recent basis of the free Lie algebra to precisely identify the conditions under which high-order methods exist.
We show that the limiting variance of a sequence of estimators for a structured covariance matrix has a general form that appears as the variance of a scaled projection of a random matrix that is of radial type and a similar result is obtained for the corresponding sequence of estimators for the vector of variance components. These results are illustrated by the limiting behavior of estimators for a linear covariance structure in a variety of multivariate statistical models. We also derive a characterization for the influence function of corresponding functionals. Furthermore, we derive the limiting distribution and influence function of scale invariant mappings of such estimators and their corresponding functionals. As a consequence, the asymptotic relative efficiency of different estimators for the shape component of a structured covariance matrix can be compared by means of a single scalar and the gross error sensitivity of the corresponding influence functions can be compared by means of a single index. Similar results are obtained for estimators of the normalized vector of variance components. We apply our results to investigate how the efficiency, gross error sensitivity, and breakdown point of S-estimators for the normalized variance components are affected simultaneously by varying their cutoff value.
We deal with a model selection problem for structural equation modeling (SEM) with latent variables for diffusion processes. Based on the asymptotic expansion of the marginal quasi-log likelihood, we propose two types of quasi-Bayesian information criteria of the SEM. It is shown that the information criteria have model selection consistency. Furthermore, we examine the finite-sample performance of the proposed information criteria by numerical experiments.
We address the problem of the best uniform approximation of a continuous function on a convex domain. The approximation is by linear combinations of a finite system of functions (not necessarily Chebyshev) under arbitrary linear constraints. By modifying the concept of alternance and of the Remez iterative procedure we present a method, which demonstrates its efficiency in numerical problems. The linear rate of convergence is proved under some favourable assumptions. A special attention is paid to systems of complex exponents, Gaussian functions, lacunar algebraic and trigonometric polynomials. Applications to signal processing, linear ODE, switching dynamical systems, and to Markov-Bernstein type inequalities are considered.
A functional nonlinear regression approach, incorporating time information in the covariates, is proposed for temporal strong correlated manifold map data sequence analysis. Specifically, the functional regression parameters are supported on a connected and compact two--point homogeneous space. The Generalized Least--Squares (GLS) parameter estimator is computed in the linearized model, having error term displaying manifold scale varying Long Range Dependence (LRD). The performance of the theoretical and plug--in nonlinear regression predictors is illustrated by simulations on sphere, in terms of the empirical mean of the computed spherical functional absolute errors. In the case where the second--order structure of the functional error term in the linearized model is unknown, its estimation is performed by minimum contrast in the functional spectral domain. The linear case is illustrated in the Supplementary Material, revealing the effect of the slow decay velocity in time of the trace norms of the covariance operator family of the regression LRD error term. The purely spatial statistical analysis of atmospheric pressure at high cloud bottom, and downward solar radiation flux in Alegria et al. (2021) is extended to the spatiotemporal context, illustrating the numerical results from a generated synthetic data set.
Many economic panel and dynamic models, such as rational behavior and Euler equations, imply that the parameters of interest are identified by conditional moment restrictions. We introduce a novel inference method without any prior information about which conditioning instruments are weak or irrelevant. Building on Bierens (1990), we propose penalized maximum statistics and combine bootstrap inference with model selection. Our method optimizes asymptotic power by solving a data-dependent max-min problem for tuning parameter selection. Extensive Monte Carlo experiments, based on an empirical example, demonstrate the extent to which our inference procedure is superior to those available in the literature.
An essential problem in statistics and machine learning is the estimation of expectations involving PDFs with intractable normalizing constants. The self-normalized importance sampling (SNIS) estimator, which normalizes the IS weights, has become the standard approach due to its simplicity. However, the SNIS has been shown to exhibit high variance in challenging estimation problems, e.g, involving rare events or posterior predictive distributions in Bayesian statistics. Further, most of the state-of-the-art adaptive importance sampling (AIS) methods adapt the proposal as if the weights had not been normalized. In this paper, we propose a framework that considers the original task as estimation of a ratio of two integrals. In our new formulation, we obtain samples from a joint proposal distribution in an extended space, with two of its marginals playing the role of proposals used to estimate each integral. Importantly, the framework allows us to induce and control a dependency between both estimators. We propose a construction of the joint proposal that decomposes in two (multivariate) marginals and a coupling. This leads to a two-stage framework suitable to be integrated with existing or new AIS and/or variational inference (VI) algorithms. The marginals are adapted in the first stage, while the coupling can be chosen and adapted in the second stage. We show in several examples the benefits of the proposed methodology, including an application to Bayesian prediction with misspecified models.
Blocking, a special case of rerandomization, is routinely implemented in the design stage of randomized experiments to balance the baseline covariates. This study proposes a regression adjustment method based on the least absolute shrinkage and selection operator (Lasso) to efficiently estimate the average treatment effect in randomized block experiments with high-dimensional covariates. We derive the asymptotic properties of the proposed estimator and outline the conditions under which this estimator is more efficient than the unadjusted one. We provide a conservative variance estimator to facilitate valid inferences. Our framework allows one treated or control unit in some blocks and heterogeneous propensity scores across blocks, thus including paired experiments and finely stratified experiments as special cases. We further accommodate rerandomized experiments and a combination of blocking and rerandomization. Moreover, our analysis allows both the number of blocks and block sizes to tend to infinity, as well as heterogeneous treatment effects across blocks without assuming a true outcome data-generating model. Simulation studies and two real-data analyses demonstrate the advantages of the proposed method.
Weakly Supervised Semantic Segmentation (WSSS) employs weak supervision, such as image-level labels, to train the segmentation model. Despite the impressive achievement in recent WSSS methods, we identify that introducing weak labels with high mean Intersection of Union (mIoU) does not guarantee high segmentation performance. Existing studies have emphasized the importance of prioritizing precision and reducing noise to improve overall performance. In the same vein, we propose ORANDNet, an advanced ensemble approach tailored for WSSS. ORANDNet combines Class Activation Maps (CAMs) from two different classifiers to increase the precision of pseudo-masks (PMs). To further mitigate small noise in the PMs, we incorporate curriculum learning. This involves training the segmentation model initially with pairs of smaller-sized images and corresponding PMs, gradually transitioning to the original-sized pairs. By combining the original CAMs of ResNet-50 and ViT, we significantly improve the segmentation performance over the single-best model and the naive ensemble model, respectively. We further extend our ensemble method to CAMs from AMN (ResNet-like) and MCTformer (ViT-like) models, achieving performance benefits in advanced WSSS models. It highlights the potential of our ORANDNet as a final add-on module for WSSS models.
This note discusses a simple modification of cross-conformal prediction inspired by recent work on e-values. The precursor of conformal prediction developed in the 1990s by Gammerman, Vapnik, and Vovk was also based on e-values and is called conformal e-prediction in this note. Replacing e-values by p-values led to conformal prediction, which has important advantages over conformal e-prediction without obvious disadvantages. The situation with cross-conformal prediction is, however, different: whereas for cross-conformal prediction validity is only an empirical fact (and can be broken with excessive randomization), this note draws the reader's attention to the obvious fact that cross-conformal e-prediction enjoys a guaranteed property of validity.