Reverse Unrestricted MIxed DAta Sampling (RU-MIDAS) regressions are used to model high-frequency responses by means of low-frequency variables. However, due to the periodic structure of RU-MIDAS regressions, the dimensionality grows quickly if the frequency mismatch between the high- and low-frequency variables is large. Additionally the number of high-frequency observations available for estimation decreases. We propose to counteract this reduction in sample size by pooling the high-frequency coefficients and further reduce the dimensionality through a sparsity-inducing convex regularizer that accounts for the temporal ordering among the different lags. To this end, the regularizer prioritizes the inclusion of lagged coefficients according to the recency of the information they contain. We demonstrate the proposed method on an empirical application for daily realized volatility forecasting where we explore whether modeling high-frequency volatility data in terms of low-frequency macroeconomic data pays off.
This paper studies an $N$--agent cost-coupled game where the agents are connected via an unreliable capacity constrained network. Each agent receives state information over that network which loses packets with probability $p$. A Base station (BS) actively schedules agent communications over the network by minimizing a weighted Age of Information (WAoI) based cost function under a capacity limit $\mathcal{C} < N$ on the number of transmission attempts at each instant. Under a standard information structure, we show that the problem can be decoupled into a scheduling problem for the BS and a game problem for the $N$ agents. Since the scheduling problem is an NP hard combinatorics problem, we propose an approximately optimal solution which approaches the optimal solution as $N \rightarrow \infty$. In the process, we also provide some insights on the case without channel erasure. Next, to solve the large population game problem, we use the mean-field game framework to compute an approximate decentralized Nash equilibrium. Finally, we validate the theoretical results using a numerical example.
Large-scale linear models are ubiquitous throughout machine learning, with contemporary application as surrogate models for neural network uncertainty quantification; that is, the linearised Laplace method. Alas, the computational cost associated with Bayesian linear models constrains this method's application to small networks, small output spaces and small datasets. We address this limitation by introducing a scalable sample-based Bayesian inference method for conjugate Gaussian multi-output linear models, together with a matching method for hyperparameter (regularisation) selection. Furthermore, we use a classic feature normalisation method (the g-prior) to resolve a previously highlighted pathology of the linearised Laplace method. Together, these contributions allow us to perform linearised neural network inference with ResNet-18 on CIFAR100 (11M parameters, 100 outputs x 50k datapoints), with ResNet-50 on Imagenet (50M parameters, 1000 outputs x 1.2M datapoints) and with a U-Net on a high-resolution tomographic reconstruction task (2M parameters, 251k output~dimensions).
Gaussian graphical models typically assume a homogeneous structure across all subjects, which is often restrictive in applications. In this article, we propose a weighted pseudo-likelihood approach for graphical modeling which allows different subjects to have different graphical structures depending on extraneous covariates. The pseudo-likelihood approach replaces the joint distribution by a product of the conditional distributions of each variable. We cast the conditional distribution as a heteroscedastic regression problem, with covariate-dependent variance terms, to enable information borrowing directly from the data instead of a hierarchical framework. This allows independent graphical modeling for each subject, while retaining the benefits of a hierarchical Bayes model and being computationally tractable. An efficient embarrassingly parallel variational algorithm is developed to approximate the posterior and obtain estimates of the graphs. Using a fractional variational framework, we derive asymptotic risk bounds for the estimate in terms of a novel variant of the $\alpha$-R\'{e}nyi divergence. We theoretically demonstrate the advantages of information borrowing across covariates over independent modeling. We show the practical advantages of the approach through simulation studies and illustrate the dependence structure in protein expression levels on breast cancer patients using CNV information as covariates.
We investigate novel parameter estimation and goodness-of-fit (GOF) assessment methods for large-scale confirmatory item factor analysis (IFA) with many respondents, items, and latent factors. For parameter estimation, we extend Urban and Bauer's (2021) deep learning algorithm for exploratory IFA to the confirmatory setting by showing how to handle constraints on loadings and factor correlations. For GOF assessment, we explore simulation-based tests and indices that extend the classifier two-sample test (C2ST), a method that tests whether a deep neural network can distinguish between observed data and synthetic data sampled from a fitted IFA model. Proposed extensions include a test of approximate fit wherein the user specifies what percentage of observed and synthetic data should be distinguishable as well as a relative fit index (RFI) that is similar in spirit to the RFIs used in structural equation modeling. Via simulation studies, we show that: (1) the confirmatory extension of Urban and Bauer's (2021) algorithm obtains comparable estimates to a state-of-the-art estimation procedure in less time; (2) C2ST-based GOF tests control the empirical type I error rate and detect when the latent dimensionality is misspecified; and (3) the sampling distribution of the C2ST-based RFI depends on the sample size.
Hierarchical learning algorithms that gradually approximate a solution to a data-driven optimization problem are essential to decision-making systems, especially under limitations on time and computational resources. In this study, we introduce a general-purpose hierarchical learning architecture that is based on the progressive partitioning of a possibly multi-resolution data space. The optimal partition is gradually approximated by solving a sequence of optimization sub-problems that yield a sequence of partitions with increasing number of subsets. We show that the solution of each optimization problem can be estimated online using gradient-free stochastic approximation updates. As a consequence, a function approximation problem can be defined within each subset of the partition and solved using the theory of two-timescale stochastic approximation algorithms. This simulates an annealing process and defines a robust and interpretable heuristic method to gradually increase the complexity of the learning architecture in a task-agnostic manner, giving emphasis to regions of the data space that are considered more important according to a predefined criterion. Finally, by imposing a tree structure in the progression of the partitions, we provide a means to incorporate potential multi-resolution structure of the data space into this approach, significantly reducing its complexity, while introducing hierarchical variable-rate feature extraction properties similar to certain classes of deep learning architectures. Asymptotic convergence analysis and experimental results are provided for supervised and unsupervised learning problems.
Many real-world systems are described not only by data from a single source but via multiple data views. In genomic medicine, for instance, patients can be characterized by data from different molecular layers. Latent variable models with structured sparsity are a commonly used tool for disentangling variation within and across data views. However, their interpretability is cumbersome since it requires a direct inspection and interpretation of each factor from domain experts. Here, we propose MuVI, a novel multi-view latent variable model based on a modified horseshoe prior for modeling structured sparsity. This facilitates the incorporation of limited and noisy domain knowledge, thereby allowing for an analysis of multi-view data in an inherently explainable manner. We demonstrate that our model (i) outperforms state-of-the-art approaches for modeling structured sparsity in terms of the reconstruction error and the precision/recall, (ii) robustly integrates noisy domain expertise in the form of feature sets, (iii) promotes the identifiability of factors and (iv) infers interpretable and biologically meaningful axes of variation in a real-world multi-view dataset of cancer patients.
Model parameter regularization is a widely used technique to improve generalization, but also can be used to shape the weight distributions for various purposes. In this work, we shed light on how weight regularization can assist model quantization and compression techniques, and then propose range regularization (R^2) to further boost the quality of model optimization by focusing on the outlier prevention. By effectively regulating the minimum and maximum weight values from a distribution, we mold the overall distribution into a tight shape so that model compression and quantization techniques can better utilize their limited numeric representation powers. We introduce L-inf regularization, its extension margin regularization and a new soft-min-max regularization to be used as a regularization loss during full-precision model training. Coupled with state-of-the-art quantization and compression techniques, models trained with R^2 perform better on an average, specifically at lower bit weights with 16x compression ratio. We also demonstrate that R^2 helps parameter constrained models like MobileNetV1 achieve significant improvement of around 8% for 2 bit quantization and 7% for 1 bit compression.
Event-based cameras have recently shown great potential for high-speed motion estimation owing to their ability to capture temporally rich information asynchronously. Spiking Neural Networks (SNNs), with their neuro-inspired event-driven processing can efficiently handle such asynchronous data, while neuron models such as the leaky-integrate and fire (LIF) can keep track of the quintessential timing information contained in the inputs. SNNs achieve this by maintaining a dynamic state in the neuron memory, retaining important information while forgetting redundant data over time. Thus, we posit that SNNs would allow for better performance on sequential regression tasks compared to similarly sized Analog Neural Networks (ANNs). However, deep SNNs are difficult to train due to vanishing spikes at later layers. To that effect, we propose an adaptive fully-spiking framework with learnable neuronal dynamics to alleviate the spike vanishing problem. We utilize surrogate gradient-based backpropagation through time (BPTT) to train our deep SNNs from scratch. We validate our approach for the task of optical flow estimation on the Multi-Vehicle Stereo Event-Camera (MVSEC) dataset and the DSEC-Flow dataset. Our experiments on these datasets show an average reduction of 13% in average endpoint error (AEE) compared to state-of-the-art ANNs. We also explore several down-scaled models and observe that our SNN models consistently outperform similarly sized ANNs offering 10%-16% lower AEE. These results demonstrate the importance of SNNs for smaller models and their suitability at the edge. In terms of efficiency, our SNNs offer substantial savings in network parameters (48.3x) and computational energy (10.2x) while attaining ~10% lower EPE compared to the state-of-the-art ANN implementations.
This paper studies inference in randomized controlled trials with multiple treatments, where treatment status is determined according to a "matched tuples" design. Here, by a matched tuples design, we mean an experimental design where units are sampled i.i.d. from the population of interest, grouped into "homogeneous" blocks with cardinality equal to the number of treatments, and finally, within each block, each treatment is assigned exactly once uniformly at random. We first study estimation and inference for matched tuples designs in the general setting where the parameter of interest is a vector of linear contrasts over the collection of average potential outcomes for each treatment. Parameters of this form include standard average treatment effects used to compare one treatment relative to another, but also include parameters which may be of interest in the analysis of factorial designs. We first establish conditions under which a sample analogue estimator is asymptotically normal and construct a consistent estimator of its corresponding asymptotic variance. Combining these results establishes the asymptotic exactness of tests based on these estimators. In contrast, we show that, for two common testing procedures based on t-tests constructed from linear regressions, one test is generally conservative while the other generally invalid. We go on to apply our results to study the asymptotic properties of what we call "fully-blocked" 2^K factorial designs, which are simply matched tuples designs applied to a full factorial experiment. Leveraging our previous results, we establish that our estimator achieves a lower asymptotic variance under the fully-blocked design than that under any stratified factorial design which stratifies the experimental sample into a finite number of "large" strata. A simulation study and empirical application illustrate the practical relevance of our results.
Causality can be described in terms of a structural causal model (SCM) that carries information on the variables of interest and their mechanistic relations. For most processes of interest the underlying SCM will only be partially observable, thus causal inference tries to leverage any exposed information. Graph neural networks (GNN) as universal approximators on structured input pose a viable candidate for causal learning, suggesting a tighter integration with SCM. To this effect we present a theoretical analysis from first principles that establishes a novel connection between GNN and SCM while providing an extended view on general neural-causal models. We then establish a new model class for GNN-based causal inference that is necessary and sufficient for causal effect identification. Our empirical illustration on simulations and standard benchmarks validate our theoretical proofs.