This paper introduces graph-based mutually exciting processes (GB-MEP) to model event times in network point processes, focusing on an application to docked bike-sharing systems. GB-MEP incorporates known relationships between nodes in a graph within the intensity function of a node-based multivariate Hawkes process. This approach reduces the number of parameters to a quantity proportional to the number of nodes in the network, resulting in significant advantages for computational scalability when compared to traditional methods. The model is applied on event data observed on the Santander Cycles network in central London, demonstrating that exploiting network-wide information related to geographical location of the stations is beneficial to improve the performance of node-based models for applications in bike-sharing systems. The proposed GB-MEP framework is more generally applicable to any network point process where a distance function between nodes is available, demonstrating wider applicability.
We consider the problem of approximating a function from $L^2$ by an element of a given $m$-dimensional space $V_m$, associated with some feature map $\varphi$, using evaluations of the function at random points $x_1,\dots,x_n$. After recalling some results on optimal weighted least-squares using independent and identically distributed points, we consider weighted least-squares using projection determinantal point processes (DPP) or volume sampling. These distributions introduce dependence between the points that promotes diversity in the selected features $\varphi(x_i)$. We first provide a generalized version of volume-rescaled sampling yielding quasi-optimality results in expectation with a number of samples $n = O(m\log(m))$, that means that the expected $L^2$ error is bounded by a constant times the best approximation error in $L^2$. Also, further assuming that the function is in some normed vector space $H$ continuously embedded in $L^2$, we further prove that the approximation is almost surely bounded by the best approximation error measured in the $H$-norm. This includes the cases of functions from $L^\infty$ or reproducing kernel Hilbert spaces. Finally, we present an alternative strategy consisting in using independent repetitions of projection DPP (or volume sampling), yielding similar error bounds as with i.i.d. or volume sampling, but in practice with a much lower number of samples. Numerical experiments illustrate the performance of the different strategies.
Any experiment with climate models relies on a potentially large set of spatio-temporal boundary conditions. These can represent both the initial state of the system and/or forcings driving the model output throughout the experiment. Whilst these boundary conditions are typically fixed using available reconstructions in climate modelling studies, they are highly uncertain, that uncertainty is unquantified, and the effect on the output of the experiment can be considerable. We develop efficient quantification of these uncertainties that combines relevant data from multiple models and observations. Starting from the coexchangeability model, we develop a coexchangable process model to capture multiple correlated spatio-temporal fields of variables. We demonstrate that further exchangeability judgements over the parameters within this representation lead to a Bayes linear analogy of a hierarchical model. We use the framework to provide a joint reconstruction of sea-surface temperature and sea-ice concentration boundary conditions at the last glacial maximum (19-23 ka) and use it to force an ensemble of ice-sheet simulations using the FAMOUS-Ice coupled atmosphere and ice-sheet model. We demonstrate that existing boundary conditions typically used in these experiments are implausible given our uncertainties and demonstrate the impact of using more plausible boundary conditions on ice-sheet simulation.
Deep learning techniques have dominated the literature on aspect-based sentiment analysis (ABSA), achieving state-of-the-art performance. However, deep models generally suffer from spurious correlations between input features and output labels, which hurts the robustness and generalization capability by a large margin. In this paper, we propose to reduce spurious correlations for ABSA, via a novel Contrastive Variational Information Bottleneck framework (called CVIB). The proposed CVIB framework is composed of an original network and a self-pruned network, and these two networks are optimized simultaneously via contrastive learning. Concretely, we employ the Variational Information Bottleneck (VIB) principle to learn an informative and compressed network (self-pruned network) from the original network, which discards the superfluous patterns or spurious correlations between input features and prediction labels. Then, self-pruning contrastive learning is devised to pull together semantically similar positive pairs and push away dissimilar pairs, where the representations of the anchor learned by the original and self-pruned networks respectively are regarded as a positive pair while the representations of two different sentences within a mini-batch are treated as a negative pair. To verify the effectiveness of our CVIB method, we conduct extensive experiments on five benchmark ABSA datasets and the experimental results show that our approach achieves better performance than the strong competitors in terms of overall prediction performance, robustness, and generalization. Code and data to reproduce the results in this paper is available at: //github.com/shesshan/CVIB.
We consider unregularized robust M-estimators for linear models under Gaussian design and heavy-tailed noise, in the proportional asymptotics regime where the sample size n and the number of features p are both increasing such that $p/n \to \gamma\in (0,1)$. An estimator of the out-of-sample error of a robust M-estimator is analysed and proved to be consistent for a large family of loss functions that includes the Huber loss. As an application of this result, we propose an adaptive tuning procedure of the scale parameter $\lambda>0$ of a given loss function $\rho$: choosing$\hat \lambda$ in a given interval $I$ that minimizes the out-of-sample error estimate of the M-estimator constructed with loss $\rho_\lambda(\cdot) = \lambda^2 \rho(\cdot/\lambda)$ leads to the optimal out-of-sample error over $I$. The proof relies on a smoothing argument: the unregularized M-estimation objective function is perturbed, or smoothed, with a Ridge penalty that vanishes as $n\to+\infty$, and show that the unregularized M-estimator of interest inherits properties of its smoothed version.
Measurement-based quantum computation (MBQC) is a paradigm for quantum computation where computation is driven by local measurements on a suitably entangled resource state. In this work we show that MBQC is related to a model of quantum computation based on Clifford quantum cellular automata (CQCA). Specifically, we show that certain MBQCs can be directly constructed from CQCAs which yields a simple and intuitive circuit model representation of MBQC in terms of quantum computation based on CQCA. We apply this description to construct various MBQC-based Ans\"atze for parameterized quantum circuits, demonstrating that the different Ans\"atze may lead to significantly different performances on different learning tasks. In this way, MBQC yields a family of Hardware-efficient Ans\"atze that may be adapted to specific problem settings and is particularly well suited for architectures with translationally invariant gates such as neutral atoms.
While generalized linear mixed models (GLMMs) are a fundamental tool in applied statistics, many specifications -- such as those involving categorical factors with many levels or interaction terms -- can be computationally challenging to estimate due to the need to compute or approximate high-dimensional integrals. Variational inference (VI) methods are a popular way to perform such computations, especially in the Bayesian context. However, naive VI methods can provide unreliable uncertainty quantification. We show that this is indeed the case in the GLMM context, proving that standard VI (i.e. mean-field) dramatically underestimates posterior uncertainty in high-dimensions. We then show how appropriately relaxing the mean-field assumption leads to VI methods whose uncertainty quantification does not deteriorate in high-dimensions, and whose total computational cost scales linearly with the number of parameters and observations. Our theoretical and numerical results focus on GLMMs with Gaussian or binomial likelihoods, and rely on connections to random graph theory to obtain sharp high-dimensional asymptotic analysis. We also provide generic results, which are of independent interest, relating the accuracy of variational inference to the convergence rate of the corresponding coordinate ascent variational inference (CAVI) algorithm for Gaussian targets. Our proposed partially-factorized VI (PF-VI) methodology for GLMMs is implemented in the R package vglmer, see //github.com/mgoplerud/vglmer . Numerical results with simulated and real data examples illustrate the favourable computation cost versus accuracy trade-off of PF-VI.
A popular design choice in public health and implementation science research, stepped wedge cluster randomized trials (SW-CRTs) are a form of randomized trial whereby clusters are progressively transitioned from control to intervention, and the timing of transition is randomized for each cluster. An important task at the design stage is to ensure that the planned trial has sufficient power to observe a clinically meaningful effect size. While methods for determining study power have been well-developed for SW-CRTs with continuous and binary outcomes, limited methods for power calculation are available for SW-CRTs with censored time-to-event outcomes. In this article, we propose a stratified marginal Cox model to account for secular trend in cross-sectional SW-CRTs, and derive an explicit expression of the robust sandwich variance to facilitate power calculations without the need for computationally intensive simulations. Power formulas based on both the Wald and robust score tests are developed and compared via simulation, generally demonstrating superiority of robust score procedures in different finite-sample scenarios. Finally, we illustrate our methods using a SW-CRT testing the effect of a new electronic reminder system on time to catheter removal in hospital settings. We also offer an R Shiny application to facilitate sample size and power calculations using our proposed methods.
This paper integrates nonlinear-manifold reduced order models (NM-ROMs) with domain decomposition (DD). NM-ROMs approximate the FOM state in a nonlinear-manifold by training a shallow, sparse autoencoder using FOM snapshot data. These NM-ROMs can be advantageous over linear-subspace ROMs (LS-ROMs) for problems with slowly decaying Kolmogorov n-width. However, the number of NM-ROM parameters that need to be trained scales with the size of the FOM. Moreover, for "extreme-scale" problems, the storage of high-dimensional FOM snapshots alone can make ROM training expensive. To alleviate the training cost, this paper applies DD to the FOM, computes NM-ROMs on each subdomain, and couples them to obtain a global NM-ROM. This approach has several advantages: Subdomain NM-ROMs can be trained in parallel, involve fewer parameters to be trained than global NM-ROMs, require smaller subdomain FOM dimensional training data, and can be tailored to subdomain-specific features of the FOM. The shallow, sparse architecture of the autoencoder used in each subdomain NM-ROM allows application of hyper-reduction (HR), reducing the complexity caused by nonlinearity and yielding computational speedup of the NM-ROM. This paper provides the first application of NM-ROM (with HR) to a DD problem. In particular, it details an algebraic DD reformulation of the FOM, trains a NM-ROM with HR for each subdomain, and develops a sequential quadratic programming (SQP) solver to evaluate the coupled global NM-ROM. Theoretical convergence results for the SQP method and a priori and a posteriori error estimates for the DD NM-ROM with HR are provided. The proposed DD NM-ROM with HR approach is numerically compared to a DD LS-ROM with HR on the 2D steady-state Burgers' equation, showing an order of magnitude improvement in accuracy of the proposed DD NM-ROM over the DD LS-ROM.
An integrated Equation of State (EOS) and strength/pore-crush/damage model framework is provided for modeling near to source (near-field) ground-shock response, where large deformations and pressures necessitate coupling EOS with pressure-dependent plastic yield and damage. Nonlinear pressure-dependence of strength up to high-pressures is combined with a Modified Cam-Clay-like cap-plasticity model in a way to allow degradation of strength from pore-crush damage, what we call the ``Yp-Cap'' model. Nonlinear hardening under compaction allows modeling the crush-out of pores in combination with a fully saturated EOS, i.e., for modeling partially saturated ground-shock response, where air-filled voids crush. Attention is given to algorithmic clarity and efficiency of the provided model, and the model is employed in example numerical simulations, including finite element simulations of underground explosions to exemplify its robustness and utility.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.