The problem of sequential anomaly detection and identification is considered in the presence of a sampling constraint. Specifically, multiple data streams are generated by distinct sources and the goal is to quickly identify those that exhibit ``anomalous'' behavior, when it is not possible to sample every source at each time instant. Thus, in addition to a stopping rule, which determines when to stop sampling, and a decision rule, which indicates which sources to identify as anomalous upon stopping, one needs to specify a sampling rule that determines which sources to sample at each time instant. The focus of this work is on ordering sampling rules, which sample the data sources, among those currently estimated as anomalous (resp. non-anomalous), for which the corresponding local test statistics have the smallest (resp. largest) values. It is shown that with an appropriate design, which is specified explicitly, an ordering sampling rule leads to the optimal expected time for stopping, among all policies that satisfy the same sampling and error constraints, to a first-order asymptotic approximation as the false positive and false negative error rates under control both go to zero. This is the first asymptotic optimality result for ordering sampling rules when multiple sources can be sampled per time instant. Moreover, this is established under a general setup where the number of anomalies is not required to be a priori known. A novel proof technique is introduced, which unifies different versions of the problem regarding the homogeneity of the sources and prior information on the number of anomalies.
We investigate the use of multilevel Monte Carlo (MLMC) methods for estimating the expectation of discretized random fields. Specifically, we consider a setting in which the input and output vectors of the numerical simulators have inconsistent dimensions across the multilevel hierarchy. This requires the introduction of grid transfer operators borrowed from multigrid methods. Starting from a simple 1D illustration, we demonstrate numerically that the resulting MLMC estimator deteriorates the estimation of high-frequency components of the discretized expectation field compared to a Monte Carlo (MC) estimator. By adapting mathematical tools initially developed for multigrid methods, we perform a theoretical spectral analysis of the MLMC estimator of the expectation of discretized random fields, in the specific case of linear, symmetric and circulant simulators. This analysis provides a spectral decomposition of the variance into contributions associated with each scale component of the discretized field. We then propose improved MLMC estimators using a filtering mechanism similar to the smoothing process of multigrid methods. The filtering operators improve the estimation of both the small- and large-scale components of the variance, resulting in a reduction of the total variance of the estimator. These improvements are quantified for the specific class of simulators considered in our spectral analysis. The resulting filtered MLMC (F-MLMC) estimator is applied to the problem of estimating the discretized variance field of a diffusion-based covariance operator, which amounts to estimating the expectation of a discretized random field. The numerical experiments support the conclusions of the theoretical analysis even with non-linear simulators, and demonstrate the improvements brought by the proposed F-MLMC estimator compared to both a crude MC and an unfiltered MLMC estimator.
This paper presents methods to study the causal effect of a binary treatment on a functional outcome with observational data. We define a Functional Average Treatment Effect and develop an outcome regression estimator. We show how to obtain valid inference on the FATE using simultaneous confidence bands, which cover the FATE with a given probability over the entire domain. Simulation experiments illustrate how the simultaneous confidence bands take the multiple comparison problem into account. Finally, we use the methods to infer the effect of early adult location on subsequent income development for one Swedish birth cohort.
One of the main challenges of multimodal learning is the need to combine heterogeneous modalities (e.g., video, audio, text). For example, video and audio are obtained at much higher rates than text and are roughly aligned in time. They are often not synchronized with text, which comes as a global context, e.g., a title, or a description. Furthermore, video and audio inputs are of much larger volumes, and grow as the video length increases, which naturally requires more compute dedicated to these modalities and makes modeling of long-range dependencies harder. We here decouple the multimodal modeling, dividing it into separate, focused autoregressive models, processing the inputs according to the characteristics of the modalities. We propose a multimodal model, called Mirasol3B, consisting of an autoregressive component for the time-synchronized modalities (audio and video), and an autoregressive component for the context modalities which are not necessarily aligned in time but are still sequential. To address the long-sequences of the video-audio inputs, we propose to further partition the video and audio sequences in consecutive snippets and autoregressively process their representations. To that end, we propose a Combiner mechanism, which models the audio-video information jointly within a timeframe. The Combiner learns to extract audio and video features from raw spatio-temporal signals, and then learns to fuse these features producing compact but expressive representations per snippet. Our approach achieves the state-of-the-art on well established multimodal benchmarks, outperforming much larger models. It effectively addresses the high computational demand of media inputs by both learning compact representations, controlling the sequence length of the audio-video feature representations, and modeling their dependencies in time.
We consider the general problem of Bayesian binary regression and we introduce a new class of distributions, the Perturbed Unified Skew Normal (pSUN, henceforth), which generalizes the Unified Skew-Normal (SUN) class. We show that the new class is conjugate to any binary regression model, provided that the link function may be expressed as a scale mixture of Gaussian densities. We discuss in detail the popular logit case, and we show that, when a logistic regression model is combined with a Gaussian prior, posterior summaries such as cumulants and normalizing constants can be easily obtained through the use of an importance sampling approach, opening the way to straightforward variable selection procedures. For more general priors, the proposed methodology is based on a simple Gibbs sampler algorithm. We also claim that, in the p > n case, the proposed methodology shows better performances - both in terms of mixing and accuracy - compared to the existing methods. We illustrate the performance through several simulation studies and two data analyses.
Numerous practical medical problems often involve data that possess a combination of both sparse and non-sparse structures. Traditional penalized regularizations techniques, primarily designed for promoting sparsity, are inadequate to capture the optimal solutions in such scenarios. To address these challenges, this paper introduces a novel algorithm named Non-sparse Iteration (NSI). The NSI algorithm allows for the existence of both sparse and non-sparse structures and estimates them simultaneously and accurately. We provide theoretical guarantees that the proposed algorithm converges to the oracle solution and achieves the optimal rate for the upper bound of the $l_2$-norm error. Through simulations and practical applications, NSI consistently exhibits superior statistical performance in terms of estimation accuracy, prediction efficacy, and variable selection compared to several existing methods. The proposed method is also applied to breast cancer data, revealing repeated selection of specific genes for in-depth analysis.
Brain simulation builds dynamical models to mimic the structure and functions of the brain, while brain-inspired computing (BIC) develops intelligent systems by learning from the structure and functions of the brain. The two fields are intertwined and should share a common programming framework to facilitate each other's development. However, none of the existing software in the fields can achieve this goal, because traditional brain simulators lack differentiability for training, while existing deep learning (DL) frameworks fail to capture the biophysical realism and complexity of brain dynamics. In this paper, we introduce BrainPy, a differentiable brain simulator developed using JAX and XLA, with the aim of bridging the gap between brain simulation and BIC. BrainPy expands upon the functionalities of JAX, a powerful AI framework, by introducing complete capabilities for flexible, efficient, and scalable brain simulation. It offers a range of sparse and event-driven operators for efficient and scalable brain simulation, an abstraction for managing the intricacies of synaptic computations, a modular and flexible interface for constructing multi-scale brain models, and an object-oriented just-in-time compilation approach to handle the memory-intensive nature of brain dynamics. We showcase the efficiency and scalability of BrainPy on benchmark tasks, highlight its differentiable simulation for biologically plausible spiking models, and discuss its potential to support research at the intersection of brain simulation and BIC.
The development of cubical type theory inspired the idea of "extension types" which has been found to have applications in other type theories that are unrelated to homotopy type theory or cubical type theory. This article describes these applications, including on records, metaprogramming, controlling unfolding, and some more exotic ones.
The expanding hardware diversity in high performance computing adds enormous complexity to scientific software development. Developers who aim to write maintainable software have two options: 1) To use a so-called data locality abstraction that handles portability internally, thereby, performance-productivity becomes a trade off. Such abstractions usually come in the form of libraries, domain-specific languages, and run-time systems. 2) To use generic programming where performance, productivity and portability are subject to software design. In the direction of the second, this work describes a design approach that allows the integration of low-level and verbose programming tools into high-level generic algorithms based on template meta-programming in C++. This enables the development of performance-portable applications targeting host-device computer architectures, such as CPUs and GPUs. With a suitable design in place, the extensibility of generic algorithms to new hardware becomes a well defined procedure that can be developed in isolation from other parts of the code. That allows scientific software to be maintainable and efficient in a period of diversifying hardware in HPC. As proof of concept, a finite-difference modelling algorithm for the acoustic wave equation is developed and benchmarked using roofline model analysis on Intel Xeon Gold 6248 CPU, Nvidia Tesla V100 GPU, and AMD MI100 GPU.
Causal representation learning algorithms discover lower-dimensional representations of data that admit a decipherable interpretation of cause and effect; as achieving such interpretable representations is challenging, many causal learning algorithms utilize elements indicating prior information, such as (linear) structural causal models, interventional data, or weak supervision. Unfortunately, in exploratory causal representation learning, such elements and prior information may not be available or warranted. Alternatively, scientific datasets often have multiple modalities or physics-based constraints, and the use of such scientific, multimodal data has been shown to improve disentanglement in fully unsupervised settings. Consequently, we introduce a causal representation learning algorithm (causalPIMA) that can use multimodal data and known physics to discover important features with causal relationships. Our innovative algorithm utilizes a new differentiable parametrization to learn a directed acyclic graph (DAG) together with a latent space of a variational autoencoder in an end-to-end differentiable framework via a single, tractable evidence lower bound loss function. We place a Gaussian mixture prior on the latent space and identify each of the mixtures with an outcome of the DAG nodes; this novel identification enables feature discovery with causal relationships. Tested against a synthetic and a scientific dataset, our results demonstrate the capability of learning an interpretable causal structure while simultaneously discovering key features in a fully unsupervised setting.
The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard's universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants' predictions of the AI.