Self-Healing Cyber-Physical Systems (SH-CPS) effectively recover from system perceived failures without human intervention. They ensure a level of resilience and tolerance to unforeseen situations that arise from intrinsic system and component degradation, errors, or malicious attacks. Implicit redundancy can be exploited in SH-CPS to structurally adapt without the need to explicitly duplicate components. However, implicitly redundant components do not guarantee the same level of dependability as the primary component used to provide for a given function. Additional processes are needed to restore critical system functionalities as desired. This work introduces implicit guarantees to ensure the dependability of implicitly redundant components and processes. Implicit guarantees can be obtained through inheritance and decomposition. Therefore, a level of dependability can be guaranteed in SH-CPS after adaptation and recovery while complying with requirements. We demonstrate compliance with the requirement guarantees while ensuring resilience in SH-CPS.
Black-box zero-th order optimization is a central primitive for applications in fields as diverse as finance, physics, and engineering. In a common formulation of this problem, a designer sequentially attempts candidate solutions, receiving noisy feedback on the value of each attempt from the system. In this paper, we study scenarios in which feedback is also provided on the safety of the attempted solution, and the optimizer is constrained to limit the number of unsafe solutions that are tried throughout the optimization process. Focusing on methods based on Bayesian optimization (BO), prior art has introduced an optimization scheme -- referred to as SAFEOPT -- that is guaranteed not to select any unsafe solution with a controllable probability over feedback noise as long as strict assumptions on the safety constraint function are met. In this paper, a novel BO-based approach is introduced that satisfies safety requirements irrespective of properties of the constraint function. This strong theoretical guarantee is obtained at the cost of allowing for an arbitrary, controllable but non-zero, rate of violation of the safety constraint. The proposed method, referred to as SAFE-BOCP, builds on online conformal prediction (CP) and is specialized to the cases in which feedback on the safety constraint is either noiseless or noisy. Experimental results on synthetic and real-world data validate the advantages and flexibility of the proposed SAFE-BOCP.
Magnitude pruning is one of the mainstream methods in lightweight architecture design whose goal is to extract subnetworks with the largest weight connections. This method is known to be successful, but under very high pruning regimes, it suffers from topological inconsistency which renders the extracted subnetworks disconnected, and this hinders their generalization ability. In this paper, we devise a novel magnitude pruning method that allows extracting subnetworks while guarantying their topological consistency. The latter ensures that only accessible and co-accessible -- impactful -- connections are kept in the resulting lightweight networks. Our solution is based on a novel reparametrization and two supervisory bi-directional networks which implement accessibility/co-accessibility and guarantee that only connected subnetworks will be selected during training. This solution allows enhancing generalization significantly, under very high pruning regimes, as corroborated through extensive experiments, involving graph convolutional networks, on the challenging task of skeleton-based action recognition.
Randomized controlled trials (RCTs) are the gold standard for causal inference, but they are often powered only for average effects, making estimation of heterogeneous treatment effects (HTEs) challenging. Conversely, large-scale observational studies (OS) offer a wealth of data but suffer from confounding bias. Our paper presents a novel framework to leverage OS data for enhancing the efficiency in estimating conditional average treatment effects (CATEs) from RCTs while mitigating common biases. We propose an innovative approach to combine RCTs and OS data, expanding the traditionally used control arms from external sources. The framework relaxes the typical assumption of CATE invariance across populations, acknowledging the often unaccounted systematic differences between RCT and OS participants. We demonstrate this through the special case of a linear outcome model, where the CATE is sparsely different between the two populations. The core of our framework relies on learning potential outcome means from OS data and using them as a nuisance parameter in CATE estimation from RCT data. We further illustrate through experiments that using OS findings reduces the variance of the estimated CATE from RCTs and can decrease the required sample size for detecting HTEs.
Given two strings $A[1..n]$ and $B[1..m]$, and a set of operations allowed to edit the strings, the edit distance between $A$ and $B$ is the minimum number of operations required to transform $A$ into $B$. Sequentially, a standard Dynamic Programming (DP) algorithm solves edit distance with $\Theta(nm)$ cost. In many real-world applications, the strings to be compared are similar and have small edit distances. To achieve highly practical implementations, we focus on output-sensitive parallel edit-distance algorithms, i.e., to achieve asymptotically better cost bounds than the standard $\Theta(nm)$ algorithm when the edit distance is small. We study four algorithms in the paper, including three algorithms based on Breadth-First Search (BFS) and one algorithm based on Divide-and-Conquer (DaC). Our BFS-based solution is based on the Landau-Vishkin algorithm. We implement three different data structures for the longest common prefix (LCP) queries needed in the algorithm: the classic solution using parallel suffix array, and two hash-based solutions proposed in this paper. Our DaC-based solution is inspired by the output-insensitive solution proposed by Apostolico et al., and we propose a non-trivial adaption to make it output-sensitive. All our algorithms have good theoretical guarantees, and they achieve different tradeoffs between work (total number of operations), span (longest dependence chain in the computation), and space. We test and compare our algorithms on both synthetic data and real-world data. Our BFS-based algorithms outperform the existing parallel edit-distance implementation in ParlayLib in all test cases. By comparing our algorithms, we also provide a better understanding of the choice of algorithms for different input patterns. We believe that our paper is the first systematic study in the theory and practice of parallel edit distance.
This work proposes a discretization of the acoustic wave equation with possibly oscillatory coefficients based on a superposition of discrete solutions to spatially localized subproblems computed with an implicit time discretization. Based on exponentially decaying entries of the global system matrices and an appropriate partition of unity, it is proved that the superposition of localized solutions is appropriately close to the solution of the (global) implicit scheme. It is thereby justified that the localized (and especially parallel) computation on multiple overlapping subdomains is reasonable. Moreover, a re-start is introduced after a certain amount of time steps to maintain a moderate overlap of the subdomains. Overall, the approach may be understood as a domain decomposition strategy (in space and time) that completely avoids inner iterations. Numerical examples are presented.
We propose a causal framework for decomposing a group disparity in an outcome in terms of an intermediate treatment variable. Our framework captures the contributions of group differences in baseline potential outcome, treatment prevalence, average treatment effect, and selection into treatment. This framework is counterfactually formulated and readily informs policy interventions. The decomposition component for differential selection into treatment is particularly novel, revealing a new mechanism for explaining and ameliorating disparities. This framework reformulates the classic Kitagawa-Blinder-Oaxaca decomposition in causal terms, supplements causal mediation analysis by explaining group disparities instead of group effects, and resolves conceptual difficulties of recent random equalization decompositions. We also provide a conditional decomposition that allows researchers to incorporate covariates in defining the estimands and corresponding interventions. We develop nonparametric estimators based on efficient influence functions of the decompositions. We show that, under mild conditions, these estimators are $\sqrt{n}$-consistent, asymptotically normal, semiparametrically efficient, and doubly robust. We apply our framework to study the causal role of education in intergenerational income persistence. We find that both differential prevalence of and differential selection into college graduation significantly contribute to the disparity in income attainment between income origin groups.
Mobile digital health (mHealth) studies often collect multiple within-day self-reported assessments of participants' behaviour and health. Indexed by time of day, these assessments can be treated as functional observations of continuous, truncated, ordinal, and binary type. We develop covariance estimation and principal component analysis for mixed-type functional data like that. We propose a semiparametric Gaussian copula model that assumes a generalized latent non-paranormal process generating observed mixed-type functional data and defining temporal dependence via a latent covariance. The smooth estimate of latent covariance is constructed via Kendall's Tau bridging method that incorporates smoothness within the bridging step. The approach is then extended with methods for handling both dense and sparse sampling designs, calculating subject-specific latent representations of observed data, latent principal components and principal component scores. Importantly, the proposed framework handles all four mixed types in a unified way. Simulation studies show a competitive performance of the proposed method under both dense and sparse sampling designs. The method is applied to data from 497 participants of National Institute of Mental Health Family Study of the Mood Disorder Spectrum to characterize the differences in within-day temporal patterns of mood in individuals with the major mood disorder subtypes including Major Depressive Disorder, and Type 1 and 2 Bipolar Disorder.
Recent research on Large Language Models (LLMs) has led to remarkable advancements in general NLP AI assistants. Some studies have further explored the use of LLMs for planning and invoking models or APIs to address more general multi-modal user queries. Despite this progress, complex visual-based tasks still remain challenging due to the diverse nature of visual tasks. This diversity is reflected in two aspects: 1) Reasoning paths. For many real-life applications, it is hard to accurately decompose a query simply by examining the query itself. Planning based on the specific visual content and the results of each step is usually required. 2) Flexible inputs and intermediate results. Input forms could be flexible for in-the-wild cases, and involves not only a single image or video but a mixture of videos and images, e.g., a user-view image with some reference videos. Besides, a complex reasoning process will also generate diverse multimodal intermediate results, e.g., video narrations, segmented video clips, etc. To address such general cases, we propose a multi-modal AI assistant, AssistGPT, with an interleaved code and language reasoning approach called Plan, Execute, Inspect, and Learn (PEIL) to integrate LLMs with various tools. Specifically, the Planner is capable of using natural language to plan which tool in Executor should do next based on the current reasoning progress. Inspector is an efficient memory manager to assist the Planner to feed proper visual information into a specific tool. Finally, since the entire reasoning process is complex and flexible, a Learner is designed to enable the model to autonomously explore and discover the optimal solution. We conducted experiments on A-OKVQA and NExT-QA benchmarks, achieving state-of-the-art results. Moreover, showcases demonstrate the ability of our system to handle questions far more complex than those found in the benchmarks.
Simulation-based inference (SBI) methods tackle complex scientific models with challenging inverse problems. However, SBI models often face a significant hurdle due to their non-differentiable nature, which hampers the use of gradient-based optimization techniques. Bayesian Optimal Experimental Design (BOED) is a powerful approach that aims to make the most efficient use of experimental resources for improved inferences. While stochastic gradient BOED methods have shown promising results in high-dimensional design problems, they have mostly neglected the integration of BOED with SBI due to the difficult non-differentiable property of many SBI simulators. In this work, we establish a crucial connection between ratio-based SBI inference algorithms and stochastic gradient-based variational inference by leveraging mutual information bounds. This connection allows us to extend BOED to SBI applications, enabling the simultaneous optimization of experimental designs and amortized inference functions. We demonstrate our approach on a simple linear model and offer implementation details for practitioners.
The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as well, if not better than, the original dense networks. Sparsity can reduce the memory footprint of regular networks to fit mobile devices, as well as shorten training time for ever growing networks. In this paper, we survey prior work on sparsity in deep learning and provide an extensive tutorial of sparsification for both inference and training. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice. Our work distills ideas from more than 300 research papers and provides guidance to practitioners who wish to utilize sparsity today, as well as to researchers whose goal is to push the frontier forward. We include the necessary background on mathematical methods in sparsification, describe phenomena such as early structure adaptation, the intricate relations between sparsity and the training process, and show techniques for achieving acceleration on real hardware. We also define a metric of pruned parameter efficiency that could serve as a baseline for comparison of different sparse networks. We close by speculating on how sparsity can improve future workloads and outline major open problems in the field.