We provide a refined characterization of the super-Turing computational power of analog, evolving, and stochastic neural networks based on the Kolmogorov complexity of their real weights, evolving weights, and real probabilities, respectively. First, we retrieve an infinite hierarchy of classes of analog networks defined in terms of the Kolmogorov complexity of their underlying real weights. This hierarchy is located between the complexity classes $\mathbf{P}$ and $\mathbf{P/poly}$. Then, we generalize this result to the case of evolving networks. A similar hierarchy of Kolomogorov-based complexity classes of evolving networks is obtained. This hierarchy also lies between $\mathbf{P}$ and $\mathbf{P/poly}$. Finally, we extend these results to the case of stochastic networks employing real probabilities as source of randomness. An infinite hierarchy of stochastic networks based on the Kolmogorov complexity of their probabilities is therefore achieved. In this case, the hierarchy bridges the gap between $\mathbf{BPP}$ and $\mathbf{BPP/log^*}$. Beyond proving the existence and providing examples of such hierarchies, we describe a generic way of constructing them based on classes of functions of increasing complexity. For the sake of clarity, this study is formulated within the framework of echo state networks. Overall, this paper intends to fill the missing results and provide a unified view about the refined capabilities of analog, evolving and stochastic neural networks.
Randomized experiments are a powerful methodology for data-driven evaluation of decisions or interventions. Yet, their validity may be undermined by network interference. This occurs when the treatment of one unit impacts not only its outcome but also that of connected units, biasing traditional treatment effect estimations. Our study introduces a new framework to accommodate complex and unknown network interference, moving beyond specialized models in the existing literature. Our framework, which we term causal message-passing, is grounded in a high-dimensional approximate message passing methodology and is specifically tailored to experimental design settings with prevalent network interference. Utilizing causal message-passing, we present a practical algorithm for estimating the total treatment effect and demonstrate its efficacy in four numerical scenarios, each with its unique interference structure.
We developed a new approach comprised of different visualizations for the comparative spatio-temporal analysis of displacement processes in porous media. We aim to analyze and compare ensemble datasets from experiments to gain insight into the influence of different parameters on fluid flow. To capture the displacement of a defending fluid by an invading fluid, we first condense an input image series to a single time map. From this map, we generate a spatio-temporal flow graph covering the whole process. This graph is further simplified to only reflect topological changes in the movement of the invading fluid. Our interactive tools allow the visual analysis of these processes by visualizing the graph structure and the context of the experimental setup, as well as by providing charts for multiple metrics. We apply our approach to analyze and compare ensemble datasets jointly with domain experts, where we vary either fluid properties or the solid structure of the porous medium. We finally report the generated insights from the domain experts and discuss our contribution's advantages, generality, and limitations.
Differentiable Filters, as recursive Bayesian estimators, possess the ability to learn complex dynamics by deriving state transition and measurement models exclusively from data. This data-driven approach eliminates the reliance on explicit analytical models while maintaining the essential algorithmic components of the filtering process. However, the gain mechanism remains non-differentiable, limiting its adaptability to specific task requirements and contextual variations. To address this limitation, this paper introduces an innovative approach called {\alpha}-MDF (Attention-based Multimodal Differentiable Filter). {\alpha}-MDF leverages modern attention mechanisms to learn multimodal latent representations for accurate state estimation in soft robots. By incorporating attention mechanisms, {\alpha}-MDF offers the flexibility to tailor the gain mechanism to the unique nature of the task and context. The effectiveness of {\alpha}-MDF is validated through real-world state estimation tasks on soft robots. Our experimental results demonstrate significant reductions in state estimation errors, consistently surpassing differentiable filter baselines by up to 45% in the domain of soft robotics.
This paper considers the secure aggregation problem for federated learning under an information theoretic cryptographic formulation, where distributed training nodes (referred to as users) train models based on their own local data and a curious-but-honest server aggregates the trained models without retrieving other information about users' local data. Secure aggregation generally contains two phases, namely key sharing phase and model aggregation phase. Due to the common effect of user dropouts in federated learning, the model aggregation phase should contain two rounds, where in the first round the users transmit masked models and, in the second round, according to the identity of surviving users after the first round, these surviving users transmit some further messages to help the server decrypt the sum of users' trained models. The objective of the considered information theoretic formulation is to characterize the capacity region of the communication rates in the two rounds from the users to the server in the model aggregation phase, assuming that key sharing has already been performed offline in prior. In this context, Zhao and Sun completely characterized the capacity region under the assumption that the keys can be arbitrary random variables. More recently, an additional constraint, known as "uncoded groupwise keys," has been introduced. This constraint entails the presence of multiple independent keys within the system, with each key being shared by precisely S users. The capacity region for the information-theoretic secure aggregation problem with uncoded groupwise keys was established in our recent work subject to the condition S > K - U, where K is the number of total users and U is the designed minimum number of surviving users. In this paper we fully characterize of the the capacity region for this problem by proposing a new converse bound and an achievable scheme.
Research into methods for improving the performance of large language models (LLMs) through fine-tuning, retrieval-augmented generation (RAG) and soft-prompting has tended to focus on the use of highly technical or high-cost techniques, making many of the newly discovered approaches comparatively inaccessible to non-technical users. In this paper we tested an unmodified version of GPT 3.5, a fine-tuned version, and the same unmodified model when given access to a vectorised RAG database, both in isolation and in combination with a basic, non-algorithmic soft prompt. In each case we tested the model's ability to answer a set of 100 questions relating primarily to events that occurred after September 2021 (the point at which GPT 3.5's training data set ends). We found that if commercial platforms are used and default settings are applied with no iteration in order to establish a baseline set of outputs, a fine-tuned model outperforms GPT 3.5 Turbo, while the RAG approach out-performed both. The application of a soft prompt significantly improved the performance of each approach.
We propose a multi-agent reinforcement learning dynamics, and analyze its convergence in infinite-horizon discounted Markov potential games. We focus on the independent and decentralized setting, where players do not have knowledge of the game model and cannot coordinate. In each stage, players update their estimate of Q-function that evaluates their total contingent payoff based on the realized one-stage reward in an asynchronous manner. Then, players independently update their policies by incorporating an optimal one-stage deviation strategy based on the estimated Q-function. A key feature of the learning dynamics is that the Q-function estimates are updated at a faster timescale than the policies. We prove that the policies induced by our learning dynamics converge to the set of stationary Nash equilibria in Markov potential games with probability 1. Our results highlight the efficacy of simple learning dynamics in reaching to the set of stationary Nash equilibrium even in environments with minimal information available.
We propose a new Bayesian heteroskedastic Markov-switching structural vector autoregression with data-driven time-varying identification. The model selects alternative exclusion restrictions over time and, as a condition for the search, allows to verify identification through heteroskedasticity within each regime. Based on four alternative monetary policy rules, we show that a monthly six-variable system supports time variation in US monetary policy shock identification. In the sample-dominating first regime, systematic monetary policy follows a Taylor rule extended by the term spread and is effective in curbing inflation. In the second regime, occurring after 2000 and gaining more persistence after the global financial and COVID crises, the Fed acts according to a money-augmented Taylor rule. This regime's unconventional monetary policy provides economic stimulus, features the liquidity effect, and is complemented by a pure term spread shock. Absent the specific monetary policy of the second regime, inflation would be over one percentage point higher on average after 2008.
This work considers the question of how convenient access to copious data impacts our ability to learn causal effects and relations. In what ways is learning causality in the era of big data different from -- or the same as -- the traditional one? To answer this question, this survey provides a comprehensive and structured review of both traditional and frontier methods in learning causality and relations along with the connections between causality and machine learning. This work points out on a case-by-case basis how big data facilitates, complicates, or motivates each approach.
We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.
Detecting carried objects is one of the requirements for developing systems to reason about activities involving people and objects. We present an approach to detect carried objects from a single video frame with a novel method that incorporates features from multiple scales. Initially, a foreground mask in a video frame is segmented into multi-scale superpixels. Then the human-like regions in the segmented area are identified by matching a set of extracted features from superpixels against learned features in a codebook. A carried object probability map is generated using the complement of the matching probabilities of superpixels to human-like regions and background information. A group of superpixels with high carried object probability and strong edge support is then merged to obtain the shape of the carried object. We applied our method to two challenging datasets, and results show that our method is competitive with or better than the state-of-the-art.