This paper introduces a novel bio-mimetic approach for distributed control of robotic swarms, inspired by the collective behaviors of swarms in nature such as schools of fish and flocks of birds. The agents are assumed to have limited sensory perception, lack memory, be Identical, anonymous, and operate without interagent explicit communication. Despite these limitations, we demonstrate that collaborative exploration and task allocation can be executed by applying simple local rules of interactions between the agents. A comprehensive model comprised of agent, formation, and swarm layers is proposed in this paper, where each layer performs a specific function in shaping the swarm's collective behavior, thereby contributing to the emergence of the anticipated behaviors. We consider four principles combined in the design of the distributed control process: Cohesiveness, Flexibility, Attraction-Repulsion, and Peristaltic Motion. We design the control algorithms as reactive behaviour that enables the swarm to maintain connectivity, adapt to dynamic environments, spread out and cover a region with a size determined by the number of agents, and respond to various local task requirements. We explore some simple broadcast control-based steering methods, that result in inducing "anonymous ad-hoc leaders" among the agents, capable of guiding the swarm towards yet unexplored regions with further tasks. Our analysis is complemented by simulations, validating the efficacy of our algorithms. The experiments with various scenarios showcase the swarm`s capability to self-organize and perform tasks effectively under the proposed framework. The possible implementations include domains that necessitate emergent coordination and control in multi-agent systems, without the need for advanced individual abilities or direct communication.
In a recent work, Gryaznov, Pudl\'{a}k, and Talebanfard (CCC' 22) introduced a stronger version of affine extractors known as directional affine extractors, together with a generalization of $\mathsf{ROBP}$s where each node can make linear queries, and showed that the former implies strong lower bound for a certain type of the latter known as strongly read-once linear branching programs ($\mathsf{SROLBP}$s). Their main result gives explicit constructions of directional affine extractors for entropy $k > 2n/3$, which implies average-case complexity $2^{n/3-o(n)}$ against $\mathsf{SROLBP}$s with exponentially small correlation. A follow-up work by Chattopadhyay and Liao (ECCC' 22) improves the hardness to $2^{n-o(n)}$ at the price of increasing the correlation to polynomially large. In this paper we show: An explicit construction of directional affine extractors with $k=o(n)$ and exponentially small error, which gives average-case complexity $2^{n-o(n)}$ against $\mathsf{SROLBP}$s with exponentially small correlation, thus answering the two open questions raised in previous works. An explicit function in $\mathsf{AC}^0$ that gives average-case complexity $2^{(1-\delta)n}$ against $\mathsf{ROBP}$s with negligible correlation, for any constant $\delta>0$. Previously, no such average-case hardness is known, and the best size lower bound for any function in $\mathsf{AC}^0$ against $\mathsf{ROBP}$s is $2^{\Omega(n)}$. One of the key ingredients in our constructions is a new linear somewhere condenser for affine sources, which is based on dimension expanders. The condenser also leads to an unconditional improvement of the entropy requirement of explicit affine extractors with negligible error. We further show that the condenser also works for general weak random sources, under the Polynomial Freiman-Ruzsa Theorem in $\mathsf{F}_2^n$.
With the advent of large models based on the Transformer architecture, researchers have observed an anomalous phenomenon in the Attention mechanism--there is a very high attention on the first element, which is prevalent across Transformer-based models. It is crucial to understand it for the development of techniques focusing on attention distribution, such as Key-Value (KV) Cache compression and infinite extrapolation; however, the latent cause leaves to be unknown. In this paper, we analyze such a phenomenon from the perspective of waiver phenomenon, which involves reducing the internal values of certain elements in the sequence, allowing them to absorb excess attention without affecting their contribution to information. In specific models, due to differences in positional encoding and attention patterns, we have found that the selection of waiver elements by the model can be categorized into two methods: positional-encoding-based and feature-distribution-within-elements-based.
This paper introduces a novel combination of two tasks, previously treated separately: acoustic-to-articulatory speech inversion (AAI) and phoneme-to-articulatory (PTA) motion estimation. We refer to this joint task as acoustic phoneme-to-articulatory speech inversion (APTAI) and explore two different approaches, both working speaker- and text-independently during inference. We use a multi-task learning setup, with the end-to-end goal of taking raw speech as input and estimating the corresponding articulatory movements, phoneme sequence, and phoneme alignment. While both proposed approaches share these same requirements, they differ in their way of achieving phoneme-related predictions: one is based on frame classification, the other on a two-staged training procedure and forced alignment. We reach competitive performance of 0.73 mean correlation for the AAI task and achieve up to approximately 87% frame overlap compared to a state-of-the-art text-dependent phoneme force aligner.
Central to the digital transformation of the process industry are Digital Twins (DTs), virtual replicas of physical manufacturing systems that combine sensor data with sophisticated data-based or physics-based models, or a combination thereof, to tackle a variety of industrial-relevant tasks like process monitoring, predictive control or decision support. The backbone of a DT, i.e. the concrete modelling methodologies and architectural frameworks supporting these models, are complex, diverse and evolve fast, necessitating a thorough understanding of the latest state-of-the-art methods and trends to stay on top of a highly competitive market. From a research perspective, despite the high research interest in reviewing various aspects of DTs, structured literature reports specifically focusing on unravelling the utilized learning paradigms (e.g. self-supervised learning) for DT-creation in the process industry are a novel contribution in this field. This study aims to address these gaps by (1) systematically analyzing the modelling methodologies (e.g. Convolutional Neural Network, Encoder-Decoder, Hidden Markov Model) and paradigms (e.g. data-driven, physics-based, hybrid) used for DT-creation; (2) assessing the utilized learning strategies (e.g. supervised, unsupervised, self-supervised); (3) analyzing the type of modelling task (e.g. regression, classification, clustering); and (4) identifying the challenges and research gaps, as well as, discuss potential resolutions provided.
This paper concerns diffraction-tomographic reconstruction of an object characterized by its scattering potential. We establish a rigorous generalization of the Fourier diffraction theorem in arbitrary dimension, giving a precise relation in the Fourier domain between measurements of the scattered wave and reconstructions of the scattering potential. With this theorem at hand, Fourier coverages for different experimental setups are investigated taking into account parameters such as object orientation, direction of incidence and frequency of illumination. Allowing for simultaneous and discontinuous variation of these parameters, a general filtered backpropagation formula is derived resulting in an explicit approximation of the scattering potential for a large class of experimental setups.
This paper addresses the need for deep learning models to integrate well-defined constraints into their outputs, driven by their application in surrogate models, learning with limited data and partial information, and scenarios requiring flexible model behavior to incorporate non-data sample information. We introduce Bayesian Entropy Neural Networks (BENN), a framework grounded in Maximum Entropy (MaxEnt) principles, designed to impose constraints on Bayesian Neural Network (BNN) predictions. BENN is capable of constraining not only the predicted values but also their derivatives and variances, ensuring a more robust and reliable model output. To achieve simultaneous uncertainty quantification and constraint satisfaction, we employ the method of multipliers approach. This allows for the concurrent estimation of neural network parameters and the Lagrangian multipliers associated with the constraints. Our experiments, spanning diverse applications such as beam deflection modeling and microstructure generation, demonstrate the effectiveness of BENN. The results highlight significant improvements over traditional BNNs and showcase competitive performance relative to contemporary constrained deep learning methods.
This paper surveys research works in the quickly advancing field of instruction tuning (IT), a crucial technique to enhance the capabilities and controllability of large language models (LLMs). Instruction tuning refers to the process of further training LLMs on a dataset consisting of \textsc{(instruction, output)} pairs in a supervised fashion, which bridges the gap between the next-word prediction objective of LLMs and the users' objective of having LLMs adhere to human instructions. In this work, we make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and applications, along with an analysis on aspects that influence the outcome of IT (e.g., generation of instruction outputs, size of the instruction dataset, etc). We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research.
In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.
In order to answer natural language questions over knowledge graphs, most processing pipelines involve entity and relation linking. Traditionally, entity linking and relation linking has been performed either as dependent sequential tasks or independent parallel tasks. In this paper, we propose a framework called "EARL", which performs entity linking and relation linking as a joint single task. EARL uses a graph connection based solution to the problem. We model the linking task as an instance of the Generalised Travelling Salesman Problem (GTSP) and use GTSP approximate algorithm solutions. We later develop EARL which uses a pair-wise graph-distance based solution to the problem.The system determines the best semantic connection between all keywords of the question by referring to a knowledge graph. This is achieved by exploiting the "connection density" between entity candidates and relation candidates. The "connection density" based solution performs at par with the approximate GTSP solution.We have empirically evaluated the framework on a dataset with 5000 questions. Our system surpasses state-of-the-art scores for entity linking task by reporting an accuracy of 0.65 to 0.40 from the next best entity linker.
In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.