In this paper, we develop a theoretical framework for goal-oriented communication assisted by reconfigurable meta-surfaces in the context of networked control systems. The relation to goal-oriented communication stems from the fact that optimization of the phase shifts of the meta-surfaces is guided by the performance of networked control systems tasks. To that end, we consider a networked control system in which a set of sensors observe the states of a set of physical processes, and communicate this information over an unreliable wireless channel assisted by a reconfigurable intelligent surface with multiple reflecting elements to a set of controllers that correct the behaviors of the physical processes based on the received information. Our objective is to find the optimal control policy for the controllers and the optimal phase policy for the reconfigurable intelligent surface that jointly minimize a regulation cost function associated with the networked control system. We characterize these policies, and also propose an approximate solution based on a semi-definite relaxation technique.
In this paper, we propose a computationally efficient framework for interval reachability of systems with neural network controllers. Our approach leverages inclusion functions for the open-loop system and the neural network controller to embed the closed-loop system into a larger-dimensional embedding system, where a single trajectory over-approximates the original system's behavior under uncertainty. We propose two methods for constructing closed-loop embedding systems, which account for the interactions between the system and the controller in different ways. The interconnection-based approach considers the worst-case evolution of each coordinate separately by substituting the neural network inclusion function into the open-loop inclusion function. The interaction-based approach uses novel Jacobian-based inclusion functions to capture the first-order interactions between the open-loop system and the controller by leveraging state-of-the-art neural network verifiers. Finally, we implement our approach in a Python framework called ReachMM to demonstrate its efficiency and scalability on benchmarks and examples ranging to $200$ state dimensions.
In this paper, we concentrate on decentralized optimization problems with nonconvex and nonsmooth objective functions, especially on the decentralized training of nonsmooth neural networks. We introduce a unified framework to analyze the global convergence of decentralized stochastic subgradient-based methods. We prove the global convergence of our proposed framework under mild conditions, by establishing that the generated sequence asymptotically approximates the trajectories of its associated differential inclusion. Furthermore, we establish that our proposed framework covers a wide range of existing efficient decentralized subgradient-based methods, including decentralized stochastic subgradient descent (DSGD), DSGD with gradient-tracking technique (DSGD-T), and DSGD with momentum (DSGD-M). In addition, we introduce the sign map to regularize the update directions in DSGD-M, and show it is enclosed in our proposed framework. Consequently, our convergence results establish, for the first time, global convergence of these methods when applied to nonsmooth nonconvex objectives. Preliminary numerical experiments demonstrate that our proposed framework yields highly efficient decentralized subgradient-based methods with convergence guarantees in the training of nonsmooth neural networks.
In this paper, we introduce DiarizationLM, a framework to leverage large language models (LLM) to post-process the outputs from a speaker diarization system. Various goals can be achieved with the proposed framework, such as improving the readability of the diarized transcript, or reducing the word diarization error rate (WDER). In this framework, the outputs of the automatic speech recognition (ASR) and speaker diarization systems are represented as a compact textual format, which is included in the prompt to an optionally finetuned LLM. The outputs of the LLM can be used as the refined diarization results with the desired enhancement. As a post-processing step, this framework can be easily applied to any off-the-shelf ASR and speaker diarization systems without retraining existing components. Our experiments show that a finetuned PaLM 2-S model can reduce the WDER by rel. 55.5% on the Fisher telephone conversation dataset, and rel. 44.9% on the Callhome English dataset.
In this paper, we present a novel algorithm for classifying ex vivo tissue that comprises multi-channel bioimpedance analysis and a hardware neural network. When implemented in a mixed-signal 180 nm CMOS process, the classifier has an estimated power budget of 39 mW and an area of 30 mm2. This means that the classifier can be integrated into the tip of a surgical margin assessment probe, for in vivo use during radical prostatectomy. We tested our classifier on digital phantoms of prostate tissue and also on an animal model of ex vivo bovine tissue. The classifier achieved an accuracy of 90% on the prostate tissue phantoms, and an accuracy of 84% on the animal model.
In this work, we revisit some combinatorial and information-theoretic extension techniques for detecting non-algebraic matroids. These are the Dress-Lov\'asz and Ahlswede-K\"orner extension properties. We provide optimizations of these techniques to reduce their computational complexity, finding new non-algebraic matroids on 9 and 10 points. In addition, we use the Ahlswede-K\"orner extension property to find better lower bounds on the information ratio of secret sharing schemes for ports of non-algebraic matroids.
In this paper, we focus on inferring whether the given user command is clear, ambiguous, or infeasible in the context of interactive robotic agents utilizing large language models (LLMs). To tackle this problem, we first present an uncertainty estimation method for LLMs to classify whether the command is certain (i.e., clear) or not (i.e., ambiguous or infeasible). Once the command is classified as uncertain, we further distinguish it between ambiguous or infeasible commands leveraging LLMs with situational aware context in a zero-shot manner. For ambiguous commands, we disambiguate the command by interacting with users via question generation with LLMs. We believe that proper recognition of the given commands could lead to a decrease in malfunction and undesired actions of the robot, enhancing the reliability of interactive robot agents. We present a dataset for robotic situational awareness, consisting pair of high-level commands, scene descriptions, and labels of command type (i.e., clear, ambiguous, or infeasible). We validate the proposed method on the collected dataset, pick-and-place tabletop simulation. Finally, we demonstrate the proposed approach in real-world human-robot interaction experiments, i.e., handover scenarios.
In this short paper, we present an innovative approach to limit the required bandwidth when transferring data during in transit analysis. This approach is called hybrid because it combines existing in situ and in transit solutions. It leverages the stable ABI of Catalyst version 2 and the Catalyst-ADIOS2 implementation to seamlessly switch from in situ, in transit and hybrid analysis without modifying the numerical simulation code. The typical use case is to perform data reduction in situ then generate a visualization in transit on the reduced data. This approach makes the numerical simulation workflows very flexible depending on the size of the data, the available computing resources or the analysis type. Our experiment with this hybrid approach, reducing data before sending it, demonstrated large cost reductions for some visualization pipelines compared to in situ and in transit solutions. The implementation is available under an open source permissive license to be usable broadly in any scientific community.
In this paper, we propose a deep reinforcement learning framework called GCOMB to learn algorithms that can solve combinatorial problems over large graphs. GCOMB mimics the greedy algorithm in the original problem and incrementally constructs a solution. The proposed framework utilizes Graph Convolutional Network (GCN) to generate node embeddings that predicts the potential nodes in the solution set from the entire node set. These embeddings enable an efficient training process to learn the greedy policy via Q-learning. Through extensive evaluation on several real and synthetic datasets containing up to a million nodes, we establish that GCOMB is up to 41% better than the state of the art, up to seven times faster than the greedy algorithm, robust and scalable to large dynamic networks.
In this paper, we propose a novel multi-task learning architecture, which incorporates recent advances in attention mechanisms. Our approach, the Multi-Task Attention Network (MTAN), consists of a single shared network containing a global feature pool, together with task-specific soft-attention modules, which are trainable in an end-to-end manner. These attention modules allow for learning of task-specific features from the global pool, whilst simultaneously allowing for features to be shared across different tasks. The architecture can be built upon any feed-forward neural network, is simple to implement, and is parameter efficient. Experiments on the CityScapes dataset show that our method outperforms several baselines in both single-task and multi-task learning, and is also more robust to the various weighting schemes in the multi-task loss function. We further explore the effectiveness of our method through experiments over a range of task complexities, and show how our method scales well with task complexity compared to baselines.
In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.