Traffic signal control (TSC) is a complex and important task that affects the daily lives of millions of people. Reinforcement Learning (RL) has shown promising results in optimizing traffic signal control, but current RL-based TSC methods are mainly trained in simulation and suffer from the performance gap between simulation and the real world. In this paper, we propose a simulation-to-real-world (sim-to-real) transfer approach called UGAT, which transfers a learned policy trained from a simulated environment to a real-world environment by dynamically transforming actions in the simulation with uncertainty to mitigate the domain gap of transition dynamics. We evaluate our method on a simulated traffic environment and show that it significantly improves the performance of the transferred RL policy in the real world.
Segmentation of planar regions from a single RGB image is a particularly important task in the perception of complex scenes. To utilize both visual and geometric properties in images, recent approaches often formulate the problem as a joint estimation of planar instances and dense depth through feature fusion mechanisms and geometric constraint losses. Despite promising results, these methods do not consider cross-task feature distillation and perform poorly in boundary regions. To overcome these limitations, we propose X-PDNet, a framework for the multitask learning of plane instance segmentation and depth estimation with improvements in the following two aspects. Firstly, we construct the cross-task distillation design which promotes early information sharing between dual-tasks for specific task improvements. Secondly, we highlight the current limitations of using the ground truth boundary to develop boundary regression loss, and propose a novel method that exploits depth information to support precise boundary region segmentation. Finally, we manually annotate more than 3000 images from Stanford 2D-3D-Semantics dataset and make available for evaluation of plane instance segmentation. Through the experiments, our proposed methods prove the advantages, outperforming the baseline with large improvement margins in the quantitative results on the ScanNet and the Stanford 2D-3D-S dataset, demonstrating the effectiveness of our proposals.
Learning meaningful frame-wise features on a partially labeled dataset is crucial to semi-supervised sound event detection. Prior works either maintain consistency on frame-level predictions or seek feature-level similarity among neighboring frames, which cannot exploit the potential of unlabeled data. In this work, we design a Local and Global Consistency (LGC) regularization scheme to enhance the model on both label- and feature-level. The audio CutMix is introduced to change the contextual information of clips. Then, the local consistency is adopted to encourage the model to leverage local features for frame-level predictions, and the global consistency is applied to force features to align with global prototypes through a specially designed contrastive loss. Experiments on the DESED dataset indicate the superiority of LGC, surpassing its respective competitors largely with the same settings as the baseline system. Besides, combining LGC with existing methods can obtain further improvements. The code will be released soon.
Many hearables contain an in-ear microphone, which may be used to capture the own voice of its user in noisy environments. Since the in-ear microphone mostly records body-conducted speech due to ear canal occlusion, it suffers from band-limitation effects while only capturing a limited amount of external noise. To enhance the quality of the in-ear microphone signal using algorithms aiming at joint bandwidth extension, equalization, and noise reduction, it is desirable to have an accurate model of the own voice transfer characteristics between the entrance of the ear canal and the in-ear microphone. Such a model can be used, e.g., to simulate a large amount of in-ear recordings to train supervised learning-based algorithms. Since previous research on ear canal occlusion suggests that own voice transfer characteristics depend on speech content, in this contribution we propose a speech-dependent system identification model based on phoneme recognition. We assess the accuracy of simulating own voice speech by speech-dependent and speech-independent modeling and investigate how well modeling approaches are able to generalize to different talkers. Simulation results show that using the proposed speech-dependent model is preferable for simulating in-ear recordings compared to using a speech-independent model.
Gaussian processes (GPs) have emerged as a prominent technique for machine learning and signal processing. A key component in GP modeling is the choice of kernel, and linear multiple kernels (LMKs) have become an attractive kernel class due to their powerful modeling capacity and interpretability. This paper focuses on the grid spectral mixture (GSM) kernel, an LMK that can approximate arbitrary stationary kernels. Specifically, we propose a novel GSM kernel formulation for multi-dimensional data that reduces the number of hyper-parameters compared to existing formulations, while also retaining a favorable optimization structure and approximation capability. In addition, to make the large-scale hyper-parameter optimization in the GSM kernel tractable, we first introduce the distributed SCA (DSCA) algorithm. Building on this, we propose the doubly distributed SCA (D$^2$SCA) algorithm based on the alternating direction method of multipliers (ADMM) framework, which allows us to cooperatively learn the GSM kernel in the context of big data while maintaining data privacy. Furthermore, we tackle the inherent communication bandwidth restriction in distributed frameworks, by quantizing the hyper-parameters in D$^2$SCA, resulting in the quantized doubly distributed SCA (QD$^2$SCA) algorithm. Theoretical analysis establishes convergence guarantees for the proposed algorithms, while experiments on diverse datasets demonstrate the superior prediction performance and efficiency of our methods.
The shortest path network interdiction (SPNI) problem poses significant computational challenges due to its NP-hardness. Current solutions, primarily based on integer programming methods, are inefficient for large-scale instances. In this paper, we introduce a novel hybrid algorithm that can utilize Ising Processing Units (IPUs) alongside classical solvers. This approach decomposes the problem into manageable sub-problems, which are then offloaded to the slow but high-quality classical solvers or IPU. Results are subsequently recombined to form a global solution. Our method demonstrates comparable quality to existing whole problem solvers while reducing computational time for large-scale instances. Furthermore, our approach is amenable to parallelization, allowing for simultaneous processing of decomposed sub-problems.
Stochastic Computing (SC) is an unconventional computing paradigm processing data in the form of random bit-streams. The accuracy and energy efficiency of SC systems highly depend on the stochastic number generator (SNG) unit that converts the data from conventional binary to stochastic bit-streams. Recent work has shown significant improvement in the efficiency of SC systems by employing low-discrepancy (LD) sequences such as Sobol and Halton sequences in the SNG unit. Still, the usage of many well-known random sequences for SC remains unexplored. This work studies some new random sequences for potential application in SC. Our design space exploration proposes a promising random number generator for accurate and energy-efficient SC. We propose P2LSG, a low-cost and energy-efficient Low-discrepancy Sequence Generator derived from Powers-of-2 VDC (Van der Corput) sequences. We evaluate the performance of our novel bit-stream generator for two SC image and video processing case studies: image scaling and scene merging. For the scene merging task, we propose a novel SC design for the first time. Our experimental results show higher accuracy and lower hardware cost and energy consumption compared to the state-of-the-art.
In a real-time transmission scenario, messages are transmitted through a channel that is subject to packet loss. The destination must recover the messages within the required deadline. In this paper, we consider a setup where two different types of messages with distinct decoding deadlines are transmitted through a channel that can introduce burst erasures of a length at most $B$, or $N$ random erasures. The message with a short decoding deadline $T_u$ is referred to as an urgent message, while the other one with a decoding deadline $T_v$ ($T_v > T_u$) is referred to as a less urgent message. We propose a merging method to encode two message streams of different urgency levels into a single flow. We consider the scenario where $T_v > T_u + B$. We establish that any coding strategy based on this merging approach has a closed-form upper limit on its achievable sum rate. Moreover, we present explicit constructions within a finite field that scales quadratically with the imposed delay, ensuring adherence to the upper bound. In a given parameter configuration, we rigorously demonstrate that the sum rate of our proposed streaming codes consistently surpasses that of separate encoding, which serves as a baseline for comparison.
Industrial Time-Sensitive Networking (TSN) provides deterministic mechanisms for real-time and reliable flow transmission. Increasing attention has been paid to efficient scheduling for time-sensitive flows with stringent requirements such as ultra-low latency and jitter. In TSN, the fine-grained traffic shaping protocol, cyclic queuing and forwarding (CQF), eliminates uncertain delay and frame loss by cyclic traffic forwarding and queuing. However, it inevitably causes high scheduling complexity. Moreover, complexity is quite sensitive to flow attributes and network scale. The problem stems in part from the lack of an attribute mining mechanism in existing frame-based scheduling. For time-critical industrial networks with large-scale complex flows, a so-called hyper-flow graph based scheduling scheme is proposed to improve the scheduling scalability in terms of schedulability, scheduling efficiency and latency & jitter. The hyper-flow graph is built by aggregating similar flow sets as hyper-flow nodes and designing a hierarchical scheduling framework. The flow attribute-sensitive scheduling information is embedded into the condensed maximal cliques, and reverse maps them precisely to congestion flow portions for re-scheduling. Its parallel scheduling reduces network scale induced complexity. Further, this scheme is designed in its entirety as a comprehensive scheduling algorithm GH^2. It improves the three criteria of scalability along a Pareto front. Extensive simulation studies demonstrate its superiority. Notably, GH^2 is verified its scheduling stability with a runtime of less than 100 ms for 1000 flows and near 1/430 of the SOTA FITS method for 2000 flows.
Graph Neural Networks (GNNs) have been shown to be effective models for different predictive tasks on graph-structured data. Recent work on their expressive power has focused on isomorphism tasks and countable feature spaces. We extend this theoretical framework to include continuous features - which occur regularly in real-world input domains and within the hidden layers of GNNs - and we demonstrate the requirement for multiple aggregation functions in this context. Accordingly, we propose Principal Neighbourhood Aggregation (PNA), a novel architecture combining multiple aggregators with degree-scalers (which generalize the sum aggregator). Finally, we compare the capacity of different models to capture and exploit the graph structure via a novel benchmark containing multiple tasks taken from classical graph theory, alongside existing benchmarks from real-world domains, all of which demonstrate the strength of our model. With this work, we hope to steer some of the GNN research towards new aggregation methods which we believe are essential in the search for powerful and robust models.
Within the rapidly developing Internet of Things (IoT), numerous and diverse physical devices, Edge devices, Cloud infrastructure, and their quality of service requirements (QoS), need to be represented within a unified specification in order to enable rapid IoT application development, monitoring, and dynamic reconfiguration. But heterogeneities among different configuration knowledge representation models pose limitations for acquisition, discovery and curation of configuration knowledge for coordinated IoT applications. This paper proposes a unified data model to represent IoT resource configuration knowledge artifacts. It also proposes IoT-CANE (Context-Aware recommendatioN systEm) to facilitate incremental knowledge acquisition and declarative context driven knowledge recommendation.