In this paper, we explore low-power custom quantised Multi-Layer Perceptrons (MLPs) as an Intrusion Detection System (IDS) for automotive controller area network (CAN). We utilise the FINN framework from AMD/Xilinx to quantise, train and generate hardware IP of our MLP to detect denial of service (DoS) and fuzzying attacks on CAN network, using ZCU104 (XCZU7EV) FPGA as our target ECU architecture with integrated IDS capabilities. Our approach achieves significant improvements in latency (0.12 ms per-message processing latency) and inference energy consumption (0.25 mJ per inference) while achieving similar classification performance as state-of-the-art approaches in the literature.
In this paper, we examine the role of stochastic quantizers for privacy preservation. We first employ a static stochastic quantizer and investigate its corresponding privacy-preserving properties. Specifically, we demonstrate that a sufficiently large quantization step guarantees $(0, \delta)$ differential privacy. Additionally, the degradation of control performance caused by quantization is evaluated as the tracking error of output regulation. These two analyses characterize the trade-off between privacy and control performance, determined by the quantization step. This insight enables us to use quantization intentionally as a means to achieve the seemingly conflicting two goals of maintaining control performance and preserving privacy at the same time; towards this end, we further investigate a dynamic stochastic quantizer. Under a stability assumption, the dynamic stochastic quantizer can enhance privacy, more than the static one, while achieving the same control performance. We further handle the unstable case by additionally applying input Gaussian noise.
In this paper, we propose Prosody-aware VITS (PAVITS) for emotional voice conversion (EVC), aiming to achieve two major objectives of EVC: high content naturalness and high emotional naturalness, which are crucial for meeting the demands of human perception. To improve the content naturalness of converted audio, we have developed an end-to-end EVC architecture inspired by the high audio quality of VITS. By seamlessly integrating an acoustic converter and vocoder, we effectively address the common issue of mismatch between emotional prosody training and run-time conversion that is prevalent in existing EVC models. To further enhance the emotional naturalness, we introduce an emotion descriptor to model the subtle prosody variations of different speech emotions. Additionally, we propose a prosody predictor, which predicts prosody features from text based on the provided emotion label. Notably, we introduce a prosody alignment loss to establish a connection between latent prosody features from two distinct modalities, ensuring effective training. Experimental results show that the performance of PAVITS is superior to the state-of-the-art EVC methods. Speech Samples are available at //jeremychee4.github.io/pavits4EVC/ .
In this paper, we propose novel Gaussian process-gated hierarchical mixtures of experts (GPHMEs). Unlike other mixtures of experts with gating models linear in the input, our model employs gating functions built with Gaussian processes (GPs). These processes are based on random features that are non-linear functions of the inputs. Furthermore, the experts in our model are also constructed with GPs. The optimization of the GPHMEs is performed by variational inference. The proposed GPHMEs have several advantages. They outperform tree-based HME benchmarks that partition the data in the input space, and they achieve good performance with reduced complexity. Another advantage is the interpretability they provide for deep GPs, and more generally, for deep Bayesian neural networks. Our GPHMEs demonstrate excellent performance for large-scale data sets, even with quite modest sizes.
In this work, we present novel protocols over rings for semi-honest secure three-party computation (3-PC) and malicious four-party computation (4-PC) with one corruption. Compared to state-of-the-art protocols in the same setting, our protocols require fewer low-latency and high-bandwidth links between the parties to achieve high throughput. Our protocols also reduce the computational complexity by requiring up to 50 percent fewer basic instructions per gate. Further, our protocols achieve the currently best-known communication complexity (3, resp. 5 elements per multiplication gate) with an optional preprocessing phase to reduce the communication complexity of the online phase to 2 (resp. 3) elements per multiplication gate. In homogeneous network settings, i.e. all links between the parties share similar network bandwidth and latency, our protocols achieve up to two times higher throughput than state-of-the-art protocols. In heterogeneous network settings, i.e. all links between the parties share different network bandwidth and latency, our protocols achieve even larger performance improvements. We implemented our protocols and multiple other state-of-the-art protocols (Replicated 3-PC, Astra, Fantastic Four, Tetrad) in a novel open-source C++ framework optimized for achieving high throughput. Five out of six implemented 3-PC and 4-PC protocols achieve more than one billion 32-bit multiplication or more than 32 billion AND gates per second using our implementation in a 25 Gbit/s LAN environment. This is the highest throughput achieved in 3-PC and 4-PC so far and between two and three orders of magnitude higher than the throughput MP-SPDZ achieves in the same settings.
Recent advances in LLMs have sparked a debate on whether they understand text. In this position paper, we argue that opponents in this debate hold different definitions for understanding, and particularly differ in their view on the role of consciousness. To substantiate this claim, we propose a thought experiment involving an open-source chatbot $Z$ which excels on every possible benchmark, seemingly without subjective experience. We ask whether $Z$ is capable of understanding, and show that different schools of thought within seminal AI research seem to answer this question differently, uncovering their terminological disagreement. Moving forward, we propose two distinct working definitions for understanding which explicitly acknowledge the question of consciousness, and draw connections with a rich literature in philosophy, psychology and neuroscience.
In this paper, we revisit the Power Curves in ANOVA Simultaneous Component Analysis (ASCA) based on permutation testing, and introduce the Population Curves derived from population parameters describing the relative effect among factors and interactions. We distinguish Relative from Absolute Population Curves, where the former represent statistical power in terms of the normalized effect size between structure and noise, and the latter in terms of the sample size. Relative Population Curves are useful to find the optimal ASCA model (e.g., fixed/random factors, crossed/nested relationships, interactions, the test statistic, transformations, etc.) for the analysis of an experimental design at hand. Absolute Population Curves are useful to determine the sample size and the optimal number of levels for each factor during the planning phase on an experiment. We illustrate both types of curves through simulation. We expect Population Curves to become the go-to approach to plan the optimal analysis pipeline and the required sample size in an omics study analyzed with ASCA.
In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.
In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR). We formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML). We evaluated the proposed approach using six languages as pretraining tasks and four languages as target tasks. Preliminary results showed that the proposed method, MetaASR, significantly outperforms the state-of-the-art multitask pretraining approach on all target languages with different combinations of pretraining languages. In addition, since MAML's model-agnostic property, this paper also opens new research direction of applying meta learning to more speech-related applications.
In this paper, we propose Latent Relation Language Models (LRLMs), a class of language models that parameterizes the joint distribution over the words in a document and the entities that occur therein via knowledge graph relations. This model has a number of attractive properties: it not only improves language modeling performance, but is also able to annotate the posterior probability of entity spans for a given text through relations. Experiments demonstrate empirical improvements over both a word-based baseline language model and a previous approach that incorporates knowledge graph information. Qualitative analysis further demonstrates the proposed model's ability to learn to predict appropriate relations in context.
In this paper, we introduce the Reinforced Mnemonic Reader for machine reading comprehension tasks, which enhances previous attentive readers in two aspects. First, a reattention mechanism is proposed to refine current attentions by directly accessing to past attentions that are temporally memorized in a multi-round alignment architecture, so as to avoid the problems of attention redundancy and attention deficiency. Second, a new optimization approach, called dynamic-critical reinforcement learning, is introduced to extend the standard supervised method. It always encourages to predict a more acceptable answer so as to address the convergence suppression problem occurred in traditional reinforcement learning algorithms. Extensive experiments on the Stanford Question Answering Dataset (SQuAD) show that our model achieves state-of-the-art results. Meanwhile, our model outperforms previous systems by over 6% in terms of both Exact Match and F1 metrics on two adversarial SQuAD datasets.