In this paper, we propose a novel layer-adaptive weight-pruning approach for Deep Neural Networks (DNNs) that addresses the challenge of optimizing the output distortion minimization while adhering to a target pruning ratio constraint. Our approach takes into account the collective influence of all layers to design a layer-adaptive pruning scheme. We discover and utilize a very important additivity property of output distortion caused by pruning weights on multiple layers. This property enables us to formulate the pruning as a combinatorial optimization problem and efficiently solve it through dynamic programming. By decomposing the problem into sub-problems, we achieve linear time complexity, making our optimization algorithm fast and feasible to run on CPUs. Our extensive experiments demonstrate the superiority of our approach over existing methods on the ImageNet and CIFAR-10 datasets. On CIFAR-10, our method achieves remarkable improvements, outperforming others by up to 1.0% for ResNet-32, 0.5% for VGG-16, and 0.7% for DenseNet-121 in terms of top-1 accuracy. On ImageNet, we achieve up to 4.7% and 4.6% higher top-1 accuracy compared to other methods for VGG-16 and ResNet-50, respectively. These results highlight the effectiveness and practicality of our approach for enhancing DNN performance through layer-adaptive weight pruning. Code will be available on //github.com/Akimoto-Cris/RD_VIT_PRUNE.
In this paper, with the goal of quantifying the qualitative image outputs of a Vision-based Tactile Sensor (VTS), we present the design, fabrication, and characterization of a novel Quantitative Surface Tactile Sensor (called QS-TS). QS-TS directly estimates the sensor's gel layer deformation in real-time enabling safe and autonomous tactile manipulation and servoing of delicate objects using robotic manipulators. The core of the proposed sensor is the utilization of miniature 1.5 mm x 1.5 mm synthetic square markers with inner binary patterns and a broad black border, called ArUco Markers. Each ArUco marker can provide real-time camera pose estimation that, in our design, is used as a quantitative measure for obtaining deformation of the QS-TS gel layer. Moreover, thanks to the use of ArUco markers, we propose a unique fabrication procedure that mitigates various challenges associated with the fabrication of the existing marker-based VTSs and offers an intuitive and less-arduous method for the construction of the VTS. Remarkably, the proposed fabrication facilitates the integration and adherence of markers with the gel layer to robustly and reliably obtain a quantitative measure of deformation in real-time regardless of the orientation of ArUco Markers. The performance and efficacy of the proposed QS-TS in estimating the deformation of the sensor's gel layer were experimentally evaluated and verified. Results demonstrate the phenomenal performance of the QS-TS in estimating the deformation of the gel layer with a relative error of <5%.
This paper makes two theoretical contributions. First, we establish a novel class of Hamiltonian systems, called arc Hamiltonian systems, for saddle Hamiltonian functions over infinite-dimensional metric spaces. Arc Hamiltonian systems generate a flow that satisfies the law of conservation of energy everywhere in a metric space. They are governed by an extension of Hamilton's equation formulated based on (i) the framework of arc fields and (ii) an infinite-dimensional gradient, termed the arc gradient, of a Hamiltonian function. We derive conditions for the existence of a flow generated by an arc Hamiltonian system, showing that they reduce to local Lipschitz continuity of the arc gradient under sufficient regularity. Second, we present two Hamiltonian functions, called the cumulant generating functional and the centred cumulant generating functional, over a metric space of log-likelihoods and measures. The former characterises the posterior of Bayesian inference as a part of the arc gradient that induces a flow of log-likelihoods and non-negative measures. The latter characterises the difference of the posterior and the prior as a part of the arc gradient that induces a flow of log-likelihoods and probability measures. Our results reveal an implication of the belief updating mechanism from the prior to the posterior as an infinitesimal change of a measure in the infinite-dimensional Hamiltonian flows.
Human feedback is increasingly used to steer the behaviours of Large Language Models (LLMs). However, it is unclear how to collect and incorporate feedback in a way that is efficient, effective and unbiased, especially for highly subjective human preferences and values. In this paper, we survey existing approaches for learning from human feedback, drawing on 95 papers primarily from the ACL and arXiv repositories.First, we summarise the past, pre-LLM trends for integrating human feedback into language models. Second, we give an overview of present techniques and practices, as well as the motivations for using feedback; conceptual frameworks for defining values and preferences; and how feedback is collected and from whom. Finally, we encourage a better future of feedback learning in LLMs by raising five unresolved conceptual and practical challenges.
In this paper, we propose a novel formation controller for nonholonomic agents to form general parametric curves. First, we derive a unified parametric representation for both open and closed curves. Then, a leader-follower formation controller is designed to form the parametric curves. We consider directed communications and constant input disturbances rejection in the controller design. Rigorous Lyapunov-based stability analysis proves the asymptotic stability of the proposed controller. Detailed numerical simulations and experimental studies are conducted to verify the performance of the proposed method.
In this paper, we revisit the problem of sparse linear regression in the local differential privacy (LDP) model. Existing research in the non-interactive and sequentially local models has focused on obtaining the lower bounds for the case where the underlying parameter is $1$-sparse, and extending such bounds to the more general $k$-sparse case has proven to be challenging. Moreover, it is unclear whether efficient non-interactive LDP (NLDP) algorithms exist. To address these issues, we first consider the problem in the $\epsilon$ non-interactive LDP model and provide a lower bound of $\Omega(\frac{\sqrt{dk\log d}}{\sqrt{n}\epsilon})$ on the $\ell_2$-norm estimation error for sub-Gaussian data, where $n$ is the sample size and $d$ is the dimension of the space. We propose an innovative NLDP algorithm, the very first of its kind for the problem. As a remarkable outcome, this algorithm also yields a novel and highly efficient estimator as a valuable by-product. Our algorithm achieves an upper bound of $\tilde{O}({\frac{d\sqrt{k}}{\sqrt{n}\epsilon}})$ for the estimation error when the data is sub-Gaussian, which can be further improved by a factor of $O(\sqrt{d})$ if the server has additional public but unlabeled data. For the sequentially interactive LDP model, we show a similar lower bound of $\Omega({\frac{\sqrt{dk}}{\sqrt{n}\epsilon}})$. As for the upper bound, we rectify a previous method and show that it is possible to achieve a bound of $\tilde{O}(\frac{k\sqrt{d}}{\sqrt{n}\epsilon})$. Our findings reveal fundamental differences between the non-private case, central DP model, and local DP model in the sparse linear regression problem.
In this paper, we explore the potential of Large Language Models (LLMs) to reason about threats, generate information about tools, and automate cyber campaigns. We begin with a manual exploration of LLMs in supporting specific threat-related actions and decisions. We proceed by automating the decision process in a cyber campaign. We present prompt engineering approaches for a plan-act-report loop for one action of a threat campaign and and a prompt chaining design that directs the sequential decision process of a multi-action campaign. We assess the extent of LLM's cyber-specific knowledge w.r.t the short campaign we demonstrate and provide insights into prompt design for eliciting actionable responses. We discuss the potential impact of LLMs on the threat landscape and the ethical considerations of using LLMs for accelerating threat actor capabilities. We report a promising, yet concerning, application of generative AI to cyber threats. However, the LLM's capabilities to deal with more complex networks, sophisticated vulnerabilities, and the sensitivity of prompts are open questions. This research should spur deliberations over the inevitable advancements in LLM-supported cyber adversarial landscape.
In this paper, we consider the one-bit precoding problem for the multiuser downlink massive multiple-input multiple-output (MIMO) system with phase shift keying (PSK) modulation. We focus on the celebrated constructive interference (CI)-based problem formulation. We first establish the NP-hardness of the problem (even in the single-user case), which reveals the intrinsic difficulty of globally solving the problem. Then, we propose a novel negative $\ell_1$ penalty model for the considered problem, which penalizes the one-bit constraint into the objective by a negative $\ell_1$-norm term, and show the equivalence between (global and local) solutions of the original problem and the penalty problem when the penalty parameter is sufficiently large. We further transform the penalty model into an equivalent min-max problem and propose an efficient alternating proximal/projection gradient descent ascent (APGDA) algorithm for solving it, which performs a proximal gradient decent over one block of variables and a projection gradient ascent over the other block of variables alternately. The APGDA algorithm enjoys a low per-iteration complexity and is guaranteed to converge to a stationary point of the min-max problem and a local minimizer of the penalty problem. To further reduce the computational cost, we also propose a low-complexity implementation of the APGDA algorithm, where the values of the variables will be fixed in later iterations once they satisfy the one-bit constraint. Numerical results show that, compared to the state-of-the-art CI-based algorithms, both of the proposed algorithms generally achieve better bit-error-rate (BER) performance with lower computational cost.
In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.
In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR). We formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML). We evaluated the proposed approach using six languages as pretraining tasks and four languages as target tasks. Preliminary results showed that the proposed method, MetaASR, significantly outperforms the state-of-the-art multitask pretraining approach on all target languages with different combinations of pretraining languages. In addition, since MAML's model-agnostic property, this paper also opens new research direction of applying meta learning to more speech-related applications.
In this paper, we introduce the Reinforced Mnemonic Reader for machine reading comprehension tasks, which enhances previous attentive readers in two aspects. First, a reattention mechanism is proposed to refine current attentions by directly accessing to past attentions that are temporally memorized in a multi-round alignment architecture, so as to avoid the problems of attention redundancy and attention deficiency. Second, a new optimization approach, called dynamic-critical reinforcement learning, is introduced to extend the standard supervised method. It always encourages to predict a more acceptable answer so as to address the convergence suppression problem occurred in traditional reinforcement learning algorithms. Extensive experiments on the Stanford Question Answering Dataset (SQuAD) show that our model achieves state-of-the-art results. Meanwhile, our model outperforms previous systems by over 6% in terms of both Exact Match and F1 metrics on two adversarial SQuAD datasets.