Processors with extensible instruction sets are often used today as programmable hardware accelerators for various domains. When extending RISC-V and other similar extensible processor architectures, the task of designing specialized instructions arises. This task can be solved automatically by using instruction synthesis algorithms. In this paper, we consider algorithms that can be used in addition to the known approaches and improve the synthesized instruction sets by recomputing common operations (the result of which is consumed by multiple operations) of a program inside clustered synthesized instructions (common operations clustering algorithm), and by identifying redundant (which have equivalents among the other instructions) synthesized instructions (subsuming functions algorithm). Experimental evaluations of the developed algorithms are presented for the tests from the domains of cryptography and three-dimensional graphics. For Magma cipher test, the common operations clustering algorithm allows reducing the size of the compiled code by 9%, and the subsuming functions algorithm allows reducing the synthesized instruction set extension size by 2 times. For AES cipher test, the common operations clustering algorithm allows reducing the size of the compiled code by 10%, and the subsuming functions algorithm allows reducing the synthesized instruction set extension size by 2.5 times. Finally, for the instruction set extension from Volume Ray-Casting test, the additional use of subsuming functions algorithm allows reducing problem-specific instruction extension set size from 5 to only 2 instructions without losing its functionality.
Today's software projects include enhancements, fixes, and patches need to be delivered almost on a daily basis to clients. Weekly and daily releases are pretty much the norm and sit alongside larger feature upgrades and quarterly releases. Software delivery has to be more agile now than ever before. Companies that were, in the past, experimenting with agile based delivery models, are now looking to scale it to enterprise grade. This shifts the need from the ability to build and execute tests rapidly, to using different means, technologies and procedures to provide rapid and insightful validation sequences and tests to establish quality withing the manufacturing cycle. This book addresses the need of effectively embedding quality engineering throughout the agile development cycle thus addressing the need for enterprise scale high quality agile development.
Goal recognition is a fundamental cognitive process that enables individuals to infer intentions based on available cues. Current goal recognition algorithms often take only observed actions as input, but here we use a Bayesian framework to explore the role of actions, timing, and goal solvability in goal recognition. We analyze human responses to goal-recognition problems in the Sokoban domain, and find that actions are assigned most importance, but that timing and solvability also influence goal recognition in some cases, especially when actions are uninformative. We leverage these findings to develop a goal recognition model that matches human inferences more closely than do existing algorithms. Our work provides new insight into human goal recognition and takes a step towards more human-like AI models.
Sequential recommender systems identify user preferences from their past interactions to predict subsequent items optimally. Although traditional deep-learning-based models and modern transformer-based models in previous studies capture unidirectional and bidirectional patterns within user-item interactions, the importance of temporal contexts, such as individual behavioral and societal trend patterns, remains underexplored. Notably, recent models often neglect similarities in users' actions that occur implicitly among users during analogous timeframes-a concept we term vertical temporal proximity. These models primarily adapt the self-attention mechanisms of the transformer to consider the temporal context in individual user actions. Meanwhile, this adaptation still remains limited in considering the horizontal temporal proximity within item interactions, like distinguishing between subsequent item purchases within a week versus a month. To address these gaps, we propose a sequential recommendation model called TemProxRec, which includes contrastive learning and self-attention methods to consider temporal proximities both across and within user-item interactions. The proposed contrastive learning method learns representations of items selected in close temporal periods across different users to be close. Simultaneously, the proposed self-attention mechanism encodes temporal and positional contexts in a user sequence using both absolute and relative embeddings. This way, our TemProxRec accurately predicts the relevant items based on the user-item interactions within a specific timeframe. We validate this work through comprehensive experiments on TemProxRec, consistently outperforming existing models on benchmark datasets as well as showing the significance of considering the vertical and horizontal temporal proximities into sequential recommendation.
Signature-based Intrusion Detection Systems (SIDSs) are traditionally used to detect malicious activity in networks. A notable example of such a system is Snort, which compares network traffic against a series of rules that match known exploits. Current SIDS rules are designed to minimize the amount of legitimate traffic flagged incorrectly, reducing the burden on network administrators. However, different use cases than the traditional one--such as researchers studying trends or analyzing modified versions of known exploits--may require SIDSs to be less constrained in their operation. In this paper, we demonstrate that applying modifications to real-world SIDS rules allow for relaxing some constraints and characterizing the performance space of modified rules. We develop an iterative approach for exploring the space of modifications to SIDS rules. By taking the modifications that expand the ROC curve of performance and altering them further, we show how to modify rules in a directed manner. Using traffic collected and identified as benign or malicious from a cloud telescope, we find that the removal of a single component from SIDS rules has the largest impact on the performance space. Effectively modifying SIDS rules to reduce constraints can enable a broader range of detection for various objectives, from increased security to research purposes.
Security is a growing problem that needs hardware support. Memristors provide an alternative technology for hardware-supported security implementation. This paper presents a specific technique that utilizes the benefits of hybrid CMOS-memristors technology demonstrated with SHA3 over implementations that use only memristor technology. In the proposed technique, SHA3 is implemented in a set of perpendicular crossbar arrays structured to facilitate logic implementation and circular bit rotation (Rho operation), which is perhaps the most complex operation in SHA3 when carried out in memristor arrays. The Rho operation itself is implemented with CMOS multiplexers (MUXs). The proposed accelerator is standby power-free and circumvents the memory access bottleneck in conventional computers. In addition, our design obscures the intermediate values from the I/O interface and outperforms the state-of-the-art memristor-based designs in terms of size and energy. Demonstrating the memristor implementation of SHA3 provides an impetus for utilizing memristors in information security applications.
Decentralized bilevel optimization has been actively studied in the past few years since it has widespread applications in machine learning. However, existing algorithms suffer from large communication complexity caused by the estimation of stochastic hypergradient, limiting their application to real-world tasks. To address this issue, we develop a novel decentralized stochastic bilevel gradient descent algorithm under the heterogeneous setting, which enjoys a small communication cost in each round and a small number of communication rounds. As such, it can achieve a much better communication complexity than existing algorithms without any strong assumptions regarding heterogeneity. To the best of our knowledge, this is the first stochastic algorithm achieving these theoretical results under the heterogeneous setting. At last, the experimental results confirm the efficacy of our algorithm.
Integrated sensing and communication (ISAC) is widely recognized as a fundamental enabler for future wireless communications. In this paper, we present a joint communication and radar beamforming framework for maximizing a sum spectral efficiency (SE) while guaranteeing desired radar performance with imperfect channel state information (CSI) in multi-user and multi-target ISAC systems. To this end, we adopt either a radar transmit beam mean square error (MSE) or receive signal-to-clutter-plus-noise ratio (SCNR) as a radar performance constraint of a sum SE maximization problem. To resolve inherent challenges such as non-convexity and imperfect CSI, we reformulate the problems and identify first-order optimality conditions for the joint radar and communication beamformer. Turning the condition to a nonlinear eigenvalue problem with eigenvector dependency (NEPv), we develop an alternating method which finds the joint beamformer through power iteration and a Lagrangian multiplier through binary search. The proposed framework encompasses both the radar metrics and is robust to channel estimation error with low complexity. Simulations validate the proposed methods. In particular, we observe that the MSE and SCNR constraints exhibit complementary performance depending on the operating environment, which manifests the importance of the proposed comprehensive and robust optimization framework.
Training advanced AI models requires large investments in computational resources, or compute. Yet, as hardware innovation reduces the price of compute and algorithmic advances make its use more efficient, the cost of training an AI model to a given performance falls over time - a concept we describe as increasing compute efficiency. We find that while an access effect increases the number of actors who can train models to a given performance over time, a performance effect simultaneously increases the performance available to each actor. This potentially enables large compute investors to pioneer new capabilities, maintaining a performance advantage even as capabilities diffuse. Since large compute investors tend to develop new capabilities first, it will be particularly important that they share information about their AI models, evaluate them for emerging risks, and, more generally, make responsible development and release decisions. Further, as compute efficiency increases, governments will need to prepare for a world where dangerous AI capabilities are widely available - for instance, by developing defenses against harmful AI models or by actively intervening in the diffusion of particularly dangerous capabilities.
Although large language models (LLMs) are impressive in solving various tasks, they can quickly be outdated after deployment. Maintaining their up-to-date status is a pressing concern in the current era. This paper provides a comprehensive review of recent advances in aligning LLMs with the ever-changing world knowledge without re-training from scratch. We categorize research works systemically and provide in-depth comparisons and discussion. We also discuss existing challenges and highlight future directions to facilitate research in this field. We release the paper list at //github.com/hyintell/awesome-refreshing-llms
Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.