A hybrid encryption (HE) system is an efficient public key encryption system for arbitrarily long messages. An HE system consists of a public key component called key encapsulation mechanism (KEM), and a symmetric key component called data encapsulation mechanism (DEM). The HE encryption algorithm uses a KEM generated key k to encapsulate the message using DEM, and send the ciphertext together with the encapsulaton of k, to the decryptor who decapsulates k and uses it to decapsulate the message using the corresponding KEM and DEM components. The KEM/DEM composition theorem proves that if KEM and DEM satisfy well-defined security notions, then HE will be secure with well defined security. We introduce HE in correlated randomness model where the encryption and decryption algorithms have samples of correlated random variables that are partially leaked to the adversary. Security of the new KEM/DEM paradigm is defined against computationally unbounded or polynomially bounded adversaries. We define iKEM and cKEM with respective information theoretic computational security, and prove a composition theorem for them and a computationally secure DEM, resulting in secure HEs with proved computational security (CPA and CCA) and without any computational assumption. We construct two iKEMs that provably satisfy the required security notions of the composition theorem. The iKEMs are used to construct two efficient quantum-resistant HEs when used with an AES based DEM. We also define and construct combiners with proved security that combine the new KEM/DEM paradigm of HE with the traditional public key based paradigm of HE.
Coded caching (CC) can substantially enhance network performance by leveraging memory as an additional communication resource. However, the use of CC is challenging in various practical applications due to dynamic user behavior. The existing solutions, based on shared caching, cannot directly handle all scenarios where users freely enter and depart the network at any time as they are constrained by specific conditions on network parameters. This paper proposes a universally applicable shared-caching scheme for dynamic setups without any restriction on network parameters. The closed-form expressions for the achievable degrees of freedom (DoF) are computed for the resulting generalized scheme, and are shown to achieve the existing optimal bounds of the shared-cache model. Furthermore, a successive-interference-cancellation-free extension based on a fast iterative optimized beamformer design is devised to optimize the use of excess spatial dimensions freed by cache-aided interference cancellation. Extensive numerical experiments are carried out to assess the performance of the proposed scheme. In particular, the results demonstrate that while a dynamic setup may achieve a DoF substantially lower than the optimal DoF of shared caching, our proposed scheme significantly improves the performance at the finite signal-to-noise ratio compared to unicasting, which only benefits from the local caching gain.
High-level synthesis, source-to-source compilers, and various Design Space Exploration techniques for pragma insertion have significantly improved the Quality of Results of generated designs. These tools offer benefits such as reduced development time and enhanced performance. However, achieving high-quality results often requires additional manual code transformations and tiling selections, which are typically performed separately or as pre-processing steps. Although DSE techniques enable code transformation upfront, the vastness of the search space often limits the exploration of all possible code transformations, making it challenging to determine which transformations are necessary. Additionally, ensuring correctness remains challenging, especially for complex transformations and optimizations. To tackle this obstacle, we first propose a comprehensive framework leveraging HLS compilers. Our system streamlines code transformation, pragma insertion, and tiles size selection for on-chip data caching through a unified optimization problem, aiming to enhance parallelization, particularly beneficial for computation-bound kernels. Them employing a novel Non-Linear Programming (NLP) approach, we simultaneously ascertain transformations, pragmas, and tile sizes, focusing on regular loop-based kernels. Our evaluation demonstrates that our framework adeptly identifies the appropriate transformations, including scenarios where no transformation is necessary, and inserts pragmas to achieve a favorable Quality of Results.
The task of Information Retrieval (IR) requires a system to identify relevant documents based on users' information needs. In real-world scenarios, retrievers are expected to not only rely on the semantic relevance between the documents and the queries but also recognize the nuanced intents or perspectives behind a user query. For example, when asked to verify a claim, a retrieval system is expected to identify evidence from both supporting vs. contradicting perspectives, for the downstream system to make a fair judgment call. In this work, we study whether retrievers can recognize and respond to different perspectives of the queries -- beyond finding relevant documents for a claim, can retrievers distinguish supporting vs. opposing documents? We reform and extend six existing tasks to create a benchmark for retrieval, where we have diverse perspectives described in free-form text, besides root, neutral queries. We show that current retrievers covered in our experiments have limited awareness of subtly different perspectives in queries and can also be biased toward certain perspectives. Motivated by the observation, we further explore the potential to leverage geometric features of retriever representation space to improve the perspective awareness of retrievers in a zero-shot manner. We demonstrate the efficiency and effectiveness of our projection-based methods on the same set of tasks. Further analysis also shows how perspective awareness improves performance on various downstream tasks, with 4.2% higher accuracy on AmbigQA and 29.9% more correlation with designated viewpoints on essay writing, compared to non-perspective-aware baselines.
The rapid advancement in Large Language Models (LLMs) has markedly enhanced the capabilities of language understanding and generation. However, the substantial model size poses hardware challenges, affecting both memory size for serving and inference latency for token generation. To address those challenges, we propose Dependency-aware Semi-structured Sparsity (DaSS), a novel method for the recent prevalent SwiGLU-based LLMs pruning. Our approach incorporates structural dependency into the weight magnitude-based unstructured pruning. We introduce an MLP-specific pruning metric that evaluates the importance of each weight by jointly considering its magnitude and its corresponding MLP intermediate activation norms. DaSS facilitates a balance between the adaptability offered by unstructured pruning and the structural consistency inherent in dependency-based structured pruning. Empirical evaluations on Mistral and LLaMA2 model families demonstrate that DaSS not only outperforms both SparseGPT and Wanda in achieving hardware-friendly N:M sparsity patterns but also maintains the computational efficiency of Wanda.
Programming recurrent spiking neural networks (RSNNs) to robustly perform multi-timescale computation remains a difficult challenge. To address this, we show how the distributed approach offered by vector symbolic architectures (VSAs), which uses high-dimensional random vectors as the smallest units of representation, can be leveraged to embed robust multi-timescale dynamics into attractor-based RSNNs. We embed finite state machines into the RSNN dynamics by superimposing a symmetric autoassociative weight matrix and asymmetric transition terms. The transition terms are formed by the VSA binding of an input and heteroassociative outer-products between states. Our approach is validated through simulations with highly non-ideal weights; an experimental closed-loop memristive hardware setup; and on Loihi 2, where it scales seamlessly to large state machines. This work demonstrates the effectiveness of VSA representations for embedding robust computation with recurrent dynamics into neuromorphic hardware, without requiring parameter fine-tuning or significant platform-specific optimisation. This advances VSAs as a high-level representation-invariant abstract language for cognitive algorithms in neuromorphic hardware.
Initial alignment is one of the key technologies in strapdown inertial navigation system (SINS) to provide initial state information for vehicle attitude and navigation. For some situations, such as the attitude heading reference system, the position is not necessarily required or even available, then the self-alignment that does not rely on any external aid becomes very necessary. This study presents a new self-alignment method under swaying conditions, which can determine the latitude and attitude simultaneously by utilizing all observation vectors without solving the Wahba problem, and it is different from the existing methods. By constructing the dyadic tensor of each observation and reference vector itself, all equations related to observation and reference vectors are accumulated into one equation, where the latitude variable is extracted and solved according to the same eigenvalues of similar matrices on both sides of the equation, meanwhile the attitude is obtained by eigenvalue decomposition. Simulation and experiment tests verify the effectiveness of the proposed methods, and the alignment result is better than TRIAD in convergence speed and stability and comparable with OBA method in alignment accuracy with or without latitude. It is useful for guiding the design of initial alignment in autonomous vehicle applications.
The quest for robust Person re-identification (Re-ID) systems capable of accurately identifying subjects across diverse scenarios remains a formidable challenge in surveillance and security applications. This study presents a novel methodology that significantly enhances Person Re-Identification (Re-ID) by integrating Uncertainty Feature Fusion (UFFM) with Wise Distance Aggregation (WDA). Tested on benchmark datasets - Market-1501, DukeMTMC-ReID, and MSMT17 - our approach demonstrates substantial improvements in Rank-1 accuracy and mean Average Precision (mAP). Specifically, UFFM capitalizes on the power of feature synthesis from multiple images to overcome the limitations imposed by the variability of subject appearances across different views. WDA further refines the process by intelligently aggregating similarity metrics, thereby enhancing the system's ability to discern subtle but critical differences between subjects. The empirical results affirm the superiority of our method over existing approaches, achieving new performance benchmarks across all evaluated datasets. Code is available on Github.
Answering questions that require reading texts in an image is challenging for current models. One key difficulty of this task is that rare, polysemous, and ambiguous words frequently appear in images, e.g., names of places, products, and sports teams. To overcome this difficulty, only resorting to pre-trained word embedding models is far from enough. A desired model should utilize the rich information in multiple modalities of the image to help understand the meaning of scene texts, e.g., the prominent text on a bottle is most likely to be the brand. Following this idea, we propose a novel VQA approach, Multi-Modal Graph Neural Network (MM-GNN). It first represents an image as a graph consisting of three sub-graphs, depicting visual, semantic, and numeric modalities respectively. Then, we introduce three aggregators which guide the message passing from one graph to another to utilize the contexts in various modalities, so as to refine the features of nodes. The updated nodes have better features for the downstream question answering module. Experimental evaluations show that our MM-GNN represents the scene texts better and obviously facilitates the performances on two VQA tasks that require reading scene texts.
Manually labeling objects by tracing their boundaries is a laborious process. In Polygon-RNN++ the authors proposed Polygon-RNN that produces polygonal annotations in a recurrent manner using a CNN-RNN architecture, allowing interactive correction via humans-in-the-loop. We propose a new framework that alleviates the sequential nature of Polygon-RNN, by predicting all vertices simultaneously using a Graph Convolutional Network (GCN). Our model is trained end-to-end. It supports object annotation by either polygons or splines, facilitating labeling efficiency for both line-based and curved objects. We show that Curve-GCN outperforms all existing approaches in automatic mode, including the powerful PSP-DeepLab and is significantly more efficient in interactive mode than Polygon-RNN++. Our model runs at 29.3ms in automatic, and 2.6ms in interactive mode, making it 10x and 100x faster than Polygon-RNN++.
We propose a novel single shot object detection network named Detection with Enriched Semantics (DES). Our motivation is to enrich the semantics of object detection features within a typical deep detector, by a semantic segmentation branch and a global activation module. The segmentation branch is supervised by weak segmentation ground-truth, i.e., no extra annotation is required. In conjunction with that, we employ a global activation module which learns relationship between channels and object classes in a self-supervised manner. Comprehensive experimental results on both PASCAL VOC and MS COCO detection datasets demonstrate the effectiveness of the proposed method. In particular, with a VGG16 based DES, we achieve an mAP of 81.7 on VOC2007 test and an mAP of 32.8 on COCO test-dev with an inference speed of 31.5 milliseconds per image on a Titan Xp GPU. With a lower resolution version, we achieve an mAP of 79.7 on VOC2007 with an inference speed of 13.0 milliseconds per image.