The adoption of processing-in-memory (PiM) architectures has been gaining momentum because they provide high performance and low energy consumption by alleviating the data movement bottleneck. Yet, the security of such architectures has not been thoroughly explored. The adoption of PiM solutions provides a new way to directly access main memory, which can be potentially exploited by malicious user applications. We show that this new way to access main memory opens opportunities for high-throughput timing attack vectors that are hard-to-mitigate without significant performance overhead. We introduce IMPACT, a set of high-throughput main memory-based timing attacks that leverage characteristics of PiM architectures to establish covert and side channels. IMPACT enables high-throughput communication and private information leakage. To achieve this, IMPACT (i) eliminates expensive cache bypassing steps required by processor-centric main memory and cache-based timing attacks and (ii) leverages the intrinsic parallelism of PiM operations. First, we showcase two covert-channel attack variants that run on the host CPU and leverage PiM architectures to gain direct and fast access to main memory and establish high-throughput communication covert channels. Second, we showcase a side-channel attack on a DNA sequence analysis application that leaks the private characteristics of a user's sample genome by leveraging PiM operations. Our results demonstrate that (i) our covert channels achieve up to 14.16 Mb/s communication throughput, which is 6.38x faster than the state-of-the-art main memory-based covert channels, and (ii) our side-channel attack allows the attacker to determine the properties of a sample genome at a throughput of 7.5 Mb/s with 96% accuracy. We discuss and evaluate several countermeasures for IMPACT to enable secure and robust PiM architectures.
Recently, Chain-of-Thought (CoT) prompting has delivered success on complex reasoning tasks, which aims at designing a simple prompt like ``Let's think step by step'' or multiple in-context exemplars with well-designed rationales to elicit Large Language Models (LLMs) to generate intermediate reasoning steps. However, the generated rationales often come with mistakes, making unfactual and unfaithful reasoning chains. To mitigate this brittleness, we propose a novel Chain-of-Knowledge (CoK) prompting, where we aim at eliciting LLMs to generate explicit pieces of knowledge evidence in the form of structure triple. This is inspired by our human behaviors, i.e., we can draw a mind map or knowledge map as the reasoning evidence in the brain before answering a complex question. Benefiting from CoK, we additionally introduce a F^2-Verification method to estimate the reliability of the reasoning chains in terms of factuality and faithfulness. For the unreliable response, the wrong evidence can be indicated to prompt the LLM to rethink. Extensive experiments demonstrate that our method can further improve the performance of commonsense, factual, symbolic, and arithmetic reasoning tasks.
Recent strides in nonlinear model predictive control (NMPC) underscore a dependence on numerical advancements to efficiently and accurately solve large-scale problems. Given the substantial number of variables characterizing typical whole-body optimal control (OC) problems - often numbering in the thousands - exploiting the sparse structure of the numerical problem becomes crucial to meet computational demands, typically in the range of a few milliseconds. Addressing the linear-quadratic regulator (LQR) problem is a fundamental building block for computing Newton or Sequential Quadratic Programming (SQP) steps in direct optimal control methods. This paper concentrates on equality-constrained problems featuring implicit system dynamics and dual regularization, a characteristic of advanced interiorpoint or augmented Lagrangian solvers. Here, we introduce a parallel algorithm for solving an LQR problem with dual regularization. Leveraging a rewriting of the LQR recursion through block elimination, we first enhanced the efficiency of the serial algorithm and then subsequently generalized it to handle parametric problems. This extension enables us to split decision variables and solve multiple subproblems concurrently. Our algorithm is implemented in our nonlinear numerical optimal control library ALIGATOR. It showcases improved performance over previous serial formulations and we validate its efficacy by deploying it in the model predictive control of a real quadruped robot.
Non-fungible tokens (NFTs) are becoming increasingly popular in Play-to-Earn (P2E) Web3 applications as a means of incentivizing user engagement. In Web3, users with NFTs ownership are entitled to monetize them. However, due to lack of objective NFT valuation, which makes NFT value determination challenging, P2E applications ecosystems have experienced inflation. In this paper, we propose a method that enables NFT inflation value management in P2E applications. Our method leverages the contribution-rewards model proposed by Curve Finance and the automated market maker (AMM) of decentralized exchanges. In decentralized systems, P2E Web3 applications inclusive, not all participants contribute in good faith. Therefore, rewards are provided to incentivize contribution. Our mechanism proves that burning NFTs, indicating the permanent removal of NFTs, contributes to managing inflation by reducing the number of NFTs in circulation. As a reward for this contribution, our method mints a compensation (CP) token as an ERC-20 token, which can be exchanged for NFTs once enough tokens have been accumulated. To further increase the value of the CP token, we suggest using governance tokens and CP tokens to create liquidity pools for AMM. The value of the governance token is determined by the market, and the CP token derives its value from the governance token in AMM. The CP token can determine its worth based on the market value of the governance token. Additionally, since CP tokens are used for exchanging NFTs, the value of the NFT is ultimately determined by the value of the CP token. To further illustrate our concept, we show how to adjust burning rewards based on factors such as the probability of upgrading NFTs' rarity or the current swap ratio of governance and CP tokens in AMM.
Yoshimura origami is a classical folding pattern that has inspired many deployable structure designs. Its applications span from space exploration, kinetic architectures, and soft robots to even everyday household items. However, despite its wide usage, Yoshimura has been fixated on a set of design constraints to ensure its flat-foldability. Through extensive kinematic analysis and prototype tests, this study presents a new Yoshimura that intentionally defies these constraints. Remarkably, one can impart a unique meta-stability by using the Golden Ratio angle to define the triangular facets of a generalized Yoshimura. As a result, when its facets are strategically popped out, a ``Golden Ratio Yoshimura'' boom with $m$ modules can be theoretically reconfigured into $8^m$ geometrically unique and load-bearing shapes. This result not only challenges the existing design norms but also opens up a new avenue to create deployable and versatile structural systems.
Multi-modal 3D scene understanding has gained considerable attention due to its wide applications in many areas, such as autonomous driving and human-computer interaction. Compared to conventional single-modal 3D understanding, introducing an additional modality not only elevates the richness and precision of scene interpretation but also ensures a more robust and resilient understanding. This becomes especially crucial in varied and challenging environments where solely relying on 3D data might be inadequate. While there has been a surge in the development of multi-modal 3D methods over past three years, especially those integrating multi-camera images (3D+2D) and textual descriptions (3D+language), a comprehensive and in-depth review is notably absent. In this article, we present a systematic survey of recent progress to bridge this gap. We begin by briefly introducing a background that formally defines various 3D multi-modal tasks and summarizes their inherent challenges. After that, we present a novel taxonomy that delivers a thorough categorization of existing methods according to modalities and tasks, exploring their respective strengths and limitations. Furthermore, comparative results of recent approaches on several benchmark datasets, together with insightful analysis, are offered. Finally, we discuss the unresolved issues and provide several potential avenues for future research.
Transformer architectures have facilitated the development of large-scale and general-purpose sequence models for prediction tasks in natural language processing and computer vision, e.g., GPT-3 and Swin Transformer. Although originally designed for prediction problems, it is natural to inquire about their suitability for sequential decision-making and reinforcement learning problems, which are typically beset by long-standing issues involving sample efficiency, credit assignment, and partial observability. In recent years, sequence models, especially the Transformer, have attracted increasing interest in the RL communities, spawning numerous approaches with notable effectiveness and generalizability. This survey presents a comprehensive overview of recent works aimed at solving sequential decision-making tasks with sequence models such as the Transformer, by discussing the connection between sequential decision-making and sequence modeling, and categorizing them based on the way they utilize the Transformer. Moreover, this paper puts forth various potential avenues for future research intending to improve the effectiveness of large sequence models for sequential decision-making, encompassing theoretical foundations, network architectures, algorithms, and efficient training systems. As this article has been accepted by the Frontiers of Computer Science, here is an early version, and the most up-to-date version can be found at //journal.hep.com.cn/fcs/EN/10.1007/s11704-023-2689-5
Deploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for practical applications. In reaction, researchers train smaller task-specific models by either finetuning with human labels or distilling using LLM-generated labels. However, finetuning and distillation require large amounts of training data to achieve comparable performance to LLMs. We introduce Distilling step-by-step, a new mechanism that (a) trains smaller models that outperform LLMs, and (b) achieves so by leveraging less training data needed by finetuning or distillation. Our method extracts LLM rationales as additional supervision for small models within a multi-task training framework. We present three findings across 4 NLP benchmarks: First, compared to both finetuning and distillation, our mechanism achieves better performance with much fewer labeled/unlabeled training examples. Second, compared to LLMs, we achieve better performance using substantially smaller model sizes. Third, we reduce both the model size and the amount of data required to outperform LLMs; our 770M T5 model outperforms the 540B PaLM model using only 80% of available data on a benchmark task.
Molecular design and synthesis planning are two critical steps in the process of molecular discovery that we propose to formulate as a single shared task of conditional synthetic pathway generation. We report an amortized approach to generate synthetic pathways as a Markov decision process conditioned on a target molecular embedding. This approach allows us to conduct synthesis planning in a bottom-up manner and design synthesizable molecules by decoding from optimized conditional codes, demonstrating the potential to solve both problems of design and synthesis simultaneously. The approach leverages neural networks to probabilistically model the synthetic trees, one reaction step at a time, according to reactivity rules encoded in a discrete action space of reaction templates. We train these networks on hundreds of thousands of artificial pathways generated from a pool of purchasable compounds and a list of expert-curated templates. We validate our method with (a) the recovery of molecules using conditional generation, (b) the identification of synthesizable structural analogs, and (c) the optimization of molecular structures given oracle functions relevant to drug discovery.
Deep neural network architectures have traditionally been designed and explored with human expertise in a long-lasting trial-and-error process. This process requires huge amount of time, expertise, and resources. To address this tedious problem, we propose a novel algorithm to optimally find hyperparameters of a deep network architecture automatically. We specifically focus on designing neural architectures for medical image segmentation task. Our proposed method is based on a policy gradient reinforcement learning for which the reward function is assigned a segmentation evaluation utility (i.e., dice index). We show the efficacy of the proposed method with its low computational cost in comparison with the state-of-the-art medical image segmentation networks. We also present a new architecture design, a densely connected encoder-decoder CNN, as a strong baseline architecture to apply the proposed hyperparameter search algorithm. We apply the proposed algorithm to each layer of the baseline architectures. As an application, we train the proposed system on cine cardiac MR images from Automated Cardiac Diagnosis Challenge (ACDC) MICCAI 2017. Starting from a baseline segmentation architecture, the resulting network architecture obtains the state-of-the-art results in accuracy without performing any trial-and-error based architecture design approaches or close supervision of the hyperparameters changes.
Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning. In this work, we propose a combined bottom-up and top-down attention mechanism that enables attention to be calculated at the level of objects and other salient image regions. This is the natural basis for attention to be considered. Within our approach, the bottom-up mechanism (based on Faster R-CNN) proposes image regions, each with an associated feature vector, while the top-down mechanism determines feature weightings. Applying this approach to image captioning, our results on the MSCOCO test server establish a new state-of-the-art for the task, achieving CIDEr / SPICE / BLEU-4 scores of 117.9, 21.5 and 36.9, respectively. Demonstrating the broad applicability of the method, applying the same approach to VQA we obtain first place in the 2017 VQA Challenge.