Autonomous vehicles necessitate a delicate balance between safety, efficiency, and user preferences in trajectory planning. Existing traditional or learning-based methods face challenges in adequately addressing all these aspects. In response, this paper proposes a novel component termed the Logical Guidance Layer (LGL), designed for seamless integration into autonomous driving trajectory planning frameworks, specifically tailored for highway scenarios. The LGL guides the trajectory planning with a local target area determined through scenario reasoning, scenario evaluation, and guidance area calculation. Integrating the Responsibility-Sensitive Safety (RSS) model, the LGL ensures formal safety guarantees while accommodating various user preferences defined by logical formulae. Experimental validation demonstrates the effectiveness of the LGL in achieving a balance between safety and efficiency, and meeting user preferences in autonomous highway driving scenarios.
Despite careful safety alignment, current large language models (LLMs) remain vulnerable to various attacks. To further unveil the safety risks of LLMs, we introduce a Safety Concept Activation Vector (SCAV) framework, which effectively guides the attacks by accurately interpreting LLMs' safety mechanisms. We then develop an SCAV-guided attack method that can generate both attack prompts and embedding-level attacks with automatically selected perturbation hyperparameters. Both automatic and human evaluations demonstrate that our attack method significantly improves the attack success rate and response quality while requiring less training data. Additionally, we find that our generated attack prompts may be transferable to GPT-4, and the embedding-level attacks may also be transferred to other white-box LLMs whose parameters are known. Our experiments further uncover the safety risks present in current LLMs. For example, we find that six out of seven open-source LLMs that we attack consistently provide relevant answers to more than 85\% malicious instructions. Finally, we provide insights into the safety mechanism of LLMs.
Advanced AI systems may be developed which exhibit capabilities that present significant risks to public safety or security. They may also exhibit capabilities that may be applied defensively in a wide set of domains, including (but not limited to) developing societal resilience against AI threats. We propose Coordinated Disclosure of Dual-Use Capabilities (CDDC) as a process to guide early information-sharing between advanced AI developers, US government agencies, and other private sector actors about these capabilities. The process centers around an information clearinghouse (the "coordinator") which receives evidence of dual-use capabilities from finders via mandatory and/or voluntary reporting pathways, and passes noteworthy reports to defenders for follow-up (i.e., further analysis and response). This aims to provide the US government, dual-use foundation model developers, and other actors with an overview of AI capabilities that could significantly impact public safety and security, as well as maximal time to respond.
Large language models (LLMs) show inherent brittleness in their safety mechanisms, as evidenced by their susceptibility to jailbreaking and even non-malicious fine-tuning. This study explores this brittleness of safety alignment by leveraging pruning and low-rank modifications. We develop methods to identify critical regions that are vital for safety guardrails, and that are disentangled from utility-relevant regions at both the neuron and rank levels. Surprisingly, the isolated regions we find are sparse, comprising about $3\%$ at the parameter level and $2.5\%$ at the rank level. Removing these regions compromises safety without significantly impacting utility, corroborating the inherent brittleness of the model's safety mechanisms. Moreover, we show that LLMs remain vulnerable to low-cost fine-tuning attacks even when modifications to the safety-critical regions are restricted. These findings underscore the urgent need for more robust safety strategies in LLMs.
In analog circuits, process variation can cause unpredictability in circuit performance. Common-centroid (CC) type layouts have been shown to mitigate process-induced variations and are widely used to match circuit elements. Nevertheless, selecting the most suitable CC topology necessitates careful consideration of important layout constraints. Manual handling of these constraints becomes challenging, especially with large size problems. State-of-the-art CC placement methods lack an optimization framework to handle important layout constraints collectively. They also require manual efforts and consequently, the solutions can be suboptimal. To address this, we propose a unified framework based on multi-objective optimization for CC placement of analog transistors. Our method handles various constraints, including degree of dispersion, routing complexity, diffusion sharing, and layout dependent effects. The multi-objective optimization provides better handling of the objectives when compared to single-objective optimization. Moreover, compared to existing methods, our method explores more CC topologies. Post-layout simulation results show better performance compared to state-of-the-art techniques in generating CC layouts.
The significant advancements in large language models (LLMs) give rise to a promising research direction, i.e., leveraging LLMs as recommenders (LLMRec). The efficacy of LLMRec arises from the open-world knowledge and reasoning capabilities inherent in LLMs. LLMRec acquires the recommendation capabilities through instruction tuning based on user interaction data. However, in order to protect user privacy and optimize utility, it is also crucial for LLMRec to intentionally forget specific user data, which is generally referred to as recommendation unlearning. In the era of LLMs, recommendation unlearning poses new challenges for LLMRec in terms of \textit{inefficiency} and \textit{ineffectiveness}. Existing unlearning methods require updating billions of parameters in LLMRec, which is costly and time-consuming. Besides, they always impact the model utility during the unlearning process. To this end, we propose \textbf{E2URec}, the first \underline{E}fficient and \underline{E}ffective \underline{U}nlearning method for LLM\underline{Rec}. Our proposed E2URec enhances the unlearning efficiency by updating only a few additional LoRA parameters, and improves the unlearning effectiveness by employing a teacher-student framework, where we maintain multiple teacher networks to guide the unlearning process. Extensive experiments show that E2URec outperforms state-of-the-art baselines on two real-world datasets. Specifically, E2URec can efficiently forget specific data without affecting recommendation performance. The source code is at \url{//github.com/justarter/E2URec}.
Estimating the probability of failure is an important step in the certification of safety-critical systems. Efficient estimation methods are often needed due to the challenges posed by high-dimensional input spaces, risky test scenarios, and computationally expensive simulators. This work frames the problem of black-box safety validation as a Bayesian optimization problem and introduces a method that iteratively fits a probabilistic surrogate model to efficiently predict failures. The algorithm is designed to search for failures, compute the most-likely failure, and estimate the failure probability over an operating domain using importance sampling. We introduce three acquisition functions that aim to reduce uncertainty by covering the design space, optimize the analytically derived failure boundaries, and sample the predicted failure regions. Results show this Bayesian safety validation approach provides a more accurate estimate of failure probability with orders of magnitude fewer samples and performs well across various safety validation metrics. We demonstrate this approach on three test problems, a stochastic decision making system, and a neural network-based runway detection system. This work is open sourced (//github.com/sisl/BayesianSafetyValidation.jl) and currently being used to supplement the FAA certification process of the machine learning components for an autonomous cargo aircraft.
Modern microservice systems have gained widespread adoption due to their high scalability, flexibility, and extensibility. However, the characteristics of independent deployment, decentralization, and frequent dynamic interactions also introduce the risk of cascading failures, making it challenging to achieve accurate failure diagnosis and rapid system recovery. These issues severely impact operation efficiency and user experience. Recognizing the crucial role of failure diagnosis in enhancing the stability and reliability of microservice systems, researchers have conducted extensive studies and achieved a series of significant outcomes. This survey provides a comprehensive review and primary analysis of 94 papers from 2003 to the present, including an overview of the fundamental concepts, a research framework, and problem statements. These insights aim to help researchers understand the latest research progress in failure diagnosis. Publicly available datasets, toolkits, and evaluation metrics are also compiled to assist practitioners in selecting and validating various techniques, providing a foundation to advance the domain beyond current practices.
The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challenges, and future directions. We present empirical evidence from prior art to demonstrate its effectiveness and highlight the importance of ensuring its factuality, fidelity, and unbiasedness. We emphasize the need for responsible use of synthetic data to build more powerful, inclusive, and trustworthy language models.
As artificial intelligence (AI) models continue to scale up, they are becoming more capable and integrated into various forms of decision-making systems. For models involved in moral decision-making, also known as artificial moral agents (AMA), interpretability provides a way to trust and understand the agent's internal reasoning mechanisms for effective use and error correction. In this paper, we provide an overview of this rapidly-evolving sub-field of AI interpretability, introduce the concept of the Minimum Level of Interpretability (MLI) and recommend an MLI for various types of agents, to aid their safe deployment in real-world settings.
Autonomic computing investigates how systems can achieve (user) specified control outcomes on their own, without the intervention of a human operator. Autonomic computing fundamentals have been substantially influenced by those of control theory for closed and open-loop systems. In practice, complex systems may exhibit a number of concurrent and inter-dependent control loops. Despite research into autonomic models for managing computer resources, ranging from individual resources (e.g., web servers) to a resource ensemble (e.g., multiple resources within a data center), research into integrating Artificial Intelligence (AI) and Machine Learning (ML) to improve resource autonomy and performance at scale continues to be a fundamental challenge. The integration of AI/ML to achieve such autonomic and self-management of systems can be achieved at different levels of granularity, from full to human-in-the-loop automation. In this article, leading academics, researchers, practitioners, engineers, and scientists in the fields of cloud computing, AI/ML, and quantum computing join to discuss current research and potential future directions for these fields. Further, we discuss challenges and opportunities for leveraging AI and ML in next generation computing for emerging computing paradigms, including cloud, fog, edge, serverless and quantum computing environments.