When humans and autonomous systems operate together as what we refer to as a hybrid team, we of course wish to ensure the team operates successfully and effectively. We refer to team members as agents. In our proposed framework, we address the case of hybrid teams in which, at any time, only one team member (the control agent) is authorized to act as control for the team. To determine the best selection of a control agent, we propose the addition of an AI manager (via Reinforcement Learning) which learns as an outside observer of the team. The manager learns a model of behavior linking observations of agent performance and the environment/world the team is operating in, and from these observations makes the most desirable selection of a control agent. We restrict the manager task by introducing a set of constraints. The manager constraints indicate acceptable team operation, so a violation occurs if the team enters a condition which is unacceptable and requires manager intervention. To ensure minimal added complexity or potential inefficiency for the team, the manager should attempt to minimize the number of times the team reaches a constraint violation and requires subsequent manager intervention. Therefore our manager is optimizing its selection of authorized agents to boost overall team performance while minimizing the frequency of manager intervention. We demonstrate our manager performance in a simulated driving scenario representing the case of a hybrid team of agents composed of a human driver and autonomous driving system. We perform experiments for our driving scenario with interfering vehicles, indicating the need for collision avoidance and proper speed control. Our results indicate a positive impact of our manager, with some cases resulting in increased team performance up to ~187% that of the best solo agent performance.
In this paper, we consider the standard quantum information decoupling, in which Alice aims to decouple her system from the environment by local operations and discarding some of her systems. To achieve an $\varepsilon$-decoupling with trace distance as the error criterion, we establish a near-optimal one-shot characterization for the largest dimension of the remainder system in terms of the conditional $(1-\varepsilon)$-hypothesis-testing entropy. When the underlying system is independent and identically prepared, our result leads to the matched second-order rate as well as the matched moderate deviation rate. As an application, we find an achievability bound in entanglement distillation protocol, where the objective is for Alice and Bob to transform their quantum state to maximally entangled state with largest possible dimension using only local operations and one-way classical communications.
Lengthy evaluation times are common in many optimization problems such as direct policy search tasks, especially when they involve conducting evaluations in the physical world, e.g. in robotics applications. Often when evaluating solution over a fixed time period it becomes clear that the objective value will not increase with additional computation time (for example when a two wheeled robot continuously spins on the spot). In such cases, it makes sense to stop the evaluation early to save computation time. However, most approaches to stop the evaluation are problem specific and need to be specifically designed for the task at hand. Therefore, we propose an early stopping method for direct policy search. The proposed method only looks at the objective value at each time step and requires no problem specific knowledge. We test the introduced stopping criterion in five direct policy search environments drawn from games, robotics and classic control domains, and show that it can save up to 75% of the computation time. We also compare it with problem specific stopping criteria and show that it performs comparably, while being more generally applicable.
In this study, we introduce a toll lane framework that optimizes the mixed flow of autonomous and high-occupancy vehicles on freeways, where human-driven and autonomous vehicles of varying commuter occupancy share a segment. Autonomous vehicles, with their ability to maintain shorter headways, boost traffic throughput. Our framework designates a toll lane for autonomous vehicles with high occupancy to use free of charge, while others pay a toll. We explore the lane choice equilibria when all vehicles minimize travel costs, and characterize the equilibria by ranking vehicles by their mobility enhancement potential, a concept we term the mobility degree. Through numerical examples, we demonstrate the framework's utility in addressing design challenges such as setting optimal tolls, determining occupancy thresholds, and designing lane policies, showing how it facilitates the integration of high-occupancy and autonomous vehicles. We also propose an algorithm for assigning rational tolls to decrease total commuter delay and examine the effects of toll non-compliance. Our findings suggest that self-interest-driven behavior mitigates moderate non-compliance impacts, highlighting the framework's resilience. This work presents a pioneering comprehensive analysis of a toll lane framework that emphasizes the coexistence of autonomous and high-occupancy vehicles, offering insights for traffic management improvements and the integration of autonomous vehicles into existing transportation infrastructures.
In this work, we present a reward-driven automated curriculum reinforcement learning approach for interaction-aware self-driving at unsignalized intersections, taking into account the uncertainties associated with surrounding vehicles (SVs). These uncertainties encompass the uncertainty of SVs' driving intention and also the quantity of SVs. To deal with this problem, the curriculum set is specifically designed to accommodate a progressively increasing number of SVs. By implementing an automated curriculum selection mechanism, the importance weights are rationally allocated across various curricula, thereby facilitating improved sample efficiency and training outcomes. Furthermore, the reward function is meticulously designed to guide the agent towards effective policy exploration. Thus the proposed framework could proactively address the above uncertainties at unsignalized intersections by employing the automated curriculum learning technique that progressively increases task difficulty, and this ensures safe self-driving through effective interaction with SVs. Comparative experiments are conducted in $Highway\_Env$, and the results indicate that our approach achieves the highest task success rate, attains strong robustness to initialization parameters of the curriculum selection module, and exhibits superior adaptability to diverse situational configurations at unsignalized intersections. Furthermore, the effectiveness of the proposed method is validated using the high-fidelity CARLA simulator.
Computer vision techniques play a central role in the perception stack of autonomous vehicles. Such methods are employed to perceive the vehicle surroundings given sensor data. 3D LiDAR sensors are commonly used to collect sparse 3D point clouds from the scene. However, compared to human perception, such systems struggle to deduce the unseen parts of the scene given those sparse point clouds. In this matter, the scene completion task aims at predicting the gaps in the LiDAR measurements to achieve a more complete scene representation. Given the promising results of recent diffusion models as generative models for images, we propose extending them to achieve scene completion from a single 3D LiDAR scan. Previous works used diffusion models over range images extracted from LiDAR data, directly applying image-based diffusion methods. Distinctly, we propose to directly operate on the points, reformulating the noising and denoising diffusion process such that it can efficiently work at scene scale. Together with our approach, we propose a regularization loss to stabilize the noise predicted during the denoising process. Our experimental evaluation shows that our method can complete the scene given a single LiDAR scan as input, producing a scene with more details compared to state-of-the-art scene completion methods. We believe that our proposed diffusion process formulation can support further research in diffusion models applied to scene-scale point cloud data.
With the advancement of quantum technologies, there is a potential threat to traditional encryption systems based on integer factorization. Therefore, developing techniques for accurately measuring the performance of associated quantum algorithms is crucial, as it can provide insights into the practical feasibility from the current perspective. In this chapter, we aim to analyze the time required for integer factorization tasks using Shor's algorithm within a gate-based quantum circuit simulator of the matrix product state type. Additionally, we observe the impact of parameter pre-selection in Shor's algorithm. Specifically, this pre-selection is expected to increase the success rate of integer factorization by reducing the number of iterations and facilitating performance measurement under fixed conditions, thus enabling scalable performance evaluation even on real quantum hardware.
We consider a two-player dynamic information design problem between a principal and a receiver -- a game is played between the two agents on top of a Markovian system controlled by the receiver's actions, where the principal obtains and strategically shares some information about the underlying system with the receiver in order to influence their actions. In our setting, both players have long-term objectives, and the principal sequentially commits to their strategies instead of committing at the beginning. Further, the principal cannot directly observe the system state, but at every turn they can choose randomized experiments to observe the system partially. The principal can share details about the experiments to the receiver. For our analysis we impose the truthful disclosure rule: the principal is required to truthfully announce the details and the result of each experiment to the receiver immediately after the experiment result is revealed. Based on the received information, the receiver takes an action when its their turn, with the action influencing the state of the underlying system. We show that there exist Perfect Bayesian equilibria in this game where both agents play Canonical Belief Based (CBB) strategies using a compressed version of their information, rather than full information, to choose experiments (for the principal) or actions (for the receiver). We also provide a backward inductive procedure to solve for an equilibrium in CBB strategies.
Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefer. We show that the emergent production, consumption, and pricing behaviors respond to environmental conditions in the directions predicted by supply and demand shifts in Microeconomics. We also demonstrate settings where the agents' emergent prices for goods vary over space, reflecting the local abundance of goods. After the price disparities emerge, some agents then discover a niche of transporting goods between regions with different prevailing prices -- a profitable strategy because they can buy goods where they are cheap and sell them where they are expensive. Finally, in a series of ablation experiments, we investigate how choices in the environmental rewards, bartering actions, agent architecture, and ability to consume tradable goods can either aid or inhibit the emergence of this economic behavior. This work is part of the environment development branch of a research program that aims to build human-like artificial general intelligence through multi-agent interactions in simulated societies. By exploring which environment features are needed for the basic phenomena of elementary microeconomics to emerge automatically from learning, we arrive at an environment that differs from those studied in prior multi-agent reinforcement learning work along several dimensions. For example, the model incorporates heterogeneous tastes and physical abilities, and agents negotiate with one another as a grounded form of communication.
With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential to investigate ways to adapt these models to downstream datasets. A recently proposed method named Context Optimization (CoOp) introduces the concept of prompt learning -- a recent trend in NLP -- to the vision domain for adapting pre-trained vision-language models. Specifically, CoOp turns context words in a prompt into a set of learnable vectors and, with only a few labeled images for learning, can achieve huge improvements over intensively-tuned manual prompts. In our study we identify a critical problem of CoOp: the learned context is not generalizable to wider unseen classes within the same dataset, suggesting that CoOp overfits base classes observed during training. To address the problem, we propose Conditional Context Optimization (CoCoOp), which extends CoOp by further learning a lightweight neural network to generate for each image an input-conditional token (vector). Compared to CoOp's static prompts, our dynamic prompts adapt to each instance and are thus less sensitive to class shift. Extensive experiments show that CoCoOp generalizes much better than CoOp to unseen classes, even showing promising transferability beyond a single dataset; and yields stronger domain generalization performance as well. Code is available at //github.com/KaiyangZhou/CoOp.
In the era of deep learning, modeling for most NLP tasks has converged to several mainstream paradigms. For example, we usually adopt the sequence labeling paradigm to solve a bundle of tasks such as POS-tagging, NER, Chunking, and adopt the classification paradigm to solve tasks like sentiment analysis. With the rapid progress of pre-trained language models, recent years have observed a rising trend of Paradigm Shift, which is solving one NLP task by reformulating it as another one. Paradigm shift has achieved great success on many tasks, becoming a promising way to improve model performance. Moreover, some of these paradigms have shown great potential to unify a large number of NLP tasks, making it possible to build a single model to handle diverse tasks. In this paper, we review such phenomenon of paradigm shifts in recent years, highlighting several paradigms that have the potential to solve different NLP tasks.