Inferring unknown constraints is a challenging and crucial problem in many robotics applications. When only expert demonstrations are available, it becomes essential to infer the unknown domain constraints to deploy additional agents effectively. In this work, we propose an approach to infer affine constraints in control tasks after observing expert demonstrations. We formulate the constraint inference problem as an inverse optimization problem, and we propose an alternating optimization scheme that infers the unknown constraints by minimizing a KKT residual objective. We demonstrate the effectiveness of our method in a number of simulations, and show that our method can infer less conservative constraints than a recent baseline method while maintaining comparable safety guarantees.
Social metaverse is a shared digital space combining a series of interconnected virtual worlds for users to play, shop, work, and socialize. In parallel with the advances of artificial intelligence (AI) and growing awareness of data privacy concerns, federated learning (FL) is promoted as a paradigm shift towards privacy-preserving AI-empowered social metaverse. However, challenges including privacy-utility tradeoff, learning reliability, and AI model thefts hinder the deployment of FL in real metaverse applications. In this paper, we exploit the pervasive social ties among users/avatars to advance a social-aware hierarchical FL framework, i.e., SocialFL for a better privacy-utility tradeoff in the social metaverse. Then, an aggregator-free robust FL mechanism based on blockchain is devised with a new block structure and an improved consensus protocol featured with on/off-chain collaboration. Furthermore, based on smart contracts and digital watermarks, an automatic federated AI (FedAI) model ownership provenance mechanism is designed to prevent AI model thefts and collusive avatars in social metaverse. Experimental findings validate the feasibility and effectiveness of proposed framework. Finally, we envision promising future research directions in this emerging area.
The prominence of embodied Artificial Intelligence (AI), which empowers robots to navigate, perceive, and engage within virtual environments, has attracted significant attention, owing to the remarkable advancements in computer vision and large language models. Privacy emerges as a pivotal concern within the realm of embodied AI, as the robot access substantial personal information. However, the issue of privacy leakage in embodied AI tasks, particularly in relation to decision-making algorithms, has not received adequate consideration in research. This paper aims to address this gap by proposing an attack on the Deep Q-Learning algorithm, utilizing gradient inversion to reconstruct states, actions, and Q-values. The choice of using gradients for the attack is motivated by the fact that commonly employed federated learning techniques solely utilize gradients computed based on private user data to optimize models, without storing or transmitting the data to public servers. Nevertheless, these gradients contain sufficient information to potentially expose private data. To validate our approach, we conduct experiments on the AI2THOR simulator and evaluate our algorithm on active perception, a prevalent task in embodied AI. The experimental results convincingly demonstrate the effectiveness of our method in successfully recovering all information from the data across all 120 room layouts.
We consider robot learning in the context of shared autonomy, where control of the system can switch between a human teleoperator and autonomous control. In this setting we address reinforcement learning, and learning from demonstration, where there is a cost associated with human time. This cost represents the human time required to teleoperate the robot, or recover the robot from failures. For each episode, the agent must choose between requesting human teleoperation, or using one of its autonomous controllers. In our approach, we learn to predict the success probability for each controller, given the initial state of an episode. This is used in a contextual multi-armed bandit algorithm to choose the controller for the episode. A controller is learnt online from demonstrations and reinforcement learning so that autonomous performance improves, and the system becomes less reliant on the teleoperator with more experience. We show that our approach to controller selection reduces the human cost to perform two simulated tasks and a single real-world task.
Behavioural cloning uses a dataset of demonstrations to learn a behavioural policy. To overcome various learning and policy adaptation problems, we propose to use latent space to index a demonstration dataset, instantly access similar relevant experiences, and copy behavior from these situations. Actions from a selected similar situation can be performed by the agent until representations of the agent's current situation and the selected experience diverge in the latent space. Thus, we formulate our control problem as a search problem over a dataset of experts' demonstrations. We test our approach on BASALT MineRL-dataset in the latent representation of a Video PreTraining model. We compare our model to state-of-the-art Minecraft agents. Our approach can effectively recover meaningful demonstrations and show human-like behavior of an agent in the Minecraft environment in a wide variety of scenarios. Experimental results reveal that performance of our search-based approach is comparable to trained models, while allowing zero-shot task adaptation by changing the demonstration examples.
Machine Learning as a Service (MLaaS) is an increasingly popular design where a company with abundant computing resources trains a deep neural network and offers query access for tasks like image classification. The challenge with this design is that MLaaS requires the client to reveal their potentially sensitive queries to the company hosting the model. Multi-party computation (MPC) protects the client's data by allowing encrypted inferences. However, current approaches suffer prohibitively large inference times. The inference time bottleneck in MPC is the evaluation of non-linear layers such as ReLU activation functions. Motivated by the success of previous work co-designing machine learning and MPC aspects, we develop an activation function co-design. We replace all ReLUs with a polynomial approximation and evaluate them with single-round MPC protocols, which give state-of-the-art inference times in wide-area networks. Furthermore, to address the accuracy issues previously encountered with polynomial activations, we propose a novel training algorithm that gives accuracy competitive with plaintext models. Our evaluation shows between $4$ and $90\times$ speedups in inference time on large models with up to $23$ million parameters while maintaining competitive inference accuracy.
The function or performance of a network is strongly dependent on its robustness, quantifying the ability of the network to continue functioning under perturbations. While a wide variety of robustness metrics have been proposed, they have their respective limitations. In this paper, we propose to use the forest index as a measure of network robustness, which overcomes the deficiencies of existing metrics. Using such a measure as an optimization criterion, we propose and study the problem of breaking down a network by attacking some key edges. We show that the objective function of the problem is monotonic but not submodular, which impose more challenging on the problem. We thus resort to greedy algorithms extended for non-submodular functions by iteratively deleting the most promising edges. We first propose a simple greedy algorithm with a proved bound for the approximation ratio and cubic-time complexity. To confront the computation challenge for large networks, we further propose an improved nearly-linear time greedy algorithm, which significantly speeds up the process for edge selection but sacrifices little accuracy. Extensive experimental results for a large set of real-world networks verify the effectiveness and efficiency of our algorithms, demonstrating that our algorithms outperform several baseline schemes.
Robots operating alongside humans often encounter unfamiliar environments that make autonomous task completion challenging. Though improving models and increasing dataset size can enhance a robot's performance in unseen environments, dataset generation and model refinement may be impractical in every unfamiliar environment. Approaches that utilize human demonstration through manual operation can aid in generalizing to these unfamiliar environments, but often require significant human effort and expertise to achieve satisfactory task performance. To address these challenges, we propose leveraging part-time human interaction for redirection of robots during failed task execution. We train a lightweight help policy that allows robots to learn when to proceed autonomously or request human assistance at times of uncertainty. By incorporating part-time human intervention, robots recover quickly from their mistakes. Our best performing policy yields a 20 percent increase in path-length weighted success with only a 21 percent human interaction ratio. This approach provides a practical means for robots to interact and learn from humans in real-world settings, facilitating effective task completion without the need for significant human intervention.
The past few years have seen rapid progress in combining reinforcement learning (RL) with deep learning. Various breakthroughs ranging from games to robotics have spurred the interest in designing sophisticated RL algorithms and systems. However, the prevailing workflow in RL is to learn tabula rasa, which may incur computational inefficiency. This precludes continuous deployment of RL algorithms and potentially excludes researchers without large-scale computing resources. In many other areas of machine learning, the pretraining paradigm has shown to be effective in acquiring transferable knowledge, which can be utilized for a variety of downstream tasks. Recently, we saw a surge of interest in Pretraining for Deep RL with promising results. However, much of the research has been based on different experimental settings. Due to the nature of RL, pretraining in this field is faced with unique challenges and hence requires new design principles. In this survey, we seek to systematically review existing works in pretraining for deep reinforcement learning, provide a taxonomy of these methods, discuss each sub-field, and bring attention to open problems and future directions.
Multi-view networks are ubiquitous in real-world applications. In order to extract knowledge or business value, it is of interest to transform such networks into representations that are easily machine-actionable. Meanwhile, network embedding has emerged as an effective approach to generate distributed network representations. Therefore, we are motivated to study the problem of multi-view network embedding, with a focus on the characteristics that are specific and important in embedding this type of networks. In our practice of embedding real-world multi-view networks, we identify two such characteristics, which we refer to as preservation and collaboration. We then explore the feasibility of achieving better embedding quality by simultaneously modeling preservation and collaboration, and propose the mvn2vec algorithms. With experiments on a series of synthetic datasets, an internal Snapchat dataset, and two public datasets, we further confirm the presence and importance of preservation and collaboration. These experiments also demonstrate that better embedding can be obtained by simultaneously modeling the two characteristics, while not over-complicating the model or requiring additional supervision.
While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on the ImageNet classification task has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new Full Reference Image Quality Assessment (FR-IQA) dataset of perceptual human judgments, orders of magnitude larger than previous datasets. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by huge margins. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.