Fully Observable Non-Deterministic (FOND) planning models uncertainty through actions with non-deterministic effects. Existing FOND planning algorithms are effective and employ a wide range of techniques. However, most of the existing algorithms are not robust for dealing with both non-determinism and task size. In this paper, we develop a novel iterative depth-first search algorithm that solves FOND planning tasks and produces strong cyclic policies. Our algorithm is explicitly designed for FOND planning, addressing more directly the non-deterministic aspect of FOND planning, and it also exploits the benefits of heuristic functions to make the algorithm more effective during the iterative searching process. We compare our proposed algorithm to well-known FOND planners, and show that it has robust performance over several distinct types of FOND domains considering different metrics.
The availability of large labeled datasets is the key component for the success of deep learning. However, annotating labels on large datasets is generally time-consuming and expensive. Active learning is a research area that addresses the issues of expensive labeling by selecting the most important samples for labeling. Diversity-based sampling algorithms are known as integral components of representation-based approaches for active learning. In this paper, we introduce a new diversity-based initial dataset selection algorithm to select the most informative set of samples for initial labeling in the active learning setting. Self-supervised representation learning is used to consider the diversity of samples in the initial dataset selection algorithm. Also, we propose a novel active learning query strategy, which uses diversity-based sampling on consistency-based embeddings. By considering the consistency information with the diversity in the consistency-based embedding scheme, the proposed method could select more informative samples for labeling in the semi-supervised learning setting. Comparative experiments show that the proposed method achieves compelling results on CIFAR-10 and Caltech-101 datasets compared with previous active learning approaches by utilizing the diversity of unlabeled data.
This paper describes resilient navigation and planning algorithm for high-speed autonomous race, Indy Autonomous Challenge (IAC). The IAC is a competition with full-scale autonomous race car that can drive up to 290 km/h(180mph). Due to its high-speed and heavy vibration of the race car, GPS/INS system is prone to be degraded. These degraded GPS measurements can cause a critical localization error leading to a serious crashing accidents. To this end, we propose a robust navigation system to implement multi-sensor fusion Kalman filter. In this study, we present how to identify the degradation of measurement based on probabilistic approaches. Based on this approach, we could compute optimal measurement values for Kalman filter correction step. At the same time, we present the other resilient navigation system so that race car can follow the race track in fatal localization failure situations. In addition, this paper also covers the optimal path planning algorithm for obstacle avoidance. To take account for original optimal racing line, obstacles, vehicle dynamics, we propose a road-graph-based path planning algorithm to guarantee our race car drives in-bounded conditions. In the experiments, we will evaluate our designed localization system can handle the degraded data, and sometimes prevent serious crashing accidents while high-speed driving. In addition, we will describe how we successfully completed the obstacle avoidance challenge.
Multi-view stereo is an important research task in computer vision while still keeping challenging. In recent years, deep learning-based methods have shown superior performance on this task. Cost volume pyramid network-based methods which progressively refine depth map in coarse-to-fine manner, have yielded promising results while consuming less memory. However, these methods fail to take fully consideration of the characteristics of the cost volumes in each stage, leading to adopt similar range search strategies for each cost volume stage. In this work, we present a novel cost volume pyramid based network with different searching strategies for multi-view stereo. By choosing different depth range sampling strategies and applying adaptive unimodal filtering, we are able to obtain more accurate depth estimation in low resolution stages and iteratively upsample depth map to arbitrary resolution. We conducted extensive experiments on both DTU and BlendedMVS datasets, and results show that our method outperforms most state-of-the-art methods.
Person search is a challenging task which aims to achieve joint pedestrian detection and person re-identification (ReID). Previous works have made significant advances under fully and weakly supervised settings. However, existing methods ignore the generalization ability of the person search models. In this paper, we take a further step and present Domain Adaptive Person Search (DAPS), which aims to generalize the model from a labeled source domain to the unlabeled target domain. Two major challenges arises under this new setting: one is how to simultaneously solve the domain misalignment issue for both detection and Re-ID tasks, and the other is how to train the ReID subtask without reliable detection results on the target domain. To address these challenges, we propose a strong baseline framework with two dedicated designs. 1) We design a domain alignment module including image-level and task-sensitive instance-level alignments, to minimize the domain discrepancy. 2) We take full advantage of the unlabeled data with a dynamic clustering strategy, and employ pseudo bounding boxes to support ReID and detection training on the target domain. With the above designs, our framework achieves 34.7% in mAP and 80.6% in top-1 on PRW dataset, surpassing the direct transferring baseline by a large margin. Surprisingly, the performance of our unsupervised DAPS model even surpasses some of the fully and weakly supervised methods. The code is available at //github.com/caposerenity/DAPS.
Face clustering is a promising way to scale up face recognition systems using large-scale unlabeled face images. It remains challenging to identify small or sparse face image clusters that we call hard clusters, which is caused by the heterogeneity, \ie, high variations in size and sparsity, of the clusters. Consequently, the conventional way of using a uniform threshold (to identify clusters) often leads to a terrible misclassification for the samples that should belong to hard clusters. We tackle this problem by leveraging the neighborhood information of samples and inferring the cluster memberships (of samples) in a probabilistic way. We introduce two novel modules, Neighborhood-Diffusion-based Density (NDDe) and Transition-Probability-based Distance (TPDi), based on which we can simply apply the standard Density Peak Clustering algorithm with a uniform threshold. Our experiments on multiple benchmarks show that each module contributes to the final performance of our method, and by incorporating them into other advanced face clustering methods, these two modules can boost the performance of these methods to a new state-of-the-art. Code is available at: //github.com/echoanran/On-Mitigating-Hard-Clusters.
Manufacturing companies face challenges when it comes to quickly adapting their production control to fluctuating demands or changing requirements. Control approaches that encapsulate production functions as services have shown to be promising in order to increase the flexibility of Cyber-Physical Production Systems. But an existing challenge of such approaches is finding a production plan based on provided functionalities for a demanded product, especially when there is no direct (i.e., syntactic) match between demanded and provided functions. While there is a variety of approaches to production planning, flexible production poses specific requirements that are not covered by existing research. In this contribution, we first capture these requirements for flexible production environments. Afterwards, an overview of current Artificial Intelligence approaches that can be utilized in order to overcome the aforementioned challenges is given. For this purpose, we focus on planning algorithms, but also consider models of production systems that can act as inputs to these algorithms. Approaches from both symbolic AI planning as well as approaches based on Machine Learning are discussed and eventually compared against the requirements. Based on this comparison, a research agenda is derived.
Photorealistic avatars of human faces have come a long way in recent years, yet research along this area is limited by a lack of publicly available, high-quality datasets covering both, dense multi-view camera captures, and rich facial expressions of the captured subjects. In this work, we present Multiface, a new multi-view, high-resolution human face dataset collected from 13 identities at Reality Labs Research for neural face rendering. We introduce Mugsy, a large scale multi-camera apparatus to capture high-resolution synchronized videos of a facial performance. The goal of Multiface is to close the gap in accessibility to high quality data in the academic community and to enable research in VR telepresence. Along with the release of the dataset, we conduct ablation studies on the influence of different model architectures toward the model's interpolation capacity of novel viewpoint and expressions. With a conditional VAE model serving as our baseline, we found that adding spatial bias, texture warp field, and residual connections improves performance on novel view synthesis. Our code and data is available at: //github.com/facebookresearch/multiface
Neural networks tend to forget previously learned knowledge when continuously learning on datasets with varying distributions, a phenomenon known as catastrophic forgetting. More significant distribution shifts among datasets lead to more forgetting. Recently, parameter-isolation-based approaches have shown great potential in overcoming forgetting with significant distribution shifts. However, they suffer from poor generalization as they fix the neural path for each dataset during training and require dataset labels during inference. In addition, they do not support backward knowledge transfer as they prioritize past data over future ones. In this paper, we propose a new adaptive learning method, named AdaptCL, that fully reuses and grows on learned parameters to overcome catastrophic forgetting and allows the positive backward transfer without requiring dataset labels. Our proposed technique adaptively grows on the same neural path by allowing optimal reuse of frozen parameters. Besides, it uses parameter-level data-driven pruning to assign equal priority to the data. We conduct extensive experiments on MNIST Variants, DomainNet, and Food Freshness Detection datasets under different intensities of distribution shifts without requiring dataset labels. Results demonstrate that our proposed method is superior to alternative baselines in minimizing forgetting and enabling positive backward knowledge transfer.
Weight sharing is a popular approach to reduce the cost of neural architecture search (NAS) by reusing the weights of shared operators from previously trained child models. However, the rank correlation between the estimated accuracy and ground truth accuracy of those child models is low due to the interference among different child models caused by weight sharing. In this paper, we investigate the interference issue by sampling different child models and calculating the gradient similarity of shared operators, and observe: 1) the interference on a shared operator between two child models is positively correlated with the number of different operators; 2) the interference is smaller when the inputs and outputs of the shared operator are more similar. Inspired by these two observations, we propose two approaches to mitigate the interference: 1) MAGIC-T: rather than randomly sampling child models for optimization, we propose a gradual modification scheme by modifying one operator between adjacent optimization steps to minimize the interference on the shared operators; 2) MAGIC-A: forcing the inputs and outputs of the operator across all child models to be similar to reduce the interference. Experiments on a BERT search space verify that mitigating interference via each of our proposed methods improves the rank correlation of super-pet and combining both methods can achieve better results. Our discovered architecture outperforms RoBERTa$_{\rm base}$ by 1.1 and 0.6 points and ELECTRA$_{\rm base}$ by 1.6 and 1.1 points on the dev and test set of GLUE benchmark. Extensive results on the BERT compression, reading comprehension and ImageNet task demonstrate the effectiveness and generality of our proposed methods.
Incorporating knowledge graph into recommender systems has attracted increasing attention in recent years. By exploring the interlinks within a knowledge graph, the connectivity between users and items can be discovered as paths, which provide rich and complementary information to user-item interactions. Such connectivity not only reveals the semantics of entities and relations, but also helps to comprehend a user's interest. However, existing efforts have not fully explored this connectivity to infer user preferences, especially in terms of modeling the sequential dependencies within and holistic semantics of a path. In this paper, we contribute a new model named Knowledge-aware Path Recurrent Network (KPRN) to exploit knowledge graph for recommendation. KPRN can generate path representations by composing the semantics of both entities and relations. By leveraging the sequential dependencies within a path, we allow effective reasoning on paths to infer the underlying rationale of a user-item interaction. Furthermore, we design a new weighted pooling operation to discriminate the strengths of different paths in connecting a user with an item, endowing our model with a certain level of explainability. We conduct extensive experiments on two datasets about movie and music, demonstrating significant improvements over state-of-the-art solutions Collaborative Knowledge Base Embedding and Neural Factorization Machine.