One of the difficulties of conversion rate (CVR) prediction is that the conversions can delay and take place long after the clicks. The delayed feedback poses a challenge: fresh data are beneficial to continuous training but may not have complete label information at the time they are ingested into the training pipeline. To balance model freshness and label certainty, previous methods set a short waiting window or even do not wait for the conversion signal. If conversion happens outside the waiting window, this sample will be duplicated and ingested into the training pipeline with a positive label. However, these methods have some issues. First, they assume the observed feature distribution remains the same as the actual distribution. But this assumption does not hold due to the ingestion of duplicated samples. Second, the certainty of the conversion action only comes from the positives. But the positives are scarce as conversions are sparse in commercial systems. These issues induce bias during the modeling of delayed feedback. In this paper, we propose DElayed FEedback modeling with Real negatives (DEFER) method to address these issues. The proposed method ingests real negative samples into the training pipeline. The ingestion of real negatives ensures the observed feature distribution is equivalent to the actual distribution, thus reducing the bias. The ingestion of real negatives also brings more certainty information of the conversion. To correct the distribution shift, DEFER employs importance sampling to weigh the loss function. Experimental results on industrial datasets validate the superiority of DEFER. DEFER have been deployed in the display advertising system of Alibaba, obtaining over 6.0% improvement on CVR in several scenarios. The code and data in this paper are now open-sourced {//github.com/gusuperstar/defer.git}.
Path integrals with complex actions are encountered for many physical systems ranging from spin- or mass-imbalanced atomic gases and graphene to quantum chromo-dynamics at finite density to the non-equilibrium evolution of quantum systems. Many computational approaches have been developed for tackling the sign problem emerging for complex actions. Among these, complex Langevin dynamics has the appeal of general applicability. One of its key challenges is the potential convergence of the dynamics to unphysical fixed points. The statistical sampling process at such a fixed point is not based on the physical action and hence leads to wrong predictions. Moreover, its unphysical nature is hard to detect due to the implicit nature of the process. In the present work we set up a general approach based on a Markov chain Monte Carlo scheme in an extended state space. In this approach we derive an explicit real sampling process for generalized complex Langevin dynamics. Subject to a set of constraints, this sampling process is the physical one. These constraints originate from the detailed-balance equations satisfied by the Monte Carlo scheme. This allows us to re-derive complex Langevin dynamics from a new perspective and establishes a framework for the explicit construction of new sampling schemes for complex actions.
Being able to explore unknown environments is a requirement for fully autonomous robots. Many learning-based methods have been proposed to learn an exploration strategy. In the frontier-based exploration, learning algorithms tend to learn the optimal or near-optimal frontier to explore. Most of these methods represent the environments as fixed size images and take these as inputs to neural networks. However, the size of environments is usually unknown, which makes these methods fail to generalize to real world scenarios. To address this issue, we present a novel state representation method based on 4D point-clouds-like information, including the locations, frontier, and distance information. We also design a neural network that can process these 4D point-clouds-like information and generate the estimated value for each frontier. Then this neural network is trained using the typical reinforcement learning framework. We test the performance of our proposed method by comparing it with other five methods and test its scalability on a map that is much larger than maps in the training set. The experiment results demonstrate that our proposed method needs shorter average traveling distances to explore whole environments and can be adopted in maps with arbitrarily sizes.
We analyze (stochastic) gradient descent (SGD) with delayed updates on smooth quasi-convex and non-convex functions and derive concise, non-asymptotic, convergence rates. We show that the rate of convergence in all cases consists of two terms: (i) a stochastic term which is not affected by the delay, and (ii) a higher order deterministic term which is only linearly slowed down by the delay. Thus, in the presence of noise, the effects of the delay become negligible after a few iterations and the algorithm converges at the same optimal rate as standard SGD. This result extends a line of research that showed similar results in the asymptotic regime or for strongly-convex quadratic functions only. We further show similar results for SGD with more intricate form of delayed gradients---compressed gradients under error compensation and for local~SGD where multiple workers perform local steps before communicating with each other. In all of these settings, we improve upon the best known rates. These results show that SGD is robust to compressed and/or delayed stochastic gradient updates. This is in particular important for distributed parallel implementations, where asynchronous and communication efficient methods are the key to achieve linear speedups for optimization with multiple devices.
Online learning with expert advice is widely used in various machine learning tasks. It considers the problem where a learner chooses one from a set of experts to take advice and make a decision. In many learning problems, experts may be related, henceforth the learner can observe the losses associated with a subset of experts that are related to the chosen one. In this context, the relationship among experts can be captured by a feedback graph, which can be used to assist the learner's decision making. However, in practice, the nominal feedback graph often entails uncertainties, which renders it impossible to reveal the actual relationship among experts. To cope with this challenge, the present work studies various cases of potential uncertainties, and develops novel online learning algorithms to deal with uncertainties while making use of the uncertain feedback graph. The proposed algorithms are proved to enjoy sublinear regret under mild conditions. Experiments on real datasets are presented to demonstrate the effectiveness of the novel algorithms.
Recently, contrastive learning (CL) has emerged as a successful method for unsupervised graph representation learning. Most graph CL methods first perform stochastic augmentation on the input graph to obtain two graph views and maximize the agreement of representations in the two views. Despite the prosperous development of graph CL methods, the design of graph augmentation schemes -- a crucial component in CL -- remains rarely explored. We argue that the data augmentation schemes should preserve intrinsic structures and attributes of graphs, which will force the model to learn representations that are insensitive to perturbation on unimportant nodes and edges. However, most existing methods adopt uniform data augmentation schemes, like uniformly dropping edges and uniformly shuffling features, leading to suboptimal performance. In this paper, we propose a novel graph contrastive representation learning method with adaptive augmentation that incorporates various priors for topological and semantic aspects of the graph. Specifically, on the topology level, we design augmentation schemes based on node centrality measures to highlight important connective structures. On the node attribute level, we corrupt node features by adding more noise to unimportant node features, to enforce the model to recognize underlying semantic information. We perform extensive experiments of node classification on a variety of real-world datasets. Experimental results demonstrate that our proposed method consistently outperforms existing state-of-the-art baselines and even surpasses some supervised counterparts, which validates the effectiveness of the proposed contrastive framework with adaptive augmentation.
We consider the question: how can you sample good negative examples for contrastive learning? We argue that, as with metric learning, learning contrastive representations benefits from hard negative samples (i.e., points that are difficult to distinguish from an anchor point). The key challenge toward using hard negatives is that contrastive methods must remain unsupervised, making it infeasible to adopt existing negative sampling strategies that use label information. In response, we develop a new class of unsupervised methods for selecting hard negative samples where the user can control the amount of hardness. A limiting case of this sampling results in a representation that tightly clusters each class, and pushes different classes as far apart as possible. The proposed method improves downstream performance across multiple modalities, requires only few additional lines of code to implement, and introduces no computational overhead.
Knowledge Graph (KG) embedding is a fundamental problem in data mining research with many real-world applications. It aims to encode the entities and relations in the graph into low dimensional vector space, which can be used for subsequent algorithms. Negative sampling, which samples negative triplets from non-observed ones in the training data, is an important step in KG embedding. Recently, generative adversarial network (GAN), has been introduced in negative sampling. By sampling negative triplets with large scores, these methods avoid the problem of vanishing gradient and thus obtain better performance. However, using GAN makes the original model more complex and hard to train, where reinforcement learning must be used. In this paper, motivated by the observation that negative triplets with large scores are important but rare, we propose to directly keep track of them with the cache. However, how to sample from and update the cache are two important questions. We carefully design the solutions, which are not only efficient but also achieve a good balance between exploration and exploitation. In this way, our method acts as a "distilled" version of previous GA-based methods, which does not waste training time on additional parameters to fit the full distribution of negative triplets. The extensive experiments show that our method can gain significant improvement in various KG embedding models, and outperform the state-of-the-art negative sampling methods based on GAN.
We present the first real-world application of methods for improving neural machine translation (NMT) with human reinforcement, based on explicit and implicit user feedback collected on the eBay e-commerce platform. Previous work has been confined to simulation experiments, whereas in this paper we work with real logged feedback for offline bandit learning of NMT parameters. We conduct a thorough analysis of the available explicit user judgments---five-star ratings of translation quality---and show that they are not reliable enough to yield significant improvements in bandit learning. In contrast, we successfully utilize implicit task-based feedback collected in a cross-lingual search task to improve task-specific and machine translation quality metrics.
Recommender systems (RSs) provide an effective way of alleviating the information overload problem by selecting personalized items for different users. Latent factors based collaborative filtering (CF) has become the popular approaches for RSs due to its accuracy and scalability. Recently, online social networks and user-generated content provide diverse sources for recommendation beyond ratings. Although {\em social matrix factorization} (Social MF) and {\em topic matrix factorization} (Topic MF) successfully exploit social relations and item reviews, respectively, both of them ignore some useful information. In this paper, we investigate the effective data fusion by combining the aforementioned approaches. First, we propose a novel model {\em \mbox{MR3}} to jointly model three sources of information (i.e., ratings, item reviews, and social relations) effectively for rating prediction by aligning the latent factors and hidden topics. Second, we incorporate the implicit feedback from ratings into the proposed model to enhance its capability and to demonstrate its flexibility. We achieve more accurate rating prediction on real-life datasets over various state-of-the-art methods. Furthermore, we measure the contribution from each of the three data sources and the impact of implicit feedback from ratings, followed by the sensitivity analysis of hyperparameters. Empirical studies demonstrate the effectiveness and efficacy of our proposed model and its extension.
Being intensively studied, visual object tracking has witnessed great advances in either speed (e.g., with correlation filters) or accuracy (e.g., with deep features). Real-time and high accuracy tracking algorithms, however, remain scarce. In this paper we study the problem from a new perspective and present a novel parallel tracking and verifying (PTAV) framework, by taking advantage of the ubiquity of multi-thread techniques and borrowing ideas from the success of parallel tracking and mapping in visual SLAM. The proposed PTAV framework is typically composed of two components, a (base) tracker T and a verifier V, working in parallel on two separate threads. The tracker T aims to provide a super real-time tracking inference and is expected to perform well most of the time; by contrast, the verifier V validates the tracking results and corrects T when needed. The key innovation is that, V does not work on every frame but only upon the requests from T; on the other end, T may adjust the tracking according to the feedback from V. With such collaboration, PTAV enjoys both the high efficiency provided by T and the strong discriminative power by V. Meanwhile, to adapt V to object appearance changes over time, we maintain a dynamic target template pool for adaptive verification, resulting in further performance improvements. In our extensive experiments on popular benchmarks including OTB2015, TC128, UAV20L and VOT2016, PTAV achieves the best tracking accuracy among all real-time trackers, and in fact even outperforms many deep learning based algorithms. Moreover, as a general framework, PTAV is very flexible with great potentials for future improvement and generalization.