Click-Through Rate (CTR) prediction is the most critical task in product and content recommendation, and learning effective feature interaction is the key challenge to exploiting user preferences for products. Some recent research works focus on investigating more sophisticated feature interactions based on soft attention or gate mechanism, while some redundant or contradictory feature combinations are still introduced. According to Global Workspace Theory in conscious processing, human clicks on advertisements ``consciously'': only a specific subset of product features are considered, and the rest are not involved in conscious processing. Therefore, we propose a CTR model that \textbf{D}irectly \textbf{E}nhances the embeddings and \textbf{L}everages \textbf{T}runcated Conscious \textbf{A}ttention during feature interaction, termed DELTA, which contains two key components: (I) conscious truncation module (CTM), which utilizes curriculum learning to apply adaptive truncation on attention weights to select the most critical feature combinations; (II) direct embedding enhancement module (DEM), which directly and independently propagates gradient from the loss layer to the embedding layer to enhance the crucial embeddings via linear feature crossing without introducing any extra cost during inference. Extensive experiments on five challenging CTR datasets demonstrate that DELTA achieves cutting-edge performance among current state-of-the-art CTR methods.
The paradigm of pre-training followed by fine-tuning on downstream tasks has become the mainstream method in natural language processing tasks. Although pre-trained models have the advantage of generalization, their performance may still vary significantly across different domain tasks. This is because the data distribution in different domains varies. For example, the different parts of the sentence 'He married Smt. Dipali Ghosh in 1947 and led a very happy married life' may have different impact for downstream tasks. For similarity calculations, words such as 'led' and 'life' are more important. On the other hand, for sentiment analysis, the word 'happy' is crucial. This indicates that different downstream tasks have different levels of sensitivity to sentence components. Our starting point is to scale information of the model and data according to the specifics of downstream tasks, enhancing domain information of relevant parts for these tasks and reducing irrelevant elements for different domain tasks, called SIFTER. In the experimental part, we use the SIFTER to improve SimCSE by constructing positive sample pairs based on enhancing the sentence stem and reducing the unimportant components in the sentence, and maximize the similarity between three sentences. Similarly, SIFTER can improve the gate mechanism of the LSTM model by short-circuiting the input gate of important words so that the LSTM model remembers the important parts of the sentence. Our experiments demonstrate that SIFTER outperforms the SimCSE and LSTM baselines.
Intertemporal choices involve making decisions that require weighing the costs in the present against the benefits in the future. One specific type of intertemporal choice is the decision between purchasing an individual item or opting for a bundle that includes that item. Previous research assumes that individuals have accurate expectations of the factors involved in these choices. However, in reality, users' perceptions of these factors are often biased, leading to irrational and suboptimal decision-making. In this work, we specifically focus on two commonly observed biases: projection bias and the reference-point effect. To address these biases, we propose a novel bias-embedded preference model called Probe. The Probe incorporates a weight function to capture users' projection bias and a value function to account for the reference-point effect, and introduce prospect theory from behavioral economics to combine the weight and value functions. This allows us to determine the probability of users selecting the bundle or a single item. We provide a thorough theoretical analysis to demonstrate the impact of projection bias on the design of bundle sales strategies. Through experimental results, we show that the proposed Probe model outperforms existing methods and contributes to a better understanding of users' irrational behaviors in bundle purchases. This investigation can facilitate a deeper comprehension of users' decision-making mechanisms, enable the provision of personalized services, and assist users in making more rational and optimal decisions.
Recommender systems have become essential tools for enhancing user experiences across various domains. While extensive research has been conducted on recommender systems for movies, music, and e-commerce, the rapidly growing and economically significant Non-Fungible Token (NFT) market remains underexplored. The unique characteristics and increasing prominence of the NFT market highlight the importance of developing tailored recommender systems to cater to its specific needs and unlock its full potential. In this paper, we examine the distinctive characteristics of NFTs and propose the first recommender system specifically designed to address NFT market challenges. In specific, we develop a Multi-Attention Recommender System for NFTs (NFT-MARS) with three key characteristics: (1) graph attention to handle sparse user-item interactions, (2) multi-modal attention to incorporate feature preference of users, and (3) multi-task learning to consider the dual nature of NFTs as both artwork and financial assets. We demonstrate the effectiveness of NFT-MARS compared to various baseline models using the actual transaction data of NFTs collected directly from blockchain for four of the most popular NFT collections. The source code and data are available at //anonymous.4open.science/r/RecSys2023-93ED.
Negative sampling plays a crucial role in training successful sequential recommendation models. Instead of merely employing random negative sample selection, numerous strategies have been proposed to mine informative negative samples to enhance training and performance. However, few of these approaches utilize structural information. In this work, we observe that as training progresses, the distributions of node-pair similarities in different groups with varying degrees of neighborhood overlap change significantly, suggesting that item pairs in distinct groups may possess different negative relationships. Motivated by this observation, we propose a Graph-based Negative sampling approach based on Neighborhood Overlap (GNNO) to exploit structural information hidden in user behaviors for negative mining. GNNO first constructs a global weighted item transition graph using training sequences. Subsequently, it mines hard negative samples based on the degree of overlap with the target item on the graph. Furthermore, GNNO employs curriculum learning to control the hardness of negative samples, progressing from easy to difficult. Extensive experiments on three Amazon benchmarks demonstrate GNNO's effectiveness in consistently enhancing the performance of various state-of-the-art models and surpassing existing negative sampling strategies. The code will be released at \url{//github.com/floatSDSDS/GNNO}.
The real-world data tends to be heavily imbalanced and severely skew the data-driven deep neural networks, which makes Long-Tailed Recognition (LTR) a massive challenging task. Existing LTR methods seldom train Vision Transformers (ViTs) with Long-Tailed (LT) data, while the off-the-shelf pretrain weight of ViTs always leads to unfair comparisons. In this paper, we systematically investigate the ViTs' performance in LTR and propose LiVT to train ViTs from scratch only with LT data. With the observation that ViTs suffer more severe LTR problems, we conduct Masked Generative Pretraining (MGP) to learn generalized features. With ample and solid evidence, we show that MGP is more robust than supervised manners. In addition, Binary Cross Entropy (BCE) loss, which shows conspicuous performance with ViTs, encounters predicaments in LTR. We further propose the balanced BCE to ameliorate it with strong theoretical groundings. Specially, we derive the unbiased extension of Sigmoid and compensate extra logit margins to deploy it. Our Bal-BCE contributes to the quick convergence of ViTs in just a few epochs. Extensive experiments demonstrate that with MGP and Bal-BCE, LiVT successfully trains ViTs well without any additional data and outperforms comparable state-of-the-art methods significantly, e.g., our ViT-B achieves 81.0% Top-1 accuracy in iNaturalist 2018 without bells and whistles. Code is available at //github.com/XuZhengzhuo/LiVT.
Recently, neural networks have been widely used in e-commerce recommender systems, owing to the rapid development of deep learning. We formalize the recommender system as a sequential recommendation problem, intending to predict the next items that the user might be interacted with. Recent works usually give an overall embedding from a user's behavior sequence. However, a unified user embedding cannot reflect the user's multiple interests during a period. In this paper, we propose a novel controllable multi-interest framework for the sequential recommendation, called ComiRec. Our multi-interest module captures multiple interests from user behavior sequences, which can be exploited for retrieving candidate items from the large-scale item pool. These items are then fed into an aggregation module to obtain the overall recommendation. The aggregation module leverages a controllable factor to balance the recommendation accuracy and diversity. We conduct experiments for the sequential recommendation on two real-world datasets, Amazon and Taobao. Experimental results demonstrate that our framework achieves significant improvements over state-of-the-art models. Our framework has also been successfully deployed on the offline Alibaba distributed cloud platform.
Properly handling missing data is a fundamental challenge in recommendation. Most present works perform negative sampling from unobserved data to supply the training of recommender models with negative signals. Nevertheless, existing negative sampling strategies, either static or adaptive ones, are insufficient to yield high-quality negative samples --- both informative to model training and reflective of user real needs. In this work, we hypothesize that item knowledge graph (KG), which provides rich relations among items and KG entities, could be useful to infer informative and factual negative samples. Towards this end, we develop a new negative sampling model, Knowledge Graph Policy Network (KGPolicy), which works as a reinforcement learning agent to explore high-quality negatives. Specifically, by conducting our designed exploration operations, it navigates from the target positive interaction, adaptively receives knowledge-aware negative signals, and ultimately yields a potential negative item to train the recommender. We tested on a matrix factorization (MF) model equipped with KGPolicy, and it achieves significant improvements over both state-of-the-art sampling methods like DNS and IRGAN, and KG-enhanced recommender models like KGAT. Further analyses from different angles provide insights of knowledge-aware sampling. We release the codes and datasets at //github.com/xiangwang1223/kgpolicy.
In recent years, Graph Neural Networks (GNNs), which can naturally integrate node information and topological structure, have been demonstrated to be powerful in learning on graph data. These advantages of GNNs provide great potential to advance social recommendation since data in social recommender systems can be represented as user-user social graph and user-item graph; and learning latent factors of users and items is the key. However, building social recommender systems based on GNNs faces challenges. For example, the user-item graph encodes both interactions and their associated opinions; social relations have heterogeneous strengths; users involve in two graphs (e.g., the user-user social graph and the user-item graph). To address the three aforementioned challenges simultaneously, in this paper, we present a novel graph neural network framework (GraphRec) for social recommendations. In particular, we provide a principled approach to jointly capture interactions and opinions in the user-item graph and propose the framework GraphRec, which coherently models two graphs and heterogeneous strengths. Extensive experiments on two real-world datasets demonstrate the effectiveness of the proposed framework GraphRec. Our code is available at \url{//github.com/wenqifan03/GraphRec-WWW19}
To provide more accurate, diverse, and explainable recommendation, it is compulsory to go beyond modeling user-item interactions and take side information into account. Traditional methods like factorization machine (FM) cast it as a supervised learning problem, which assumes each interaction as an independent instance with side information encoded. Due to the overlook of the relations among instances or items (e.g., the director of a movie is also an actor of another movie), these methods are insufficient to distill the collaborative signal from the collective behaviors of users. In this work, we investigate the utility of knowledge graph (KG), which breaks down the independent interaction assumption by linking items with their attributes. We argue that in such a hybrid structure of KG and user-item graph, high-order relations --- which connect two items with one or multiple linked attributes --- are an essential factor for successful recommendation. We propose a new method named Knowledge Graph Attention Network (KGAT) which explicitly models the high-order connectivities in KG in an end-to-end fashion. It recursively propagates the embeddings from a node's neighbors (which can be users, items, or attributes) to refine the node's embedding, and employs an attention mechanism to discriminate the importance of the neighbors. Our KGAT is conceptually advantageous to existing KG-based recommendation methods, which either exploit high-order relations by extracting paths or implicitly modeling them with regularization. Empirical results on three public benchmarks show that KGAT significantly outperforms state-of-the-art methods like Neural FM and RippleNet. Further studies verify the efficacy of embedding propagation for high-order relation modeling and the interpretability benefits brought by the attention mechanism.
Collaborative filtering often suffers from sparsity and cold start problems in real recommendation scenarios, therefore, researchers and engineers usually use side information to address the issues and improve the performance of recommender systems. In this paper, we consider knowledge graphs as the source of side information. We propose MKR, a Multi-task feature learning approach for Knowledge graph enhanced Recommendation. MKR is a deep end-to-end framework that utilizes knowledge graph embedding task to assist recommendation task. The two tasks are associated by cross&compress units, which automatically share latent features and learn high-order interactions between items in recommender systems and entities in the knowledge graph. We prove that cross&compress units have sufficient capability of polynomial approximation, and show that MKR is a generalized framework over several representative methods of recommender systems and multi-task learning. Through extensive experiments on real-world datasets, we demonstrate that MKR achieves substantial gains in movie, book, music, and news recommendation, over state-of-the-art baselines. MKR is also shown to be able to maintain a decent performance even if user-item interactions are sparse.