Sequential recommendation involves automatically recommending the next item to users based on their historical item sequence. While most prior research employs RNN or transformer methods to glean information from the item sequence-generating probabilities for each user-item pair and recommending the top items, these approaches often overlook the challenge posed by spurious relationships. This paper specifically addresses these spurious relations. We introduce a novel sequential recommendation framework named Irl4Rec. This framework harnesses invariant learning and employs a new objective that factors in the relationship between spurious variables and adjustment variables during model training. This approach aids in identifying spurious relations. Comparative analyses reveal that our framework outperforms three typical methods, underscoring the effectiveness of our model. Moreover, an ablation study further demonstrates the critical role our model plays in detecting spurious relations.
Sequential location recommendation plays a huge role in modern life, which can enhance user experience, bring more profit to businesses and assist in government administration. Although methods for location recommendation have evolved significantly thanks to the development of recommendation systems, there is still limited utilization of geographic information, along with the ongoing challenge of addressing data sparsity. In response, we introduce a Proximity-aware based region representation for Sequential Recommendation (PASR for short), built upon the Self-Attention Network architecture. We tackle the sparsity issue through a novel loss function employing importance sampling, which emphasizes informative negative samples during optimization. Moreover, PASR enhances the integration of geographic information by employing a self-attention-based geography encoder to the hierarchical grid and proximity grid at each GPS point. To further leverage geographic information, we utilize the proximity-aware negative samplers to enhance the quality of negative samples. We conducted evaluations using three real-world Location-Based Social Networking (LBSN) datasets, demonstrating that PASR surpasses state-of-the-art sequential location recommendation methods
Deep generative models are key-enabling technology to computer vision, text generation and large language models. Denoising diffusion probabilistic models (DDPMs) have recently gained much attention due to their ability to generate diverse and high-quality samples in many computer vision tasks, as well as to incorporate flexible model architectures and relatively simple training scheme. Quantum generative models, empowered by entanglement and superposition, have brought new insight to learning classical and quantum data. Inspired by the classical counterpart, we propose the quantum denoising diffusion probabilistic models (QuDDPM) to enable efficiently trainable generative learning of quantum data. QuDDPM adopts sufficient layers of circuits to guarantee expressivity, while introduces multiple intermediate training tasks as interpolation between the target distribution and noise to avoid barren plateau and guarantee efficient training. We demonstrate QuDDPM's capability in learning correlated quantum noise model and learning topological structure of nontrivial distribution of quantum data.
While analogies are a common way to evaluate word embeddings in NLP, it is also of interest to investigate whether or not analogical reasoning is a task in itself that can be learned. In this paper, we test several ways to learn basic analogical reasoning, specifically focusing on analogies that are more typical of what is used to evaluate analogical reasoning in humans than those in commonly used NLP benchmarks. Our experiments find that models are able to learn analogical reasoning, even with a small amount of data. We additionally compare our models to a dataset with a human baseline, and find that after training, models approach human performance.
Large language models based on self-attention mechanisms have achieved astonishing performances not only in natural language itself, but also in a variety of tasks of different nature. However, regarding processing language, our human brain may not operate using the same principle. Then, a debate is established on the connection between brain computation and artificial self-supervision adopted in large language models. One of most influential hypothesis in brain computation is the predictive coding framework, which proposes to minimize the prediction error by local learning. However, the role of predictive coding and the associated credit assignment in language processing remains unknown. Here, we propose a mean-field learning model within the predictive coding framework, assuming that the synaptic weight of each connection follows a spike and slab distribution, and only the distribution, rather than specific weights, is trained. This meta predictive learning is successfully validated on classifying handwritten digits where pixels are input to the network in sequence, and moreover on the toy and real language corpus. Our model reveals that most of the connections become deterministic after learning, while the output connections have a higher level of variability. The performance of the resulting network ensemble changes continuously with data load, further improving with more training data, in analogy with the emergent behavior of large language models. Therefore, our model provides a starting point to investigate the connection among brain computation, next-token prediction and general intelligence.
We present a coordination method for multiple mobile manipulators to sort objects in clutter. We consider the object rearrangement problem in which the objects must be sorted into different groups in a particular order. In clutter, the order constraints could not be easily satisfied since some objects occlude other objects so the occluded ones are not directly accessible to the robots. Those objects occluding others need to be moved more than once to make the occluded objects accessible. Such rearrangement problems fall into the class of nonmonotone rearrangement problems which are computationally intractable. While the nonmonotone problems with order constraints are harder, involving with multiple robots requires another computation for task allocation. The proposed method first finds a sequence of objects to be sorted using a search such that the order constraint in each group is satisfied. The search can solve nonmonotone instances that require temporal relocation of some objects to access the next object to be sorted. Once a complete sorting sequence is found, the objects in the sequence are assigned to multiple mobile manipulators using a greedy allocation method. We develop four versions of the method with different search strategies. In the experiments, we show that our method can find a sorting sequence quickly (e.g., 4.6 sec with 20 objects sorted into five groups) even though the solved instances include hard nonmonotone ones. The extensive tests and the experiments in simulation show the ability of the method to solve the real-world sorting problem using multiple mobile manipulators.
Electrical circuits are present in a variety of technologies, making their design an important part of computer aided engineering. The growing number of tunable parameters that affect the final design leads to a need for new approaches of quantifying their impact. Machine learning may play a key role in this regard, however current approaches often make suboptimal use of existing knowledge about the system at hand. In terms of circuits, their description via modified nodal analysis is well-understood. This particular formulation leads to systems of differential-algebraic equations (DAEs) which bring with them a number of peculiarities, e.g. hidden constraints that the solution needs to fulfill. We aim to use the recently introduced dissection concept for DAEs that can decouple a given system into ordinary differential equations, only depending on differential variables, and purely algebraic equations that describe the relations between differential and algebraic variables. The idea then is to only learn the differential variables and reconstruct the algebraic ones using the relations from the decoupling. This approach guarantees that the algebraic constraints are fulfilled up to the accuracy of the nonlinear system solver, which represents the main benefit highlighted in this article.
Accessibility measures how well a location is connected to surrounding opportunities. We focus on accessibility provided by Public Transit (PT). There is an evident inequality in the distribution of accessibility between city centers or close to main transportation corridors and suburbs. In the latter, poor PT service leads to a chronic car-dependency. Demand-Responsive Transit (DRT) is better suited for low-density areas than conventional fixed-route PT. However, its potential to tackle accessibility inequality has not yet been exploited. On the contrary, planning DRT without care to inequality (as in the methods proposed so far) can further improve the accessibility gap in urban areas. To the best of our knowledge this paper is the first to propose a DRT planning strategy, which we call AccEq-DRT, aimed at reducing accessibility inequality, while ensuring overall efficiency. To this aim, we combine a graph representation of conventional PT and a Continuous Approximation (CA) model of DRT. The two are combined in the same multi-layer graph, on which we compute accessibility. We then devise a scoring function to estimate the need of each area for an improvement, appropriately weighting population density and accessibility. Finally, we provide a bilevel optimization method, where the upper level is a heuristic to allocate DRT buses, guided by the scoring function, and the lower level performs traffic assignment. Numerical results in a simplified model of Montreal show that inequality, measured with the Atkinson index, is reduced by up to 34\%. Keywords: DRT Public, Transportation, Accessibility, Continuous Approximation, Network Design
The emergence of complex structures in the systems governed by a simple set of rules is among the most fascinating aspects of Nature. The particularly powerful and versatile model suitable for investigating this phenomenon is provided by cellular automata, with the Game of Life being one of the most prominent examples. However, this simplified model can be too limiting in providing a tool for modelling real systems. To address this, we introduce and study an extended version of the Game of Life, with the dynamical process governing the rule selection at each step. We show that the introduced modification significantly alters the behaviour of the game. We also demonstrate that the choice of the synchronization policy can be used to control the trade-off between the stability and the growth in the system.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.
In recent years, object detection has experienced impressive progress. Despite these improvements, there is still a significant gap in the performance between the detection of small and large objects. We analyze the current state-of-the-art model, Mask-RCNN, on a challenging dataset, MS COCO. We show that the overlap between small ground-truth objects and the predicted anchors is much lower than the expected IoU threshold. We conjecture this is due to two factors; (1) only a few images are containing small objects, and (2) small objects do not appear enough even within each image containing them. We thus propose to oversample those images with small objects and augment each of those images by copy-pasting small objects many times. It allows us to trade off the quality of the detector on large objects with that on small objects. We evaluate different pasting augmentation strategies, and ultimately, we achieve 9.7\% relative improvement on the instance segmentation and 7.1\% on the object detection of small objects, compared to the current state of the art method on MS COCO.