Feature interaction selection is a fundamental problem in commercial recommender systems. Most approaches equally enumerate all features and interactions by the same pre-defined operation under expert guidance. Their recommendation is unsatisfactory sometimes due to the following issues: (1)~They cannot ensure the learning abilities of models because their architectures are poorly adaptable to tasks and data; (2)~Useless features and interactions can bring unnecessary noise and complicate the training process. In this paper, we aim to adaptively evolve the model to select appropriate operations, features, and interactions under task guidance. Inspired by the evolution and functioning of natural organisms, we propose a novel \textsl{Cognitive EvoLutionary Learning (CELL)} framework, where cognitive ability refers to a property of organisms that allows them to react and survive in diverse environments. It consists of three stages, i.e., DNA search, genome search, and model functioning. Specifically, if we regard the relationship between models and tasks as the relationship between organisms and natural environments, interactions of feature pairs can be analogous to double-stranded DNA, of which relevant features and interactions can be analogous to genomes. Along this line, we diagnose the fitness of the model on operations, features, and interactions to simulate the survival rates of organisms for natural selection. We show that CELL can adaptively evolve into different models for different tasks and data, which enables practitioners to access off-the-shelf models. Extensive experiments on four real-world datasets demonstrate that CELL significantly outperforms state-of-the-art baselines. Also, we conduct synthetic experiments to ascertain that CELL can consistently discover the pre-defined interaction patterns for feature pairs.
Ordinary differential equations (ODEs) are widely used to describe dynamical systems in science, but identifying parameters that explain experimental measurements is challenging. In particular, although ODEs are differentiable and would allow for gradient-based parameter optimization, the nonlinear dynamics of ODEs often lead to many local minima and extreme sensitivity to initial conditions. We therefore propose diffusion tempering, a novel regularization technique for probabilistic numerical methods which improves convergence of gradient-based parameter optimization in ODEs. By iteratively reducing a noise parameter of the probabilistic integrator, the proposed method converges more reliably to the true parameters. We demonstrate that our method is effective for dynamical systems of different complexity and show that it obtains reliable parameter estimates for a Hodgkin-Huxley model with a practically relevant number of parameters.
Identifying causal effects is a key problem of interest across many disciplines. The two long-standing approaches to estimate causal effects are observational and experimental (randomized) studies. Observational studies can suffer from unmeasured confounding, which may render the causal effects unidentifiable. On the other hand, direct experiments on the target variable may be too costly or even infeasible to conduct. A middle ground between these two approaches is to estimate the causal effect of interest through proxy experiments, which are conducted on variables with a lower cost to intervene on compared to the main target. Akbari et al. [2022] studied this setting and demonstrated that the problem of designing the optimal (minimum-cost) experiment for causal effect identification is NP-complete and provided a naive algorithm that may require solving exponentially many NP-hard problems as a sub-routine in the worst case. In this work, we provide a few reformulations of the problem that allow for designing significantly more efficient algorithms to solve it as witnessed by our extensive simulations. Additionally, we study the closely-related problem of designing experiments that enable us to identify a given effect through valid adjustments sets.
Multimodal analysis has recently drawn much interest in affective computing, since it can improve the overall accuracy of emotion recognition over isolated uni-modal approaches. The most effective techniques for multimodal emotion recognition efficiently leverage diverse and complimentary sources of information, such as facial, vocal, and physiological modalities, to provide comprehensive feature representations. In this paper, we focus on dimensional emotion recognition based on the fusion of facial and vocal modalities extracted from videos, where complex spatiotemporal relationships may be captured. Most of the existing fusion techniques rely on recurrent networks or conventional attention mechanisms that do not effectively leverage the complimentary nature of audio-visual (A-V) modalities. We introduce a cross-attentional fusion approach to extract the salient features across A-V modalities, allowing for accurate prediction of continuous values of valence and arousal. Our new cross-attentional A-V fusion model efficiently leverages the inter-modal relationships. In particular, it computes cross-attention weights to focus on the more contributive features across individual modalities, and thereby combine contributive feature representations, which are then fed to fully connected layers for the prediction of valence and arousal. The effectiveness of the proposed approach is validated experimentally on videos from the RECOLA and Fatigue (private) data-sets. Results indicate that our cross-attentional A-V fusion model is a cost-effective approach that outperforms state-of-the-art fusion approaches. Code is available: \url{//github.com/praveena2j/Cross-Attentional-AV-Fusion}
Modern cyber-physical systems are becoming increasingly complex to model, thus motivating data-driven techniques such as reinforcement learning (RL) to find appropriate control agents. However, most systems are subject to hard constraints such as safety or operational bounds. Typically, to learn to satisfy these constraints, the agent must violate them systematically, which is computationally prohibitive in most systems. Recent efforts aim to utilize feasibility models that assess whether a proposed action is feasible to avoid applying the agent's infeasible action proposals to the system. However, these efforts focus on guaranteeing constraint satisfaction rather than the agent's learning efficiency. To improve the learning process, we introduce action mapping, a novel approach that divides the learning process into two steps: first learn feasibility and subsequently, the objective by mapping actions into the sets of feasible actions. This paper focuses on the feasibility part by learning to generate all feasible actions through self-supervised querying of the feasibility model. We train the agent by formulating the problem as a distribution matching problem and deriving gradient estimators for different divergences. Through an illustrative example, a robotic path planning scenario, and a robotic grasping simulation, we demonstrate the agent's proficiency in generating actions across disconnected feasible action sets. By addressing the feasibility step, this paper makes it possible to focus future work on the objective part of action mapping, paving the way for an RL framework that is both safe and efficient.
Cohesive subgraph mining is a fundamental problem in bipartite graph analysis. In reality, relationships between two types of entities often occur at some specific timestamps, which can be modeled as a temporal bipartite graph. However, the temporal information is widely neglected by previous studies. Moreover, directly extending the existing models may fail to find some critical groups in temporal bipartite graphs, which appear in a unilateral (i.e., one-layer) form. To fill the gap, in this paper, we propose a novel model, called maximal \lambda-frequency group (MFG). Given a temporal bipartite graph G=(U,V,E), a vertex set V_S \subseteq V is an MFG if i) there are no less than \lambda timestamps, at each of which V_S can form a (t_U,t_V)-biclique with some vertices in U at the corresponding snapshot, and ii) it is maximal. To solve the problem, a filter-and-verification (FilterV) method is proposed based on the Bron-Kerbosch framework, incorporating novel filtering techniques to reduce the search space and array-based strategy to accelerate the frequency and maximality verification. Nevertheless, the cost of frequency verification in each valid candidate set computation and maximality check could limit the scalability of FilterV to larger graphs. Therefore, we further develop a novel verification-free (VFree) approach by leveraging the advanced dynamic counting structure proposed. Theoretically, we prove that VFree can reduce the cost of each valid candidate set computation in FilterV by a factor of O(|V|). Furthermore, VFree can avoid the explicit maximality verification because of the developed search paradigm. Finally, comprehensive experiments on 15 real-world graphs are conducted to demonstrate the efficiency and effectiveness of the proposed techniques and model.
We propose a constructive approach to building temporal point processes that incorporate dependence on their history. The dependence is modeled through the conditional density of the duration, i.e., the interval between successive event times, using a mixture of first-order conditional densities for each one of a specific number of lagged durations. Such a formulation for the conditional duration density accommodates high-order dynamics, and it thus enables flexible modeling for point processes with memory. The implied conditional intensity function admits a representation as a local mixture of first-order hazard functions. By specifying appropriate families of distributions for the first-order conditional densities, with different shapes for the associated hazard functions, we can obtain either self-exciting or self-regulating point processes. From the perspective of duration processes, we develop a method to specify a stationary marginal density. The resulting model, interpreted as a dependent renewal process, introduces high-order Markov dependence among identically distributed durations. Furthermore, we provide extensions to cluster point processes. These can describe duration clustering behaviors attributed to different factors, thus expanding the scope of the modeling framework to a wider range of applications. Regarding implementation, we develop a Bayesian approach to inference, model checking, and prediction. We investigate point process model properties analytically, and illustrate the methodology with both synthetic and real data examples.
A new method called the aggregated sure independence screening is proposed for the computational challenges in variable selection of interactions when the number of explanatory variables is much higher than the number of observations (i.e., $p\gg n$). In this problem, the two main challenges are the strong hierarchical restriction and the number of candidates for the main effects and interactions. If $n$ is a few hundred and $p$ is ten thousand, then the memory needed for the augmented matrix of the full model is more than $100{\rm GB}$ in size, beyond the memory capacity of a personal computer. This issue can be solved by our proposed method but not by our competitors. Two advantages are that the proposed method can include important interactions even if the related main effects are weak or absent, and it can be combined with an arbitrary variable selection method for interactions. The research addresses the main concern for variable selection of interactions because it makes previous methods applicable to the case when $p$ is extremely large.
The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving.
Sequential recommendation as an emerging topic has attracted increasing attention due to its important practical significance. Models based on deep learning and attention mechanism have achieved good performance in sequential recommendation. Recently, the generative models based on Variational Autoencoder (VAE) have shown the unique advantage in collaborative filtering. In particular, the sequential VAE model as a recurrent version of VAE can effectively capture temporal dependencies among items in user sequence and perform sequential recommendation. However, VAE-based models suffer from a common limitation that the representational ability of the obtained approximate posterior distribution is limited, resulting in lower quality of generated samples. This is especially true for generating sequences. To solve the above problem, in this work, we propose a novel method called Adversarial and Contrastive Variational Autoencoder (ACVAE) for sequential recommendation. Specifically, we first introduce the adversarial training for sequence generation under the Adversarial Variational Bayes (AVB) framework, which enables our model to generate high-quality latent variables. Then, we employ the contrastive loss. The latent variables will be able to learn more personalized and salient characteristics by minimizing the contrastive loss. Besides, when encoding the sequence, we apply a recurrent and convolutional structure to capture global and local relationships in the sequence. Finally, we conduct extensive experiments on four real-world datasets. The experimental results show that our proposed ACVAE model outperforms other state-of-the-art methods.
It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small- and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.