Constrained variational inequality problems are recognized for their broad applications across various fields including machine learning and operations research. First-order methods have emerged as the standard approach for solving these problems due to their simplicity and scalability. However, they typically rely on projection or linear minimization oracles to navigate the feasible set, which becomes computationally expensive in practical scenarios featuring multiple functional constraints. Existing efforts to tackle such functional constrained variational inequality problems have centered on primal-dual algorithms grounded in the Lagrangian function. These algorithms along with their theoretical analysis often require the existence and prior knowledge of the optimal Lagrange multipliers. In this work, we propose a simple primal method, termed Constrained Gradient Method (CGM), for addressing functional constrained variational inequality problems, without necessitating any information on the optimal Lagrange multipliers. We establish a non-asymptotic convergence analysis of the algorithm for variational inequality problems with monotone operators under smooth constraints. Remarkably, our algorithms match the complexity of projection-based methods in terms of operator queries for both monotone and strongly monotone settings, while utilizing significantly cheaper oracles based on quadratic programming. Furthermore, we provide several numerical examples to evaluate the efficacy of our algorithms.
The integration of path reasoning with language modeling in recommender systems has shown promise for enhancing explainability but often struggles with the authenticity of the explanations provided. Traditional models modify their architecture to produce entities and relations alternately--for example, employing separate heads for each in the model--which does not ensure the authenticity of paths reflective of actual Knowledge Graph (KG) connections. This misalignment can lead to user distrust due to the generation of corrupted paths. Addressing this, we introduce PEARLM (Path-based Explainable-Accurate Recommender based on Language Modelling), which innovates with a Knowledge Graph Constraint Decoding (KGCD) mechanism. This mechanism ensures zero incidence of corrupted paths by enforcing adherence to valid KG connections at the decoding level, agnostic of the underlying model architecture. By integrating direct token embedding learning from KG paths, PEARLM not only guarantees the generation of plausible and verifiable explanations but also highly enhances recommendation accuracy. We validate the effectiveness of our approach through a rigorous empirical assessment, employing a newly proposed metric that quantifies the integrity of explanation paths. Our results demonstrate a significant improvement over existing methods, effectively eliminating the generation of inaccurate paths and advancing the state-of-the-art in explainable recommender systems.
Deep neural networks are an attractive alternative for simulating complex dynamical systems, as in comparison to traditional scientific computing methods, they offer reduced computational costs during inference and can be trained directly from observational data. Existing methods, however, cannot extrapolate accurately and are prone to error accumulation in long-time integration. Herein, we address this issue by combining neural operators with recurrent neural networks, learning the operator mapping, while offering a recurrent structure to capture temporal dependencies. The integrated framework is shown to stabilize the solution and reduce error accumulation for both interpolation and extrapolation of the Korteweg-de Vries equation.
Addressing real-world optimization problems becomes particularly challenging when analytic objective functions or constraints are unavailable. While numerous studies have addressed the issue of unknown objectives, limited research has focused on scenarios where feasibility constraints are not given explicitly. Overlooking these constraints can lead to spurious solutions that are unrealistic in practice. To deal with such unknown constraints, we propose to perform optimization within the data manifold using diffusion models. To constrain the optimization process to the data manifold, we reformulate the original optimization problem as a sampling problem from the product of the Boltzmann distribution defined by the objective function and the data distribution learned by the diffusion model. To enhance sampling efficiency, we propose a two-stage framework that begins with a guided diffusion process for warm-up, followed by a Langevin dynamics stage for further correction. Theoretical analysis shows that the initial stage results in a distribution focused on feasible solutions, thereby providing a better initialization for the later stage. Comprehensive experiments on a synthetic dataset, six real-world black-box optimization datasets, and a multi-objective optimization dataset show that our method achieves better or comparable performance with previous state-of-the-art baselines.
One well motivated explanation method for classifiers leverages counterfactuals which are hypothetical events identical to real observations in all aspects except for one categorical feature. Constructing such counterfactual poses specific challenges for texts, however, as some attribute values may not necessarily align with plausible real-world events. In this paper we propose a simple method for generating counterfactuals by intervening in the space of text representations which bypasses this limitation. We argue that our interventions are minimally disruptive and that they are theoretically sound as they align with counterfactuals as defined in Pearl's causal inference framework. To validate our method, we conducted experiments first on a synthetic dataset and then on a realistic dataset of counterfactuals. This allows for a direct comparison between classifier predictions based on ground truth counterfactuals - obtained through explicit text interventions - and our counterfactuals, derived through interventions in the representation space. Eventually, we study a real world scenario where our counterfactuals can be leveraged both for explaining a classifier and for bias mitigation.
Stochastic approximation is a class of algorithms that update a vector iteratively, incrementally, and stochastically, including, e.g., stochastic gradient descent and temporal difference learning. One fundamental challenge in analyzing a stochastic approximation algorithm is to establish its stability, i.e., to show that the stochastic vector iterates are bounded almost surely. In this paper, we extend the celebrated Borkar-Meyn theorem for stability from the Martingale difference noise setting to the Markovian noise setting, which greatly improves its applicability in reinforcement learning, especially in those off-policy reinforcement learning algorithms with linear function approximation and eligibility traces. Central to our analysis is the diminishing asymptotic rate of change of a few functions, which is implied by both a form of strong law of large numbers and a commonly used V4 Lyapunov drift condition and trivially holds if the Markov chain is finite and irreducible.
We consider the problem of an autonomous agent equipped with multiple sensors, each with different sensing precision and energy costs. The agent's goal is to explore the environment and gather information subject to its resource constraints in unknown, partially observable environments. The challenge lies in reasoning about the effects of sensing and movement while respecting the agent's resource and dynamic constraints. We formulate the problem as a trajectory optimization problem and solve it using a projection-based trajectory optimization approach where the objective is to reduce the variance of the Gaussian process world belief. Our approach outperforms previous approaches in long horizon trajectories by achieving an overall variance reduction of up to 85% and reducing the root-mean square error in the environment belief by 50%. This approach was developed in support of rover path planning for the NASA VIPER Mission.
We develop a conformal inference method to construct joint confidence regions for structured groups of missing entries within a sparsely observed matrix. This method is useful to provide reliable uncertainty estimation for group-level collaborative filtering; for example, it can be applied to help suggest a movie for a group of friends to watch together. Unlike standard conformal techniques, which make inferences for one individual at a time, our method achieves stronger group-level guarantees by carefully assembling a structured calibration data set mimicking the patterns expected among the test group of interest. We propose a generalized weighted conformalization framework to deal with the lack of exchangeability arising from such structured calibration, and in this process we introduce several innovations to overcome computational challenges. The practicality and effectiveness of our method are demonstrated through extensive numerical experiments and an analysis of the MovieLens 100K data set.
This work addresses the problem of simulating Gaussian random fields that are continuously indexed over a class of metric graphs, termed graphs with Euclidean edges, being more general and flexible than linear networks. We introduce three general algorithms that allow to reconstruct a wide spectrum of random fields having a covariance function that depends on a specific metric, called resistance metric, and proposed in recent literature. The algorithms are applied to a synthetic case study consisting of a street network. They prove to be fast and accurate in that they reproduce the target covariance function and provide random fields whose finite-dimensional distributions are approximately Gaussian.
Minimum spanning trees are important tools in the analysis and design of networks. Many practical applications require their computation, ranging from biology and linguistics to economy and telecommunications. The set of cycles of a network has a vector space structure. Given a spanning tree, the set of non-tree edges defines cycles that determine a basis. The intersection of two such cycles is the number of edges they have in common and the intersection number -- denoted $\cap(G)$ -- is the number of non-empty pairwise intersections of the cycles of the basis. The Minimum Spanning Tree Cycle Intersection problem consists in finding a spanning tree such that the intersection number is minimum. This problem is relevant in order to integrate discrete differential forms. In this paper, we present two lower bounds of the intersection number of an arbitrary connected graph $G=(V,E)$. In the first part, we prove the following statement: $$\frac{1}{2}\left(\frac{\nu^2}{n-1} - \nu\right) \leq \cap(G),$$ where $n = |V|$ and $\nu$ is the \emph{cyclomatic number} of $G$. In the second part, based on some experimental results and a new observation, we conjecture the following improved tight lower bound: $$(n-1) \binom{q}{2} + q \ r\leq \cap(G),$$ where $2 \nu = q (n-1) + r$ is the integer division of $2 \nu$ and $n-1$. This is the first result in a general context, that is for an arbitrary connected graph.
A community reveals the features and connections of its members that are different from those in other communities in a network. Detecting communities is of great significance in network analysis. Despite the classical spectral clustering and statistical inference methods, we notice a significant development of deep learning techniques for community detection in recent years with their advantages in handling high dimensional network data. Hence, a comprehensive overview of community detection's latest progress through deep learning is timely to both academics and practitioners. This survey devises and proposes a new taxonomy covering different categories of the state-of-the-art methods, including deep learning-based models upon deep neural networks, deep nonnegative matrix factorization and deep sparse filtering. The main category, i.e., deep neural networks, is further divided into convolutional networks, graph attention networks, generative adversarial networks and autoencoders. The survey also summarizes the popular benchmark data sets, model evaluation metrics, and open-source implementations to address experimentation settings. We then discuss the practical applications of community detection in various domains and point to implementation scenarios. Finally, we outline future directions by suggesting challenging topics in this fast-growing deep learning field.