Sequential model synchronisation is the task of propagating changes from one model to another correlated one to restore consistency. It is challenging to perform this propagation in a least-changing way that avoids unnecessary deletions (which might cause information loss). From a theoretical point of view, so-called short-cut (SC) rules have been developed that enable provably correct propagation of changes while avoiding information loss. However, to be able to react to every possible change, an infinite set of such rules might be necessary. Practically, only small sets of pre-computed basic SC rules have been used, severely restricting the kind of changes that can be propagated without loss of information. In this work, we close that gap by developing an approach to compute more complex required SC rules on-the-fly during synchronisation. These higher-order SC rules allow us to cope with more complex scenarios when multiple changes must be handled in one step. We implemented our approach in the model transformation tool eMoflon. An evaluation shows that the overhead of computing higher-order SC rules on-the-fly is tolerable and at times even improves the overall performance. Above that, completely new scenarios can be dealt with without the loss of information.
We prove an inverse approximation theorem for the approximation of nonlinear sequence-to-sequence relationships using recurrent neural networks (RNNs). This is a so-called Bernstein-type result in approximation theory, which deduces properties of a target function under the assumption that it can be effectively approximated by a hypothesis space. In particular, we show that nonlinear sequence relationships that can be stably approximated by nonlinear RNNs must have an exponential decaying memory structure - a notion that can be made precise. This extends the previously identified curse of memory in linear RNNs into the general nonlinear setting, and quantifies the essential limitations of the RNN architecture for learning sequential relationships with long-term memory. Based on the analysis, we propose a principled reparameterization method to overcome the limitations. Our theoretical results are confirmed by numerical experiments. The code has been released in //github.com/radarFudan/Curse-of-memory
With the rapid development of large models, the need for data has become increasingly crucial. Especially in 3D object detection, costly manual annotations have hindered further advancements. To reduce the burden of annotation, we study the problem of achieving 3D object detection solely based on 2D annotations. Thanks to advanced 3D reconstruction techniques, it is now feasible to reconstruct the overall static 3D scene. However, extracting precise object-level annotations from the entire scene and generalizing these limited annotations to the entire scene remain challenges. In this paper, we introduce a novel paradigm called BA$^2$-Det, encompassing pseudo label generation and multi-stage generalization. We devise the DoubleClustering algorithm to obtain object clusters from reconstructed scene-level points, and further enhance the model's detection capabilities by developing three stages of generalization: progressing from complete to partial, static to dynamic, and close to distant. Experiments conducted on the large-scale Waymo Open Dataset show that the performance of BA$^2$-Det is on par with the fully-supervised methods using 10% annotations. Additionally, using large raw videos for pretraining,BA$^2$-Det can achieve a 20% relative improvement on the KITTI dataset. The method also has great potential for detecting open-set 3D objects in complex scenes. Project page: //ba2det.site.
We study an extension of the cardinality-constrained knapsack problem wherein each item has a concave piecewise linear utility structure (CCKP), which is motivated by applications such as resource management problems in monitoring and surveillance tasks. Our main contributions are combinatorial algorithms for the offline CCKP and an online version of the CCKP. For the offline problem, we present a fully polynomial-time approximation scheme and show that it can be cast as the maximization of a submodular function with cardinality constraints; the latter property allows us to derive a greedy $(1 - \frac{1}{e})$-approximation algorithm. For the online CCKP in the random order model, we derive a $\frac{10.427}{\alpha}$-competitive algorithm based on $\alpha$-approximation algorithms for the offline CCKP; moreover, we derive stronger guarantees for the cases wherein the cardinality capacity is very small or relatively large. Finally, we investigate the empirical performance of the proposed algorithms in numerical experiments.
We consider a model convection-diffusion problem and present useful connections between the finite differences and finite element discretization methods. We introduce a general upwinding Petrov-Galerkin discretization based on bubble modification of the test space and connect the method with the general upwinding approach used in finite difference discretization. We write the finite difference and the finite element systems such that the two corresponding linear systems have the same stiffness matrices, and compare the right hand side load vectors for the two methods. This new approach allows for improving well known upwinding finite difference methods and for obtaining new error estimates. We prove that the exponential bubble Petrov-Galerkin discretization can recover the interpolant of the exact solution. As a consequence, we estimate the closeness of the related finite difference solutions to the interpolant. The ideas we present in this work, can lead to building efficient new discretization methods for multidimensional convection dominated problems.
In social recommender systems, it is crucial that the recommendation models provide equitable visibility for different demographic groups, such as gender or race. Most existing research has addressed this problem by only studying individual static snapshots of networks that typically change over time. To address this gap, we study the evolution of recommendation fairness over time and its relation to dynamic network properties. We examine three real-world dynamic networks by evaluating the fairness of six recommendation algorithms and analyzing the association between fairness and network properties over time. We further study how interventions on network properties influence fairness by examining counterfactual scenarios with alternative evolution outcomes and differing network properties. Our results on empirical datasets suggest that recommendation fairness improves over time, regardless of the recommendation method. We also find that two network properties, minority ratio, and homophily ratio, exhibit stable correlations with fairness over time. Our counterfactual study further suggests that an extreme homophily ratio potentially contributes to unfair recommendations even with a balanced minority ratio. Our work provides insights into the evolution of fairness within dynamic networks in social science. We believe that our findings will help system operators and policymakers to better comprehend the implications of temporal changes and interventions targeting fairness in social networks.
We consider a variant of matrix completion where entries are revealed in a biased manner, adopting a model akin to that introduced by Ma and Chen. Instead of treating this observation bias as a disadvantage, as is typically the case, the goal is to exploit the shared information between the bias and the outcome of interest to improve predictions. Towards this, we consider a natural model where the observation pattern and outcome of interest are driven by the same set of underlying latent or unobserved factors. This leads to a two stage matrix completion algorithm: first, recover (distances between) the latent factors by utilizing matrix completion for the fully observed noisy binary matrix corresponding to the observation pattern; second, utilize the recovered latent factors as features and sparsely observed noisy outcomes as labels to perform non-parametric supervised learning. The finite-sample error rates analysis suggests that, ignoring logarithmic factors, this approach is competitive with the corresponding supervised learning parametric rates. This implies the two-stage method has performance that is comparable to having access to the unobserved latent factors through exploiting the shared information between the bias and outcomes. Through empirical evaluation using a real-world dataset, we find that with this two-stage algorithm, the estimates have 30x smaller mean squared error compared to traditional matrix completion methods, suggesting the utility of the model and the method proposed in this work.
Machine unlearning has emerged as a new paradigm to deliberately forget data samples from a given model in order to adhere to stringent regulations. However, existing machine unlearning methods have been primarily focused on classification models, leaving the landscape of unlearning for generative models relatively unexplored. This paper serves as a bridge, addressing the gap by providing a unifying framework of machine unlearning for image-to-image generative models. Within this framework, we propose a computationally-efficient algorithm, underpinned by rigorous theoretical analysis, that demonstrates negligible performance degradation on the retain samples, while effectively removing the information from the forget samples. Empirical studies on two large-scale datasets, ImageNet-1K and Places-365, further show that our algorithm does not rely on the availability of the retain samples, which further complies with data retention policy. To our best knowledge, this work is the first that represents systemic, theoretical, empirical explorations of machine unlearning specifically tailored for image-to-image generative models. Our code is available at //github.com/jpmorganchase/l2l-generator-unlearning.
Modeling the correlations among errors is closely associated with how accurately the model can quantify predictive uncertainty in probabilistic time series forecasting. Recent multivariate models have made significant progress in accounting for contemporaneous correlations among errors, while a common assumption on these errors is that they are temporally independent for the sake of statistical simplicity. However, real-world observations often deviate from this assumption, since errors usually exhibit substantial autocorrelation due to various factors such as the exclusion of temporally correlated covariates. In this work, we propose an efficient method, based on a low-rank-plus-diagonal parameterization of the covariance matrix, which can effectively characterize the autocorrelation of errors. The proposed method possesses several desirable properties: the complexity does not scale with the number of time series, the resulting covariance can be used for calibrating predictions, and it can seamlessly integrate with any model with Gaussian-distributed errors. We empirically demonstrate these properties using two distinct neural forecasting models -- GPVar and Transformer. Our experimental results confirm the effectiveness of our method in enhancing predictive accuracy and the quality of uncertainty quantification on multiple real-world datasets.
The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel machine learning algorithms for autonomous systems control, with a specific focus on deep reinforcement learning and multi-agent reinforcement learning. Research problems include scalable learning of coordinated agent policies and inter-agent communication; reasoning about the behaviours, goals, and composition of other agents from limited observations; and sample-efficient learning based on intrinsic motivation, curriculum learning, causal inference, and representation learning. This article provides a broad overview of the ongoing research portfolio of the group and discusses open problems for future directions.
Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis.