In this work, we propose the use of a causal collider structured model to describe the underlying data generative process assumptions in disentangled representation learning. This extends the conventional i.i.d. factorization assumption model $p(\mathbf{y}) = \prod_{i} p(\mathbf{y}_i )$, inadequate to handle learning from biased datasets (e.g., with sampling selection bias). The collider structure, explains that conditional dependencies between the underlying generating variables may be exist, even when these are in reality unrelated, complicating disentanglement. Under the rubric of causal inference, we show this issue can be reconciled under the condition of causal identification; attainable from data and a combination of constraints, aimed at controlling the dependencies characteristic of the \textit{collider} model. For this, we propose regularization by identification (ReI), a modular regularization engine designed to align the behavior of large scale generative models with the disentanglement constraints imposed by causal identification. Empirical evidence on standard benchmarks demonstrates the superiority of ReI in learning disentangled representations in a variational framework. In a real-world dataset we additionally show that our framework, results in interpretable representations robust to out-of-distribution examples and that align with the true expected effect from domain knowledge.
We propose two extensions to existing importance sampling based methods for lossy compression. First, we introduce an importance sampling based compression scheme that is a variant of ordered random coding (Theis and Ahmed, 2022) and is amenable to direct evaluation of the achievable compression rate for a finite number of samples. Our second and major contribution is the importance matching lemma, which is a finite proposal counterpart of the recently introduced Poisson matching lemma (Li and Anantharam, 2021). By integrating with deep learning, we provide a new coding scheme for distributed lossy compression with side information at the decoder. We demonstrate the effectiveness of the proposed scheme through experiments involving synthetic Gaussian sources, distributed image compression with MNIST and vertical federated learning with CIFAR-10.
In this work, we address the design of tracking controllers that drive a mechanical system's state asymptotically towards a reference trajectory. Motivated by aerospace and robotics applications, we consider fully-actuated systems evolving on the broad class of homogeneous spaces (encompassing all vector spaces, Lie groups, and spheres of any dimension). In this setting, the transitive action of a Lie group on the configuration manifold enables an intrinsic description of the tracking error as an element of the state space, even in the absence of a group structure on the configuration manifold itself (e.g., for $\mathbb{S}^2$). Such an error state facilitates the design of a generalized control policy depending smoothly on state and time that drives this geometric tracking error to a designated origin from almost every initial condition, thereby guaranteeing almost global convergence to the reference trajectory. Moreover, the proposed controller simplifies naturally when specialized to a Lie group or the $n$-sphere. In summary, we propose a unified, intrinsic controller guaranteeing almost global asymptotic trajectory tracking for fully-actuated mechanical systems evolving on a broader class of manifolds. We apply the method to an axisymmetric satellite and an omnidirectional aerial robot.
In this work, we present the first algorithm to compute expander decompositions in an $m$-edge directed graph with near-optimal time $\tilde{O}(m)$. Further, our algorithm can maintain such a decomposition in a dynamic graph and again obtains near-optimal update times. Our result improves over previous algorithms of Bernstein-Probst Gutenberg-Saranurak (FOCS 2020), Hua-Kyng-Probst Gutenberg-Wu (SODA 2023) that only obtained algorithms optimal up to subpolynomial factors. At the same time, our algorithm is much simpler and more accessible than previous work. In order to obtain our new algorithm, we present a new push-pull-relabel flow framework that generalizes the classic push-relabel flow algorithm of Goldberg-Tarjan (JACM 1988), which was later dynamized for computing expander decompositions in undirected graphs by Henzinger-Rao-Wang (SIAM J. Comput. 2020), Saranurak-Wang (SODA 2019). We then show that the flow problems formulated in recent work of Hua-Kyng-Probst Gutenberg-Wu (SODA 2023) to decompose directed graphs can be solved much more efficiently in the push-pull-relabel flow framework.
In this paper, we propose new techniques for solving geometric optimization problems involving interpoint distances of a point set in the plane. Given a set $P$ of $n$ points in the plane and an integer $1 \leq k \leq \binom{n}{2}$, the distance selection problem is to find the $k$-th smallest interpoint distance among all pairs of points of $P$. The previously best deterministic algorithm solves the problem in $O(n^{4/3} \log^2 n)$ time [Katz and Sharir, SIAM J. Comput. 1997 and SoCG 1993]. In this paper, we improve their algorithm to $O(n^{4/3} \log n)$ time. Using similar techniques, we also give improved algorithms on both the two-sided and the one-sided discrete Fr\'{e}chet distance with shortcuts problem for two point sets in the plane. For the two-sided problem (resp., one-sided problem), we improve the previous work [Avraham, Filtser, Kaplan, Katz, and Sharir, ACM Trans. Algorithms 2015 and SoCG 2014] by a factor of roughly $\log^2(m+n)$ (resp., $(m+n)^{\epsilon}$), where $m$ and $n$ are the sizes of the two input point sets, respectively. Other problems whose solutions can be improved by our techniques include the reverse shortest path problems for unit-disk graphs. Our techniques are quite general and we believe they will find many other applications in future.
Deep reinforcement learning algorithms are usually impeded by sampling inefficiency, heavily depending on multiple interactions with the environment to acquire accurate decision-making capabilities. In contrast, humans rely on their hippocampus to retrieve relevant information from past experiences of relevant tasks, which guides their decision-making when learning a new task, rather than exclusively depending on environmental interactions. Nevertheless, designing a hippocampus-like module for an agent to incorporate past experiences into established reinforcement learning algorithms presents two challenges. The first challenge involves selecting the most relevant past experiences for the current task, and the second challenge is integrating such experiences into the decision network. To address these challenges, we propose a novel method that utilizes a retrieval network based on task-conditioned hypernetwork, which adapts the retrieval network's parameters depending on the task. At the same time, a dynamic modification mechanism enhances the collaborative efforts between the retrieval and decision networks. We evaluate the proposed method across various tasks within a multitask scenario in the Minigrid environment. The experimental results demonstrate that our proposed method significantly outperforms strong baselines.
In this work, we consider the general problem of constructing a neural network surrogate model using multi-fidelity information. Motivated by rigorous error and complexity estimates for ReLU neural networks, given an inexpensive low-fidelity and an expensive high-fidelity computational model, we present a residual multi-fidelity computational framework that formulates the correlation between models as a residual function, a possibly non-linear mapping between 1) the shared input space of the models together with the low-fidelity model output and 2) the discrepancy between the two model outputs. To accomplish this, we train two neural networks to work in concert. The first network learns the residual function on a small set of high-fidelity and low-fidelity data. Once trained, this network is used to generate additional synthetic high-fidelity data, which is used in the training of a second network. This second network, once trained, acts as our surrogate for the high-fidelity quantity of interest. We present three numerical examples to demonstrate the power of the proposed framework. In particular, we show that dramatic savings in computational cost may be achieved when the output predictions are desired to be accurate within small tolerances.
Designing and generating new data under targeted properties has been attracting various critical applications such as molecule design, image editing and speech synthesis. Traditional hand-crafted approaches heavily rely on expertise experience and intensive human efforts, yet still suffer from the insufficiency of scientific knowledge and low throughput to support effective and efficient data generation. Recently, the advancement of deep learning induces expressive methods that can learn the underlying representation and properties of data. Such capability provides new opportunities in figuring out the mutual relationship between the structural patterns and functional properties of the data and leveraging such relationship to generate structural data given the desired properties. This article provides a systematic review of this promising research area, commonly known as controllable deep data generation. Firstly, the potential challenges are raised and preliminaries are provided. Then the controllable deep data generation is formally defined, a taxonomy on various techniques is proposed and the evaluation metrics in this specific domain are summarized. After that, exciting applications of controllable deep data generation are introduced and existing works are experimentally analyzed and compared. Finally, the promising future directions of controllable deep data generation are highlighted and five potential challenges are identified.
In this paper, we propose a deep reinforcement learning framework called GCOMB to learn algorithms that can solve combinatorial problems over large graphs. GCOMB mimics the greedy algorithm in the original problem and incrementally constructs a solution. The proposed framework utilizes Graph Convolutional Network (GCN) to generate node embeddings that predicts the potential nodes in the solution set from the entire node set. These embeddings enable an efficient training process to learn the greedy policy via Q-learning. Through extensive evaluation on several real and synthetic datasets containing up to a million nodes, we establish that GCOMB is up to 41% better than the state of the art, up to seven times faster than the greedy algorithm, robust and scalable to large dynamic networks.
Object detection is considered as one of the most challenging problems in computer vision, since it requires correct prediction of both classes and locations of objects in images. In this study, we define a more difficult scenario, namely zero-shot object detection (ZSD) where no visual training data is available for some of the target object classes. We present a novel approach to tackle this ZSD problem, where a convex combination of embeddings are used in conjunction with a detection framework. For evaluation of ZSD methods, we propose a simple dataset constructed from Fashion-MNIST images and also a custom zero-shot split for the Pascal VOC detection challenge. The experimental results suggest that our method yields promising results for ZSD.
In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.