We study the DAG sorting problem: a partial order $\mathcal{P}$ on $n$ keys is to be discovered by querying as few edges of an input graph $G=(V=[n],E)$ as possible. The graph $G$ only contains edges between ordered pairs, and $G$ is promised to contain the transitive reduction of the DAG describing $\mathcal{P}$. We present two technical and one conceptual result. We first show that DAG sorting is closely related to the fundamental problem of sorting with priced information. \emph{Our first technical result} shows the existence of an algorithm with a $\widetilde{O}(n^{3/4})$ competitive ratio for the $\{0,1,n,\infty\}$ cost version. Thus the $\Omega(n)$ lower bound for maximum cannot extend to sorting, reopening the question of the existence of a $o(n)$-competitive algorithm for the general version. \emph{As our main conceptual contribution}, we define a notion of instance-optimality for the specific problem of DAG sorting, and also unify the existing landscape of instance-optimal algorithms for other static problems existing in literature. This includes problems like sorting [Estivill-Castro and Woods, ACM Comput. Surv. 1992], convex hull [Afshani, Barbay and Chan, JACM 2017], and adaptive joins [Demaine, L\'{o}pez-Ortiz and Munro, SODA 2000]. Our unified notion of instance-optimality is also related to FPT algorithms and algorithms with predictions. We consider the special case of DAG sorting where the input graph is bipartite. \emph{As our second technical result}, we show that a recent algorithm for bichromatic sorting [Goswami and Jacob, ITCS 2024] gives an algorithm for bipartite DAG sorting which is instance-optimal to a factor $O(\log^{3}n)$. This generalizes the famous nuts-and-bolts problem to the setting where the number of nuts and bolts are different, and there is no promise of a matching between them, and the resulting order might not be total.
Data consisting of a graph with a function to $\mathbb{R}^d$ arise in many data applications, encompassing structures such as Reeb graphs, geometric graphs, and knot embeddings. As such, the ability to compare and cluster such objects is required in a data analysis pipeline, leading to a need for distances or metrics between them. In this work, we study the interleaving distance on discretizations of these objects, $\mathbb{R}^d$-mapper graphs, where functor representations of the data can be compared by finding pairs of natural transformations between them. However, in many cases, computation of the interleaving distance is NP-hard. For this reason, we take inspiration from the work of Robinson to find quality measures for families of maps that do not rise to the level of a natural transformation, called assignments. We then endow the functor images with the extra structure of a metric space and define a loss function which measures how far an assignment is from making the required diagrams of an interleaving commute. Finally we show that the computation of the loss function is polynomial. We believe this idea is both powerful and translatable, with the potential to be used for approximation and bounds on interleavings in a broad array of contexts.
Bayesian Optimization (BO) is typically used to optimize an unknown function $f$ that is noisy and costly to evaluate, by exploiting an acquisition function that must be maximized at each optimization step. Even if provably asymptotically optimal BO algorithms are efficient at optimizing low-dimensional functions, scaling them to high-dimensional spaces remains an open problem, often tackled by assuming an additive structure for $f$. By doing so, BO algorithms typically introduce additional restrictive assumptions on the additive structure that reduce their applicability domain. This paper contains two main contributions: (i) we relax the restrictive assumptions on the additive structure of $f$ without weakening the maximization guarantees of the acquisition function, and (ii) we address the over-exploration problem for decentralized BO algorithms. To these ends, we propose DuMBO, an asymptotically optimal decentralized BO algorithm that achieves very competitive performance against state-of-the-art BO algorithms, especially when the additive structure of $f$ comprises high-dimensional factors.
Let $G$ be a planar graph and $I_s$ and $I_t$ be two independent sets in $G$, each of size $k$. We begin with a ``token'' on each vertex of $I_s$ and seek to move all tokens to $I_t$, by repeated ``token jumping'', removing a single token from one vertex and placing it on another vertex. We require that each intermediate arrangement of tokens again specifies an independent set of size $k$. Given $G$, $I_s$, and $I_t$, we ask whether there exists a sequence of token jumps that transforms $I_s$ to $I_t$. When $k$ is part of the input, this problem is known to be PSPACE-complete. However, it was shown by Ito, Kami\'nski, and Ono to be fixed-parameter tractable. That is, when $k$ is fixed, the problem can be solved in time polynomial in the order of $G$. Here we strengthen the upper bound on the running time in terms of $k$ by showing that the problem has a kernel of size linear in $k$. More precisely, we transform an arbitrary input problem on a planar graph into an equivalent problem on a (planar) graph with order $O(k)$.
This paper considers the problem of completing a rating matrix based on sub-sampled matrix entries as well as observed social graphs and hypergraphs. We show that there exists a \emph{sharp threshold} on the sample probability for the task of exactly completing the rating matrix -- the task is achievable when the sample probability is above the threshold, and is impossible otherwise -- demonstrating a phase transition phenomenon. The threshold can be expressed as a function of the ``quality'' of hypergraphs, enabling us to \emph{quantify} the amount of reduction in sample probability due to the exploitation of hypergraphs. This also highlights the usefulness of hypergraphs in the matrix completion problem. En route to discovering the sharp threshold, we develop a computationally efficient matrix completion algorithm that effectively exploits the observed graphs and hypergraphs. Theoretical analyses show that our algorithm succeeds with high probability as long as the sample probability exceeds the aforementioned threshold, and this theoretical result is further validated by synthetic experiments. Moreover, our experiments on a real social network dataset (with both graphs and hypergraphs) show that our algorithm outperforms other state-of-the-art matrix completion algorithms.
A common pipeline in functional data analysis is to first convert the discretely observed data to smooth functions, and then represent the functions by a finite-dimensional vector of coefficients summarizing the information. Existing methods for data smoothing and dimensional reduction mainly focus on learning the linear mappings from the data space to the representation space, however, learning only the linear representations may not be sufficient. In this study, we propose to learn the nonlinear representations of functional data using neural network autoencoders designed to process data in the form it is usually collected without the need of preprocessing. We design the encoder to employ a projection layer computing the weighted inner product of the functional data and functional weights over the observed timestamp, and the decoder to apply a recovery layer that maps the finite-dimensional vector extracted from the functional data back to functional space using a set of predetermined basis functions. The developed architecture can accommodate both regularly and irregularly spaced data. Our experiments demonstrate that the proposed method outperforms functional principal component analysis in terms of prediction and classification, and maintains superior smoothing ability and better computational efficiency in comparison to the conventional autoencoders under both linear and nonlinear settings.
This work considers the combinatorial multi-access coded caching problem introduced in the recent work by Muralidhar \textit{et al.} [P. N. Muralidhar, D. Katyal, and B. S. Rajan, ``Maddah-Ali-Niesen scheme for multi-access coded caching,'' in \textit{IEEE Inf. Theory Workshop (ITW)}, 2021] The problem setting consists of a central server having a library of $N$ files and $C$ caches each with capacity $M$. Each user in the system can access a unique set of $r<C$ caches, and there exist users corresponding to every distinct set of $r$ caches. Therefore, the number of users in the system is $\binom{C}{r}$. For the aforementioned combinatorial multi-access setting, we propose a coded caching scheme with an MDS code-based coded placement. This novel placement technique helps to achieve a better rate in the delivery phase compared to the optimal scheme under uncoded placement when $M> N/C$. For a lower memory regime, we present another scheme with coded placement, which outperforms the optimal scheme under uncoded placement if the number of files is no more than the number of users. Further, we derive an information-theoretic lower bound on the optimal rate-memory trade-off of the combinatorial multi-access coded caching scheme. In addition, using the derived lower bound, we show that the first scheme is optimal in the higher memory regime, and the second scheme is optimal if $N\leq \binom{C}{r}$. Finally, we show that the performance of the first scheme is within a constant factor of the optimal performance, when $r=2$.
We propose a new framework to design and analyze accelerated methods that solve general monotone equation (ME) problems $F(x)=0$. Traditional approaches include generalized steepest descent methods and inexact Newton-type methods. If $F$ is uniformly monotone and twice differentiable, these methods achieve local convergence rates while the latter methods are globally convergent thanks to line search and hyperplane projection. However, a global rate is unknown for these methods. The variational inequality methods can be applied to yield a global rate that is expressed in terms of $\|F(x)\|$ but these results are restricted to first-order methods and a Lipschitz continuous operator. It has not been clear how to obtain global acceleration using high-order Lipschitz continuity. This paper takes a continuous-time perspective where accelerated methods are viewed as the discretization of dynamical systems. Our contribution is to propose accelerated rescaled gradient systems and prove that they are equivalent to closed-loop control systems. Based on this connection, we establish the properties of solution trajectories. Moreover, we provide a unified algorithmic framework obtained from discretization of our system, which together with two approximation subroutines yields both existing high-order methods and new first-order methods. We prove that the $p^{th}$-order method achieves a global rate of $O(k^{-p/2})$ in terms of $\|F(x)\|$ if $F$ is $p^{th}$-order Lipschitz continuous and the first-order method achieves the same rate if $F$ is $p^{th}$-order strongly Lipschitz continuous. If $F$ is strongly monotone, the restarted versions achieve local convergence with order $p$ when $p \geq 2$. Our discrete-time analysis is largely motivated by the continuous-time analysis and demonstrates the fundamental role that rescaled gradients play in global acceleration for solving ME problems.
While existing work in robust deep learning has focused on small pixel-level $\ell_p$ norm-based perturbations, this may not account for perturbations encountered in several real world settings. In many such cases although test data might not be available, broad specifications about the types of perturbations (such as an unknown degree of rotation) may be known. We consider a setup where robustness is expected over an unseen test domain that is not i.i.d. but deviates from the training domain. While this deviation may not be exactly known, its broad characterization is specified a priori, in terms of attributes. We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space, without having access to the data from the test domain. Our adversarial training solves a min-max optimization problem, with the inner maximization generating adversarial perturbations, and the outer minimization finding model parameters by optimizing the loss on adversarial perturbations generated from the inner maximization. We demonstrate the applicability of our approach on three types of naturally occurring perturbations -- object-related shifts, geometric transformations, and common image corruptions. Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations. We demonstrate the usefulness of the proposed approach by showing the robustness gains of deep neural networks trained using our adversarial training on MNIST, CIFAR-10, and a new variant of the CLEVR dataset.
The problem of Multiple Object Tracking (MOT) consists in following the trajectory of different objects in a sequence, usually a video. In recent years, with the rise of Deep Learning, the algorithms that provide a solution to this problem have benefited from the representational power of deep models. This paper provides a comprehensive survey on works that employ Deep Learning models to solve the task of MOT on single-camera videos. Four main steps in MOT algorithms are identified, and an in-depth review of how Deep Learning was employed in each one of these stages is presented. A complete experimental comparison of the presented works on the three MOTChallenge datasets is also provided, identifying a number of similarities among the top-performing methods and presenting some possible future research directions.
Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.