Given a conjunctive query $Q$ and a database $\mathbf{D}$, a direct access to the answers of $Q$ over $\mathbf{D}$ is the operation of returning, given an index $j$, the $j^{\mathsf{th}}$ answer for some order on its answers. While this problem is $\#\mathsf{P}$-hard in general with respect to combined complexity, many conjunctive queries have an underlying structure that allows for a direct access to their answers for some lexicographical ordering that takes polylogarithmic time in the size of the database after a polynomial time precomputation. Previous work has precisely characterised the tractable classes and given fine-grained lower bounds on the precomputation time needed depending on the structure of the query. In this paper, we generalise these tractability results to the case of signed conjunctive queries, that is, conjunctive queries that may contain negative atoms. Our technique is based on a class of circuits that can represent relational data. We first show that this class supports tractable direct access after a polynomial time preprocessing. We then give bounds on the size of the circuit needed to represent the answer set of signed conjunctive queries depending on their structure. Both results combined together allow us to prove the tractability of direct access for a large class of conjunctive queries. On the one hand, we recover the known tractable classes from the literature in the case of positive conjunctive queries. On the other hand, we generalise and unify known tractability results about negative conjunctive queries -- that is, queries having only negated atoms. In particular, we show that the class of $\beta$-acyclic negative conjunctive queries and the class of bounded nest set width negative conjunctive queries admit tractable direct access.
We consider the problem of learning a graph modeling the statistical relations of the $d$ variables from a dataset with $n$ samples $X \in \mathbb{R}^{n \times d}$. Standard approaches amount to searching for a precision matrix $\Theta$ representative of a Gaussian graphical model that adequately explains the data. However, most maximum likelihood-based estimators usually require storing the $d^{2}$ values of the empirical covariance matrix, which can become prohibitive in a high-dimensional setting. In this work, we adopt a compressive viewpoint and aim to estimate a sparse $\Theta$ from a \emph{sketch} of the data, i.e. a low-dimensional vector of size $m \ll d^{2}$ carefully designed from $X$ using non-linear random features. Under certain assumptions on the spectrum of $\Theta$ (or its condition number), we show that it is possible to estimate it from a sketch of size $m=\Omega\left((d+2k)\log(d)\right)$ where $k$ is the maximal number of edges of the underlying graph. These information-theoretic guarantees are inspired by compressed sensing theory and involve restricted isometry properties and instance optimal decoders. We investigate the possibility of achieving practical recovery with an iterative algorithm based on the graphical lasso, viewed as a specific denoiser. We compare our approach and graphical lasso on synthetic datasets, demonstrating its favorable performance even when the dataset is compressed.
Federated Learning (FL) has emerged as a promising solution to perform deep learning on different data owners without exchanging raw data. However, non-IID data has been a key challenge in FL, which could significantly degrade the accuracy of the final model. Among different non-IID types, label skews have been challenging and common in image classification and other tasks. Instead of averaging the local models in most previous studies, we propose FedConcat, a simple and effective approach that concatenates these local models as the base of the global model to effectively aggregate the local knowledge. To reduce the size of the global model, we adopt the clustering technique to group the clients by their label distributions and collaboratively train a model inside each cluster. We theoretically analyze the advantage of concatenation over averaging by analyzing the information bottleneck of deep neural networks. Experimental results demonstrate that FedConcat achieves significantly higher accuracy than previous state-of-the-art FL methods in various heterogeneous label skew distribution settings and meanwhile has lower communication costs. Our code is publicly available.
We consider the Distinct Shortest Walks problem. Given two vertices $s$ and $t$ of a graph database $\mathcal{D}$ and a regular path query, enumerate all walks of minimal length from $s$ to $t$ that carry a label that conforms to the query. Usual theoretical solutions turn out to be inefficient when applied to graph models that are closer to real-life systems, in particular because edges may carry multiple labels. Indeed, known algorithms may repeat the same answer exponentially many times. We propose an efficient algorithm for multi-labelled graph databases. The preprocessing runs in $O{|\mathcal{D}|\times|\mathcal{A}|}$ and the delay between two consecutive outputs is in $O(\lambda\times|\mathcal{A}|)$, where $\mathcal{A}$ is a nondeterministic automaton representing the query and $\lambda$ is the minimal length. The algorithm can handle $\varepsilon$-transitions in $\mathcal{A}$ or queries given as regular expressions at no additional cost.
We introduce a \emph{gain function} viewpoint of information leakage by proposing \emph{maximal $g$-leakage}, a rich class of operationally meaningful leakage measures that subsumes recently introduced leakage measures -- {maximal leakage} and {maximal $\alpha$-leakage}. In maximal $g$-leakage, the gain of an adversary in guessing an unknown random variable is measured using a {gain function} applied to the probability of correctly guessing. In particular, maximal $g$-leakage captures the multiplicative increase, upon observing $Y$, in the expected gain of an adversary in guessing a randomized function of $X$, maximized over all such randomized functions. We also consider the scenario where an adversary can make multiple attempts to guess the randomized function of interest. We show that maximal leakage is an upper bound on maximal $g$-leakage under multiple guesses, for any non-negative gain function $g$. We obtain a closed-form expression for maximal $g$-leakage under multiple guesses for a class of concave gain functions. We also study maximal $g$-leakage measure for a specific class of gain functions related to the $\alpha$-loss. In particular, we first completely characterize the minimal expected $\alpha$-loss under multiple guesses and analyze how the corresponding leakage measure is affected with the number of guesses. Finally, we study two variants of maximal $g$-leakage depending on the type of adversary and obtain closed-form expressions for them, which do not depend on the particular gain function considered as long as it satisfies some mild regularity conditions. We do this by developing a variational characterization for the R\'{e}nyi divergence of order infinity which naturally generalizes the definition of pointwise maximal leakage to incorporate arbitrary gain functions.
In 2023, Kuznetsov and Speranski introduced infinitary action logic with multiplexing $!^m\nabla \mathrm{ACT}_\omega$ and proved that the derivability problem for it lies between the $\omega$ and $\omega^\omega$ levels of the hyperarithmetical hierarchy. We prove that this problem is $\Delta^0_{\omega^\omega}$-complete under Turing reductions. Namely, we prove that it is recursively isomorphic to the satisfaction predicate for computable infinitary formulas of rank less than $\omega^\omega$ in the language of arithmetic. We also prove this result for the fragment of $!^m\nabla \mathrm{ACT}_\omega$ where Kleene star is not allowed to be in the scope of the subexponential. Finally, we present a family of logics, which are fragments of $!^m\nabla \mathrm{ACT}_\omega$, such that the complexity of the $k$-th logic is between $\Delta^0_{\omega^k}$ and $\Delta^0_{\omega^{k+1}}$.
We propose a new framework to design and analyze accelerated methods that solve general monotone equation (ME) problems $F(x)=0$. Traditional approaches include generalized steepest descent methods and inexact Newton-type methods. If $F$ is uniformly monotone and twice differentiable, these methods achieve local convergence rates while the latter methods are globally convergent thanks to line search and hyperplane projection. However, a global rate is unknown for these methods. The variational inequality methods can be applied to yield a global rate that is expressed in terms of $\|F(x)\|$ but these results are restricted to first-order methods and a Lipschitz continuous operator. It has not been clear how to obtain global acceleration using high-order Lipschitz continuity. This paper takes a continuous-time perspective where accelerated methods are viewed as the discretization of dynamical systems. Our contribution is to propose accelerated rescaled gradient systems and prove that they are equivalent to closed-loop control systems. Based on this connection, we establish the properties of solution trajectories. Moreover, we provide a unified algorithmic framework obtained from discretization of our system, which together with two approximation subroutines yields both existing high-order methods and new first-order methods. We prove that the $p^{th}$-order method achieves a global rate of $O(k^{-p/2})$ in terms of $\|F(x)\|$ if $F$ is $p^{th}$-order Lipschitz continuous and the first-order method achieves the same rate if $F$ is $p^{th}$-order strongly Lipschitz continuous. If $F$ is strongly monotone, the restarted versions achieve local convergence with order $p$ when $p \geq 2$. Our discrete-time analysis is largely motivated by the continuous-time analysis and demonstrates the fundamental role that rescaled gradients play in global acceleration for solving ME problems.
A homomorphism from a graph $G$ to a graph $H$ is an edge-preserving mapping from $V(G)$ to $V(H)$. In the graph homomorphism problem, denoted by $Hom(H)$, the graph $H$ is fixed and we need to determine if there exists a homomorphism from an instance graph $G$ to $H$. We study the complexity of the problem parameterized by the cutwidth of $G$. We aim, for each $H$, for algorithms for $Hom(H)$ running in time $c_H^k n^{\mathcal{O}(1)}$ and matching lower bounds that exclude $c_H^{k \cdot o(1)}n^{\mathcal{O}(1)}$ or $c_H^{k(1-\Omega(1))}n^{\mathcal{O}(1)}$ time algorithms under the (Strong) Exponential Time Hypothesis. In the paper we introduce a new parameter that we call $\mathrm{mimsup}(H)$. Our main contribution is strong evidence of a close connection between $c_H$ and $\mathrm{mimsup}(H)$: * an information-theoretic argument that the number of states needed in a natural dynamic programming algorithm is at most $\mathrm{mimsup}(H)^k$, * lower bounds that show that for almost all graphs $H$ indeed we have $c_H \geq \mathrm{mimsup}(H)$, assuming the (Strong) Exponential-Time Hypothesis, and * an algorithm with running time $\exp ( {\mathcal{O}( \mathrm{mimsup}(H) \cdot k \log k)}) n^{\mathcal{O}(1)}$. The parameter $\mathrm{mimsup}(H)$ can be thought of as the $p$-th root of the maximum induced matching number in the graph obtained by multiplying $p$ copies of $H$ via certain graph product, where $p$ tends to infinity. It can also be defined as an asymptotic rank parameter of the adjacency matrix of $H$. Our results tightly link the parameterized complexity of a problem to such an asymptotic rank parameter for the first time.
It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small- and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.
Neural machine translation (NMT) is a deep learning based approach for machine translation, which yields the state-of-the-art translation performance in scenarios where large-scale parallel corpora are available. Although the high-quality and domain-specific translation is crucial in the real world, domain-specific corpora are usually scarce or nonexistent, and thus vanilla NMT performs poorly in such scenarios. Domain adaptation that leverages both out-of-domain parallel corpora as well as monolingual corpora for in-domain translation, is very important for domain-specific translation. In this paper, we give a comprehensive survey of the state-of-the-art domain adaptation techniques for NMT.
Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.