The circuit class $\mathsf{QAC}^0$ was introduced by Moore (1999) as a model for constant depth quantum circuits where the gate set includes many-qubit Toffoli gates. Proving lower bounds against such circuits is a longstanding challenge in quantum circuit complexity; in particular, showing that polynomial-size $\mathsf{QAC}^0$ cannot compute the parity function has remained an open question for over 20 years. In this work, we identify a notion of the Pauli spectrum of $\mathsf{QAC}^0$ circuits, which can be viewed as the quantum analogue of the Fourier spectrum of classical $\mathsf{AC}^0$ circuits. We conjecture that the Pauli spectrum of $\mathsf{QAC}^0$ circuits satisfies low-degree concentration, in analogy to the famous Linial, Nisan, Mansour theorem on the low-degree Fourier concentration of $\mathsf{AC}^0$ circuits. If true, this conjecture immediately implies that polynomial-size $\mathsf{QAC}^0$ circuits cannot compute parity. We prove this conjecture for the class of depth-$d$, polynomial-size $\mathsf{QAC}^0$ circuits with at most $n^{O(1/d)}$ auxiliary qubits. We obtain new circuit lower bounds and learning results as applications: this class of circuits cannot correctly compute - the $n$-bit parity function on more than $(\frac{1}{2} + 2^{-\Omega(n^{1/d})})$-fraction of inputs, and - the $n$-bit majority function on more than $(1 - 1/\mathrm{poly}(n))$-fraction of inputs. Additionally we show that this class of $\mathsf{QAC}^0$ circuits with limited auxiliary qubits can be learned with quasipolynomial sample complexity, giving the first learning result for $\mathsf{QAC}^0$ circuits. More broadly, our results add evidence that "Pauli-analytic" techniques can be a powerful tool in studying quantum circuits.
We propose and study a framework for quantifying the importance of the choices of parameter values to the result of a query over a database. These parameters occur as constants in logical queries, such as conjunctive queries. In our framework, the importance of a parameter is its SHAP score. This score is a popular instantiation of the game-theoretic Shapley value to measuring the importance of feature values in machine learning models. We make the case for the rationale of using this score by explaining the intuition behind SHAP, and by showing that we arrive at this score in two different, apparently opposing, approaches to quantifying the contribution of a parameter. The application of the SHAP score requires two components in addition to the query and the database: (a) a probability distribution over the combinations of parameter values, and (b) a utility function that measures the similarity between the result for the original parameters and the result for hypothetical parameters. The main question addressed in the paper is the complexity of calculating the SHAP score for different distributions and similarity measures. We first address the case of probabilistically independent parameters. The problem is hard if we consider a fragment of queries that is hard to evaluate (as one would expect), and even for the fragment of acyclic conjunctive queries. In some cases, though, one can efficiently list all relevant parameter combinations, and then the SHAP score can be computed in polynomial time under reasonable general conditions. Also tractable is the case of full acyclic conjunctive queries for certain (natural) similarity functions. We extend our results to conjunctive queries with inequalities between variables and parameters. Finally, we discuss a simple approximation technique for the case of correlated parameters.
A Particle Swarm Optimizer for the search of balanced Boolean functions with good cryptographic properties is proposed in this paper. The algorithm is a modified version of the permutation PSO by Hu, Eberhart and Shi which preserves the Hamming weight of the particles positions, coupled with the Hill Climbing method devised by Millan, Clark and Dawson to improve the nonlinearity and deviation from correlation immunity of Boolean functions. The parameters for the PSO velocity equation are tuned by means of two meta-optimization techniques, namely Local Unimodal Sampling (LUS) and Continuous Genetic Algorithms (CGA), finding that CGA produces better results. Using the CGA-evolved parameters, the PSO algorithm is then run on the spaces of Boolean functions from $n=7$ to $n=12$ variables. The results of the experiments are reported, observing that this new PSO algorithm generates Boolean functions featuring similar or better combinations of nonlinearity, correlation immunity and propagation criterion with respect to the ones obtained by other optimization methods.
Knowledge-grounded dialogue (KGD) learns to generate an informative response based on a given dialogue context and external knowledge (\emph{e.g.}, knowledge graphs; KGs). Recently, the emergence of large language models (LLMs) and pre-training techniques has brought great success to knowledge-grounded dialogue. However, when building KGD systems in real applications, there are various real-world noises that are inevitable to face. For example, the dialogue context might involve perturbations such as misspellings and abbreviations. In addition, KGs typically suffer from incompletion and also might contain erroneous and outdated facts. Such real-world noises pose a challenge to the robustness of KGD systems and hinder their applications in the real world. In this paper, we propose an entity-based contrastive learning framework for improving the robustness of KGD. Specifically, we make use of the entity information in a KGD sample to create both its positive and negative samples which involve semantic-irrelevant and semantic-relevant perturbations, respectively. The contrastive learning framework ensures the KGD model is aware of these two types of perturbations, thus generating informative responses with the potentially noisy inputs in real applications. Experimental results on three benchmark datasets show that our method achieves new state-of-the-art performance in terms of automatic evaluation scores, verifying its effectiveness and potentiality. Furthermore, we show that our method can generate better responses than comparison models in both the noisy and the few-shot settings.
We propose a new approach for approximating functions in $C([0,1]^d)$ via Kolmogorov superposition theorem (KST) based on the linear spline approximation of the K-outer function in Kolmogorov superposition representation. We improve the results in \cite{LaiShenKST21} by showing that the optimal approximation rate based on our proposed approach is $\mathcal{O}(\frac{1}{n^2})$, with $n$ being the number of knots over $[0,1]$, and the approximation constant increases linearly in $d$. We show that there is a dense subclass in $C([0,1]^d)$ whose approximation can achieve such optimal rate, and the number of parameters needed in such approximation is at most $\mathcal{O}(nd)$. Moreover, for $d\geq 4$, we apply the tensor product spline denoising technique to smooth the KB-splines and get the corresponding LKB-splines. We use those LKB-splines as the basis to approximate functions for the cases when $d=4$ and $d=6$, which extends the results in \cite{LaiShenKST21} for $d=2$ and $d=3$. Based on the idea of pivotal data locations introduced in \cite{LaiShenKST21}, we validate via numerical experiments that fewer than $\mathcal{O}(nd)$ function values are enough to achieve the approximation rates such as $\mathcal{O}(\frac{1}{n})$ or $\mathcal{O}(\frac{1}{n^2})$ based on the smoothness of the K-outer function. Finally, we demonstrate that our approach can be applied to numerically solving partial differential equation such as the Poisson equation with accurate approximation results.
The characteristics of data like distribution and heterogeneity, become more complex and counterintuitive as the dimensionality increases. This phenomenon is known as curse of dimensionality, where common patterns and relationships (e.g., internal and boundary pattern) that hold in low-dimensional space may be invalid in higher-dimensional space. It leads to a decreasing performance for the regression, classification or clustering models or algorithms. Curse of dimensionality can be attributed to many causes. In this paper, we first summarize five challenges associated with manipulating high-dimensional data, and explains the potential causes for the failure of regression, classification or clustering tasks. Subsequently, we delve into two major causes of the curse of dimensionality, distance concentration and manifold effect, by performing theoretical and empirical analyses. The results demonstrate that nearest neighbor search (NNS) using three typical distance measurements, Minkowski distance, Chebyshev distance, and cosine distance, becomes meaningless as the dimensionality increases. Meanwhile, the data incorporates more redundant features, and the variance contribution of principal component analysis (PCA) is skewed towards a few dimensions. By interpreting the causes of the curse of dimensionality, we can better understand the limitations of current models and algorithms, and drive to improve the performance of data analysis and machine learning tasks in high-dimensional space.
This paper investigates the spectral norm version of the column subset selection problem. Given a matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ and a positive integer $k\leq\text{rank}(\mathbf{A})$, the objective is to select exactly $k$ columns of $\mathbf{A}$ that minimize the spectral norm of the residual matrix after projecting $\mathbf{A}$ onto the space spanned by the selected columns. We use the method of interlacing polynomials introduced by Marcus-Spielman-Srivastava to derive a new upper bound on the minimal approximation error. This new bound is asymptotically sharp when the matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ obeys a spectral power-law decay. The relevant expected characteristic polynomials can be written as an extension of the expected polynomial for the restricted invertibility problem, incorporating two extra variable substitution operators. Finally, we propose a deterministic polynomial-time algorithm that achieves this error bound up to a computational error.
An infinite sequence of sets $\left\{B_{n}\right\}_{n\in\mathbb{N}}$ is said to be a heterochromatic sequence from an infinite sequence of families $\left\{ \mathcal{F}_{n} \right\}_{n \in \mathbb{N}}$, if there exists a strictly increasing sequence of natural numbers $\left\{ i_{n}\right\}_{n \in \mathbb{N}}$ such that for all $n \in \mathbb{N}$ we have $B_{n} \in \mathcal{F}_{i_{n}}$. In this paper, we have proved that if for each $n\in\mathbb{N}$, $\mathcal{F}_n$ is a family of {\em nicely shaped} convex sets in $\mathbb{R}^d$ such that each heterochromatic sequence $\left\{B_{n}\right\}_{n\in\mathbb{N}}$ from $\left\{ \mathcal{F}_{n} \right\}_{n \in \mathbb{N}}$ contains at least $k+2$ sets that can be pierced by a single $k$-flat ($k$-dimensional affine space) then all but finitely many families in $\left\{\mathcal{F}_{n}\right\}_{n\in \mathbb{N}}$ can be pierced by finitely many $k$-flats. This result can be considered as a {\em countably colorful} generalization of the $(\aleph_0, k+2)$-theorem proved by Keller and Perles (Symposium on Computational Geometry 2022). We have also established the tightness of our result by proving a number of no-go theorems.
In an instance of the weighted Nash Social Welfare problem, we are given a set of $m$ indivisible items, $\mathscr{G}$, and $n$ agents, $\mathscr{A}$, where each agent $i \in \mathscr{A}$ has a valuation $v_{ij}\geq 0$ for each item $j\in \mathscr{G}$. In addition, every agent $i$ has a non-negative weight $w_i$ such that the weights collectively sum up to $1$. The goal is to find an assignment $\sigma:\mathscr{G}\rightarrow \mathscr{A}$ that maximizes $\prod_{i\in \mathscr{A}} \left(\sum_{j\in \sigma^{-1}(i)} v_{ij}\right)^{w_i}$, the product of the weighted valuations of the players. When all the weights equal $\frac1n$, the problem reduces to the classical Nash Social Welfare problem, which has recently received much attention. In this work, we present a $5\cdot\exp\left(2\cdot D_{\text{KL}}(\mathbf{w}\, ||\, \frac{\vec{\mathbf{1}}}{n})\right) = 5\cdot\exp\left(2\log{n} + 2\sum_{i=1}^n w_i \log{w_i}\right)$-approximation algorithm for the weighted Nash Social Welfare problem, where $D_{\text{KL}}(\mathbf{w}\, ||\, \frac{\vec{\mathbf{1}}}{n})$ denotes the KL-divergence between the distribution induced by $\mathbf{w}$ and the uniform distribution on $[n]$. We show a novel connection between the convex programming relaxations for the unweighted variant of Nash Social Welfare presented in \cite{cole2017convex, anari2017nash}, and generalize the programs to two different mathematical programs for the weighted case. The first program is convex and is necessary for computational efficiency, while the second program is a non-convex relaxation that can be rounded efficiently. The approximation factor derives from the difference in the objective values of the convex and non-convex relaxation.
Neural machine translation (NMT) is a deep learning based approach for machine translation, which yields the state-of-the-art translation performance in scenarios where large-scale parallel corpora are available. Although the high-quality and domain-specific translation is crucial in the real world, domain-specific corpora are usually scarce or nonexistent, and thus vanilla NMT performs poorly in such scenarios. Domain adaptation that leverages both out-of-domain parallel corpora as well as monolingual corpora for in-domain translation, is very important for domain-specific translation. In this paper, we give a comprehensive survey of the state-of-the-art domain adaptation techniques for NMT.
We propose a novel approach to multimodal sentiment analysis using deep neural networks combining visual analysis and natural language processing. Our goal is different than the standard sentiment analysis goal of predicting whether a sentence expresses positive or negative sentiment; instead, we aim to infer the latent emotional state of the user. Thus, we focus on predicting the emotion word tags attached by users to their Tumblr posts, treating these as "self-reported emotions." We demonstrate that our multimodal model combining both text and image features outperforms separate models based solely on either images or text. Our model's results are interpretable, automatically yielding sensible word lists associated with emotions. We explore the structure of emotions implied by our model and compare it to what has been posited in the psychology literature, and validate our model on a set of images that have been used in psychology studies. Finally, our work also provides a useful tool for the growing academic study of images - both photographs and memes - on social networks.