We compute the weight distribution of the ${\mathcal R} (4,9)$ by combining the approach described in D. V. Sarwate's Ph.D. thesis from 1973 with knowledge on the affine equivalence classification of Boolean functions. To solve this problem posed, e.g., in the MacWilliams and Sloane book [p. 447], we apply a refined approach based on the classification of Boolean quartic forms in $8$ variables due to Ph. Langevin and G. Leander, and recent results on the classification of the quotient space ${\mathcal R} (4,7)/{\mathcal R} (2,7)$ due to V. Gillot and Ph. Langevin.
Despite the fundamental role the Quantum Satisfiability (QSAT) problem has played in quantum complexity theory, a central question remains open: At which local dimension does the complexity of QSAT transition from "easy" to "hard"? Here, we study QSAT with each constraint acting on a $k$-dimensional and $l$-dimensional qudit pair, denoted $(k,l)$-QSAT. Our first main result shows that, surprisingly, QSAT on qubits can remain $\mathsf{QMA}_1$-hard, in that $(2,5)$-QSAT is $\mathsf{QMA}_1$-complete. In contrast, $2$-SAT on qubits is well-known to be poly-time solvable [Bravyi, 2006]. Our second main result proves that $(3,d)$-QSAT on the 1D line with $d\in O(1)$ is also $\mathsf{QMA}_1$-hard. Finally, we initiate the study of 1D $(2,d)$-QSAT by giving a frustration-free 1D Hamiltonian with a unique, entangled ground state. Our first result uses a direct embedding, combining a novel clock construction with the 2D circuit-to-Hamiltonian construction of [Gosset, Nagaj, 2013]. Of note is a new simplified and analytic proof for the latter (as opposed to a partially numeric proof in [GN13]). This exploits Unitary Labelled Graphs [Bausch, Cubitt, Ozols, 2017] together with a new "Nullspace Connection Lemma", allowing us to break low energy analyses into small patches of projectors, and to improve the soundness analysis of [GN13] from $\Omega(1/T^6)$ to $\Omega(1/T^2)$, for $T$ the number of gates. Our second result goes via black-box reduction: Given an arbitrary 1D Hamiltonian $H$ on $d'$-dimensional qudits, we show how to embed it into an effective null-space of a 1D $(3,d)$-QSAT instance, for $d\in O(1)$. Our approach may be viewed as a weaker notion of "simulation" (\`a la [Bravyi, Hastings 2017], [Cubitt, Montanaro, Piddock 2018]). As far as we are aware, this gives the first "black-box simulation"-based $\mathsf{QMA}_1$-hardness result, i.e. for frustration-free Hamiltonians.
In this paper, we introduce LLaVA-$\phi$ (LLaVA-Phi), an efficient multi-modal assistant that harnesses the power of the recently advanced small language model, Phi-2, to facilitate multi-modal dialogues. LLaVA-Phi marks a notable advancement in the realm of compact multi-modal models. It demonstrates that even smaller language models, with as few as 2.7B parameters, can effectively engage in intricate dialogues that integrate both textual and visual elements, provided they are trained with high-quality corpora. Our model delivers commendable performance on publicly available benchmarks that encompass visual comprehension, reasoning, and knowledge-based perception. Beyond its remarkable performance in multi-modal dialogue tasks, our model opens new avenues for applications in time-sensitive environments and systems that require real-time interaction, such as embodied agents. It highlights the potential of smaller language models to achieve sophisticated levels of understanding and interaction, while maintaining greater resource efficiency.The project is available at {//github.com/zhuyiche/llava-phi}.
We solve the problem of estimating the distribution of presumed i.i.d. observations for the total variation loss. Our approach is based on density models and is versatile enough to cope with many different ones, including some density models for which the Maximum Likelihood Estimator (MLE for short) does not exist. We mainly illustrate the properties of our estimator on models of densities on the line that satisfy a shape constraint. We show that it possesses some similar optimality properties, with regard to some global rates of convergence, as the MLE does when it exists. It also enjoys some adaptation properties with respect to some specific target densities in the model for which our estimator is proven to converge at parametric rate. More important is the fact that our estimator is robust, not only with respect to model misspecification, but also to contamination, the presence of outliers among the dataset and the equidistribution assumption. This means that the estimator performs almost as well as if the data were i.i.d. with density $p$ in a situation where these data are only independent and most of their marginals are close enough in total variation to a distribution with density $p$. We also show that our estimator converges to the average density of the data, when this density belongs to the model, even when none of the marginal densities belongs to it. Our main result on the risk of the estimator takes the form of an exponential deviation inequality which is non-asymptotic and involves explicit numerical constants. We deduce from it several global rates of convergence, including some bounds for the minimax $\mathbb{L}_{1}$-risks over the sets of concave and log-concave densities. These bounds derive from some specific results on the approximation of densities which are monotone, convex, concave and log-concave. Such results may be of independent interest.
Factor Analysis based on multivariate $t$ distribution ($t$fa) is a useful robust tool for extracting common factors on heavy-tailed or contaminated data. However, $t$fa is only applicable to vector data. When $t$fa is applied to matrix data, it is common to first vectorize the matrix observations. This introduces two challenges for $t$fa: (i) the inherent matrix structure of the data is broken, and (ii) robustness may be lost, as vectorized matrix data typically results in a high data dimension, which could easily lead to the breakdown of $t$fa. To address these issues, starting from the intrinsic matrix structure of matrix data, a novel robust factor analysis model, namely bilinear factor analysis built on the matrix-variate $t$ distribution ($t$bfa), is proposed in this paper. The novelty is that it is capable to simultaneously extract common factors for both row and column variables of interest on heavy-tailed or contaminated matrix data. Two efficient algorithms for maximum likelihood estimation of $t$bfa are developed. Closed-form expression for the Fisher information matrix to calculate the accuracy of parameter estimates are derived. Empirical studies are conducted to understand the proposed $t$bfa model and compare with related competitors. The results demonstrate the superiority and practicality of $t$bfa. Importantly, $t$bfa exhibits a significantly higher breakdown point than $t$fa, making it more suitable for matrix data.
Given a set of objects O in the plane, the corresponding intersection graph is defined as follows. A vertex is created for each object and an edge joins two vertices whenever the corresponding objects intersect. We study here the case of unit segments and polylines with exactly k bends. In the recognition problem, we are given a graph and want to decide whether the graph can be represented as the intersection graph of certain geometric objects. In previous work it was shown that various recognition problems are $\exists\mathbb{R}$-complete, leaving unit segments and polylines as few remaining natural cases. We show that recognition for both families of objects is $\exists\mathbb{R}$-complete.
In large scale machine learning, random sampling is a popular way to approximate datasets by a small representative subset of examples. In particular, sensitivity sampling is an intensely studied technique which provides provable guarantees on the quality of approximation, while reducing the number of examples to the product of the VC dimension $d$ and the total sensitivity $\mathfrak S$ in remarkably general settings. However, guarantees going beyond this general bound of $\mathfrak S d$ are known in perhaps only one setting, for $\ell_2$ subspace embeddings, despite intense study of sensitivity sampling in prior work. In this work, we show the first bounds for sensitivity sampling for $\ell_p$ subspace embeddings for $p > 2$ that improve over the general $\mathfrak S d$ bound, achieving a bound of roughly $\mathfrak S^{2-2/p}$ for $2<p<\infty$. Furthermore, our techniques yield further new results in the study of sampling algorithms, showing that the root leverage score sampling algorithm achieves a bound of roughly $d$ for $1\leq p<2$, and that a combination of leverage score and sensitivity sampling achieves an improved bound of roughly $d^{2/p}\mathfrak S^{2-4/p}$ for $2<p<\infty$. Our sensitivity sampling results yield the best known sample complexity for a wide class of structured matrices that have small $\ell_p$ sensitivity.
In deep learning, classification tasks are formalized as optimization problems solved via the minimization of the cross-entropy. However, recent advancements in the design of objective functions allow the $f$-divergence measure to generalize the formulation of the optimization problem for classification. With this goal in mind, we adopt a Bayesian perspective and formulate the classification task as a maximum a posteriori probability problem. We propose a class of objective functions based on the variational representation of the $f$-divergence, from which we extract a list of five posterior probability estimators leveraging well-known $f$-divergences. In addition, driven by the challenge of improving the state-of-the-art approach, we propose a bottom-up method that leads us to the formulation of a new objective function (and posterior probability estimator) corresponding to a novel $f$-divergence referred to as shifted log (SL). First, we theoretically prove the convergence property of the posterior probability estimators. Then, we numerically test the set of proposed objective functions in three application scenarios: toy examples, image data sets, and signal detection/decoding problems. The analyzed tasks demonstrate the effectiveness of the proposed estimators and that the SL divergence achieves the highest classification accuracy in almost all the scenarios.
We develop the theory of illfounded and cyclic proof systems in the context of the modal $\mu$-calculus. A fine analysis of provability and admissibility bridges the finitary, cyclic and illfounded notions of proof for this logic and re-enforces the subtlety of two important normal form theorems: guardedness and disjunctiveness.
The generalized inverse Gaussian, denoted $\mathrm{GIG}(p, a, b)$, is a flexible family of distributions that includes the gamma, inverse gamma, and inverse Gaussian distributions as special cases. In this article, we derive two novel mixture representations for the $\mathrm{GIG}(p, a, b)$: one that expresses the distribution as a continuous mixture of inverse Gaussians and another one that expresses it as a continuous mixture of truncated exponentials. Beyond their conceptual interest, these representations are useful for random number generation. We use the first representation to derive a geometrically ergodic Gibbs sampler whose stationary distribution is $\mathrm{GIG}(p, a, b)$, and the second one to define a recursive algorithm to generate exact independent draws from the distribution for half-integer $p$. Additionally, the second representation gives rise to a recursive algorithm for evaluating the cumulative distribution function of the $\mathrm{GIG}(p, a, b)$ for half-integer $p$. The algorithms are simple and can be easily implemented in standard programming languages.
Fix a positive integer $n$, a real number $p\in (0,1]$, and a (perhaps random) hypergraph $\mathcal{H}$ on $[n]$. We introduce and investigate the following random multigraph model, which we denote $\mathbb{G}(n,p\, ; \,\mathcal{H})$: begin with an empty graph on $n$ vertices, which are labelled by the set $[n]$. For every $H\in \mathcal{H}$ choose, independently from previous choices, a doubleton from $H$, say $D = \{i,j\} \subset H$, uniformly at random and then introduce an edge between the vertices $i$ and $j$ in the graph with probability $p$, where each edge is introduced independently of all other edges.