In this paper we show derivations among logarithmic space bounded counting classes based on closure properties of $#L$ that leads us to the result that $NL=PL=C_=L$.
We present FIMO, an innovative dataset comprising formal mathematical problem statements sourced from the International Mathematical Olympiad (IMO) Shortlisted Problems. Designed to facilitate advanced automated theorem proving at the IMO level, FIMO is currently tailored for the Lean formal language. It comprises 149 formal problem statements, accompanied by both informal problem descriptions and their corresponding LaTeX-based informal proofs. Through initial experiments involving GPT-4, our findings underscore the existing limitations in current methodologies, indicating a substantial journey ahead before achieving satisfactory IMO-level automated theorem proving outcomes.
We present new large-scale algorithms for fitting a subgradient regularized multivariate convex regression function to $n$ samples in $d$ dimensions -- a key problem in shape constrained nonparametric regression with applications in statistics, engineering and the applied sciences. The infinite-dimensional learning task can be expressed via a convex quadratic program (QP) with $O(nd)$ decision variables and $O(n^2)$ constraints. While instances with $n$ in the lower thousands can be addressed with current algorithms within reasonable runtimes, solving larger problems (e.g., $n\approx 10^4$ or $10^5$) is computationally challenging. To this end, we present an active set type algorithm on the dual QP. For computational scalability, we allow for approximate optimization of the reduced sub-problems; and propose randomized augmentation rules for expanding the active set. We derive novel computational guarantees for our algorithms. We demonstrate that our framework can approximately solve instances of the subgradient regularized convex regression problem with $n=10^5$ and $d=10$ within minutes; and shows strong computational performance compared to earlier approaches.
We consider the k-diameter clustering problem, where the goal is to partition a set of points in a metric space into $k$ clusters, minimizing the maximum distance between any two points in the same cluster. In general metrics, k-diameter is known to be NP-hard, while it has a $2$-approximation algorithm (Gonzalez'85). Complementing this algorithm, it is known that k-diameter is NP-hard to approximate within a factor better than $2$ in the $\ell_1$ and $\ell_\infty$ metrics, and within a factor of $1.969$ in the $\ell_2$ metric (Feder-Greene'88). When $k\geq 3$ is fixed, k-diameter remains NP-hard to approximate within a factor better than $2$ in the $\ell_\infty$ metric (Megiddo'90). However, its approximability in this setting has not previously been studied in the $\ell_1$ and $\ell_2$ metrics, though a $1.415$-approximation algorithm in the $\ell_2$ metric follows from a known result (Badoiu et al.'02). In this paper, we address the remaining gap by showing new hardness of approximation results that hold even when $k=3$. Specifically, we prove that 3-diameter is NP-hard to approximate within a factor better than $1.5$ in the $\ell_1$ metric, and within a factor of $1.304$ in the $\ell_2$ metric.
This paper addresses the output-sensitive complexity for linear multi-objective integer minimum cost flow (MOIMCF) problems and provides insights about the time complexity for enumerating all supported nondominated vectors. The paper shows that there can not exist an output-polynomial time algorithm for the enumeration of all supported nondominated vectors that determine the vectors in an ordered way in the outcome space unless NP = P. Moreover, novel methods for identifying supported nondominated vectors in bi-objective minimum cost flow (BOIMCF) problems are proposed, accompanied by a numerical comparison between decision- and objective-space methods. A novel, equivalent and more compact formulation of the minimum cost flow ILP formulation used in the e-constrained-scalarization approach is introduced, demonstrating enhanced efficiency in the numerical tests
We investigate a fundamental vertex-deletion problem called (Induced) Subgraph Hitting: given a graph $G$ and a set $\mathcal{F}$ of forbidden graphs, the goal is to compute a minimum-sized set $S$ of vertices of $G$ such that $G-S$ does not contain any graph in $\mathcal{F}$ as an (induced) subgraph. This is a generic problem that encompasses many well-known problems that were extensively studied on their own, particularly (but not only) from the perspectives of both approximation and parameterization. In this paper, we study the approximability of the problem on a large variety of graph classes. Our first result is a linear-time $(1+\varepsilon)$-approximation reduction from (Induced) Subgraph Hitting on any graph class $\mathcal{G}$ of bounded expansion to the same problem on bounded degree graphs within $\mathcal{G}$. This directly yields linear-size $(1+\varepsilon)$-approximation lossy kernels for the problems on any bounded-expansion graph classes. Our second result is a linear-time approximation scheme for (Induced) Subgraph Hitting on any graph class $\mathcal{G}$ of polynomial expansion, based on the local-search framework of Har-Peled and Quanrud [SICOMP 2017]. This approximation scheme can be applied to a more general family of problems that aim to hit all subgraphs satisfying a certain property $\pi$ that is efficiently testable and has bounded diameter. Both of our results have applications to Subgraph Hitting (not induced) on wide classes of geometric intersection graphs, resulting in linear-size lossy kernels and (near-)linear time approximation schemes for the problem.
The classical way of extending an $[n, k, d]$ linear code $\C$ is to add an overall parity-check coordinate to each codeword of the linear code $\C$. This extended code, denoted by $\overline{\C}(-\bone)$ and called the standardly extended code of $\C$, is a linear code with parameters $[n+1, k, \bar{d}]$, where $\bar{d}=d$ or $\bar{d}=d+1$. This is one of the two extending techniques for linear codes in the literature. The standardly extended codes of some families of binary linear codes have been studied to some extent. However, not much is known about the standardly extended codes of nonbinary codes. For example, the minimum distances of the standardly extended codes of the nonbinary Hamming codes remain open for over 70 years. The first objective of this paper is to introduce the nonstandardly extended codes of a linear code and develop some general theory for this type of extended linear codes. The second objective is to study this type of extended codes of a number of families of linear codes, including cyclic codes and nonbinary Hamming codes. Four families of distance-optimal or dimension-optimal linear codes are obtained with this extending technique. The parameters of certain extended codes of many families of linear codes are settled in this paper.
We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data. To do this, we define a common format, "visual sentences", in which we can represent raw images and videos as well as annotated data sources such as semantic segmentations and depth reconstructions without needing any meta-knowledge beyond the pixels. Once this wide variety of visual data (comprising 420 billion tokens) is represented as sequences, the model can be trained to minimize a cross-entropy loss for next token prediction. By training across various scales of model architecture and data diversity, we provide empirical evidence that our models scale effectively. Many different vision tasks can be solved by designing suitable visual prompts at test time.
Let $G$ be a graph with $n$ vertices and $m$ edges. One of several hierarchies towards the stability number of $G$ is the exact subgraph hierarchy (ESH). On the first level it computes the Lov\'{a}sz theta function $\vartheta(G)$ as semidefinite program (SDP) with a matrix variable of order $n+1$ and $n+m+1$ constraints. On the $k$-th level it adds all exact subgraph constraints (ESC) for subgraphs of order $k$ to the SDP. An ESC ensures that the submatrix of the matrix variable corresponding to the subgraph is in the correct polytope. By including only some ESCs into the SDP the ESH can be exploited computationally. In this paper we introduce a variant of the ESH that computes $\vartheta(G)$ through an SDP with a matrix variable of order $n$ and $m+1$ constraints. We show that it makes sense to include the ESCs into this SDP and introduce the compressed ESH (CESH) analogously to the ESH. Computationally the CESH seems favorable as the SDP is smaller. However, we prove that the bounds based on the ESH are always at least as good as those of the CESH. In computational experiments sometimes they are significantly better. We also introduce scaled ESCs (SESCs), which are a more natural way to include exactness constraints into the smaller SDP and we prove that including an SESC is equivalent to including an ESC for every subgraph.
We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.
In this paper, we propose a conceptually simple and geometrically interpretable objective function, i.e. additive margin Softmax (AM-Softmax), for deep face verification. In general, the face verification task can be viewed as a metric learning problem, so learning large-margin face features whose intra-class variation is small and inter-class difference is large is of great importance in order to achieve good performance. Recently, Large-margin Softmax and Angular Softmax have been proposed to incorporate the angular margin in a multiplicative manner. In this work, we introduce a novel additive angular margin for the Softmax loss, which is intuitively appealing and more interpretable than the existing works. We also emphasize and discuss the importance of feature normalization in the paper. Most importantly, our experiments on LFW BLUFR and MegaFace show that our additive margin softmax loss consistently performs better than the current state-of-the-art methods using the same network architecture and training dataset. Our code has also been made available at //github.com/happynear/AMSoftmax