We study the complexity of producing $(\delta,\epsilon)$-stationary points of Lipschitz objectives which are possibly neither smooth nor convex, using only noisy function evaluations. Recent works proposed several stochastic zero-order algorithms that solve this task, all of which suffer from a dimension-dependence of $\Omega(d^{3/2})$ where $d$ is the dimension of the problem, which was conjectured to be optimal. We refute this conjecture by providing a faster algorithm that has complexity $O(d\delta^{-1}\epsilon^{-3})$, which is optimal (up to numerical constants) with respect to $d$ and also optimal with respect to the accuracy parameters $\delta,\epsilon$, thus solving an open question due to Lin et al. (NeurIPS'22). Moreover, the convergence rate achieved by our algorithm is also optimal for smooth objectives, proving that in the nonconvex stochastic zero-order setting, nonsmooth optimization is as easy as smooth optimization. We provide algorithms that achieve the aforementioned convergence rate in expectation as well as with high probability. Our analysis is based on a simple yet powerful geometric lemma regarding the Goldstein-subdifferential set, which allows utilizing recent advancements in first-order nonsmooth nonconvex optimization.
We propose a new algorithm for efficiently solving the damped Fisher matrix in large-scale scenarios where the number of parameters significantly exceeds the number of available samples. This problem is fundamental for natural gradient descent and stochastic reconfiguration. Our algorithm is based on Cholesky decomposition and is generally applicable. Benchmark results show that the algorithm is significantly faster than existing methods.
We study the problem of $\textit{vector set search}$ with $\textit{vector set queries}$. This task is analogous to traditional near-neighbor search, with the exception that both the query and each element in the collection are $\textit{sets}$ of vectors. We identify this problem as a core subroutine for semantic search applications and find that existing solutions are unacceptably slow. Towards this end, we present a new approximate search algorithm, DESSERT (${\bf D}$ESSERT ${\bf E}$ffeciently ${\bf S}$earches ${\bf S}$ets of ${\bf E}$mbeddings via ${\bf R}$etrieval ${\bf T}$ables). DESSERT is a general tool with strong theoretical guarantees and excellent empirical performance. When we integrate DESSERT into ColBERT, a state-of-the-art semantic search model, we find a 2-5x speedup on the MS MARCO and LoTTE retrieval benchmarks with minimal loss in recall, underscoring the effectiveness and practical applicability of our proposal.
Obtaining the solutions of partial differential equations based on various machine learning methods has drawn more and more attention in the fields of scientific computation and engineering applications. In this work, we first propose a coupled Extreme Learning Machine (called CELM) method incorporated with the physical laws to solve a class of fourth-order biharmonic equations by reformulating it into two well-posed Poisson problems. In addition, some activation functions including tangent, gauss, sine, and trigonometric (sin+cos) functions are introduced to assess our CELM method. Notably, the sine and trigonometric functions demonstrate a remarkable ability to effectively minimize the approximation error of the CELM model. In the end, several numerical experiments are performed to study the initializing approaches for both the weights and biases of the hidden units in our CELM model and explore the required number of hidden units. Numerical results show the proposed CELM algorithm is high-precision and efficient to address the biharmonic equation in both regular and irregular domains.
Given some binary matrix $M$, suppose we are presented with the collection of its rows and columns in independent arbitrary orderings. From this information, are we able to recover the unique original orderings and matrix? We present an algorithm that identifies whether there is a unique ordering associated with a set of rows and columns, and outputs either the unique correct orderings for the rows and columns or the full collection of all valid orderings and valid matrices. We show that there is a constant $c > 0$ such that the algorithm terminates in $O(n^2)$ time with high probability and in expectation for random $n \times n$ binary matrices with i.i.d.\ Bernoulli $(p)$ entries $(m_{ij})_{ij=1}^n$ such that $\frac{c\log^2(n)}{n(\log\log(n))^2} \leq p \leq \frac{1}{2}$.
For some $\epsilon > 10^{-36}$ we give a randomized $3/2-\epsilon$ approximation algorithm for metric TSP.
We study the recursion-theoretic complexity of Positive Almost-Sure Termination ($\mathsf{PAST}$) in an imperative programming language with rational variables, bounded nondeterministic choice, and discrete probabilistic choice. A program terminates positive almost-surely if, for every scheduler, the program terminates almost-surely and the expected runtime to termination is finite. We show that $\mathsf{PAST}$ for our language is complete for the (lightface) co-analytic sets ($\Pi^1_1$-complete) - this is in contrast to the related notions of Almost-Sure Termination ($\mathsf{AST}$) and Bounded Termination ($\mathsf{BAST}$), both of which are arithmetical ($\Pi^0_2$ and $\Sigma^0_2$ complete respectively). Our upper bound implies an effective procedure to reduce reasoning about probabilistic termination to non-probabilistic fair termination in a model with bounded nondeterminism, and to simple program termination in models with unbounded nondeterminism. Our lower bound shows the opposite: for every program with unbounded nondeterministic choice, there is an effectively computable probabilistic program with bounded choice such that the original program is terminating $iff$ the transformed program is $\mathsf{PAST}$. We show that every program has an effectively computable normal form, in which each probabilistic choice either continues or terminates execution. For normal form programs, we provide the first sound and complete proof rule for $\mathsf{PAST}$. Our proof rule uses transfinite ordinals. We show that reasoning about $\mathsf{PAST}$ requires transfinite ordinals up to $\omega^{CK}_1$; thus, existing techniques for probabilistic termination based on ranking supermartingales that map program states to reals do not suffice to reason about $\mathsf{PAST}$.
We study the existence of finite characterisations for modal formulas. A finite characterisation of a modal formula $\varphi$ is a finite collection of positive and negative examples that distinguishes $\varphi$ from every other, non-equivalent modal formula, where an example is a finite pointed Kripke structure. This definition can be restricted to specific frame classes and to fragments of the modal language: a modal fragment $L$ admits finite characterisations with respect to a frame class $F$ if every formula $\varphi\in L$ has a finite characterisation with respect to $L$ consting of examples that are based on frames in $F$. Finite characterisations are useful for illustration, interactive specification, and debugging of formal specifications, and their existence is a precondition for exact learnability with membership queries. We show that the full modal language admits finite characterisations with respect to a frame class $F$ only when the modal logic of $F$ is locally tabular. We then study which modal fragments, freely generated by some set of connectives, admit finite characterisations. Our main result is that the positive modal language without the truth-constants $\top$ and $\bot$ admits finite characterisations w.r.t. the class of all frames. This result is essentially optimal: finite characterizability fails when the language is extended with the truth constant $\top$ or $\bot$ or with all but very limited forms of negation.
Given a conjunctive query $Q$ and a database $\mathbf{D}$, a direct access to the answers of $Q$ over $\mathbf{D}$ is the operation of returning, given an index $j$, the $j^{\mathsf{th}}$ answer for some order on its answers. While this problem is $\#\mathsf{P}$-hard in general with respect to combined complexity, many conjunctive queries have an underlying structure that allows for a direct access to their answers for some lexicographical ordering that takes polylogarithmic time in the size of the database after a polynomial time precomputation. Previous work has precisely characterised the tractable classes and given fine-grained lower bounds on the precomputation time needed depending on the structure of the query. In this paper, we generalise these tractability results to the case of signed conjunctive queries, that is, conjunctive queries that may contain negative atoms. Our technique is based on a class of circuits that can represent relational data. We first show that this class supports tractable direct access after a polynomial time preprocessing. We then give bounds on the size of the circuit needed to represent the answer set of signed conjunctive queries depending on their structure. Both results combined together allow us to prove the tractability of direct access for a large class of conjunctive queries. On the one hand, we recover the known tractable classes from the literature in the case of positive conjunctive queries. On the other hand, we generalise and unify known tractability results about negative conjunctive queries -- that is, queries having only negated atoms. In particular, we show that the class of $\beta$-acyclic negative conjunctive queries and the class of bounded nest set width negative conjunctive queries admit tractable direct access.
The searching efficiency of the quantum approximate optimization algorithm is dependent on both the classical and quantum sides of the algorithm. Recently a quantum approximate Bayesian optimization algorithm (QABOA) that includes two mixers was developed, where surrogate-based Bayesian optimization is applied to improve the sampling efficiency of the classical optimizer. A continuous-time quantum walk mixer is used to enhance exploration, and the generalized Grover mixer is also applied to improve exploitation. In this paper, an extension of QABOA is proposed to further improve its searching efficiency. The searching efficiency is enhanced through two aspects. First, two mixers, including one for exploration and the other for exploitation, are applied in an alternating fashion. Second, uncertainty of the quantum circuit is quantified with a new quantum Mat\'ern kernel based on the kurtosis of the basis state distribution, which increases the chance of obtaining the optimum. The proposed new two-mixer QABOA$'$s with and without uncertainty quantification are compared with three single-mixer QABOA$'$s on five discrete and four mixed-integer problems. The results show that the proposed two-mixer QABOA with uncertainty quantification has the best performance in efficiency and consistency for five out of the nine tested problems. The results also show that QABOA with the generalized Grover mixer performs the best among the single-mixer algorithms, thereby demonstrating the benefit of exploitation and the importance of dynamic exploration-exploitation balance in improving searching efficiency.
We introduce a generic framework that reduces the computational cost of object detection while retaining accuracy for scenarios where objects with varied sizes appear in high resolution images. Detection progresses in a coarse-to-fine manner, first on a down-sampled version of the image and then on a sequence of higher resolution regions identified as likely to improve the detection accuracy. Built upon reinforcement learning, our approach consists of a model (R-net) that uses coarse detection results to predict the potential accuracy gain for analyzing a region at a higher resolution and another model (Q-net) that sequentially selects regions to zoom in. Experiments on the Caltech Pedestrians dataset show that our approach reduces the number of processed pixels by over 50% without a drop in detection accuracy. The merits of our approach become more significant on a high resolution test set collected from YFCC100M dataset, where our approach maintains high detection performance while reducing the number of processed pixels by about 70% and the detection time by over 50%.