Hamiltonian simulation is one of the most important problems in the field of quantum computing. There have been extended efforts on designing algorithms for faster simulation, and the evolution time $T$ for the simulation turns out to largely affect algorithm runtime. While there are some specific types of Hamiltonians that can be fast-forwarded, i.e., simulated within time $o(T)$, for large enough classes of Hamiltonians (e.g., all local/sparse Hamiltonians), existing simulation algorithms require running time at least linear in the evolution time $T$. On the other hand, while there exist lower bounds of $\Omega(T)$ circuit size for some large classes of Hamiltonian, these lower bounds do not rule out the possibilities of Hamiltonian simulation with large but "low-depth" circuits by running things in parallel. Therefore, it is intriguing whether we can achieve fast Hamiltonian simulation with the power of parallelism. In this work, we give a negative result for the above open problem, showing that sparse Hamiltonians and (geometrically) local Hamiltonians cannot be parallelly fast-forwarded. In the oracle model, we prove that there are time-independent sparse Hamiltonians that cannot be simulated via an oracle circuit of depth $o(T)$. In the plain model, relying on the random oracle heuristic, we show that there exist time-independent local Hamiltonians and time-dependent geometrically local Hamiltonians that cannot be simulated via an oracle circuit of depth $o(T/n^c)$, where the Hamiltonians act on $n$-qubits, and $c$ is a constant.
We observe n possibly dependent random variables, the distribution of which is presumed to be stationary even though this might not be true, and we aim at estimating the stationary distribution. We establish a non-asymptotic deviation bound for the Hellinger distance between the target distribution and our estimator. If the dependence within the observations is small, the estimator performs as good as if the data were independent and identically distributed. In addition our estimator is robust to misspecification and contamination. If the dependence is too high but the observed process is mixing, we can select a subset of observations that is almost independent and retrieve results similar to what we have in the i.i.d. case. We apply our procedure to the estimation of the invariant distribution of a diffusion process and to finite state space hidden Markov models.
Positive linear programs (LPs) model many graph and operations research problems. One can solve for a $(1+\epsilon)$-approximation for positive LPs, for any selected $\epsilon$, in polylogarithmic depth and near-linear work via variations of the multiplicative weight update (MWU) method. Despite extensive theoretical work on these algorithms through the decades, their empirical performance is not well understood. In this work, we implement and test an efficient parallel algorithm for solving positive LP relaxations, and apply it to graph problems such as densest subgraph, bipartite matching, vertex cover and dominating set. We accelerate the algorithm via a new step size search heuristic. Our implementation uses sparse linear algebra optimization techniques such as fusion of vector operations and use of sparse format. Furthermore, we devise an implicit representation for graph incidence constraints. We demonstrate the parallel scalability with the use of threading OpenMP and MPI on the Stampede2 supercomputer. We compare this implementation with exact libraries and specialized libraries for the above problems in order to evaluate MWU's practical standing for both accuracy and performance among other methods. Our results show this implementation is faster than general purpose LP solvers (IBM CPLEX, Gurobi) in all of our experiments, and in some instances, outperforms state-of-the-art specialized parallel graph algorithms.
We consider the problem of mixed sparse linear regression with two components, where two real $k$-sparse signals $\beta_1, \beta_2$ are to be recovered from $n$ unlabelled noisy linear measurements. The sparsity is allowed to be sublinear in the dimension, and additive noise is assumed to be independent Gaussian with variance $\sigma^2$. Prior work has shown that the problem suffers from a $\frac{k}{SNR^2}$-to-$\frac{k^2}{SNR^2}$ statistical-to-computational gap, resembling other computationally challenging high-dimensional inference problems such as Sparse PCA and Robust Sparse Mean Estimation; here $SNR$ is the signal-to-noise ratio. We establish the existence of a more extensive computational barrier for this problem through the method of low-degree polynomials, but show that the problem is computationally hard only in a very narrow symmetric parameter regime. We identify a smooth information-computation tradeoff between the sample complexity $n$ and runtime for any randomized algorithm in this hard regime. Via a simple reduction, this provides novel rigorous evidence for the existence of a computational barrier to solving exact support recovery in sparse phase retrieval with sample complexity $n = \tilde{o}(k^2)$. Our second contribution is to analyze a simple thresholding algorithm which, outside of the narrow regime where the problem is hard, solves the associated mixed regression detection problem in $O(np)$ time with square-root the number of samples and matches the sample complexity required for (non-mixed) sparse linear regression; this allows the recovery problem to be subsequently solved by state-of-the-art techniques from the dense case. As a special case of our results, we show that this simple algorithm is order-optimal among a large family of algorithms in solving exact signed support recovery in sparse linear regression.
Random walks (or Markov chains) are models extensively used in theoretical computer science. Several tools, including analysis of quantities such as hitting and mixing times, are helpful for devising randomized algorithms. A notable example is Sch\"oning's algorithm for the satisfiability (SAT) problem. In this work, we use the density-matrix formalism to define a quantum Markov chain model which directly generalizes classical walks, and we show that a common tools such as hitting times can be computed with a similar formula as the one found in the classical theory, which we then apply to known quantum settings such as Grover's algorithm.
Gaussian variational inference and the Laplace approximation are popular alternatives to Markov chain Monte Carlo that formulate Bayesian posterior inference as an optimization problem, enabling the use of simple and scalable stochastic optimization algorithms. However, a key limitation of both methods is that the solution to the optimization problem is typically not tractable to compute; even in simple settings the problem is nonconvex. Thus, recently developed statistical guarantees -- which all involve the (data) asymptotic properties of the global optimum -- are not reliably obtained in practice. In this work, we provide two major contributions: a theoretical analysis of the asymptotic convexity properties of variational inference with a Gaussian family and the maximum a posteriori (MAP) problem required by the Laplace approximation; and two algorithms -- consistent Laplace approximation (CLA) and consistent stochastic variational inference (CSVI) -- that exploit these properties to find the optimal approximation in the asymptotic regime. Both CLA and CSVI involve a tractable initialization procedure that finds the local basin of the optimum, and CSVI further includes a scaled gradient descent algorithm that provably stays locally confined to that basin. Experiments on nonconvex synthetic and real-data examples show that compared with standard variational and Laplace approximations, both CSVI and CLA improve the likelihood of obtaining the global optimum of their respective optimization problems.
The scale of large pre-trained models (PTMs) poses significant challenges in adapting to downstream tasks due to the high optimization overhead and storage costs associated with full-parameter fine-tuning. To address this, many studies explore parameter-efficient tuning methods, also framed as "delta tuning", which updates only a small subset of parameters, known as "delta modules", while keeping the backbone model's parameters fixed. However, the practicality and flexibility of delta tuning have been limited due to existing implementations that directly modify the code of the backbone PTMs and hard-code specific delta tuning methods for each PTM. In this paper, we present OpenDelta, an open-source library that overcomes these limitations by providing a plug-and-play implementation of various delta tuning methods. Our novel techniques eliminate the need to modify the backbone PTMs' code, making OpenDelta compatible with different, even novel PTMs. OpenDelta is designed to be simple, modular, and extensible, providing a comprehensive platform for researchers and practitioners to adapt large PTMs efficiently.
The Independent Cutset problem asks whether there is a set of vertices in a given graph that is both independent and a cutset. Such a problem is $\textsf{NP}$-complete even when the input graph is planar and has maximum degree five. In this paper, we first present a $\mathcal{O}^*(1.4423^{n})$-time algorithm for the problem. We also show how to compute a minimum independent cutset (if any) in the same running time. Since the property of having an independent cutset is MSO$_1$-expressible, our main results are concerned with structural parameterizations for the problem considering parameters that are not bounded by a function of the clique-width of the input. We present $\textsf{FPT}$-time algorithms for the problem considering the following parameters: the dual of the maximum degree, the dual of the solution size, the size of a dominating set (where a dominating set is given as an additional input), the size of an odd cycle transversal, the distance to chordal graphs, and the distance to $P_5$-free graphs. We close by introducing the notion of $\alpha$-domination, which allows us to identify more fixed-parameter tractable and polynomial-time solvable cases.
For a graph class $\mathcal{G}$, we define the $\mathcal{G}$-modular cardinality of a graph $G$ as the minimum size of a vertex partition of $G$ into modules that each induces a graph in $\mathcal{G}$. This generalizes other module-based graph parameters such as neighborhood diversity and iterated type partition. Moreover, if $\mathcal{G}$ has bounded modular-width, the W[1]-hardness of a problem in $\mathcal{G}$-modular cardinality implies hardness on modular-width, clique-width, and other related parameters. On the other hand, fixed-parameter tractable (FPT) algorithms in $\mathcal{G}$-modular cardinality may provide new ideas for algorithms using such parameters. Several FPT algorithms based on modular partitions compute a solution table in each module, then combine each table into a global solution. This works well when each table has a succinct representation, but as we argue, when no such representation exists, the problem is typically W[1]-hard. We illustrate these ideas on the generic $(\alpha, \beta)$-domination problem, which asks for a set of vertices that contains at least a fraction $\alpha$ of the adjacent vertices of each unchosen vertex, plus some (possibly negative) amount $\beta$. This generalizes known domination problems such as Bounded Degree Deletion, $k$-Domination, and $\alpha$-Domination. We show that for graph classes $\mathcal{G}$ that require arbitrarily large solution tables, these problems are W[1]-hard in the $\mathcal{G}$-modular cardinality, whereas they are fixed-parameter tractable when they admit succinct solution tables. This leads to several new positive and negative results for many domination problems parameterized by known and novel structural graph parameters such as clique-width, modular-width, and $cluster$-modular cardinality.
Tensor networks have been an important concept and technique in many research areas, such as quantum computation and machine learning. We study the exponential complexity of contracting tensor networks on two special graph structures: planar graphs and finite element graphs. We prove that any finite element graph has a $O(d\sqrt{\max\{\Delta,d\}N})$ size edge separator. Furthermore, we develop a $2^{O(d\sqrt{\max\{\Delta,d\}N})}$ time algorithm to contracting a tensor network consisting of $N$ Boolean tensors, whose underlying graph is a finite element graph with maximum degree $\Delta$ and has no face with more than $d$ boundary edges in the planar skeleton, based on the $2^{O(\sqrt{\Delta N})}$ time algorithm \cite{fastcounting} for planar Boolean tensor network contractions. We use two methods to accelerate the exponential algorithms by transferring high-dimensional tensors to low-dimensional tensors. We put up a $O(k)$ size planar gadget for any Boolean symmetric tensor of dimension $k$, where the gadget only consists of Boolean tensors with dimension no more than $5$. Another method is decomposing any tensor into a series of vectors (unary functions), according to its \emph{CP decomposition} \cite{tensor-rank}. We also prove the sub-exponential time lower bound for contracting tensor networks under the counting \emph{Exponential Time Hypothesis} (\#ETH) holds.
Transshipment, also known under the names of earth mover's distance, uncapacitated min-cost flow, or Wasserstein's metric, is an important and well-studied problem that asks to find a flow of minimum cost that routes a general demand vector. Adding to its importance, recent advancements in our understanding of algorithms for transshipment have led to breakthroughs for the fundamental problem of computing shortest paths. Specifically, the recent near-optimal $(1+\varepsilon)$-approximate single-source shortest path algorithms in the parallel and distributed settings crucially solve transshipment as a central step of their approach. The key property that differentiates transshipment from other similar problems like shortest path is the so-called \emph{boosting}: one can boost a (bad) approximate solution to a near-optimal $(1 + \varepsilon)$-approximate solution. This conceptually reduces the problem to finding an approximate solution. However, not all approximations can be boosted -- there have been several proposed approaches that were shown to be susceptible to boosting, and a few others where boosting was left as an open question. The main takeaway of our paper is that any black-box $\alpha$-approximate transshipment solver that computes a \emph{dual} solution can be boosted to an $(1 + \varepsilon)$-approximate solver. Moreover, we significantly simplify and decouple previous approaches to transshipment (in sequential, parallel, and distributed settings) by showing all of them (implicitly) obtain approximate dual solutions. Our analysis is very simple and relies only on the well-known multiplicative weights framework. Furthermore, to keep the paper completely self-contained, we provide a new (and arguably much simpler) analysis of multiplicative weights that leverages well-known optimization tools to bypass the ad-hoc calculations used in the standard analyses.