A permutation $\pi: [n] \rightarrow [n]$ is a Baxter permutation if and only if it does not contain either of the patterns $2-41-3$ and $3-14-2$. Baxter permutations are one of the most widely studied subclasses of general permutation due to their connections with various combinatorial objects such as plane bipolar orientations and mosaic floorplans, etc. In this paper, we introduce a novel succinct representation (i.e., using $o(n)$ additional bits from their information-theoretical lower bounds) for Baxter permutations of size $n$ that supports $\pi(i)$ and $\pi^{-1}(j)$ queries for any $i \in [n]$ in $O(f_1(n))$ and $O(f_2(n))$ time, respectively. Here, $f_1(n)$ and $f_2(n)$ are arbitrary increasing functions that satisfy the conditions $\omega(\log n)$ and $\omega(\log^2 n)$, respectively. This stands out as the first succinct representation with sub-linear worst-case query times for Baxter permutations. Additionally, we consider a subclass of Baxter permutations called \textit{separable permutations}, which do not contain either of the patterns $2-4-1-3$ and $3-1-4-2$. In this paper, we provide the first succinct representation of the separable permutation $\rho: [n] \rightarrow [n]$ of size $n$ that supports both $\rho(i)$ and $\rho^{-1}(j)$ queries in $O(1)$ time. In particular, this result circumvents Golynski's [SODA 2009] lower bound result for trade-offs between redundancy and $\rho(i)$ and $\rho^{-1}(j)$ queries. Moreover, as applications of these permutations with the queries, we also introduce the first succinct representations for mosaic/slicing floorplans, and plane bipolar orientations, which can further support specific navigational queries on them efficiently.
We demonstrate the inter-translatability of proofs between the most prominent sequent-based formalisms for G\"odel-L\"ob provability logic. In particular, we consider Sambin and Valentini's sequent system GLseq, Shamkanov's non-wellfounded and cyclic sequent systems GL$\infty$ and GLcirc, Poggiolesi's tree-hypersequent system CSGL, and Negri's labeled sequent system G3GL. Shamkanov showed how to transform proofs between GLseq, GL$\infty$, and GLcirc, and Gor\'e and Ramanayake showed how to transform proofs between CSGL and G3GL, however, the exact nature of proof transformations between the former three systems and the latter two systems has remained an open problem. We solve this open problem by showing how to restructure tree-hypersequent proofs into an end-active form and introduce a novel linearization technique that transforms such proofs into linear nested sequent proofs. As a result, we obtain a new proof-theoretic tool for extracting linear nested sequent systems from tree-hypersequent systems, which yields the first cut-free linear nested sequent calculus LNGL for G\"odel-L\"ob provability logic. We show how to transform proofs in LNGL into a certain normal form, where proofs repeat in stages of modal and local rule applications, and which are translatable into GLseq and G3GL proofs. These new syntactic transformations, together with those mentioned above, establish full proof-theoretic correspondences between GLseq, GL$\infty$, GLcirc, CSGL, G3GL, and LNGL while also giving (to the best of the author's knowledge) the first constructive proof mappings between structural (viz. labeled, tree-hypersequent, and linear nested sequent) systems and a cyclic sequent system.
The singular value decomposition (SVD) allows to put a matrix as a product of three matrices: a matrix with the left singular vectors, a matrix with the positive-valued singular values and a matrix with the right singular vectors. There are two main approaches allowing to get the SVD result: the classical method and the randomized method. The analysis of the classical approach leads to accurate singular values. The randomized approach is especially used for high dimensional matrix and is based on the approximation accuracy without computing necessary all singular values. In this paper, the SVD computation is formalized as an optimization problem and a use of the gradient search algorithm. That results in a power method allowing to get all or the first largest singular values and their associated right vectors. In this iterative search, the accuracy on the singular values and the associated vector matrix depends on the user settings. Two applications of the SVD are the principal component analysis and the autoencoder used in the neural network models.
Given an integer or a non-negative integer solution $x$ to a system $Ax = b$, where the number of non-zero components of $x$ is at most $n$. This paper addresses the following question: How closely can we approximate $b$ with $Ay$, where $y$ is an integer or non-negative integer solution constrained to have at most $k$ non-zero components with $k<n$? We establish upper and lower bounds for this question in general. In specific cases, these bounds match. The key finding is that the quality of the approximation increases exponentially as $k$ goes to $n$.
In this paper, we investigate 1D elliptic equations $-\nabla\cdot (a\nabla u)=f$ with rough diffusion coefficients $a$ that satisfy $0<a_{\min}\le a\le a_{\max}<\infty$ and $f\in L_2(\Omega)$. To achieve an accurate and robust numerical solution on a coarse mesh of size $H$, we introduce a derivative-orthogonal wavelet-based framework. This approach incorporates both regular and specialized basis functions constructed through a novel technique, defining a basis function space that enables effective approximation. We develop a derivative-orthogonal wavelet multiscale method tailored for this framework, proving that the condition number $\kappa$ of the stiffness matrix satisfies $\kappa\le a_{\max}/a_{\min}$, independent of $H$. For the error analysis, we establish that the energy and $L_2$-norm errors of our method converge at first-order and second-order rates, respectively, for any coarse mesh $H$. Specifically, the energy and $L_2$-norm errors are bounded by $2 a_{\min}^{-1/2} \|f\|_{L_2(\Omega)} H$ and $4 a_{\min}^{-1}\|f\|_{L_2(\Omega)} H^2$. Moreover, the numerical approximated solution also possesses the interpolation property at all grid points. We present a range of challenging test cases with continuous, discontinuous, high-frequency, and high-contrast coefficients $a$ to evaluate errors in $u, u'$ and $a u'$ in both $l_2$ and $l_\infty$ norms. We also provide a numerical example that both coefficient $a$ and source term $f$ contain discontinuous, high-frequency and high-contrast oscillations. Additionally, we compare our method with the standard second-order finite element method to assess error behaviors and condition numbers when the mesh is not fine enough to resolve coefficient oscillations. Numerical results confirm the bounded condition numbers and convergence rates, affirming the effectiveness of our approach.
We study the asymptotic discrepancy of $m \times m$ matrices $A_1,\ldots,A_n$ belonging to the Gaussian orthogonal ensemble, which is a class of random symmetric matrices with independent normally distributed entries. In the setting $m^2 = o(n)$, our results show that there exists a signing $x \in \{\pm1\}^n$ such that the spectral norm of $\sum_{i=1}^n x_iA_i$ is $\Theta(\sqrt{nm}4^{-(1 + o(1))n/m^2})$ with high probability. This is best possible and settles a recent conjecture by Kunisky and Zhang.
The class of type-two basic feasible functionals ($\mathtt{BFF}_2$) is the analogue of $\mathtt{FP}$ (polynomial time functions) for type-2 functionals, that is, functionals that can take (first-order) functions as arguments. $\mathtt{BFF}_2$ can be defined through Oracle Turing machines with running time bounded by second-order polynomials. On the other hand, higher-order term rewriting provides an elegant formalism for expressing higher-order computation. We address the problem of characterizing $\mathtt{BFF}_2$ by higher-order term rewriting. Various kinds of interpretations for first-order term rewriting have been introduced in the literature for proving termination and characterizing first-order complexity classes. In this paper, we consider a recently introduced notion of cost-size interpretations for higher-order term rewriting and see second order rewriting as ways of computing type-2 functionals. We then prove that the class of functionals represented by higher-order terms admitting polynomially bounded cost-size interpretations exactly corresponds to $\mathtt{BFF}_2$.
We study the density estimation problem defined as follows: given $k$ distributions $p_1, \ldots, p_k$ over a discrete domain $[n]$, as well as a collection of samples chosen from a ``query'' distribution $q$ over $[n]$, output $p_i$ that is ``close'' to $q$. Recently~\cite{aamand2023data} gave the first and only known result that achieves sublinear bounds in {\em both} the sampling complexity and the query time while preserving polynomial data structure space. However, their improvement over linear samples and time is only by subpolynomial factors. Our main result is a lower bound showing that, for a broad class of data structures, their bounds cannot be significantly improved. In particular, if an algorithm uses $O(n/\log^c k)$ samples for some constant $c>0$ and polynomial space, then the query time of the data structure must be at least $k^{1-O(1)/\log \log k}$, i.e., close to linear in the number of distributions $k$. This is a novel \emph{statistical-computational} trade-off for density estimation, demonstrating that any data structure must use close to a linear number of samples or take close to linear query time. The lower bound holds even in the realizable case where $q=p_i$ for some $i$, and when the distributions are flat (specifically, all distributions are uniform over half of the domain $[n]$). We also give a simple data structure for our lower bound instance with asymptotically matching upper bounds. Experiments show that the data structure is quite efficient in practice.
We study the problem of $(\epsilon,\delta)$-certified machine unlearning for minimax models. Most of the existing works focus on unlearning from standard statistical learning models that have a single variable and their unlearning steps hinge on the direct Hessian-based conventional Newton update. We develop a new $(\epsilon,\delta)$-certified machine unlearning algorithm for minimax models. It proposes a minimax unlearning step consisting of a total-Hessian-based complete Newton update and the Gaussian mechanism borrowed from differential privacy. To obtain the unlearning certification, our method injects calibrated Gaussian noises by carefully analyzing the "sensitivity" of the minimax unlearning step (i.e., the closeness between the minimax unlearning variables and the retraining-from-scratch variables). We derive the generalization rates in terms of population strong and weak primal-dual risk for three different cases of loss functions, i.e., (strongly-)convex-(strongly-)concave losses. We also provide the deletion capacity to guarantee that a desired population risk can be maintained as long as the number of deleted samples does not exceed the derived amount. With training samples $n$ and model dimension $d$, it yields the order $\mathcal O(n/d^{1/4})$, which shows a strict gap over the baseline method of differentially private minimax learning that has $\mathcal O(n/d^{1/2})$. In addition, our rates of generalization and deletion capacity match the state-of-the-art rates derived previously for standard statistical learning models.
We study the average-case version of the Orthogonal Vectors problem, in which one is given as input $n$ vectors from $\{0,1\}^d$ which are chosen randomly so that each coordinate is $1$ independently with probability $p$. Kane and Williams [ITCS 2019] showed how to solve this problem in time $O(n^{2 - \delta_p})$ for a constant $\delta_p > 0$ that depends only on $p$. However, it was previously unclear how to solve the problem faster in the hardest parameter regime where $p$ may depend on $d$. The best prior algorithm was the best worst-case algorithm by Abboud, Williams and Yu [SODA 2014], which in dimension $d = c \cdot \log n$, solves the problem in time $n^{2 - \Omega(1/\log c)}$. In this paper, we give a new algorithm which improves this to $n^{2 - \Omega(\log\log c /\log c)}$ in the average case for any parameter $p$. As in the prior work, our algorithm uses the polynomial method. We make use of a very simple polynomial over the reals, and use a new method to analyze its performance based on computing how its value degrades as the input vectors get farther from orthogonal. To demonstrate the generality of our approach, we also solve the average-case version of the closest pair problem in the same running time.
The problem of recovering a signal $\boldsymbol x\in \mathbb{R}^n$ from a quadratic system $\{y_i=\boldsymbol x^\top\boldsymbol A_i\boldsymbol x,\ i=1,\ldots,m\}$ with full-rank matrices $\boldsymbol A_i$ frequently arises in applications such as unassigned distance geometry and sub-wavelength imaging. With i.i.d. standard Gaussian matrices $\boldsymbol A_i$, this paper addresses the high-dimensional case where $m\ll n$ by incorporating prior knowledge of $\boldsymbol x$. First, we consider a $k$-sparse $\boldsymbol x$ and introduce the thresholded Wirtinger flow (TWF) algorithm that does not require the sparsity level $k$. TWF comprises two steps: the spectral initialization that identifies a point sufficiently close to $\boldsymbol x$ (up to a sign flip) when $m=O(k^2\log n)$, and the thresholded gradient descent which, when provided a good initialization, produces a sequence linearly converging to $\boldsymbol x$ with $m=O(k\log n)$ measurements. Second, we explore the generative prior, assuming that $x$ lies in the range of an $L$-Lipschitz continuous generative model with $k$-dimensional inputs in an $\ell_2$-ball of radius $r$. With an estimate correlated with the signal, we develop the projected gradient descent (PGD) algorithm that also comprises two steps: the projected power method that provides an initial vector with $O\big(\sqrt{\frac{k \log L}{m}}\big)$ $\ell_2$-error given $m=O(k\log(Lnr))$ measurements, and the projected gradient descent that refines the $\ell_2$-error to $O(\delta)$ at a geometric rate when $m=O(k\log\frac{Lrn}{\delta^2})$. Experimental results corroborate our theoretical findings and show that: (i) our approach for the sparse case notably outperforms the existing provable algorithm sparse power factorization; (ii) leveraging the generative prior allows for precise image recovery in the MNIST dataset from a small number of quadratic measurements.