We propose a new alignment-free algorithm by constructing a compact vector representation on $\mathbb{R}^{24}$ of a DNA sequence of arbitrary length. Each component of this vector is obtained from a representative sequence, the elements of which are the values realized by a function $\Gamma$. $\Gamma$ acts on neighborhoods of arbitrary radius that are located at strategic positions within the DNA sequence and carries complete information about the local distribution of frequencies of the nucleotides as a consequence of the uniqueness of prime factorization of integer. The algorithm exhibits linear time complexity and turns out to consume significantly small memory. The two natural parameters characterizing the radius and location of the neighbourhoods are fixed by comparing the phylogenetic tree with the benchmark for full genome sequences of fish mtDNA datasets. Using these fitting parameters, the method is applied to analyze a number of genome sequences from benchmark and other standard datasets. The algorithm proves to be computationally efficient compared to Co-phylog and CD-MAWS when applied over a certain range of a simulated dataset.
We introduce a bicategorical model of linear logic which is a novel variation of the bicategory of groupoids, profunctors, and natural transformations. Our model is obtained by endowing groupoids with additional structure, called a kit, to stabilize the profunctors by controlling the freeness of the groupoid action on profunctor elements. The theory of generalized species of structures, based on profunctors, is refined to a new theory of \emph{stable species} of structures between groupoids with Boolean kits. Generalized species are in correspondence with analytic functors between presheaf categories; in our refined model, stable species are shown to be in correspondence with restrictions of analytic functors, which we characterize as being stable, to full subcategories of stabilized presheaves. Our motivating example is the class of finitary polynomial functors between categories of indexed sets, also known as normal functors, that arises from kits enforcing free actions. We show that the bicategory of groupoids with Boolean kits, stable species, and natural transformations is cartesian closed. This makes essential use of the logical structure of Boolean kits and explains the well-known failure of cartesian closure for the bicategory of finitary polynomial functors between categories of set-indexed families and cartesian natural transformations. The paper additionally develops the model of classical linear logic underlying the cartesian closed structure and clarifies the connection to stable domain theory.
Suppose that one can construct a valid $(1-\delta)$-confidence interval (CI) for each of $K$ parameters of potential interest. If a data analyst uses an arbitrary data-dependent criterion to select some subset $S$ of parameters, then the aforementioned CIs for the selected parameters are no longer valid due to selection bias. We design a new method to adjust the intervals in order to control the false coverage rate (FCR). The main established method is the "BY procedure" by Benjamini and Yekutieli (JASA, 2005). The BY guarantees require certain restrictions on the selection criterion and on the dependence between the CIs. We propose a new simple method which, in contrast, is valid under any dependence structure between the original CIs, and any (unknown) selection criterion, but which only applies to a special, yet broad, class of CIs that we call e-CIs. To elaborate, our procedure simply reports $(1-\delta|S|/K)$-CIs for the selected parameters, and we prove that it controls the FCR at $\delta$ for confidence intervals that implicitly invert e-values; examples include those constructed via supermartingale methods, via universal inference, or via Chernoff-style bounds, among others. The e-BY procedure is admissible, and recovers the BY procedure as a special case via a particular calibrator. Our work also has implications for post-selection inference in sequential settings, since it applies at stopping times, to continuously-monitored confidence sequences, and under bandit sampling. We demonstrate the efficacy of our procedure using numerical simulations and real A/B testing data from Twitter.
We prove that the medial axis of closed sets is Hausdorff stable in the following sense: Let $\mathcal{S} \subseteq \mathbb{R}^d$ be (fixed) closed set (that contains a bounding sphere). Consider the space of $C^{1,1}$ diffeomorphisms of $\mathbb{R}^d$ to itself, which keep the bounding sphere invariant. The map from this space of diffeomorphisms (endowed with some Banach norm) to the space of closed subsets of $\mathbb{R}^d$ (endowed with the Hausdorff distance), mapping a diffeomorphism $F$ to the closure of the medial axis of $F(\mathcal{S})$, is Lipschitz. This extends a previous stability result of Chazal and Soufflet on the stability of the medial axis of $C^2$ manifolds under $C^2$ ambient diffeomorphisms.
We describe a `discretize-then-relax' strategy to globally minimize integral functionals over functions $u$ in a Sobolev space subject to Dirichlet boundary conditions. The strategy applies whenever the integral functional depends polynomially on $u$ and its derivatives, even if it is nonconvex. The `discretize' step uses a bounded finite element scheme to approximate the integral minimization problem with a convergent hierarchy of polynomial optimization problems over a compact feasible set, indexed by the decreasing size $h$ of the finite element mesh. The `relax' step employs sparse moment-sum-of-squares relaxations to approximate each polynomial optimization problem with a hierarchy of convex semidefinite programs, indexed by an increasing relaxation order $\omega$. We prove that, as $\omega\to\infty$ and $h\to 0$, solutions of such semidefinite programs provide approximate minimizers that converge in a suitable sense (including in certain $L^p$ norms) to the global minimizer of the original integral functional if it is unique. We also report computational experiments showing that our numerical strategy works well even when technical conditions required by our theoretical analysis are not satisfied.
We analyze the Schr\"odingerisation method for quantum simulation of a general class of non-unitary dynamics with inhomogeneous source terms. The Schr\"odingerisation technique, introduced in \cite{JLY22a,JLY23}, transforms any linear ordinary and partial differential equations with non-unitary dynamics into a system under unitary dynamics via a warped phase transition that maps the equations into a higher dimension, making them suitable for quantum simulation. This technique can also be applied to these equations with inhomogeneous terms modeling source or forcing terms or boundary and interface conditions, and discrete dynamical systems such as iterative methods in numerical linear algebra, through extra equations in the system. Difficulty airses with the presense of inhomogeneous terms since it can change the stability of the original system. In this paper, we systematically study--both theoretically and numerically--the important issue of recovering the original variables from the Schr\"odingerized equations, even when the evolution operator contains unstable modes. We show that even with unstable modes, one can still construct a stable scheme, yet to recover the original variable one needs to use suitable data in the extended space. We analyze and compare both the discrete and continuous Fourier transforms used in the extended dimension, and derive corresponding error estimates, which allows one to use the more appropriate transform for specific equations. We also provide a smoother initialization for the Schrod\"odingerized system to gain higher order accuracy in the extended space. We homogenize the inhomogeneous terms with a stretch transformation, making it easier to recover the original variable. Our recovering technique also provides a simple and generic framework to solve general ill-posed problems in a computationally stable way.
We consider the additive version of the matrix denoising problem, where a random symmetric matrix $S$ of size $n$ has to be inferred from the observation of $Y=S+Z$, with $Z$ an independent random matrix modeling a noise. For prior distributions of $S$ and $Z$ that are invariant under conjugation by orthogonal matrices we determine, using results from first and second order free probability theory, the Bayes-optimal (in terms of the mean square error) polynomial estimators of degree at most $D$, asymptotically in $n$, and show that as $D$ increases they converge towards the estimator introduced by Bun, Allez, Bouchaud and Potters in [IEEE Transactions on Information Theory {\bf 62}, 7475 (2016)]. We conjecture that this optimality holds beyond strictly orthogonally invariant priors, and provide partial evidences of this universality phenomenon when $S$ is an arbitrary Wishart matrix and $Z$ is drawn from the Gaussian Orthogonal Ensemble, a case motivated by the related extensive rank matrix factorization problem.
Let $1<t<n$ be integers, where $t$ is a divisor of $n$. An R-$q^t$-partially scattered polynomial is a $\mathbb F_q$-linearized polynomial $f$ in $\mathbb F_{q^n}[X]$ that satisfies the condition that for all $x,y\in\mathbb F_{q^n}^*$ such that $x/y\in\mathbb F_{q^t}$, if $f(x)/x=f(y)/y$, then $x/y\in\mathbb F_q$; $f$ is called scattered if this implication holds for all $x,y\in\mathbb F_{q^n}^*$. Two polynomials in $\mathbb F_{q^n}[X]$ are said to be equivalent if their graphs are in the same orbit under the action of the group $\Gamma L(2,q^n)$. For $n>8$ only three families of scattered polynomials in $\mathbb F_{q^n}[X]$ are known: $(i)$~monomials of pseudoregulus type, $(ii)$~binomials of Lunardon-Polverino type, and $(iii)$~a family of quadrinomials defined in [1,10] and extended in [8,13]. In this paper we prove that the polynomial $\varphi_{m,q^J}=X^{q^{J(t-1)}}+X^{q^{J(2t-1)}}+m(X^{q^J}-X^{q^{J(t+1)}})\in\mathbb F_{q^{2t}}[X]$, $q$ odd, $t\ge3$ is R-$q^t$-partially scattered for every value of $m\in\mathbb F_{q^t}^*$ and $J$ coprime with $2t$. Moreover, for every $t>4$ and $q>5$ there exist values of $m$ for which $\varphi_{m,q}$ is scattered and new with respect to the polynomials mentioned in $(i)$, $(ii)$ and $(iii)$ above. The related linear sets are of $\Gamma L$-class at least two.
A constraint satisfaction problem (CSP), $\textsf{Max-CSP}(\mathcal{F})$, is specified by a finite set of constraints $\mathcal{F} \subseteq \{[q]^k \to \{0,1\}\}$ for positive integers $q$ and $k$. An instance of the problem on $n$ variables is given by $m$ applications of constraints from $\mathcal{F}$ to subsequences of the $n$ variables, and the goal is to find an assignment to the variables that satisfies the maximum number of constraints. In the $(\gamma,\beta)$-approximation version of the problem for parameters $0 \leq \beta < \gamma \leq 1$, the goal is to distinguish instances where at least $\gamma$ fraction of the constraints can be satisfied from instances where at most $\beta$ fraction of the constraints can be satisfied. In this work we consider the approximability of this problem in the context of sketching algorithms and give a dichotomy result. Specifically, for every family $\mathcal{F}$ and every $\beta < \gamma$, we show that either a linear sketching algorithm solves the problem in polylogarithmic space, or the problem is not solvable by any sketching algorithm in $o(\sqrt{n})$ space. In particular, we give non-trivial approximation algorithms using polylogarithmic space for infinitely many constraint satisfaction problems. We also extend previously known lower bounds for general streaming algorithms to a wide variety of problems, and in particular the case of $q=k=2$, where we get a dichotomy, and the case when the satisfying assignments of the constraints of $\mathcal{F}$ support a distribution on $[q]^k$ with uniform marginals. Prior to this work, other than sporadic examples, the only systematic classes of CSPs that were analyzed considered the setting of Boolean variables $q=2$, binary constraints $k=2$, singleton families $|\mathcal{F}|=1$ and only considered the setting where constraints are placed on literals rather than variables.
We consider the problem of the exact computation of the marginal eigenvalue distributions in the Laguerre and Jacobi $\beta$ ensembles. In the case $\beta=1$ this is a question of long standing in the mathematical statistics literature. A recursive procedure to accomplish this task is given for $\beta$ a positive integer, and the parameter $\lambda_1$ a non-negative integer. This case is special due to a finite basis of elementary functions, with coefficients which are polynomials. In the Laguerre case with $\beta = 1$ and $\lambda_1 + 1/2$ a non-negative integer some evidence is given of their again being a finite basis, now consisting of elementary functions and the error function multiplied by elementary functions. Moreover, from this the corresponding distributions in the fixed trace case permit a finite basis of power functions, as also for $\lambda_1$ a non-negative integer. The fixed trace case in this setting is relevant to quantum information theory and quantum transport problem, allowing particularly the exact determination of Landauer conductance distributions in a previously intractable parameter regime. Our findings also aid in analyzing zeros of the generating function for specific gap probabilities, supporting the validity of an associated large $N$ local central limit theorem.
The fixed length Levenshtein (FLL) distance between two words $\mathbf{x,y} \in \mathbb{Z}_m^n$ is the smallest integer $t$ such that $\mathbf{x}$ can be transformed to $\mathbf{y}$ by $t$ insertions and $t$ deletions. The size of a ball in FLL metric is a fundamental but challenging problem. Very recently, Bar-Lev, Etzion, and Yaakobi explicitly determined the minimum, maximum and average sizes of the FLL balls with radius one. In this paper, based on these results, we further prove that the size of the FLL balls with radius one is highly concentrated around its mean by Azuma's inequality.