We consider the problem of linearizing a pseudo-Boolean function $f : \{0,1\}^n \to \mathbb{R}$ by means of $k$ Boolean functions. Such a linearization yields an integer linear programming formulation with only $k$ auxiliary variables. This motivates the definition of the linarization complexity of $f$ as the minimum such $k$. Our theoretical contributions are the proof that random polynomials almost surely have a high linearization complexity and characterizations of its value in case we do or do not restrict the set of admissible Boolean functions. The practical relevance is shown by devising and evaluating integer linear programming models of two such linearizations for the low auto-correlation binary sequences problem. Still, many problems around this new concept remain open.
Given a rectangle $R$ with area $A$ and a set of areas $L=\{A_1,...,A_n\}$ with $\sum_{i=1}^n A_i = A$, we consider the problem of partitioning $R$ into $n$ sub-regions $R_1,...,R_n$ with areas $A_1,...,A_n$ in a way that the total perimeter of all sub-regions is minimized. The goal is to create square-like sub-regions, which are often more desired. We propose a divide and conquer algorithm for this problem that finds factor $1.2$--approximate solutions in $\mathcal{O}(n\log n)$ time.
We provide a framework to prove convergence rates for discretizations of kinetic Langevin dynamics for $M$-$\nabla$Lipschitz $m$-log-concave densities. Our approach provides convergence rates of $\mathcal{O}(m/M)$, with explicit stepsize restrictions, which are of the same order as the stability threshold for Gaussian targets and are valid for a large interval of the friction parameter. We apply this methodology to various integration methods which are popular in the molecular dynamics and machine learning communities. Finally we introduce the property ``$\gamma$-limit convergent" (GLC) to characterise underdamped Langevin schemes that converge to overdamped dynamics in the high friction limit and which have stepsize restrictions that are independent of the friction parameter; we show that this property is not generic by exhibiting methods from both the class and its complement.
We consider a Cauchy problem for the inhomogeneous differential equation given in terms of an unbounded linear operator $A$ and the Caputo fractional derivative of order $\alpha \in (0, 2)$ in time. The previously known representation of the mild solution to such a problem does not have a conventional variation-of-constants like form, with the propagator derived from the associated homogeneous problem. Instead, it relies on the existence of two propagators with different analytical properties. This fact limits theoretical and especially numerical applicability of the existing solution representation. Here, we propose an alternative representation of the mild solution to the given problem that consolidates the solution formulas for sub-parabolic, parabolic and sub-hyperbolic equations with a positive sectorial operator $A$ and non-zero initial data. The new representation is solely based on the propagator of the homogeneous problem and, therefore, can be considered as a more natural fractional extension of the solution to the classical parabolic Cauchy problem. By exploiting a trade-off between the regularity assumptions on the initial data in terms of the fractional powers of $A$ and the regularity assumptions on the right-hand side in time, we show that the proposed solution formula is strongly convergent for $t \geq 0$ under considerably weaker assumptions compared to the standard results from the literature. Crucially, the achieved relaxation of space regularity assumptions ensures that the new solution representation is practically feasible for any $\alpha \in (0, 2)$ and is amenable to the numerical evaluation using uniformly accurate quadrature-based algorithms.
We study the \textsc{$\alpha$-Fixed Cardinality Graph Partitioning ($\alpha$-FCGP)} problem, the generic local graph partitioning problem introduced by Bonnet et al. [Algorithmica 2015]. In this problem, we are given a graph $G$, two numbers $k,p$ and $0\leq\alpha\leq 1$, the question is whether there is a set $S\subseteq V$ of size $k$ with a specified coverage function $cov_{\alpha}(S)$ at least $p$ (or at most $p$ for the minimization version). The coverage function $cov_{\alpha}(\cdot)$ counts edges with exactly one endpoint in $S$ with weight $\alpha$ and edges with both endpoints in $S$ with weight $1 - \alpha$. $\alpha$-FCGP generalizes a number of fundamental graph problems such as \textsc{Densest $k$-Subgraph}, \textsc{Max $k$-Vertex Cover}, and \textsc{Max $(k,n-k)$-Cut}. A natural question in the study of $\alpha$-FCGP is whether the algorithmic results known for its special cases, like \textsc{Max $k$-Vertex Cover}, could be extended to more general settings. One of the simple but powerful methods for obtaining parameterized approximation [Manurangsi, SOSA 2019] and subexponential algorithms [Fomin et al. IPL 2011] for \textsc{Max $k$-Vertex Cover} is based on the greedy vertex degree orderings. The main insight of our work is that the idea of greed vertex degree ordering could be used to design fixed-parameter approximation schemes (FPT-AS) for $\alpha > 0$ and the subexponential-time algorithms for the problem on apex-minor free graphs for maximization with $\alpha > 1/3$ and minimization with $\alpha < 1/3$.
A cut-down de Bruijn sequence is a cyclic string of length $L$, where $1 \leq L \leq k^n$, such that every substring of length $n$ appears at most once. Etzion [Theor. Comp. Sci 44 (1986)] gives an algorithm to construct binary cut-down de Bruijn sequences that requires $o(n)$ simple $n$-bit operations per symbol generated. In this paper, we simplify the algorithm and improve the running time to $\mathcal{O}(n)$ time per symbol generated using $\mathcal{O}(n)$ space. We then provide the first successor-rule approach for constructing a binary cut-down de Bruijn sequence by leveraging recent ranking algorithms for fixed-density Lyndon words. Finally, we develop an algorithm to generate cut-down de Bruijn sequences for $k>2$ that runs in $\mathcal{O}(n)$ time per symbol using $\mathcal{O}(n)$ space after some initialization. While our $k$-ary algorithm is based on our simplified version of Etzion's binary algorithm, a number of non-trivial adaptations are required to generalize to larger alphabets.
A 2-packing set for an undirected graph $G=(V,E)$ is a subset $\mathcal{S} \subset V$ such that any two vertices $v_1,v_2 \in \mathcal{S}$ have no common neighbors. Finding a 2-packing set of maximum cardinality is a NP-hard problem. We develop a new approach to solve this problem on arbitrary graphs using its close relation to the independent set problem. Thereby, our algorithm red2pack uses new data reduction rules specific to the 2-packing set problem as well as a graph transformation. Our experiments show that we outperform the state-of-the-art for arbitrary graphs with respect to solution quality and also are able to compute solutions multiple orders of magnitude faster than previously possible. For example, we are able to solve 63% of our graphs to optimality in less than a second while the competitor for arbitrary graphs can only solve 5% of the graphs in the data set to optimality even with a 10 hour time limit. Moreover, our approach can solve a wide range of large instances that have previously been unsolved.
A code $C \colon \{0,1\}^k \to \{0,1\}^n$ is a $q$-locally decodable code ($q$-LDC) if one can recover any chosen bit $b_i$ of the message $b \in \{0,1\}^k$ with good confidence by randomly querying the encoding $x := C(b)$ on at most $q$ coordinates. Existing constructions of $2$-LDCs achieve $n = \exp(O(k))$, and lower bounds show that this is in fact tight. However, when $q = 3$, far less is known: the best constructions achieve $n = \exp(k^{o(1)})$, while the best known results only show a quadratic lower bound $n \geq \tilde{\Omega}(k^2)$ on the blocklength. In this paper, we prove a near-cubic lower bound of $n \geq \tilde{\Omega}(k^3)$ on the blocklength of $3$-query LDCs. This improves on the best known prior works by a polynomial factor in $k$. Our proof relies on a new connection between LDCs and refuting constraint satisfaction problems with limited randomness. Our quantitative improvement builds on the new techniques for refuting semirandom instances of CSPs developed in [GKM22, HKM23] and, in particular, relies on bounding the spectral norm of appropriate Kikuchi matrices.
The goal of the trace reconstruction problem is to recover a string $x\in\{0,1\}^n$ given many independent {\em traces} of $x$, where a trace is a subsequence obtained from deleting bits of $x$ independently with some given probability $p\in [0,1).$ A recent result of Chase (STOC 2021) shows how $x$ can be determined (in exponential time) from $\exp(\widetilde{O}(n^{1/5}))$ traces. This is the state-of-the-art result on the sample complexity of trace reconstruction. In this paper we consider two kinds of algorithms for the trace reconstruction problem. Our first, and technically more involved, result shows that any $k$-mer-based algorithm for trace reconstruction must use $\exp(\Omega(n^{1/5}))$ traces, under the assumption that the estimator requires $poly(2^k, 1/\varepsilon)$ traces, thus establishing the optimality of this number of traces. The analysis of this result also shows that the analysis technique used by Chase (STOC 2021) is essentially tight, and hence new techniques are needed in order to improve the worst-case upper bound. Our second, simple, result considers the performance of the Maximum Likelihood Estimator (MLE), which specifically picks the source string that has the maximum likelihood to generate the samples (traces). We show that the MLE algorithm uses a nearly optimal number of traces, \ie, up to a factor of $n$ in the number of samples needed for an optimal algorithm, and show that this factor of $n$ loss may be necessary under general ``model estimation'' settings.
The Shift Equivalence Testing (SET) of polynomials is deciding whether two polynomials $p(x_1, \ldots, x_m)$ and $q(x_1, \ldots, x_m)$ satisfy the relation $p(x_1 + a_1, \ldots, x_m + a_m) = q(x_1, \ldots, x_m)$ for some $a_1, \ldots, a_m$ in the coefficient field. The SET problem is one of basic computational problems in computer algebra and algebraic complexity theory, which was reduced by Dvir, Oliveira and Shpilka in 2014 to the Polynomial Identity Testing (PIT) problem. This paper presents a general scheme for designing algorithms to solve the SET problem which includes Dvir-Oliveira-Shpilka's algorithm as a special case. With the algorithms for the SET problem over integers, we give complete solutions to two challenging problems in symbolic summation of multivariate rational functions, namely the rational summability problem and the existence problem of telescopers for multivariate rational functions. Our approach is based on the structure of isotropy groups of polynomials introduced by Sato in 1960s. Our results can be used to detect the applicability of the Wilf-Zeilberger method to multivariate rational functions.
Lifetime models with a non-monotone hazard rate $\hspace{0.12cm}$ function have a wide range of applications in engineering and lifetime data analysis. There are different bathtub shaped failure rate models that are available in reliability literature. Kavya and Manoharan (2021) introduced a new transformation called KM-transformation which was found to be more useful in reliability and lifetime data analysis. Power generalization technique would be the best approach to deal with a system whose components are connected in series, in which the distribution of the component is KM-transformation of any lifetime model. In this article, we introduce a new lifetime model, Power Generalized KM-Transformation (PGKM) for Non-Monotone Failure Rate Distribution, which shows monotone and non-monotone behavior for the hazard rate function for different choices of values of parameters. We derive the moments, moment generating function, characteristic function, quantiles, entropy etc of the proposed distribution. Distributions of minimum and maximum are obtained. Estimation of parameters of the distribution is performed via maximum likelihood method. A simulation study is performed to validate the maximum likelihood estimator (MLE). Analysis of three sets of real data are given.