In Linear Hashing ($\mathsf{LH}$) with $\beta$ bins on a size $u$ universe ${\mathcal{U}=\{0,1,\ldots, u-1\}}$, items $\{x_1,x_2,\ldots, x_n\}\subset \mathcal{U}$ are placed in bins by the hash function $$x_i\mapsto (ax_i+b)\mod p \mod \beta$$ for some prime $p\in [u,2u]$ and randomly chosen integers $a,b \in [1,p]$. The "maxload" of $\mathsf{LH}$ is the number of items assigned to the fullest bin. Expected maxload for a worst-case set of items is a natural measure of how well $\mathsf{LH}$ distributes items amongst the bins. Fix $\beta=n$. Despite $\mathsf{LH}$'s simplicity, bounding $\mathsf{LH}$'s worst-case maxload is extremely challenging. It is well-known that on random inputs $\mathsf{LH}$ achieves maxload $\Omega\left(\frac{\log n}{\log\log n}\right)$; this is currently the best lower bound for $\mathsf{LH}$'s expected maxload. Recently Knudsen established an upper bound of $\widetilde{O}(n^{1 / 3})$. The question "Is the worst-case expected maxload of $\mathsf{LH}$ $n^{o(1)}$?" is one of the most basic open problems in discrete math. In this paper we propose a set of intermediate open questions to help researchers make progress on this problem. We establish the relationship between these intermediate open questions and make some partial progress on them.
An \emph{eight-partition} of a finite set of points (respectively, of a continuous mass distribution) in $\mathbb{R}^3$ consists of three planes that divide the space into $8$ octants, such that each open octant contains at most $1/8$ of the points (respectively, of the mass). In 1966, Hadwiger showed that any mass distribution in $\mathbb{R}^3$ admits an eight-partition; moreover, one can prescribe the normal direction of one of the three planes. The analogous result for finite point sets follows by a standard limit argument. We prove the following variant of this result: Any mass distribution (or point set) in $\mathbb{R}^3$ admits an eight-partition for which the intersection of two of the planes is a line with a prescribed direction. Moreover, we present an efficient algorithm for calculating an eight-partition of a set of $n$ points in~$\mathbb{R}^3$ (with prescribed normal direction of one of the planes) in time $O^{*}(n^{5/2})$.
A new $H(\textrm{divdiv})$-conforming finite element is presented, which avoids the need for super-smoothness by redistributing the degrees of freedom to edges and faces. This leads to a hybridizable mixed method with superconvergence for the biharmonic equation. Moreover, new finite element divdiv complexes are established. Finally, new weak Galerkin and $C^0$ discontinuous Galerkin methods for the biharmonic equation are derived.
Given a finite set, $A \subseteq \mathbb{R}^2$, and a subset, $B \subseteq A$, the \emph{MST-ratio} is the combined length of the minimum spanning trees of $B$ and $A \setminus B$ divided by the length of the minimum spanning tree of $A$. The question of the supremum, over all sets $A$, of the maximum, over all subsets $B$, is related to the Steiner ratio, and we prove this sup-max is between $2.154$ and $2.427$. Restricting ourselves to $2$-dimensional lattices, we prove that the sup-max is $2.0$, while the inf-max is $1.25$. By some margin the most difficult of these results is the upper bound for the inf-max, which we prove by showing that the hexagonal lattice cannot have MST-ratio larger than $1.25$.
A partition $\mathcal{P}$ of a weighted graph $G$ is $(\sigma,\tau,\Delta)$-sparse if every cluster has diameter at most $\Delta$, and every ball of radius $\Delta/\sigma$ intersects at most $\tau$ clusters. Similarly, $\mathcal{P}$ is $(\sigma,\tau,\Delta)$-scattering if instead for balls we require that every shortest path of length at most $\Delta/\sigma$ intersects at most $\tau$ clusters. Given a graph $G$ that admits a $(\sigma,\tau,\Delta)$-sparse partition for all $\Delta>0$, Jia et al. [STOC05] constructed a solution for the Universal Steiner Tree problem (and also Universal TSP) with stretch $O(\tau\sigma^2\log_\tau n)$. Given a graph $G$ that admits a $(\sigma,\tau,\Delta)$-scattering partition for all $\Delta>0$, we construct a solution for the Steiner Point Removal problem with stretch $O(\tau^3\sigma^3)$. We then construct sparse and scattering partitions for various different graph families, receiving many new results for the Universal Steiner Tree and Steiner Point Removal problems.
We study the problem of robust multivariate polynomial regression: let $p\colon\mathbb{R}^n\to\mathbb{R}$ be an unknown $n$-variate polynomial of degree at most $d$ in each variable. We are given as input a set of random samples $(\mathbf{x}_i,y_i) \in [-1,1]^n \times \mathbb{R}$ that are noisy versions of $(\mathbf{x}_i,p(\mathbf{x}_i))$. More precisely, each $\mathbf{x}_i$ is sampled independently from some distribution $\chi$ on $[-1,1]^n$, and for each $i$ independently, $y_i$ is arbitrary (i.e., an outlier) with probability at most $\rho < 1/2$, and otherwise satisfies $|y_i-p(\mathbf{x}_i)|\leq\sigma$. The goal is to output a polynomial $\hat{p}$, of degree at most $d$ in each variable, within an $\ell_\infty$-distance of at most $O(\sigma)$ from $p$. Kane, Karmalkar, and Price [FOCS'17] solved this problem for $n=1$. We generalize their results to the $n$-variate setting, showing an algorithm that achieves a sample complexity of $O_n(d^n\log d)$, where the hidden constant depends on $n$, if $\chi$ is the $n$-dimensional Chebyshev distribution. The sample complexity is $O_n(d^{2n}\log d)$, if the samples are drawn from the uniform distribution instead. The approximation error is guaranteed to be at most $O(\sigma)$, and the run-time depends on $\log(1/\sigma)$. In the setting where each $\mathbf{x}_i$ and $y_i$ are known up to $N$ bits of precision, the run-time's dependence on $N$ is linear. We also show that our sample complexities are optimal in terms of $d^n$. Furthermore, we show that it is possible to have the run-time be independent of $1/\sigma$, at the cost of a higher sample complexity.
In dimension $d$, Mutually Unbiased Bases (MUBs) are a collection of orthonormal bases over $\mathbb{C}^d$ such that for any two vectors $v_1, v_2$ belonging to different bases, the scalar product $|\braket{v_1|v_2}| = \frac{1}{\sqrt{d}}$. The upper bound on the number of such bases is $d+1$. Constructions to achieve this bound are known when $d$ is some power of prime. The situation is more restrictive in other cases and also when we consider the results over real rather than complex. Thus, certain relaxations of this model are considered in literature and consequently Approximate MUBs (AMUB) are studied. This enables one to construct potentially large number of such objects for $\mathbb{C}^d$ as well as in $\mathbb{R}^d$. In this regard, we propose the concept of Almost Perfect MUBs (APMUB), where we restrict the absolute value of inner product $|\braket{v_1|v_2}|$ to be two-valued, one being 0 and the other $ \leq \frac{1+\mathcal{O}(d^{-\lambda})}{\sqrt{d}}$, such that $\lambda > 0$ and the numerator $1 + \mathcal{O}(d^{-\lambda}) \leq 2$. Each such vector constructed, has an important feature that large number of its components are zero and the non-zero components are of equal magnitude. Our techniques are based on combinatorial structures related to RBDs. We show that for several composite dimensions $d$, one can construct $\mathcal{O}(\sqrt{d})$ many APMUBs, in which cases the number of MUBs are significantly small. To be specific, this result works for $d$ of the form $(q-e)(q+f), \ q, e, f \in \mathbb{N}$, with the conditions $0 \leq f \leq e$ for constant $e, f$ and $q$ some power of prime. We also show that such APMUBs provide sets of Bi-angular vectors which are $\mathcal{O}(d^{\frac{3}{2}})$ in numbers, having high angular distances among them. Finally, as the MUBs are equivalent to a set of Hadamard matrices, we show that the APMUBs are so with the set of Weighing matrices.
Most diffusion models assume that the reverse process adheres to a Gaussian distribution. However, this approximation has not been rigorously validated, especially at singularities, where t=0 and t=1. Improperly dealing with such singularities leads to an average brightness issue in applications, and limits the generation of images with extreme brightness or darkness. We primarily focus on tackling singularities from both theoretical and practical perspectives. Initially, we establish the error bounds for the reverse process approximation, and showcase its Gaussian characteristics at singularity time steps. Based on this theoretical insight, we confirm the singularity at t=1 is conditionally removable while it at t=0 is an inherent property. Upon these significant conclusions, we propose a novel plug-and-play method SingDiffusion to address the initial singular time step sampling, which not only effectively resolves the average brightness issue for a wide range of diffusion models without extra training efforts, but also enhances their generation capability in achieving notable lower FID scores. Code and models are released at //github.com/PangzeCheung/SingDiffusion.
Recently (Elkin, Filtser, Neiman 2017) introduced the concept of a {\it terminal embedding} from one metric space $(X,d_X)$ to another $(Y,d_Y)$ with a set of designated terminals $T\subset X$. Such an embedding $f$ is said to have distortion $\rho\ge 1$ if $\rho$ is the smallest value such that there exists a constant $C>0$ satisfying \begin{equation*} \forall x\in T\ \forall q\in X,\ C d_X(x, q) \le d_Y(f(x), f(q)) \le C \rho d_X(x, q) . \end{equation*} When $X,Y$ are both Euclidean metrics with $Y$ being $m$-dimensional, recently (Narayanan, Nelson 2019), following work of (Mahabadi, Makarychev, Makarychev, Razenshteyn 2018), showed that distortion $1+\epsilon$ is achievable via such a terminal embedding with $m = O(\epsilon^{-2}\log n)$ for $n := |T|$. This generalizes the Johnson-Lindenstrauss lemma, which only preserves distances within $T$ and not to $T$ from the rest of space. The downside of prior work is that evaluating their embedding on some $q\in \mathbb{R}^d$ required solving a semidefinite program with $\Theta(n)$ constraints in~$m$ variables and thus required some superlinear $\mathrm{poly}(n)$ runtime. Our main contribution in this work is to give a new data structure for computing terminal embeddings. We show how to pre-process $T$ to obtain an almost linear-space data structure that supports computing the terminal embedding image of any $q\in\mathbb{R}^d$ in sublinear time $O^* (n^{1-\Theta(\epsilon^2)} + d)$. To accomplish this, we leverage tools developed in the context of approximate nearest neighbor search.
A subset $S$ of vertices in a graph $G$ is a secure dominating set of $G$ if $S$ is a dominating set of $G$ and, for each vertex $u \not\in S$, there is a vertex $v \in S$ such that $uv$ is an edge and $(S \setminus \{v\}) \cup \{u\}$ is also a dominating set of $G$. The secure domination number of $G$, denoted by $\gamma_{s}(G)$, is the cardinality of a smallest secure dominating sets of $G$. In this paper, we prove that for any outerplanar graph with $n \geq 4$ vertices, $\gamma_{s}(G) \geq (n+4)/5$ and the bound is tight.
For a $P$-indexed persistence module ${\sf M}$, the (generalized) rank of ${\sf M}$ is defined as the rank of the limit-to-colimit map for ${\sf M}$ over the poset $P$. For $2$-parameter persistence modules, recently a zigzag persistence based algorithm has been proposed that takes advantage of the fact that generalized rank for $2$-parameter modules is equal to the number of full intervals in a zigzag module defined on the boundary of the poset. Analogous definition of boundary for $d$-parameter persistence modules or general $P$-indexed persistence modules does not seem plausible. To overcome this difficulty, we first unfold a given $P$-indexed module ${\sf M}$ into a zigzag module ${\sf M}_{ZZ}$ and then check how many full interval modules in a decomposition of ${\sf M}_{ZZ}$ can be folded back to remain full in ${\sf M}$. This number determines the generalized rank of ${\sf M}$. For special cases of degree-$d$ homology for $d$-complexes, we obtain a more efficient algorithm including a linear time algorithm for degree-$1$ homology in graphs.