The combined universal probability $\mathbf{m}(D)$ of strings $x$ in sets $D$ is close to max $\m(x)$ over $x$ in $D$: their logs differ by at most $D$'s information $\mathbf{I}(D:\mathcal{H})$ about the halting sequence $\mathcal{H}$.
We study functions that produce a ranking of $n$ individuals from $n$ such rankings and are impartial in the sense that the position of an individual in the output ranking does not depend on the input ranking submitted by that individual. When $n \geq 4$, two properties concerning the quality of the output in relation to the input can be achieved in addition to impartiality: individual full rank, which requires that each individual can appear in any position of the output ranking; and monotonicity, which requires that an individual cannot move down in the output ranking if it moves up in an input ranking. When $n \geq 5$, monotonicity can be dropped to strengthen individual full rank to weak unanimity, requiring that a ranking submitted by every individual must be chosen as the output ranking. Mechanisms achieving these results can be implemented in polynomial time. Both results are best possible in terms of their dependence on $n$. The second result cannot be strengthened further to a notion of unanimity that requires agreement on pairwise comparisons to be preserved.
In the impartial selection problem, a subset of agents up to a fixed size $k$ among a group of $n$ is to be chosen based on votes cast by the agents themselves. A selection mechanism is impartial if no agent can influence its own chance of being selected by changing its vote. It is $\alpha$-optimal if, for every instance, the ratio between the votes received by the selected subset is at least a fraction of $\alpha$ of the votes received by the subset of size $k$ with the highest number of votes. We study deterministic impartial mechanisms in a more general setting with arbitrarily weighted votes and provide the first approximation guarantee, roughly $1/\lceil 2n/k\rceil$. When the number of agents to select is large enough compared to the total number of agents, this yields an improvement on the previously best known approximation ratio of $1/k$ for the unweighted setting. We further show that our mechanism can be adapted to the impartial assignment problem, in which multiple sets of up to $k$ agents are to be selected, with a loss in the approximation ratio of $1/2$.
Traditionally in the turnstile model of data streams, there is a state vector $x=(x_1,x_2,\ldots,x_n)$ which is updated through a stream of pairs $(i,k)$ where $i\in [n]$ and $k\in \Z$. Upon receiving $(i,k)$, $x_i\gets x_i + k$. A distinct count algorithm in the turnstile model takes one pass of the stream and then estimates $\norm{x}_0 = |\{i\in[n]\mid x_i\neq 0\}|$ (aka $L_0$, the Hamming norm). In this paper, we define a finite-field version of the turnstile model. Let $F$ be any finite field. Then in the $F$-turnstile model, for each $i\in [n]$, $x_i\in F$; for each update $(i,k)$, $k\in F$. The update $x_i\gets x_i+k$ is then computed in the field $F$. A distinct count algorithm in the $F$-turnstile model takes one pass of the stream and estimates $\norm{x}_{0;F} = |\{i\in[n]\mid x_i\neq 0_F\}|$. We present a simple distinct count algorithm, called $F$-\pcsa{}, in the $F$-turnstile model for any finite field $F$. The new $F$-\pcsa{} algorithm takes $m\log(n)\log (|F|)$ bits of memory and estimates $\norm{x}_{0;F}$ with $O(\frac{1}{\sqrt{m}})$ relative error where the hidden constant depends on the order of the field. $F$-\pcsa{} is straightforward to implement and has several applications in the real world with different choices of $F$. Most notably, it makes distinct count with deletions as simple as distinct count without deletions.
We study several polygonal curve problems under the Fr\'{e}chet distance via algebraic geometric methods. Let $\mathbb{X}_m^d$ and $\mathbb{X}_k^d$ be the spaces of all polygonal curves of $m$ and $k$ vertices in $\mathbb{R}^d$, respectively. We assume that $k \leq m$. Let $\mathcal{R}^d_{k,m}$ be the set of ranges in $\mathbb{X}_m^d$ for all possible metric balls of polygonal curves in $\mathbb{X}_k^d$ under the Fr\'{e}chet distance. We prove a nearly optimal bound of $O(dk\log (km))$ on the VC dimension of the range space $(\mathbb{X}_m^d,\mathcal{R}_{k,m}^d)$, improving on the previous $O(d^2k^2\log(dkm))$ upper bound and approaching the current $\Omega(dk\log k)$ lower bound. Our upper bound also holds for the weak Fr\'{e}chet distance. We also obtain exact solutions that are hitherto unknown for curve simplification, range searching, nearest neighbor search, and distance oracle.
A cubic hypermatrix of order $d$ can be considered as a structure matrix of a tensor with covariant order $r$ and contra-variant order $s=d-r$. Corresponding to this matrix expression of the hypermatrix, an eigenvector $x$ with respect to an eigenvalue $\lambda$ is proposed, called the universal eigenvector and eigenvalue of the hypermatrix. According to the action of tensors, if $x$ is decomposable, it is called a universal hyper-(UH-)eigenvector. Particularly, if all decomposed components are the same, $x$ is called a universal diagonal hyper (UDH-)eigenvector, which covers most of existing definitions of eigenvalue/eigenvector of hypermatrices. Using Semi-tensor product (STP) of matrices, the properties of universal eigenvalues/eigenvectors are investigated. Algorithms are developed to calculate universal eigenvalues/eigenvectors for hypermatrices. Particular efforts have been put on UDH- eigenvalues/eigenvectors, because they cover most of the existing eigenvalues/eigenvectors for hypermatrices. Some numerical examples are presented to illustrate that the proposed technique is universal and efficient.
Large-scale matrix data has been widely discovered and continuously studied in various fields recently. Considering the multi-level factor structure and utilizing the matrix structure, we propose a multilevel matrix factor model with both global and local factors. The global factors can affect all matrix times series, whereas the local factors are only allow to affect within each specific matrix time series. The estimation procedures can consistently estimate the factor loadings and determine the number of factors. We establish the asymptotic properties of the estimators. The simulation is presented to illustrate the performance of the proposed estimation method. We utilize the model to analyze eight indicators across 200 stocks from ten distinct industries, demonstrating the empirical utility of our proposed approach.
Constructing a similarity graph from a set $X$ of data points in $\mathbb{R}^d$ is the first step of many modern clustering algorithms. However, typical constructions of a similarity graph have high time complexity, and a quadratic space dependency with respect to $|X|$. We address this limitation and present a new algorithmic framework that constructs a sparse approximation of the fully connected similarity graph while preserving its cluster structure. Our presented algorithm is based on the kernel density estimation problem, and is applicable for arbitrary kernel functions. We compare our designed algorithm with the well-known implementations from the scikit-learn library and the FAISS library, and find that our method significantly outperforms the implementation from both libraries on a variety of datasets.
In this paper, we consider the counting function $E_P(y) = |P_{y} \cap Z^{n_x}|$ for a parametric polyhedron $P_{y} = \{x \in R^{n_x} \colon A x \leq b + B y\}$, where $y \in R^{n_y}$. We give a new representation of $E_P(y)$, called a \emph{piece-wise step-polynomial with periodic coefficients}, which is a generalization of piece-wise step-polynomials and integer/rational Ehrhart's quasi-polynomials. In terms of the computational complexity, our result gives the fastest way to calculate $E_P(y)$ in certain scenarios. The most remarkable cases are the following: 1) Consider a parametric polyhedron $P_y$ defined by a standard-form system $A x = y,\, x \geq 0$ with a fixed number of equalities. We show that there exists an $poly\bigl(n, \|A\|_{\infty}\bigr)$ preprocessing-algorithm that returns a polynomial-time computable representation of $E_P(y)$. That is, $E_(y)$ can be computed by a polynomial-time algorithm for any given $y \in Q^k$; 2) Again, assuming that the co-dimension is fixed, we show that integer/rational Ehrhart's quasi-polynomials of a polytope can be computed by FPT-algorithms, parameterized by sub-determinants of $A$ or its elements; 3) Our representation of $E_P(y)$ is more efficient than other known approaches, if the matrix $A$ has bounded elements, especially if the matrix $A$ is sparse in addition; Additionally, we provide a discussion about possible applications in the area of compiler optimization. In some "natural" assumptions on a program code, our approach has the fastest complexity bounds.
A universal partial cycle (or upcycle) for $\mathcal{A}^n$ is a cyclic sequence that covers each word of length $n$ over the alphabet $\mathcal{A}$ exactly once -- like a De Bruijn cycle, except that we also allow a wildcard symbol $\mathord{\diamond}$ that can represent any letter of $\mathcal{A}$. Chen et al. in 2017 and Goeckner et al. in 2018 showed that the existence and structure of upcycles are highly constrained, unlike those of De Bruijn cycles, which exist for any alphabet size and word length. Moreover, it was not known whether any upcycles existed for $n \ge 5$. We present several examples of upcycles over both binary and non-binary alphabets for $n = 8$. We generalize two graph-theoretic representations of De Bruijn cycles to upcycles. We then introduce novel approaches to constructing new upcycles from old ones. Notably, given any upcycle for an alphabet of size $a$, we show how to construct an upcycle for an alphabet of size $ak$ for any $k \in \mathbb{N}$, so each example generates an infinite family of upcycles. We also define folds and lifts of upcycles, which relate upcycles with differing densities of $\mathord{\diamond}$ characters. In particular, we show that every upcycle lifts to a De Bruijn cycle. Our constructions rely on a different generalization of De Bruijn cycles known as perfect necklaces, and we introduce several new examples of perfect necklaces. We extend the definitions of certain pseudorandomness properties to partial words and determine which are satisfied by all upcycles, then draw a conclusion about linear feedback shift registers. Finally, we prove new nonexistence results based on the word length $n$, alphabet size, and $\mathord{\diamond}$ density.
Generalising the concept of a complete permutation polynomial over a finite field, we define completness to level $k$ for $k\ge1$ in fields of odd characteristic. We construct two families of polynomials that satisfy the condition of high level completeness for all finite fields, and two more families complete to the maximum level a possible for large collection of finite fields. Under the binary operation of composition of functions one family of polynomials is an abelian group isomorphic to the additive group, while the other is isomorphic to the multiplicative group.