This paper considers a massive connectivity setting in which a base-station (BS) aims to communicate sources $(X_1,\cdots,X_k)$ to a randomly activated subset of $k$ users, among a large pool of $n$ users, via a common downlink message. Although the identities of the $k$ active users are assumed to be known at the BS, each active user only knows whether itself is active and does not know the identities of the other active users. A naive coding strategy is to transmit the sources alongside the identities of the users for which the source information is intended, which would require $H(X_1,\cdots,X_k) + k\log(n)$ bits, because the cost of specifying the identity of a user is $\log(n)$ bits. For large $n$, this overhead can be significant. This paper shows that it is possible to develop coding techniques that eliminate the dependency of the overhead on $n$, if the source distribution follows certain symmetry. Specifically, if the source distribution is independent and identically distributed (i.i.d.) then the overhead can be reduced to at most $O(\log(k))$ bits, and in case of uniform i.i.d. sources, the overhead can be further reduced to $O(1)$ bits. For sources that follow a more general exchangeable distribution, the overhead is at most $O(k)$ bits, and in case of finite-alphabet exchangeable sources, the overhead can be further reduced to $O(\log(k))$ bits. The downlink massive random access problem is closely connected to the study of finite exchangeable sequences. The proposed coding strategy allows bounds on the relative entropy distance between finite exchangeable distributions and i.i.d. mixture distributions to be developed, and gives a new relative entropy version of the finite de Finetti theorem which is scaling optimal.
We study the problems of data compression, gambling and prediction of a sequence $x^n=x_1x_2...x_n$ from an alphabet ${\cal X}$, in terms of regret and expected regret (redundancy) with respect to various smooth families of probability distributions. We evaluate the regret of Bayes mixture distributions compared to maximum likelihood, under the condition that the maximum likelihood estimate is in the interior of the parameter space. For general exponential families (including the non-i.i.d.\ case) the asymptotically mimimax value is achieved when variants of the prior of Jeffreys are used. %under the condition that the maximum likelihood estimate is in the interior of the parameter space. Interestingly, we also obtain a modification of Jeffreys prior which has measure outside the given family of densities, to achieve minimax regret with respect to non-exponential type families. This modification enlarges the family using local exponential tilting (a fiber bundle). Our conditions are confirmed for certain non-exponential families, including curved families and mixture families (where either the mixture components or their weights of combination are parameterized) as well as contamination models. Furthermore for mixture families we show how to deal with the full simplex of parameters. These results also provide characterization of Rissanen's stochastic complexity.
We consider learning in an adversarial environment, where an $\varepsilon$-fraction of samples from a distribution $P$ are arbitrarily modified (global corruptions) and the remaining perturbations have average magnitude bounded by $\rho$ (local corruptions). Given access to $n$ such corrupted samples, we seek a computationally efficient estimator $\hat{P}_n$ that minimizes the Wasserstein distance $\mathsf{W}_1(\hat{P}_n,P)$. In fact, we attack the fine-grained task of minimizing $\mathsf{W}_1(\Pi_\# \hat{P}_n, \Pi_\# P)$ for all orthogonal projections $\Pi \in \mathbb{R}^{d \times d}$, with performance scaling with $\mathrm{rank}(\Pi) = k$. This allows us to account simultaneously for mean estimation ($k=1$), distribution estimation ($k=d$), as well as the settings interpolating between these two extremes. We characterize the optimal population-limit risk for this task and then develop an efficient finite-sample algorithm with error bounded by $\sqrt{\varepsilon k} + \rho + \tilde{O}(d\sqrt{k}n^{-1/(k \lor 2)})$ when $P$ has bounded covariance. This guarantee holds uniformly in $k$ and is minimax optimal up to the sub-optimality of the plug-in estimator when $\rho = \varepsilon = 0$. Our efficient procedure relies on a novel trace norm approximation of an ideal yet intractable 2-Wasserstein projection estimator. We apply this algorithm to robust stochastic optimization, and, in the process, uncover a new method for overcoming the curse of dimensionality in Wasserstein distributionally robust optimization.
It is common in machine learning to estimate a response $y$ given covariate information $x$. However, these predictions alone do not quantify any uncertainty associated with said predictions. One way to overcome this deficiency is with conformal inference methods, which construct a set containing the unobserved response $y$ with a prescribed probability. Unfortunately, even with a one-dimensional response, conformal inference is computationally expensive despite recent encouraging advances. In this paper, we explore multi-output regression, delivering exact derivations of conformal inference $p$-values when the predictive model can be described as a linear function of $y$. Additionally, we propose \texttt{unionCP} and a multivariate extension of \texttt{rootCP} as efficient ways of approximating the conformal prediction region for a wide array of multi-output predictors, both linear and nonlinear, while preserving computational advantages. We also provide both theoretical and empirical evidence of the effectiveness of these methods using both real-world and simulated data.
We consider a wireless distributed computing system based on the MapReduce framework, which consists of three phases: \textit{Map}, \textit{Shuffle}, and \textit{Reduce}. The system consists of a set of distributed nodes assigned to compute arbitrary output functions depending on a file library. The computation of the output functions is decomposed into Map and Reduce functions, and the Shuffle phase, which involves the data exchange, links the two. In our model, the Shuffle phase communication happens over a full-duplex wireless interference channel. For this setting, a coded wireless MapReduce distributed computing scheme exists in the literature, achieving optimal performance under one-shot linear schemes. However, the scheme requires the number of input files to be very large, growing exponentially with the number of nodes. We present schemes that require the number of files to be in the order of the number of nodes and achieve the same performance as the existing scheme. The schemes are obtained by designing a structure called wireless MapReduce array that succinctly represents all three phases in a single array. The wireless MapReduce arrays can also be obtained from the extended placement delivery arrays known for multi-antenna coded caching schemes.
The method of occupation kernels has been used to learn ordinary differential equations from data in a non-parametric way. We propose a two-step method for learning the drift and diffusion of a stochastic differential equation given snapshots of the process. In the first step, we learn the drift by applying the occupation kernel algorithm to the expected value of the process. In the second step, we learn the diffusion given the drift using a semi-definite program. Specifically, we learn the diffusion squared as a non-negative function in a RKHS associated with the square of a kernel. We present examples and simulations.
In symmetric cryptography, maximum distance separable (MDS) matrices with computationally simple inverses have wide applications. Many block ciphers like AES, SQUARE, SHARK, and hash functions like PHOTON use an MDS matrix in the diffusion layer. In this article, we first characterize all $3 \times 3$ irreducible semi-involutory matrices over the finite field of characteristic $2$. Using this matrix characterization, we provide a necessary and sufficient condition to construct MDS semi-involutory matrices using only their diagonal entries and the entries of an associated diagonal matrix. Finally, we count the number of $3 \times 3$ semi-involutory MDS matrices over any finite field of characteristic $2$.
We present a novel space-efficient graph coarsening technique for $n$-vertex planar graphs $G$, called cloud partition, which partitions the vertices $V(G)$ into disjoint sets $C$ of size $O(\log n)$ such that each $C$ induces a connected subgraph of $G$. Using this partition $P$ we construct a so-called structure-maintaining minor $F$ of $G$ via specific contractions within the disjoint sets such that $F$ has $O(n/\log n)$ vertices. The combination of $(F, P)$ is referred to as a cloud decomposition. For planar graphs we show that a cloud decomposition can be constructed in $O(n)$ time and using $O(n)$ bits. Given a cloud decomposition $(F, P)$ constructed for a planar graph $G$ we are able to find a balanced separator of $G$ in $O(n/\log n)$ time. Contrary to related publications, we do not make use of an embedding of the planar input graph. We generalize our cloud decomposition from planar graphs to $H$-minor-free graphs for any fixed graph $H$. This allows us to construct the succinct encoding scheme for $H$-minor-free graphs due to Blelloch and Farzan (CPM 2010) in $O(n)$ time and $O(n)$ bits improving both runtime and space by a factor of $\Theta(\log n)$. As an additional application of our cloud decomposition we show that, for $H$-minor-free graphs, a tree decomposition of width $O(n^{1/2 + \epsilon})$ for any $\epsilon > 0$ can be constructed in $O(n)$ bits and a time linear in the size of the tree decomposition. Finally, we implemented our cloud decomposition algorithm and experimentally verified its practical effectiveness on both randomly generated graphs and real-world graphs such as road networks. The obtained data shows that a simplified version of our algorithms suffices in a practical setting, as many of the theoretical worst-case scenarios are not present in the graphs we encountered.
This paper enhances and develops bridges between statistics, mechanics, and geometry. For a given system of points in $\mathbb R^k$ representing a sample of full rank, we construct an explicit pencil of confocal quadrics with the following properties: (i) All the hyperplanes for which the hyperplanar moments of inertia for the given system of points are equal, are tangent to the same quadrics from the pencil of quadrics. As an application, we develop regularization procedures for the orthogonal least square method, analogues of lasso and ridge methods from linear regression. (ii) For any given point $P$ among all the hyperplanes that contain it, the best fit is the tangent hyperplane to the quadric from the confocal pencil corresponding to the maximal Jacobi coordinate of the point $P$; the worst fit among the hyperplanes containing $P$ is the tangent hyperplane to the ellipsoid from the confocal pencil that contains $P$. The confocal pencil of quadrics provides a universal tool to solve the restricted principal component analysis restricted at any given point. Both results (i) and (ii) can be seen as generalizations of the classical result of Pearson on orthogonal regression. They have natural and important applications in the statistics of the errors-in-variables models (EIV). For the classical linear regressions we provide a geometric characterization of hyperplanes of least squares in a given direction among all hyperplanes which contain a given point. The obtained results have applications in restricted regressions, both ordinary and orthogonal ones. For the latter, a new formula for test statistic is derived. The developed methods and results are illustrated in natural statistics examples.
We consider the dunking problem: a solid body at uniform temperature $T_{\text i}$ is placed in a environment characterized by farfield temperature $T_\infty$ and spatially uniform time-independent heat transfer coefficient. We permit heterogeneous material composition: spatially dependent density, specific heat, and thermal conductivity. Mathematically, the problem is described by a heat equation with Robin boundary conditions. The crucial parameter is the Biot number -- a nondimensional heat transfer (Robin) coefficient; we consider the limit of small Biot number. We introduce first-order and second-order asymptotic approximations (in Biot number) for several quantities of interest, notably the spatial domain average temperature as a function of time; the first-order approximation is simply the standard engineering `lumped' model. We then provide asymptotic error estimates for the first-order and second-order approximations for small Biot number, and also, for the first-order approximation, alternative strict bounds valid for all Biot number. Companion numerical solutions of the heat equation confirm the effectiveness of the error estimates for small Biot number. The second-order approximation and the first-order and second-order error estimates depend on several functional outputs associated to an elliptic partial differential equation; the latter is derived from Biot-sensitivity analysis of the heat equation eigenproblem in the limit of small Biot number. Most important is $\phi$, the only functional output required for the first-order error estimates; $\phi$ admits a simple physical interpretation in terms of conduction length scale. We investigate the domain and property dependence of $\phi$: most notably, we characterize spatial domains for which the standard lumped-model error criterion -- Biot number (based on volume-to-area length scale) small -- is deficient.
In this paper, we consider the counting function $E_P(y) = |P_{y} \cap Z^{n_x}|$ for a parametric polyhedron $P_{y} = \{x \in R^{n_x} \colon A x \leq b + B y\}$, where $y \in R^{n_y}$. We give a new representation of $E_P(y)$, called a \emph{piece-wise step-polynomial with periodic coefficients}, which is a generalization of piece-wise step-polynomials and integer/rational Ehrhart's quasi-polynomials. It gives the fastest way to calculate $E_P(y)$ in certain scenarios. The most important cases are the following: 1) We show that, for the parametric polyhedron $P_y$ defined by a standard-form system $A x = y,\, x \geq 0$ with a fixed number of equalities, the function $E_P(y)$ can be represented by a polynomial-time computable function. In turn, such a representation of $E_P(y)$ can be constructed by an $poly\bigl(n, \|A\|_{\infty}\bigr)$-time algorithm; 2) Assuming again that the number of equalities is fixed, we show that integer/rational Ehrhart's quasi-polynomials of a polytope can be computed by FPT-algorithms, parameterized by sub-determinants of $A$ or its elements; 3) Our representation of $E_P$ is more efficient than other known approaches, if $A$ has bounded elements, especially if it is sparse in addition. Additionally, we provide a discussion about possible applications in the area of compiler optimization. In some "natural" assumptions on a program code, our approach has the fastest complexity bounds.