We consider networked sources that generate update messages with a defined rate and we investigate the age of that information at the receiver. Typical applications are in cyber-physical systems that depend on timely sensor updates. We phrase the age of information in the min-plus algebra of the network calculus. This facilitates a variety of models including wireless channels and schedulers with random cross-traffic, as well as sources with periodic and random updates, respectively. We show how the age of information depends on the network service where, e.g., outages of a wireless channel cause delays. Further, our analytical expressions show two regimes depending on the update rate, where the age of information is either dominated by congestive delays or by idle waiting. We find that the optimal update rate strikes a balance between these two effects.
This paper introduces a new simulation-based inference procedure to model and sample from multi-dimensional probability distributions given access to i.i.d. samples, circumventing the usual approaches of explicitly modeling the density function or designing Markov chain Monte Carlo. Motivated by the seminal work on distance and isomorphism between metric measure spaces, we propose a new notion called the Reversible Gromov-Monge (RGM) distance and study how RGM can be used to design new transform samplers to perform simulation-based inference. Our RGM sampler can also estimate optimal alignments between two heterogeneous metric measure spaces $(\mathcal{X}, \mu, c_{\mathcal{X}})$ and $(\mathcal{Y}, \nu, c_{\mathcal{Y}})$ from empirical data sets, with estimated maps that approximately push forward one measure $\mu$ to the other $\nu$, and vice versa. Analytic properties of the RGM distance are derived; statistical rate of convergence, representation, and optimization questions regarding the induced sampler are studied. Synthetic and real-world examples showcasing the effectiveness of the RGM sampler are also demonstrated.
We study the distributed minimum spanning tree (MST) problem, a fundamental problem in distributed computing. It is well-known that distributed MST can be solved in $\tilde{O}(D+\sqrt{n})$ rounds in the standard CONGEST model (where $n$ is the network size and $D$ is the network diameter) and this is essentially the best possible round complexity (up to logarithmic factors). However, in resource-constrained networks such as ad hoc wireless and sensor networks, nodes spending so much time can lead to significant spending of resources such as energy. Motivated by the above consideration, we study distributed algorithms for MST under the \emph{sleeping model} [Chatterjee et al., PODC 2020], a model for design and analysis of resource-efficient distributed algorithms. In the sleeping model, a node can be in one of two modes in any round -- \emph{sleeping} or \emph{awake} (unlike the traditional model where nodes are always awake). Only the rounds in which a node is \emph{awake} are counted, while \emph{sleeping} rounds are ignored. A node spends resources only in the awake rounds and hence the main goal is to minimize the \emph{awake complexity} of a distributed algorithm, the worst-case number of rounds any node is awake. We present deterministic and randomized distributed MST algorithms that have an \emph{optimal} awake complexity of $O(\log n)$ time with a matching lower bound. We also show that our randomized awake-optimal algorithm has essentially the best possible round complexity by presenting a lower bound of $\tilde{\Omega}(n)$ on the product of the awake and round complexity of any distributed algorithm (including randomized) that outputs an MST, where $\tilde{\Omega}$ hides a $1/(\text{polylog } n)$ factor.
This paper considers the problem of inference in cluster randomized experiments when cluster sizes are non-ignorable. Here, by a cluster randomized experiment, we mean one in which treatment is assigned at the level of the cluster; by non-ignorable cluster sizes we mean that "large" clusters and "small" clusters may be heterogeneous, and, in particular, the effects of the treatment may vary across clusters of differing sizes. In order to permit this sort of flexibility, we consider a sampling framework in which cluster sizes themselves are random. In this way, our analysis departs from earlier analyses of cluster randomized experiments in which cluster sizes are treated as non-random. We distinguish between two different parameters of interest: the equally-weighted cluster-level average treatment effect, and the size-weighted cluster-level average treatment effect. For each parameter, we provide methods for inference in an asymptotic framework where the number of clusters tends to infinity and treatment is assigned using simple random sampling. We additionally permit the experimenter to sample only a subset of the units within each cluster rather than the entire cluster and demonstrate the implications of such sampling for some commonly used estimators. A small simulation study shows the practical relevance of our theoretical results.
Weighted automata are a generalization of nondeterministic automata that associate a weight drawn from a semiring $K$ with every transition and every state. Their behaviours can be formalized either as weighted language equivalence or weighted bisimulation. In this paper we explore the properties of weighted automata in the framework of coalgebras over (i) the category $\mathsf{SMod}$ of semimodules over a semiring $K$ and $K$-linear maps, and (ii) the category $\mathsf{Set}$ of sets and maps. We show that the behavioural equivalences defined by the corresponding final coalgebras in these two cases characterize weighted language equivalence and weighted bisimulation, respectively. These results extend earlier work by Bonchi et al. using the category $\mathsf{Vect}$ of vector spaces and linear maps as the underlying model for weighted automata with weights drawn from a field $K$. The key step in our work is generalizing the notions of linear relation and linear bisimulation of Boreale from vector spaces to semimodules using the concept of the kernel of a $K$-linear map in the sense of universal algebra. We also provide an abstract procedure for forward partition refinement for computing weighted language equivalence. Since for weighted automata defined over semirings the problem is undecidable in general, it is guaranteed to halt only in special cases. We provide sufficient conditions for the termination of our procedure. Although the results are similar to those of Bonchi et al., many of our proofs are new, especially those about the coalgebra in $\mathsf{SMod}$ characterizing weighted language equivalence.
We provide a decision theoretic analysis of bandit experiments. The setting corresponds to a dynamic programming problem, but solving this directly is typically infeasible. Working within the framework of diffusion asymptotics, we define suitable notions of asymptotic Bayes and minimax risk for bandit experiments. For normally distributed rewards, the minimal Bayes risk can be characterized as the solution to a nonlinear second-order partial differential equation (PDE). Using a limit of experiments approach, we show that this PDE characterization also holds asymptotically under both parametric and non-parametric distribution of the rewards. The approach further describes the state variables it is asymptotically sufficient to restrict attention to, and therefore suggests a practical strategy for dimension reduction. The upshot is that we can approximate the dynamic programming problem defining the bandit experiment with a PDE which can be efficiently solved using sparse matrix routines. We derive the optimal Bayes and minimax policies from the numerical solutions to these equations. The proposed policies substantially dominate existing methods such as Thompson sampling. The framework also allows for substantial generalizations to the bandit problem such as time discounting and pure exploration motives.
We study the acceleration of the Local Polynomial Interpolation-based Gradient Descent method (LPI-GD) recently proposed for the approximate solution of empirical risk minimization problems (ERM). We focus on loss functions that are strongly convex and smooth with condition number $\sigma$. We additionally assume the loss function is $\eta$-H\"older continuous with respect to the data. The oracle complexity of LPI-GD is $\tilde{O}\left(\sigma m^d \log(1/\varepsilon)\right)$ for a desired accuracy $\varepsilon$, where $d$ is the dimension of the parameter space, and $m$ is the cardinality of an approximation grid. The factor $m^d$ can be shown to scale as $O((1/\varepsilon)^{d/2\eta})$. LPI-GD has been shown to have better oracle complexity than gradient descent (GD) and stochastic gradient descent (SGD) for certain parameter regimes. We propose two accelerated methods for the ERM problem based on LPI-GD and show an oracle complexity of $\tilde{O}\left(\sqrt{\sigma} m^d \log(1/\varepsilon)\right)$. Moreover, we provide the first empirical study on local polynomial interpolation-based gradient methods and corroborate that LPI-GD has better performance than GD and SGD in some scenarios, and the proposed methods achieve acceleration.
Low-rank matrix estimation under heavy-tailed noise is challenging, both computationally and statistically. Convex approaches have been proven statistically optimal but suffer from high computational costs, especially since robust loss functions are usually non-smooth. More recently, computationally fast non-convex approaches via sub-gradient descent are proposed, which, unfortunately, fail to deliver a statistically consistent estimator even under sub-Gaussian noise. In this paper, we introduce a novel Riemannian sub-gradient (RsGrad) algorithm which is not only computationally efficient with linear convergence but also is statistically optimal, be the noise Gaussian or heavy-tailed. Convergence theory is established for a general framework and specific applications to absolute loss, Huber loss, and quantile loss are investigated. Compared with existing non-convex methods, ours reveals a surprising phenomenon of dual-phase convergence. In phase one, RsGrad behaves as in a typical non-smooth optimization that requires gradually decaying stepsizes. However, phase one only delivers a statistically sub-optimal estimator which is already observed in the existing literature. Interestingly, during phase two, RsGrad converges linearly as if minimizing a smooth and strongly convex objective function and thus a constant stepsize suffices. Underlying the phase-two convergence is the smoothing effect of random noise to the non-smooth robust losses in an area close but not too close to the truth. Lastly, RsGrad is applicable for low-rank tensor estimation under heavy-tailed noise where a statistically optimal rate is attainable with the same phenomenon of dual-phase convergence, and a novel shrinkage-based second-order moment method is guaranteed to deliver a warm initialization. Numerical simulations confirm our theoretical discovery and showcase the superiority of RsGrad over prior methods.
Convection-diffusion-reaction equations model the conservation of scalar quantities. From the analytic point of view, solution of these equations satisfy under certain conditions maximum principles, which represent physical bounds of the solution. That the same bounds are respected by numerical approximations of the solution is often of utmost importance in practice. The mathematical formulation of this property, which contributes to the physical consistency of a method, is called Discrete Maximum Principle (DMP). In many applications, convection dominates diffusion by several orders of magnitude. It is well known that standard discretizations typically do not satisfy the DMP in this convection-dominated regime. In fact, in this case, it turns out to be a challenging problem to construct discretizations that, on the one hand, respect the DMP and, on the other hand, compute accurate solutions. This paper presents a survey on finite element methods, with a main focus on the convection-dominated regime, that satisfy a local or a global DMP. The concepts of the underlying numerical analysis are discussed. The survey reveals that for the steady-state problem there are only a few discretizations, all of them nonlinear, that at the same time satisfy the DMP and compute reasonably accurate solutions, e.g., algebraically stabilized schemes. Moreover, most of these discretizations have been developed in recent years, showing the enormous progress that has been achieved lately. Methods based on algebraic stabilization, nonlinear and linear ones, are currently as well the only finite element methods that combine the satisfaction of the global DMP and accurate numerical results for the evolutionary equations in the convection-dominated situation.
One of the most important problems in system identification and statistics is how to estimate the unknown parameters of a given model. Optimization methods and specialized procedures, such as Empirical Minimization (EM) can be used in case the likelihood function can be computed. For situations where one can only simulate from a parametric model, but the likelihood is difficult or impossible to evaluate, a technique known as the Two-Stage (TS) Approach can be applied to obtain reliable parametric estimates. Unfortunately, there is currently a lack of theoretical justification for TS. In this paper, we propose a statistical decision-theoretical derivation of TS, which leads to Bayesian and Minimax estimators. We also show how to apply the TS approach on models for independent and identically distributed samples, by computing quantiles of the data as a first step, and using a linear function as the second stage. The proposed method is illustrated via numerical simulations.
This study investigates whether the phonological features derived from the Featurally Underspecified Lexicon model can be applied in text-to-speech systems to generate native and non-native speech in English and Mandarin. We present a mapping of ARPABET/pinyin to SAMPA/SAMPA-SC and then to phonological features. This mapping was tested for whether it could lead to the successful generation of native, non-native, and code-switched speech in the two languages. We ran two experiments, one with a small dataset and one with a larger dataset. The results supported that phonological features could be used as a feasible input system for languages in or not in the train data, although further investigation is needed to improve model performance. The results lend support to FUL by presenting successfully synthesised output, and by having the output carrying a source-language accent when synthesising a language not in the training data. The TTS process stimulated human second language acquisition process and thus also confirm FUL's ability to account for acquisition.