Inspired by several delay-bounded mission-critical applications, optimizing the end-to-end reliability of multi-hop networks is an important problem subject to end-to-end delay constraints on the packets. Towards that direction, Automatic Repeat Request (ARQ) based strategies have been recently proposed wherein the problem statement is to distribute a certain total number of ARQs (that capture end-to-end delay) across the nodes such that the end-to-end reliability is optimized. Although such strategies provide a fine control to trade end-to-end delay with end-to-end reliability, their performance degrades in slowly-varying channel conditions. Pointing at this drawback, in this work, we propose a Chase Combing Hybrid ARQ (CC-HARQ) based multi-hop network addressing the problem statement of how to distribute a certain total number of ARQs such that the end-to-end reliability is optimized. Towards solving the problem, first we identify that the objective function of the optimization problem is intractable due to the presence of Marcum-Q functions in it. As a result, we propose an approximation on the objective function and then prove a set of necessary and sufficient conditions on the near-optimal ARQ distribution. Subsequently, we propose a low-complexity algorithm to solve the problem for any network size. We show that CC-HARQ based strategies are particularly appealing in slow-fading channels wherein the existing ARQ strategies fail.
This paper is devoted to the robust approximation with a variational phase field approach of multiphase mean curvature flows with possibly highly contrasted mobilities. The case of harmonically additive mobilities has been addressed recently using a suitable metric to define the gradient flow of the phase field approximate energy. We generalize this approach to arbitrary nonnegative mobilities using a decomposition as sums of harmonically additive mobilities. We establish the consistency of the resulting method by analyzing the sharp interface limit of the flow: a formal expansion of the phase field shows that the method is of second order. We propose a simple numerical scheme to approximate the solutions to our new model. Finally, we present some numerical experiments in dimensions 2 and 3 that illustrate the interest and effectiveness of our approach, in particular for approximating flows in which the mobility of some phases is zero.
Modeling and control of high-dimensional, nonlinear robotic systems remains a challenging task. While various model- and learning-based approaches have been proposed to address these challenges, they broadly lack generalizability to different control tasks and rarely preserve the structure of the dynamics. In this work, we propose a new, data-driven approach for extracting low-dimensional models from data using Spectral Submanifold Reduction (SSMR). In contrast to other data-driven methods which fit dynamical models to training trajectories, we identify the dynamics on generic, low-dimensional attractors embedded in the full phase space of the robotic system. This allows us to obtain computationally-tractable models for control which preserve the system's dominant dynamics and better track trajectories radically different from the training data. We demonstrate the superior performance and generalizability of SSMR in dynamic trajectory tracking tasks vis-a-vis the state of the art, including Koopman operator-based approaches.
The $h$-index is a metric used to measure the impact of a user in a publication setting, such as a member of a social network with many highly liked posts or a researcher in an academic domain with many highly cited publications. Specifically, the $h$-index of a user is the largest integer $h$ such that at least $h$ publications of the user have at least $h$ units of positive feedback. We design an algorithm that, given query access to the $n$ publications of a user and each publication's corresponding positive feedback number, outputs a $(1\pm \varepsilon)$-approximation of the $h$-index of this user with probability at least $1-\delta$ in time \[ O(\frac{n \cdot \ln{(1/\delta)}}{\varepsilon^2 \cdot h}), \] where $h$ is the actual $h$-index which is unknown to the algorithm a-priori. We then design a novel lower bound technique that allows us to prove that this bound is in fact asymptotically optimal for this problem in all parameters $n,h,\varepsilon,$ and $\delta$. Our work is one of the first in sublinear time algorithms that addresses obtaining asymptotically optimal bounds, especially in terms of the error and confidence parameters. As such, we focus on designing novel techniques for this task. In particular, our lower bound technique seems quite general -- to showcase this, we also use our approach to prove an asymptotically optimal lower bound for the problem of estimating the number of triangles in a graph in sublinear time, which now is also optimal in the error and confidence parameters. This result improves upon prior lower bounds of Eden, Levi, Ron, and Seshadhri (FOCS'15) for this problem, as well as multiple follow-ups that extended this lower bound to other subgraph counting problems.
Binary stars undergo a variety of interactions and evolutionary phases, critical for predicting and explaining observed properties. Binary population synthesis with full stellar-structure and evolution simulations are computationally expensive requiring a large number of mass-transfer sequences. The recently developed binary population synthesis code POSYDON incorporates grids of MESA binary star simulations which are then interpolated to model large-scale populations of massive binaries. The traditional method of computing a high-density rectilinear grid of simulations is not scalable for higher-dimension grids, accounting for a range of metallicities, rotation, and eccentricity. We present a new active learning algorithm, psy-cris, which uses machine learning in the data-gathering process to adaptively and iteratively select targeted simulations to run, resulting in a custom, high-performance training set. We test psy-cris on a toy problem and find the resulting training sets require fewer simulations for accurate classification and regression than either regular or randomly sampled grids. We further apply psy-cris to the target problem of building a dynamic grid of MESA simulations, and we demonstrate that, even without fine tuning, a simulation set of only $\sim 1/4$ the size of a rectilinear grid is sufficient to achieve the same classification accuracy. We anticipate further gains when algorithmic parameters are optimized for the targeted application. We find that optimizing for classification only may lead to performance losses in regression, and vice versa. Lowering the computational cost of producing grids will enable future versions of POSYDON to cover more input parameters while preserving interpolation accuracies.
In machine learning, training data often capture the behaviour of multiple subgroups of some underlying human population. When the nature of training data for subgroups are not controlled carefully, under-representation bias arises. To counter this effect we introduce two natural notions of subgroup fairness and instantaneous fairness to address such under-representation bias in time-series forecasting problems. Here we show globally convergent methods for the fairness-constrained learning problems using hierarchies of convexifications of non-commutative polynomial optimisation problems. Our empirical results on a biased data set motivated by insurance applications and the well-known COMPAS data set demonstrate the efficacy of our methods. We also show that by exploiting sparsity in the convexifications, we can reduce the run time of our methods considerably.
In this paper, we derive variational formulas for the asymptotic exponents of the concentration and isoperimetric functions in the product Polish probability space. These formulas are expressed in terms of relative entropies (which are from information theory) and optimal transport cost functionals (which are from optimal transport theory). Our results verify an intimate connection among information theory, optimal transport, and concentration of measure or isoperimetric inequalities. In the concentration regime, the corresponding variational formula is in fact a dimension-free bound on the exponent of the concentration function. The proofs in this paper are based on information-theoretic and optimal transport techniques. Our results generalize Alon, Boppana, and Spencer's in \cite{alon1998asymptotic}, Gozlan and L\'eonard's \cite{gozlan2007large}, and Ahlswede and Zhang's in \cite{ahlswede1999asymptotical}.
Variational Bayesian posterior inference often requires simplifying approximations such as mean-field parametrisation to ensure tractability. However, prior work has associated the variational mean-field approximation for Bayesian neural networks with underfitting in the case of small datasets or large model sizes. In this work, we show that invariances in the likelihood function of over-parametrised models contribute to this phenomenon because these invariances complicate the structure of the posterior by introducing discrete and/or continuous modes which cannot be well approximated by Gaussian mean-field distributions. In particular, we show that the mean-field approximation has an additional gap in the evidence lower bound compared to a purpose-built posterior that takes into account the known invariances. Importantly, this invariance gap is not constant; it vanishes as the approximation reverts to the prior. We proceed by first considering translation invariances in a linear model with a single data point in detail. We show that, while the true posterior can be constructed from a mean-field parametrisation, this is achieved only if the objective function takes into account the invariance gap. Then, we transfer our analysis of the linear model to neural networks. Our analysis provides a framework for future work to explore solutions to the invariance problem.
Designing learning systems which are invariant to certain data transformations is critical in machine learning. Practitioners can typically enforce a desired invariance on the trained model through the choice of a network architecture, e.g. using convolutions for translations, or using data augmentation. Yet, enforcing true invariance in the network can be difficult, and data invariances are not always known a piori. State-of-the-art methods for learning data augmentation policies require held-out data and are based on bilevel optimization problems, which are complex to solve and often computationally demanding. In this work we investigate new ways of learning invariances only from the training data. Using learnable augmentation layers built directly in the network, we demonstrate that our method is very versatile. It can incorporate any type of differentiable augmentation and be applied to a broad class of learning problems beyond computer vision. We provide empirical evidence showing that our approach is easier and faster to train than modern automatic data augmentation techniques based on bilevel optimization, while achieving comparable results. Experiments show that while the invariances transferred to a model through automatic data augmentation are limited by the model expressivity, the invariance yielded by our approach is insensitive to it by design.
When is heterogeneity in the composition of an autonomous robotic team beneficial and when is it detrimental? We investigate and answer this question in the context of a minimally viable model that examines the role of heterogeneous speeds in perimeter defense problems, where defenders share a total allocated speed budget. We consider two distinct problem settings and develop strategies based on dynamic programming and on local interaction rules. We present a theoretical analysis of both approaches and our results are extensively validated using simulations. Interestingly, our results demonstrate that the viability of heterogeneous teams depends on the amount of information available to the defenders. Moreover, our results suggest a universality property: across a wide range of problem parameters the optimal ratio of the speeds of the defenders remains nearly constant.
The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.