In this paper, we study the problem of relaying a single bit of information across a series of binary symmetric channels, and the associated trade-off between the number of hops $m$, the transmission time $n$, and the error probability. We introduce a simple, efficient, and deterministic protocol that attains positive information velocity (i.e., a non-vanishing ratio $\frac{m}{n}$ and small error probability) and is significantly simpler than existing protocols that do so. In addition, we characterize the optimal low-noise and high-noise scaling laws of the information velocity, and we adapt our 1-bit protocol to transmit $k$ bits over $m$ hops with $O(m+k)$ transmission time.
End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model. It promises to improve the performance of assistant systems by leveraging acoustic information lost in the intermediate textual representation and preventing cascading errors from Automatic Speech Recognition (ASR). Further, having one unified model has efficiency advantages when deploying assistant systems on-device. However, the limited number of public audio datasets with semantic parse labels hinders the research progress in this area. In this paper, we release the Spoken Task-Oriented semantic Parsing (STOP) dataset, the largest and most complex SLU dataset to be publicly available. Additionally, we define low-resource splits to establish a benchmark for improving SLU when limited labeled data is available. Furthermore, in addition to the human-recorded audio, we are releasing a TTS-generated version to benchmark the performance for low-resource domain adaptation of end-to-end SLU systems. Initial experimentation show end-to-end SLU models performing slightly worse than their cascaded counterparts, which we hope encourages future work in this direction.
In this paper, we study a sequential decision making problem faced by e-commerce carriers related to when to send out a vehicle from the central depot to serve customer requests, and in which order to provide the service, under the assumption that the time at which parcels arrive at the depot is stochastic and dynamic. The objective is to maximize the number of parcels that can be delivered during the service hours. We propose two reinforcement learning approaches for solving this problem, one based on a policy function approximation (PFA) and the second on a value function approximation (VFA). Both methods are combined with a look-ahead strategy, in which future release dates are sampled in a Monte-Carlo fashion and a tailored batch approach is used to approximate the value of future states. Our PFA and VFA make a good use of branch-and-cut-based exact methods to improve the quality of decisions. We also establish sufficient conditions for partial characterization of optimal policy and integrate them into PFA/VFA. In an empirical study based on 720 benchmark instances, we conduct a competitive analysis using upper bounds with perfect information and we show that PFA and VFA greatly outperform two alternative myopic approaches. Overall, PFA provides best solutions, while VFA (which benefits from a two-stage stochastic optimization model) achieves a better tradeoff between solution quality and computing time.
Finding meaningful concepts in engineering application datasets which allow for a sensible grouping of designs is very helpful in many contexts. It allows for determining different groups of designs with similar properties and provides useful knowledge in the engineering decision making process. Also, it opens the route for further refinements of specific design candidates which exhibit certain characteristic features. In this work, an approach to define meaningful and consistent concepts in an existing engineering dataset is presented. The designs in the dataset are characterized by a multitude of features such as design parameters, geometrical properties or performance values of the design for various boundary conditions. In the proposed approach the complete feature set is partitioned into several subsets called description spaces. The definition of the concepts respects this partitioning which leads to several desired properties of the identified concepts. This cannot be achieved with state-of-the-art clustering or concept identification approaches. A novel concept quality measure is proposed, which provides an objective value for a given definition of concepts in a dataset. The usefulness of the measure is demonstrated by considering a realistic engineering dataset consisting of about 2500 airfoil profiles, for which the performance values (lift and drag) for three different operating conditions were obtained by a computational fluid dynamics simulation. A numerical optimization procedure is employed, which maximizes the concept quality measure and finds meaningful concepts for different setups of the description spaces, while also incorporating user preference. It is demonstrated how these concepts can be used to select archetypal representatives of the dataset which exhibit characteristic features of each concept.
We study the problem of providing channel state information (CSI) at the transmitter in multi-user massive MIMO systems operating in frequency division duplexing (FDD). The wideband MIMO channel is a vector-valued random process correlated in time, space (antennas), and frequency (subcarriers). The base station (BS) broadcasts periodically beta_tr pilot symbols from its M antenna ports to K single-antenna users (UEs). Correspondingly, the K UEs send feedback messages about their channel state using beta_fb symbols in the uplink (UL). Using results from remote rate-distortion theory, we show that, as snr reaches infty, the optimal feedback strategy achieves a channel state estimation mean squared error (MSE) that behaves as Theta(1) if beta_tr < r and as Theta(snr^(-alpha)) when beta_tr >=r, where alpha = min(beta_fb/r, 1), where r is the rank of the channel covariance matrix. The MSE-optimal rate-distortion strategy implies encoding of long sequences of channel states, which would yield completely stale CSI and therefore poor multiuser precoding performance. Hence, we consider three practical one-shot CSI strategies with minimum one-slot delay and analyze their large-SNR channel estimation MSE behavior. These are: (1) digital feedback via entropy-coded scalar quantization (ECSQ), (2) analog feedback (AF), and (3) local channel estimation at the UEs and digital feedback. These schemes have different requirements in terms of knowledge of the channel statistics at the UE and at the BS. In particular, the latter strategy requires no statistical knowledge and is closely inspired by a CSI feedback scheme currently proposed in 3GPP standardization.
The monotone minimal perfect hash function (MMPHF) problem is the following indexing problem. Given a set $S= \{s_1,\ldots,s_n\}$ of $n$ distinct keys from a universe $U$ of size $u$, create a data structure $DS$ that answers the following query: \[ RankOp(q) = \text{rank of } q \text{ in } S \text{ for all } q\in S ~\text{ and arbitrary answer otherwise.} \] Solutions to the MMPHF problem are in widespread use in both theory and practice. The best upper bound known for the problem encodes $DS$ in $O(n\log\log\log u)$ bits and performs queries in $O(\log u)$ time. It has been an open problem to either improve the space upper bound or to show that this somewhat odd looking bound is tight. In this paper, we show the latter: specifically that any data structure (deterministic or randomized) for monotone minimal perfect hashing of any collection of $n$ elements from a universe of size $u$ requires $\Omega(n \cdot \log\log\log{u})$ expected bits to answer every query correctly. We achieve our lower bound by defining a graph $\mathbf{G}$ where the nodes are the possible ${u \choose n}$ inputs and where two nodes are adjacent if they cannot share the same $DS$. The size of $DS$ is then lower bounded by the log of the chromatic number of $\mathbf{G}$. Finally, we show that the fractional chromatic number (and hence the chromatic number) of $\mathbf{G}$ is lower bounded by $2^{\Omega(n \log\log\log u)}$.
It has been rightfully emphasized that the use of AI for clinical decision making could amplify health disparities. An algorithm may encode protected characteristics, and then use this information for making predictions due to undesirable correlations in the (historical) training data. It remains unclear how we can establish whether such information is actually used. Besides the scarcity of data from underserved populations, very little is known about how dataset biases manifest in predictive models and how this may result in disparate performance. This article aims to shed some light on these issues by exploring new methodology for subgroup analysis in image-based disease detection models. We utilize two publicly available chest X-ray datasets, CheXpert and MIMIC-CXR, to study performance disparities across race and biological sex in deep learning models. We explore test set resampling, transfer learning, multitask learning, and model inspection to assess the relationship between the encoding of protected characteristics and disease detection performance across subgroups. We confirm subgroup disparities in terms of shifted true and false positive rates which are partially removed after correcting for population and prevalence shifts in the test sets. We further find a previously used transfer learning method to be insufficient for establishing whether specific patient information is used for making predictions. The proposed combination of test-set resampling, multitask learning, and model inspection reveals valuable new insights about the way protected characteristics are encoded in the feature representations of deep neural networks.
When learning disconnected distributions, Generative adversarial networks (GANs) are known to face model misspecification. Indeed, a continuous mapping from a unimodal latent distribution to a disconnected one is impossible, so GANs necessarily generate samples outside of the support of the target distribution. This raises a fundamental question: what is the latent space partition that minimizes the measure of these areas? Building on a recent result of geometric measure theory, we prove that an optimal GANs must structure its latent space as a 'simplicial cluster' - a Voronoi partition where cells are convex cones - when the dimension of the latent space is larger than the number of modes. In this configuration, each Voronoi cell maps to a distinct mode of the data. We derive both an upper and a lower bound on the optimal precision of GANs learning disconnected manifolds. Interestingly, these two bounds have the same order of decrease: $\sqrt{\log m}$, $m$ being the number of modes. Finally, we perform several experiments to exhibit the geometry of the latent space and experimentally show that GANs have a geometry with similar properties to the theoretical one.
$n$-cycle permutations with small $n$ have the advantage that their compositional inverses are efficient in terms of implementation. They can be also used in constructing Bent functions and designing codes. Since the AGW Criterion was proposed, the permuting property of several forms of polynomials has been studied. In this paper, characterizations of several types of $n$-cycle permutations are investigated. Three criteria for $ n $-cycle permutations of the form $xh(\lambda(x))$, $ h(\psi(x)) \varphi(x)+g(\psi(x)) $ and $g\left( x^{q^i} -x +\delta \right) +bx $ with general $n$ are provided. We demonstrate these criteria by providing explicit constructions. For the form of $x^rh(x^s)$, several new explicit triple-cycle permutations are also provided. Finally, we also consider triple-cycle permutations of the form $x^t + c\rm Tr_{q^m/q}(x^s)$ and provide one explicit construction. Many of our constructions are both new in the $n$-cycle property and the permutation property.
The fundamental challenge of drawing causal inference is that counterfactual outcomes are not fully observed for any unit. Furthermore, in observational studies, treatment assignment is likely to be confounded. Many statistical methods have emerged for causal inference under unconfoundedness conditions given pre-treatment covariates, including propensity score-based methods, prognostic score-based methods, and doubly robust methods. Unfortunately for applied researchers, there is no `one-size-fits-all' causal method that can perform optimally universally. In practice, causal methods are primarily evaluated quantitatively on handcrafted simulated data. Such data-generative procedures can be of limited value because they are typically stylized models of reality. They are simplified for tractability and lack the complexities of real-world data. For applied researchers, it is critical to understand how well a method performs for the data at hand. Our work introduces a deep generative model-based framework, Credence, to validate causal inference methods. The framework's novelty stems from its ability to generate synthetic data anchored at the empirical distribution for the observed sample, and therefore virtually indistinguishable from the latter. The approach allows the user to specify ground truth for the form and magnitude of causal effects and confounding bias as functions of covariates. Thus simulated data sets are used to evaluate the potential performance of various causal estimation methods when applied to data similar to the observed sample. We demonstrate Credence's ability to accurately assess the relative performance of causal estimation techniques in an extensive simulation study and two real-world data applications from Lalonde and Project STAR studies.
Link prediction on knowledge graphs (KGs) is a key research topic. Previous work mainly focused on binary relations, paying less attention to higher-arity relations although they are ubiquitous in real-world KGs. This paper considers link prediction upon n-ary relational facts and proposes a graph-based approach to this task. The key to our approach is to represent the n-ary structure of a fact as a small heterogeneous graph, and model this graph with edge-biased fully-connected attention. The fully-connected attention captures universal inter-vertex interactions, while with edge-aware attentive biases to particularly encode the graph structure and its heterogeneity. In this fashion, our approach fully models global and local dependencies in each n-ary fact, and hence can more effectively capture associations therein. Extensive evaluation verifies the effectiveness and superiority of our approach. It performs substantially and consistently better than current state-of-the-art across a variety of n-ary relational benchmarks. Our code is publicly available.