In this paper, we consider a general K-user Gaussian multiple-input multiple-output (MIMO) broadcast channel (BC). We assume that the channel state is deterministic and known to all the nodes. While the private-message capacity region is well known to be achievable with dirty paper coding (DPC), we are interested in the simpler linearly precoded transmission schemes. In particular, we focus on linear precoding schemes combined with rate-splitting (RS). First, we derive an achievable rate region with minimum mean square error (MMSE) precoding at the transmitter and joint decoding of the sub-messages at the receivers. Then, we study the achievable sum rate of this scheme and obtain two findings: 1) an analytically tractable upper bound on the sum rate that is shown numerically to be a close approximation, and 2) how to reduce the number of active streams -- crucial to the overall complexity -- while preserving the sum rate to within a constant loss. The latter results in two practical algorithms: a stream elimination algorithm and a stream ordering algorithm. Finally, we investigate the constant-gap optimality of linearly precoded RS with respect to the capacity. Our result reveals that, while the achievable rate of linear precoding alone can be arbitrarily far from the capacity, the introduction of RS can help achieve the capacity region to within a constant gap in the two-user case. Nevertheless, we prove that the RS scheme's constant-gap optimality does not extend to the three-user case. Specifically, we show, through a pathological example, that the gap between the sum rate and the sum capacity can be unbounded.
In this work, we investigate a novel simultaneous transmission and reflection reconfigurable intelligent surface (RIS)-assisted multiple-input multiple-output downlink system, where three practical transmission protocols, namely, energy splitting (ES), mode selection (MS), and time splitting (TS), are studied. For the system under consideration, we maximize the weighted sum rate with multiple coupled variables. To solve this optimization problem, a block coordinate descent algorithm is proposed to reformulate this problem and design the precoding matrices and the transmitting and reflecting coefficients (TARCs) in an alternate manner. Specifically, for the ES scheme, the precoding matrices are solved using the Lagrange dual method, while the TARCs are obtained using the penalty concave-convex method. Additionally, the proposed method is extended to the MS scheme by solving a mixed-integer problem. Moreover, we solve the formulated problem for the TS scheme using a one-dimensional search and the Majorization-Minimization technique. Our simulation results reveal that: 1) Simultaneous transmission and reflection RIS (STAR-RIS) can achieve better performance than reflecting-only RIS; 2) In unicast communication, TS scheme outperforms the ES and MS schemes, while in broadcast communication, ES scheme outperforms the TS and MS schemes.
This paper studies a memoryless state-dependent multiple access channel (MAC) where two transmitters wish to convey a message to a receiver under the assumption of causal and imperfect channel state information at transmitters (CSIT) and imperfect channel state information at receiver (CSIR). In order to emphasize the limitation of transmitter cooperation between physically distributed nodes, we focus on the so-called distributed CSIT assumption, i.e. where each transmitter has its individual channel knowledge, while the message can be assumed to be partially or entirely shared a priori between transmitters by exploiting some on-board memory. Under this setup, the first part of the paper characterizes the common message capacity of the channel at hand for arbitrary CSIT and CSIR structure. The optimal scheme builds on Shannon strategies, i.e. optimal codes are constructed by letting the channel inputs be a function of current CSIT only. For a special case when CSIT is a deterministic function of CSIR, the considered scheme also achieves the capacity region of a common message and two private messages. The second part addresses an important instance of the previous general result in a context of a cooperative multi-antenna Gaussian channel under i.i.d. fading operating in frequency-division duplex mode, such that CSIT is acquired via an explicit feedback of perfect CSIR. The capacity of the channel at hand is achieved by distributed linear precoding applied to Gaussian codes. Surprisingly, we demonstrate that it is suboptimal to send a number of data streams bounded by the number of transmit antennas as typically considered in a centralized CSIT setup. Finally, numerical examples are provided to evaluate the sum capacity of the binary MAC with binary states as well as the Gaussian MAC with i.i.d. fading.
Inspired by the demands of real-time climate and weather forecasting, we develop optimistic online learning algorithms that require no parameter tuning and have optimal regret guarantees under delayed feedback. Our algorithms -- DORM, DORM+, and AdaHedgeD -- arise from a novel reduction of delayed online learning to optimistic online learning that reveals how optimistic hints can mitigate the regret penalty caused by delay. We pair this delay-as-optimism perspective with a new analysis of optimistic learning that exposes its robustness to hinting errors and a new meta-algorithm for learning effective hinting strategies in the presence of delay. We conclude by benchmarking our algorithms on four subseasonal climate forecasting tasks, demonstrating low regret relative to state-of-the-art forecasting models.
Communication over a quantum broadcast channel with cooperation between the receivers is considered. The first form of cooperation addressed is classical conferencing, where Receiver 1 can send classical messages to Receiver 2. Another cooperation setting involves quantum conferencing, where Receiver 1 can teleport a quantum state to Receiver 2. When Receiver 1 is not required to recover information and its sole purpose is to help the transmission to Receiver 2, the model reduces to the quantum primitive relay channel. The quantum conferencing setting is intimately related to quantum repeaters, as the sender, Receiver 1, and Receiver 2 can be viewed as the transmitter, the repeater, and the destination receiver, respectively. We develop lower and upper bounds on the capacity region in each setting. In particular, the cutset upper bound and the decode-forward lower bound are derived for the primitive relay channel. Furthermore, we present an entanglement-formation lower bound, where a virtual channel is simulated through the conference link. At last, we show that as opposed to the multiple access channel with entangled encoders, entanglement between decoders does not increase the classical communication rates for the broadcast dual.
We analyze (stochastic) gradient descent (SGD) with delayed updates on smooth quasi-convex and non-convex functions and derive concise, non-asymptotic, convergence rates. We show that the rate of convergence in all cases consists of two terms: (i) a stochastic term which is not affected by the delay, and (ii) a higher order deterministic term which is only linearly slowed down by the delay. Thus, in the presence of noise, the effects of the delay become negligible after a few iterations and the algorithm converges at the same optimal rate as standard SGD. This result extends a line of research that showed similar results in the asymptotic regime or for strongly-convex quadratic functions only. We further show similar results for SGD with more intricate form of delayed gradients---compressed gradients under error compensation and for local~SGD where multiple workers perform local steps before communicating with each other. In all of these settings, we improve upon the best known rates. These results show that SGD is robust to compressed and/or delayed stochastic gradient updates. This is in particular important for distributed parallel implementations, where asynchronous and communication efficient methods are the key to achieve linear speedups for optimization with multiple devices.
We consider the problem of estimating a $d$-dimensional $s$-sparse discrete distribution from its samples observed under a $b$-bit communication constraint. The best-known previous result on $\ell_2$ estimation error for this problem is $O\left( \frac{s\log\left( {d}/{s}\right)}{n2^b}\right)$. Surprisingly, we show that when sample size $n$ exceeds a minimum threshold $n^*(s, d, b)$, we can achieve an $\ell_2$ estimation error of $O\left( \frac{s}{n2^b}\right)$. This implies that when $n>n^*(s, d, b)$ the convergence rate does not depend on the ambient dimension $d$ and is the same as knowing the support of the distribution beforehand. We next ask the question: ``what is the minimum $n^*(s, d, b)$ that allows dimension-free convergence?''. To upper bound $n^*(s, d, b)$, we develop novel localization schemes to accurately and efficiently localize the unknown support. For the non-interactive setting, we show that $n^*(s, d, b) = O\left( \min \left( {d^2\log^2 d}/{2^b}, {s^4\log^2 d}/{2^b}\right) \right)$. Moreover, we connect the problem with non-adaptive group testing and obtain a polynomial-time estimation scheme when $n = \tilde{\Omega}\left({s^4\log^4 d}/{2^b}\right)$. This group testing based scheme is adaptive to the sparsity parameter $s$, and hence can be applied without knowing it. For the interactive setting, we propose a novel tree-based estimation scheme and show that the minimum sample-size needed to achieve dimension-free convergence can be further reduced to $n^*(s, d, b) = \tilde{O}\left( {s^2\log^2 d}/{2^b} \right)$.
We study the problem of boosting the accuracy of a weak learner in the (distribution-independent) PAC model with Massart noise. In the Massart noise model, the label of each example $x$ is independently misclassified with probability $\eta(x) \leq \eta$, where $\eta<1/2$. The Massart model lies between the random classification noise model and the agnostic model. Our main positive result is the first computationally efficient boosting algorithm in the presence of Massart noise that achieves misclassification error arbitrarily close to $\eta$. Prior to our work, no non-trivial booster was known in this setting. Moreover, we show that this error upper bound is best possible for polynomial-time black-box boosters, under standard cryptographic assumptions. Our upper and lower bounds characterize the complexity of boosting in the distribution-independent PAC model with Massart noise. As a simple application of our positive result, we give the first efficient Massart learner for unions of high-dimensional rectangles.
Consider a (multiple-access) wireless communication system where users are connected to a unique base station over a shared-spectrum radio links. Each user has a fixed number $k$ of bits to send to the base station, and his signal gets attenuated by a random channel gain (quasi-static fading). In this paper we consider the many-user asymptotics of Chen-Chen-Guo'2017, where the number of users grows linearly with the blocklength. Differently, though, we adopt a per-user probability of error (PUPE) criterion (as opposed to classical joint-error probability criterion). Under PUPE the finite energy-per-bit communication is possible, and we are able to derive bounds on the tradeoff between energy and spectral efficiencies. We reconfirm the curious behaviour (previously observed for non-fading MAC) of the possibility of almost perfect multi-user interference (MUI) cancellation for user densities below a critical threshold. Further, we demonstrate the suboptimality of standard solutions such as orthogonalization (i.e., TDMA/FDMA) and treating interference as noise (i.e. pseudo-random CDMA without multi-user detection). Notably, the problem treated here can be seen as a variant of support recovery in compressed sensing for the unusual definition of sparsity with one non-zero entry per each contiguous section of $2^k$ coordinates. This identifies our problem with that of the sparse regression codes (SPARCs) and hence our results can be equivalently understood in the context of SPARCs with sections of length $2^{100}$. Finally, we discuss the relation of the almost perfect MUI cancellation property and the replica-method predictions.
We consider the exploration-exploitation trade-off in reinforcement learning and we show that an agent imbued with a risk-seeking utility function is able to explore efficiently, as measured by regret. The parameter that controls how risk-seeking the agent is can be optimized exactly, or annealed according to a schedule. We call the resulting algorithm K-learning and show that the corresponding K-values are optimistic for the expected Q-values at each state-action pair. The K-values induce a natural Boltzmann exploration policy for which the `temperature' parameter is equal to the risk-seeking parameter. This policy achieves an expected regret bound of $\tilde O(L^{3/2} \sqrt{S A T})$, where $L$ is the time horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the total number of elapsed time-steps. This bound is only a factor of $L$ larger than the established lower bound. K-learning can be interpreted as mirror descent in the policy space, and it is similar to other well-known methods in the literature, including Q-learning, soft-Q-learning, and maximum entropy policy gradient, and is closely related to optimism and count based exploration methods. K-learning is simple to implement, as it only requires adding a bonus to the reward at each state-action and then solving a Bellman equation. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.
We develop an approach to risk minimization and stochastic optimization that provides a convex surrogate for variance, allowing near-optimal and computationally efficient trading between approximation and estimation error. Our approach builds off of techniques for distributionally robust optimization and Owen's empirical likelihood, and we provide a number of finite-sample and asymptotic results characterizing the theoretical performance of the estimator. In particular, we show that our procedure comes with certificates of optimality, achieving (in some scenarios) faster rates of convergence than empirical risk minimization by virtue of automatically balancing bias and variance. We give corroborating empirical evidence showing that in practice, the estimator indeed trades between variance and absolute performance on a training sample, improving out-of-sample (test) performance over standard empirical risk minimization for a number of classification problems.