Communication over a classical multiple-access channel (MAC) with entanglement resources is considered, whereby two transmitters share entanglement resources a priori before communication begins. Leditzki et al. (2020) presented an example of a classical MAC, defined in terms of a pseudo telepathy game, such that the sum rate with entangled transmitters is strictly higher than the best achievable sum rate without such resources. Here, we derive a full characterization of the capacity region for the general MAC with entangled transmitters, and show that the previous result can be obtained as a special case. A single letter formula is established involving auxiliary variables and ancillas of finite dimensions. This, in turn, leads to a sufficient entanglement rate to achieve the rate region. Furthermore, it has long been known that the capacity region of the classical MAC under a message-average error criterion can be strictly larger than with a maximal error criterion (Dueck, 1978). We observe that given entanglement resources, the regions coincide.
Dataset Distillation is the task of synthesizing small datasets from large ones while still retaining comparable predictive accuracy to the original uncompressed dataset. Despite significant empirical progress in recent years, there is little understanding of the theoretical limitations/guarantees of dataset distillation, specifically, what excess risk is achieved by distillation compared to the original dataset, and how large are distilled datasets? In this work, we take a theoretical view on kernel ridge regression (KRR) based methods of dataset distillation such as Kernel Inducing Points. By transforming ridge regression in random Fourier features (RFF) space, we provide the first proof of the existence of small (size) distilled datasets and their corresponding excess risk for shift-invariant kernels. We prove that a small set of instances exists in the original input space such that its solution in the RFF space coincides with the solution of the original data. We further show that a KRR solution can be generated using this distilled set of instances which gives an approximation towards the KRR solution optimized on the full input data. The size of this set is linear in the dimension of the RFF space of the input set or alternatively near linear in the number of effective degrees of freedom, which is a function of the kernel, number of datapoints, and the regularization parameter $\lambda$. The error bound of this distilled set is also a function of $\lambda$. We verify our bounds analytically and empirically.
Symbiotic radio (SR) is a promising technology of spectrum- and energy-efficient wireless systems, for which the key idea is to use cognitive backscattering communication to achieve mutualistic spectrum and energy sharing with passive backscatter devices (BDs). In this paper, a reconfigurable intelligent surface (RIS) based SR system is considered, where the RIS is used not only to assist the primary active communication, but also for passive communication to transmit its own information. For the considered system, we investigate the EE trade-off between active and passive communications, by characterizing the EE region. To gain some insights, we first derive the maximum achievable individual EEs of the primary transmitter (PT) and RIS, respectively, and then analyze the asymptotic performance by exploiting the channel hardening effect. To characterize the non-trivial EE trade-off, we formulate an optimization problem to find the Pareto boundary of the EE region by jointly optimizing the transmit beamforming, power allocation and the passive beamforming of RIS. The formulated problem is non-convex, and an efficient algorithm is proposed by decomposing it into a series of subproblems by using alternating optimization (AO) and successive convex approximation (SCA) techniques. Finally, simulation results are presented to validate the effectiveness of the proposed algorithm.
Fault-aware retraining has emerged as a prominent technique for mitigating permanent faults in Deep Neural Network (DNN) hardware accelerators. However, retraining leads to huge overheads, specifically when used for fine-tuning large DNNs designed for solving complex problems. Moreover, as each fabricated chip can have a distinct fault pattern, fault-aware retraining is required to be performed for each chip individually considering its unique fault map, which further aggravates the problem. To reduce the overall retraining cost, in this work, we introduce the concept of resilience-driven retraining amount selection. To realize this concept, we propose a novel framework, Reduce, that, at first, computes the resilience of the given DNN to faults at different fault rates and with different amounts of retraining. Then, based on the resilience, it computes the amount of retraining required for each chip considering its unique fault map. We demonstrate the effectiveness of our methodology for a systolic array-based DNN accelerator experiencing permanent faults in the computational array.
Most existing studies on joint activity detection and channel estimation for grant-free massive random access (RA) systems assume perfect synchronization among all active users, which is hard to achieve in practice. Therefore, this paper considers asynchronous grant-free massive RA systems and develops novel algorithms for joint user activity detection, synchronization delay detection, and channel estimation. In particular, the framework of orthogonal approximate message passing (OAMP) is first utilized to deal with the non-independent and identically distributed (i.i.d.) pilot matrix in asynchronous grant-free massive RA systems, and an OAMP-based algorithm capable of leveraging the common sparsity among the received pilot signals from multiple base station antennas is developed. To reduce the computational complexity, a memory AMP (MAMP)based algorithm is further proposed that eliminates the matrix inversions in the OAMP-based algorithm. Simulation results demonstrate the effectiveness of the two proposed algorithms over the baseline methods. Besides, the MAMP-based algorithm reduces 37% of the computations while maintaining comparable detection/estimation accuracy, compared with the OAMP-based algorithm.
Suppose that we have $n$ agents and $n$ items which lie in a shared metric space. We would like to match the agents to items such that the total distance from agents to their matched items is as small as possible. However, instead of having direct access to distances in the metric, we only have each agent's ranking of the items in order of distance. Given this limited information, what is the minimum possible worst-case approximation ratio (known as the distortion) that a matching mechanism can guarantee? Previous work by Caragiannis et al. proved that the (deterministic) Serial Dictatorship mechanism has distortion at most $2^n - 1$. We improve this by providing a simple deterministic mechanism that has distortion $O(n^2)$. We also provide the first nontrivial lower bound on this problem, showing that any matching mechanism (deterministic or randomized) must have worst-case distortion $\Omega(\log n)$. In addition to these new bounds, we show that a large class of truthful mechanisms derived from Deferred Acceptance all have worst-case distortion at least $2^n - 1$, and we find an intriguing connection between thin matchings (analogous to the well-known thin trees conjecture) and the distortion gap between deterministic and randomized mechanisms.
While recent studies on semi-supervised learning have shown remarkable progress in leveraging both labeled and unlabeled data, most of them presume a basic setting of the model is randomly initialized. In this work, we consider semi-supervised learning and transfer learning jointly, leading to a more practical and competitive paradigm that can utilize both powerful pre-trained models from source domain as well as labeled/unlabeled data in the target domain. To better exploit the value of both pre-trained weights and unlabeled target examples, we introduce adaptive consistency regularization that consists of two complementary components: Adaptive Knowledge Consistency (AKC) on the examples between the source and target model, and Adaptive Representation Consistency (ARC) on the target model between labeled and unlabeled examples. Examples involved in the consistency regularization are adaptively selected according to their potential contributions to the target task. We conduct extensive experiments on several popular benchmarks including CUB-200-2011, MIT Indoor-67, MURA, by fine-tuning the ImageNet pre-trained ResNet-50 model. Results show that our proposed adaptive consistency regularization outperforms state-of-the-art semi-supervised learning techniques such as Pseudo Label, Mean Teacher, and MixMatch. Moreover, our algorithm is orthogonal to existing methods and thus able to gain additional improvements on top of MixMatch and FixMatch. Our code is available at //github.com/SHI-Labs/Semi-Supervised-Transfer-Learning.
The information bottleneck (IB) method is a technique for extracting information that is relevant for predicting the target random variable from the source random variable, which is typically implemented by optimizing the IB Lagrangian that balances the compression and prediction terms. However, the IB Lagrangian is hard to optimize, and multiple trials for tuning values of Lagrangian multiplier are required. Moreover, we show that the prediction performance strictly decreases as the compression gets stronger during optimizing the IB Lagrangian. In this paper, we implement the IB method from the perspective of supervised disentangling. Specifically, we introduce Disentangled Information Bottleneck (DisenIB) that is consistent on compressing source maximally without target prediction performance loss (maximum compression). Theoretical and experimental results demonstrate that our method is consistent on maximum compression, and performs well in terms of generalization, robustness to adversarial attack, out-of-distribution detection, and supervised disentangling.
This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.
Deep learning techniques have received much attention in the area of image denoising. However, there are substantial differences in the various types of deep learning methods dealing with image denoising. Specifically, discriminative learning based on deep learning can ably address the issue of Gaussian noise. Optimization models based on deep learning are effective in estimating the real noise. However, there has thus far been little related research to summarize the different deep learning techniques for image denoising. In this paper, we offer a comparative study of deep techniques in image denoising. We first classify the deep convolutional neural networks (CNNs) for additive white noisy images; the deep CNNs for real noisy images; the deep CNNs for blind denoising and the deep CNNs for hybrid noisy images, which represents the combination of noisy, blurred and low-resolution images. Then, we analyze the motivations and principles of the different types of deep learning methods. Next, we compare the state-of-the-art methods on public denoising datasets in terms of quantitative and qualitative analysis. Finally, we point out some potential challenges and directions of future research.
Federated learning is a new distributed machine learning framework, where a bunch of heterogeneous clients collaboratively train a model without sharing training data. In this work, we consider a practical and ubiquitous issue in federated learning: intermittent client availability, where the set of eligible clients may change during the training process. Such an intermittent client availability model would significantly deteriorate the performance of the classical Federated Averaging algorithm (FedAvg for short). We propose a simple distributed non-convex optimization algorithm, called Federated Latest Averaging (FedLaAvg for short), which leverages the latest gradients of all clients, even when the clients are not available, to jointly update the global model in each iteration. Our theoretical analysis shows that FedLaAvg attains the convergence rate of $O(1/(N^{1/4} T^{1/2}))$, achieving a sublinear speedup with respect to the total number of clients. We implement and evaluate FedLaAvg with the CIFAR-10 dataset. The evaluation results demonstrate that FedLaAvg indeed reaches a sublinear speedup and achieves 4.23% higher test accuracy than FedAvg.