This paper investigates adaptive streaming codes over a three-node relayed network. In this setting, a source transmits a sequence of message packets through a relay under a delay constraint of $T$ time slots per packet. The source-to-relay and relay-to-destination links are unreliable and introduce a maximum of $N_1$ and $N_2$ packet erasures respectively. Recent work has proposed adaptive (time variant) and nonadaptive (time invariant) code constructions for this setting and has shown that adaptive codes can achieve higher rates. However, the adaptive construction deals with many possibilities, leading to an impractical code with very large block lengths. In this work, we propose a simplified adaptive code construction which greatly improves the practicality of the code, with only a small cost to the achievable rates. We analyze the construction in terms of the achievable rates and field size requirements, and perform numerical simulations over statistical channels to estimate packet loss probabilities.
We propose a new method for cloth digitalization. Deviating from existing methods which learn from data captured under relatively casual settings, we propose to learn from data captured in strictly tested measuring protocols, and find plausible physical parameters of the cloths. However, such data is currently absent, so we first propose a new dataset with accurate cloth measurements. Further, the data size is considerably smaller than the ones in current deep learning, due to the nature of the data capture process. To learn from small data, we propose a new Bayesian differentiable cloth model to estimate the complex material heterogeneity of real cloths. It can provide highly accurate digitalization from very limited data samples. Through exhaustive evaluation and comparison, we show our method is accurate in cloth digitalization, efficient in learning from limited data samples, and general in capturing material variations. Code and data are available //github.com/realcrane/Bayesian-Differentiable-Physics-for-Cloth-Digitalization
Transport layer data leaks metadata unintentionally -- such as who communicates with whom. While tools for strong transport layer privacy exist, they have adoption obstacles, including performance overheads incompatible with mobile devices. We posit that by changing the objective of metadata privacy for $\textit{all traffic}$, we can open up a new design space for pragmatic approaches to transport layer privacy. As a first step in this direction, we propose using techniques from information flow control and present a principled approach to constructing formal models of systems with metadata privacy for $\textit{some}$, deniable, traffic. We prove that deniable traffic achieves metadata privacy against strong adversaries -- this constitutes the first bridging of information flow control and anonymous communication to our knowledge. Additionally, we show that existing state-of-the-art protocols can be extended to support metadata privacy, by designing a novel protocol for $\textit{deniable instant messaging}$ (DenIM), which is a variant of the Signal protocol. To show the efficacy of our approach, we implement and evaluate a proof-of-concept instant messaging system running DenIM on top of unmodified Signal. We empirically show that the DenIM on Signal can maintain low-latency for unmodified Signal traffic without breaking existing features, while at the same time supporting deniable Signal traffic.
In the rapidly evolving landscape of 5G and beyond 5G (B5G) mobile cellular communications, efficient data compression and reconstruction strategies become paramount, especially in massive multiple-input multiple-output (MIMO) systems. A critical challenge in these systems is the capacity-limited fronthaul, particularly in the context of the Ethernet-based common public radio interface (eCPRI) connecting baseband units (BBUs) and remote radio units (RRUs). This capacity limitation hinders the effective handling of increased traffic and data flows. We propose a novel two-stage compression approach to address this bottleneck. The first stage employs sparse Tucker decomposition, targeting the weight tensor's low-rank components for compression. The second stage further compresses these components using complex givens decomposition and run-length encoding, substantially improving the compression ratio. Our approach specifically targets the Zero-Forcing (ZF) beamforming weights in BBUs. By reconstructing these weights in RRUs, we significantly alleviate the burden on eCPRI traffic, enabling a higher number of concurrent streams in the radio access network (RAN). Through comprehensive evaluations, we demonstrate the superior effectiveness of our method in Channel State Information (CSI) compression, paving the way for more efficient 5G/B5G fronthaul links.
Dense retrieval methods have demonstrated promising performance in multilingual information retrieval, where queries and documents can be in different languages. However, dense retrievers typically require a substantial amount of paired data, which poses even greater challenges in multilingual scenarios. This paper introduces UMR, an Unsupervised Multilingual dense Retriever trained without any paired data. Our approach leverages the sequence likelihood estimation capabilities of multilingual language models to acquire pseudo labels for training dense retrievers. We propose a two-stage framework which iteratively improves the performance of multilingual dense retrievers. Experimental results on two benchmark datasets show that UMR outperforms supervised baselines, showcasing the potential of training multilingual retrievers without paired data, thereby enhancing their practicality. Our source code, data, and models are publicly available at //github.com/MiuLab/UMR
Mobile networks have increased spectral efficiency through advanced multiplexing strategies that are coordinated by base stations (BS) in licensed spectrum. However, external interference on clients leads to significant performance degradation during dynamic (unlicensed) spectrum access (DSA). We introduce the notion of network tomography for DSA, whereby clients are transformed into spectrum sensors, whose joint access statistics are measured and used to account for interfering sources. Albeit promising, performing such tomography naively incurs an impractical overhead that scales exponentially with the multiplexing order of the strategies deployed -- which will only continue to grow with 5G/6G technologies. To this end, we propose a novel, scalable network tomography framework called NeTo-X that estimates joint client access statistics with just linear overhead, and forms a blue-print of the interference, thus enabling efficient DSA for future networks. NeTo-X's design incorporates intelligent algorithms that leverage multi-channel diversity and the spatial locality of interference impact on clients to accurately estimate the desired interference statistics from just pair-wise measurements of its clients. The merits of its framework are showcased in the context of resource management and jammer localization applications, where its performance significantly outperforms baseline approaches and closely approximates optimal performance at a scalable overhead.
The real-world data tends to be heavily imbalanced and severely skew the data-driven deep neural networks, which makes Long-Tailed Recognition (LTR) a massive challenging task. Existing LTR methods seldom train Vision Transformers (ViTs) with Long-Tailed (LT) data, while the off-the-shelf pretrain weight of ViTs always leads to unfair comparisons. In this paper, we systematically investigate the ViTs' performance in LTR and propose LiVT to train ViTs from scratch only with LT data. With the observation that ViTs suffer more severe LTR problems, we conduct Masked Generative Pretraining (MGP) to learn generalized features. With ample and solid evidence, we show that MGP is more robust than supervised manners. In addition, Binary Cross Entropy (BCE) loss, which shows conspicuous performance with ViTs, encounters predicaments in LTR. We further propose the balanced BCE to ameliorate it with strong theoretical groundings. Specially, we derive the unbiased extension of Sigmoid and compensate extra logit margins to deploy it. Our Bal-BCE contributes to the quick convergence of ViTs in just a few epochs. Extensive experiments demonstrate that with MGP and Bal-BCE, LiVT successfully trains ViTs well without any additional data and outperforms comparable state-of-the-art methods significantly, e.g., our ViT-B achieves 81.0% Top-1 accuracy in iNaturalist 2018 without bells and whistles. Code is available at //github.com/XuZhengzhuo/LiVT.
This paper presents a new approach for assembling graph neural networks based on framelet transforms. The latter provides a multi-scale representation for graph-structured data. With the framelet system, we can decompose the graph feature into low-pass and high-pass frequencies as extracted features for network training, which then defines a framelet-based graph convolution. The framelet decomposition naturally induces a graph pooling strategy by aggregating the graph feature into low-pass and high-pass spectra, which considers both the feature values and geometry of the graph data and conserves the total information. The graph neural networks with the proposed framelet convolution and pooling achieve state-of-the-art performance in many types of node and graph prediction tasks. Moreover, we propose shrinkage as a new activation for the framelet convolution, which thresholds the high-frequency information at different scales. Compared to ReLU, shrinkage in framelet convolution improves the graph neural network model in terms of denoising and signal compression: noises in both node and structure can be significantly reduced by accurately cutting off the high-pass coefficients from framelet decomposition, and the signal can be compressed to less than half its original size with the prediction performance well preserved.
Recent advances in maximizing mutual information (MI) between the source and target have demonstrated its effectiveness in text generation. However, previous works paid little attention to modeling the backward network of MI (i.e., dependency from the target to the source), which is crucial to the tightness of the variational information maximization lower bound. In this paper, we propose Adversarial Mutual Information (AMI): a text generation framework which is formed as a novel saddle point (min-max) optimization aiming to identify joint interactions between the source and target. Within this framework, the forward and backward networks are able to iteratively promote or demote each other's generated instances by comparing the real and synthetic data distributions. We also develop a latent noise sampling strategy that leverages random variations at the high-level semantic space to enhance the long term dependency in the generation process. Extensive experiments based on different text generation tasks demonstrate that the proposed AMI framework can significantly outperform several strong baselines, and we also show that AMI has potential to lead to a tighter lower bound of maximum mutual information for the variational information maximization problem.
This paper proposes a generic method to learn interpretable convolutional filters in a deep convolutional neural network (CNN) for object classification, where each interpretable filter encodes features of a specific object part. Our method does not require additional annotations of object parts or textures for supervision. Instead, we use the same training data as traditional CNNs. Our method automatically assigns each interpretable filter in a high conv-layer with an object part of a certain category during the learning process. Such explicit knowledge representations in conv-layers of CNN help people clarify the logic encoded in the CNN, i.e., answering what patterns the CNN extracts from an input image and uses for prediction. We have tested our method using different benchmark CNNs with various structures to demonstrate the broad applicability of our method. Experiments have shown that our interpretable filters are much more semantically meaningful than traditional filters.
We present SlowFast networks for video recognition. Our model involves (i) a Slow pathway, operating at low frame rate, to capture spatial semantics, and (ii) a Fast pathway, operating at high frame rate, to capture motion at fine temporal resolution. The Fast pathway can be made very lightweight by reducing its channel capacity, yet can learn useful temporal information for video recognition. Our models achieve strong performance for both action classification and detection in video, and large improvements are pin-pointed as contributions by our SlowFast concept. We report 79.0% accuracy on the Kinetics dataset without using any pre-training, largely surpassing the previous best results of this kind. On AVA action detection we achieve a new state-of-the-art of 28.3 mAP. Code will be made publicly available.