We study the problem of optimal sampling in an edge-based video analytics system (VAS), where sensor samples collected at a terminal device are offloaded to a back-end server that processes them and generates feedback for a user. Sampling the system with the maximum allowed frequency results in the timely detection of relevant events with minimum delay. However, it incurs high energy costs and causes unnecessary usage of network and compute resources via communication and processing of redundant samples. On the other hand, an infrequent sampling result in a higher delay in detecting the relevant event, thus increasing the idle energy usage and degrading the quality of experience in terms of responsiveness of the system. We quantify this sampling frequency trade-off as a weighted function between the number of samples and the responsiveness. We propose an energy-optimal aperiodic sampling policy that improves over the state-of-the-art optimal periodic sampling policy. Numerically, we show the proposed policy provides a consistent improvement of more than 10$\mathbf{\%}$ over the state-of-the-art.
We consider a multi-process remote estimation system observing $K$ independent Ornstein-Uhlenbeck processes. In this system, a shared sensor samples the $K$ processes in such a way that the long-term average sum mean square error (MSE) is minimized. The sensor operates under a total sampling frequency constraint $f_{\max}$. The samples from all processes consume random processing delays in a shared queue and then are transmitted over an erasure channel with probability $\epsilon$. We study two variants of the problem: first, when the samples are scheduled according to a Maximum-Age-First (MAF) policy, and the receiver provides an erasure status feedback; and second, when samples are scheduled according to a Round-Robin (RR) policy, when there is no erasure status feedback from the receiver. Aided by optimal structural results, we show that the optimal sampling policy for both settings, under some conditions, is a \emph{threshold policy}. We characterize the optimal threshold and the corresponding optimal long-term average sum MSE as a function of $K$, $f_{\max}$, $\epsilon$, and the statistical properties of the observed processes. Our results show that, with an exponentially distributed service rate, the optimal threshold $\tau^*$ increases as the number of processes $K$ increases, for both settings. Additionally, we show that the optimal threshold is an \emph{increasing} function of $\epsilon$ in the case of \emph{available} erasure status feedback, while it exhibits the \emph{opposite behavior}, i.e., $\tau^*$ is a \emph{decreasing} function of $\epsilon$, in the case of \emph{absent} erasure status feedback.
As Metaverse emerges as the next-generation Internet paradigm, the ability to efficiently generate content is paramount. AI-Generated Content (AIGC) offers a promising solution to this challenge. However, the training and deployment of large AI models necessitate significant resources. To address this issue, we introduce an AIGC-as-a-Service (AaaS) architecture, which deploys AIGC models in wireless edge networks, ensuring ubiquitous access to AIGC services for Metaverse users. Nonetheless, a key aspect of providing personalized user experiences requires the careful selection of AIGC service providers (ASPs) capable of effectively executing user tasks. This selection process is complicated by environmental uncertainty and variability, a challenge not yet addressed well in existing literature. Therefore, we first propose a diffusion model-based AI-generated optimal decision (AGOD) algorithm, which can generate the optimal ASP selection decisions. We then apply AGOD to deep reinforcement learning (DRL), resulting in the Deep Diffusion Soft Actor-Critic (D2SAC) algorithm, which achieves efficient and effective ASP selection. Our comprehensive experiments demonstrate that D2SAC outperforms seven leading DRL algorithms. Furthermore, the proposed AGOD algorithm has the potential for extension to various optimization problems in wireless networks, positioning it a promising approach for the future research on AIGC-driven services in Metaverse. The implementation of our proposed method is available at: //github.com/Lizonghang/AGOD.
In media industry, the demand of SDR-to-HDRTV up-conversion arises when users possess HDR-WCG (high dynamic range-wide color gamut) TVs while most off-the-shelf footage is still in SDR (standard dynamic range). The research community has started tackling this low-level vision task by learning-based approaches. When applied to real SDR, yet, current methods tend to produce dim and desaturated result, making nearly no improvement on viewing experience. Different from other network-oriented methods, we attribute such deficiency to training set (HDR-SDR pair). Consequently, we propose new HDRTV dataset (dubbed HDRTV4K) and new HDR-to-SDR degradation models. Then, it's used to train a luminance-segmented network (LSN) consisting of a global mapping trunk, and two Transformer branches on bright and dark luminance range. We also update assessment criteria by tailored metrics and subjective experiment. Finally, ablation studies are conducted to prove the effectiveness. Our work is available at: //github.com/AndreGuo/HDRTVDM.
Efficient integration of reconfigurable intelligent surfaces (RISs) into the current wireless network standard is not a trivial task due to the overhead generated by performing channel estimation (CE) and phase-shift optimization. In this paper, we propose a framework enabling the coexistence between orthogonal-frequency division multiplexing (OFDM) and RIS technologies. Instead of wasting communication symbols for the CE and optimization, the proposed framework exploits the localization information obtainable by RIS-aided communications to provide a robust allocation strategy for user multiplexing. The results demonstrate the effectiveness of the proposed approach with respect to CE-based transmission methods.
We introduce and compare computational techniques for sharp extreme event probability estimates in stochastic differential equations with small additive Gaussian noise. In particular, we focus on strategies that are scalable, i.e. their efficiency does not degrade upon spatial and temporal refinement. For that purpose, we extend algorithms based on the Laplace method for estimating the probability of an extreme event to infinite dimensions. The method estimates the limiting exponential scaling using a single realization of the random variable, the large deviation minimizer. Finding this minimizer amounts to solving an optimization problem governed by a differential equation. The probability estimate becomes sharp when it additionally includes prefactor information, which necessitates computing the determinant of a second derivative operator to evaluate a Gaussian integral around the minimizer. We present an approach in infinite dimensions based on Fredholm determinants, and develop numerical algorithms to compute these determinants efficiently for the high-dimensional systems that arise upon discretization. We also give an interpretation of this approach using Gaussian process covariances and transition tubes. An example model problem, for which we also provide an open-source python implementation, is used throughout the paper to illustrate all methods discussed. To study the performance of the methods, we consider examples of stochastic differential and stochastic partial differential equations, including the randomly forced incompressible three-dimensional Navier-Stokes equations.
Interactive image segmentation aims at obtaining a segmentation mask for an image using simple user annotations. During each round of interaction, the segmentation result from the previous round serves as feedback to guide the user's annotation and provides dense prior information for the segmentation model, effectively acting as a bridge between interactions. Existing methods overlook the importance of feedback or simply concatenate it with the original input, leading to underutilization of feedback and an increase in the number of required annotations. To address this, we propose an approach called Focused and Collaborative Feedback Integration (FCFI) to fully exploit the feedback for click-based interactive image segmentation. FCFI first focuses on a local area around the new click and corrects the feedback based on the similarities of high-level features. It then alternately and collaboratively updates the feedback and deep features to integrate the feedback into the features. The efficacy and efficiency of FCFI were validated on four benchmarks, namely GrabCut, Berkeley, SBD, and DAVIS. Experimental results show that FCFI achieved new state-of-the-art performance with less computational overhead than previous methods. The source code is available at //github.com/veizgyauzgyauz/FCFI.
Debiased recommendation with a randomized dataset has shown very promising results in mitigating the system-induced biases. However, it still lacks more theoretical insights or an ideal optimization objective function compared with the other more well studied route without a randomized dataset. To bridge this gap, we study the debiasing problem from a new perspective and propose to directly minimize the upper bound of an ideal objective function, which facilitates a better potential solution to the system-induced biases. Firstly, we formulate a new ideal optimization objective function with a randomized dataset. Secondly, according to the prior constraints that an adopted loss function may satisfy, we derive two different upper bounds of the objective function, i.e., a generalization error bound with the triangle inequality and a generalization error bound with the separability. Thirdly, we show that most existing related methods can be regarded as the insufficient optimization of these two upper bounds. Fourthly, we propose a novel method called debiasing approximate upper bound with a randomized dataset (DUB), which achieves a more sufficient optimization of these upper bounds. Finally, we conduct extensive experiments on a public dataset and a real product dataset to verify the effectiveness of our DUB.
Recent works have explored the use of weight sparsity to improve the training efficiency (test accuracy w.r.t training FLOPs) of deep neural networks (DNNs). These works aim to reduce training FLOPs but training with sparse weights often leads to accuracy loss or requires longer train schedules, making the resulting training efficiency less clear. In contrast, we focus on using sparsity to increase accuracy while using the same FLOPS as the dense model and show training efficiency gains through higher accuracy. In this work, we introduce SIFT, a family of Sparse Iso-FLOP Transformations which are used as drop-in replacements for dense layers to improve their representational capacity and FLOP efficiency. Each transformation is parameterized by a single parameter (sparsity level) and provides a larger search space to find optimal sparse masks. Without changing any training hyperparameters, replacing dense layers with SIFT leads to significant improvements across computer vision (CV) and natural language processing (NLP) tasks, including ResNet-18 on ImageNet (+3.5%) and GPT-3 Small on WikiText-103 (-0.4 PPL), both matching larger dense model variants with 2x or more FLOPs. To the best of our knowledge, this is the first work to demonstrate the use of sparsity for improving accuracy of dense models via a simple-to-use set of sparse transformations. Code is available at: //github.com/CerebrasResearch/SIFT.
Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute: self-supervised pretraining and high-resource machine translation. We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models converge in significantly fewer steps. Moreover, this acceleration in convergence typically outpaces the additional computational overhead of using larger models. Therefore, the most compute-efficient training strategy is to counterintuitively train extremely large models but stop after a small number of iterations. This leads to an apparent trade-off between the training efficiency of large Transformer models and the inference efficiency of small Transformer models. However, we show that large models are more robust to compression techniques such as quantization and pruning than small models. Consequently, one can get the best of both worlds: heavily compressed, large models achieve higher accuracy than lightly compressed, small models.
To address the sparsity and cold start problem of collaborative filtering, researchers usually make use of side information, such as social networks or item attributes, to improve recommendation performance. This paper considers the knowledge graph as the source of side information. To address the limitations of existing embedding-based and path-based methods for knowledge-graph-aware recommendation, we propose Ripple Network, an end-to-end framework that naturally incorporates the knowledge graph into recommender systems. Similar to actual ripples propagating on the surface of water, Ripple Network stimulates the propagation of user preferences over the set of knowledge entities by automatically and iteratively extending a user's potential interests along links in the knowledge graph. The multiple "ripples" activated by a user's historically clicked items are thus superposed to form the preference distribution of the user with respect to a candidate item, which could be used for predicting the final clicking probability. Through extensive experiments on real-world datasets, we demonstrate that Ripple Network achieves substantial gains in a variety of scenarios, including movie, book and news recommendation, over several state-of-the-art baselines.