In this work, we tackle the problem of bandwidth estimation (BWE) for real-time communication systems through expert personalization. While expert heuristic-based methods have been widely adopted, tailoring these methods for each and every end user environment is cumbersome due to the level of domain expertise and manual effort required to adjust the carefully tuned heuristic parameters. Thus. we propose Merlin, a data-driven solution to BWE that harnesses expert demonstrations from prior heuristic-based methods to extract an expert BWE policy. The extracted policy can then be finetuned to end user network conditions to improve user quality of experience (QoE). In real-world videoconferencing calls, Merlin matches our expert's policy with no statistically significant movements in terms of objective QoE metrics. Additionally, we show that personalizing Merlin's control policy is possible through a small number of online data-driven parameter updates.
In this work, we show a fundamental limitation in vocabulary adaptation approaches that use Byte-Pair Encoding (BPE) tokenization scheme for fine-tuning pretrained language models (PLMs) to expert domains. Current approaches trivially append the target domain-specific vocabulary at the end of the PLM vocabulary. This approach leads to a lower priority score and causes sub-optimal tokenization in BPE that iteratively uses merge rules to tokenize a given text. To mitigate this issue, we propose AdaptBPE where the BPE tokenization initialization phase is modified to first perform the longest string matching on the added (target) vocabulary before tokenizing at the character level. We perform an extensive evaluation of AdaptBPE versus the standard BPE over various classification and summarization tasks; AdaptBPE improves by 3.57% (in terms of accuracy) and 1.87% (in terms of Rouge-L), respectively. AdaptBPE for MEDVOC works particularly well when reference summaries have high OOV concentration or are longer in length. We also conduct a human evaluation, revealing that AdaptBPE generates more relevant and more faithful summaries as compared to MEDVOC. We make our codebase publicly available at //github.com/gb-kgp/adaptbpe.
In this work, we tackle the task of learning 3D human Gaussians from a single image, focusing on recovering detailed appearance and geometry including unobserved regions. We introduce a single-view generalizable Human Gaussian Model (HGM), which employs a novel generate-then-refine pipeline with the guidance from human body prior and diffusion prior. Our approach uses a ControlNet to refine rendered back-view images from coarse predicted human Gaussians, then uses the refined image along with the input image to reconstruct refined human Gaussians. To mitigate the potential generation of unrealistic human poses and shapes, we incorporate human priors from the SMPL-X model as a dual branch, propagating image features from the SMPL-X volume to the image Gaussians using sparse convolution and attention mechanisms. Given that the initial SMPL-X estimation might be inaccurate, we gradually refine it with our HGM model. We validate our approach on several publicly available datasets. Our method surpasses previous methods in both novel view synthesis and surface reconstruction. Our approach also exhibits strong generalization for cross-dataset evaluation and in-the-wild images.
Bandit convex optimization (BCO) is a general framework for online decision making under uncertainty. While tight regret bounds for general convex losses have been established, existing algorithms achieving these bounds have prohibitive computational costs for high dimensional data. In this paper, we propose a simple and practical BCO algorithm inspired by the online Newton step algorithm. We show that our algorithm achieves optimal (in terms of horizon) regret bounds for a large class of convex functions that we call $\kappa$-convex. This class contains a wide range of practically relevant loss functions including linear, quadratic, and generalized linear models. In addition to optimal regret, this method is the most efficient known algorithm for several well-studied applications including bandit logistic regression. Furthermore, we investigate the adaptation of our second-order bandit algorithm to online convex optimization with memory. We show that for loss functions with a certain affine structure, the extended algorithm attains optimal regret. This leads to an algorithm with optimal regret for bandit LQR/LQG problems under a fully adversarial noise model, thereby resolving an open question posed in \citep{gradu2020non} and \citep{sun2023optimal}. Finally, we show that the more general problem of BCO with (non-affine) memory is harder. We derive a $\tilde{\Omega}(T^{2/3})$ regret lower bound, even under the assumption of smooth and quadratic losses.
We consider the problem of human-focused driver support. State-of-the-art personalization concepts allow to estimate parameters for vehicle control systems or driver models. However, there are currently few approaches proposed that use personalized models and evaluate the effectiveness in the form of general risk warning. In this paper, we therefore propose a warning system that estimates a personalized risk factor for the given driver based on the driver's behavior. The system afterwards is able to adapt the warning signal with personalized Risk Maps. In experiments, we show examples for longitudinal following and intersection scenarios in which the novel warning system can effectively reduce false negative errors and false positive errors compared to a baseline approach which does not use personalized driver considerations. This underlines the potential of personalization for reducing warning errors in risk warning and driver support.
This paper analyzes the outage probability of orthogonal time frequency space (OTFS) modulation under a lossy communication scenario. First of all, we introduce the channel model and the vector form representation of OTFS this paper uses. Then, we derive an exact expression of the OTFS outage probability in lossy communication scenarios, using Shannon's lossy source-channel separation theorem. Because the channel is time-varying, calculating the exact outage probability is computationally expensive. Therefore, this paper aims to derive a lower bound of the outage probability, which can relatively easily be calculated. Thus, given the distortion requirement and number of the resolvable paths, we can obtain a performance limit under the optimal condition as a reference. Finally, the experimental results of outage probability are obtained by Monte-Carlo method, and compared with the theoretical results calculated by the closed-from expression of the lower bound.
This letter introduces weighted sum power (WSP), a new performance metric for wireless resource allocation during cooperative spectrum sharing in cognitive radio networks, where the primary and secondary nodes have different priorities and quality of service (QoS) requirements. Compared to using energy efficiency (EE) and weighted sum energy efficiency (WSEE) as performance metrics and optimization objectives of wireless resource allocation towards green communication, the linear character of WSP can reduce the complexity of optimization problems. Meanwhile, the weights assigned to different nodes are beneficial for managing their power budget. Using WSP as the optimization objective, a suboptimal resource allocation scheme is proposed, leveraging linear programming and Newton's method. Simulations verify that the proposed scheme provides near-optimal performance with low computation time. Furthermore, the initial approximate value selection in Newton's method is also optimized to accelerate the proposed scheme.
Generalist robot manipulation policies (GMPs) have the potential to generalize across a wide range of tasks, devices, and environments. However, existing policies continue to struggle with out-of-distribution scenarios due to the inherent difficulty of collecting sufficient action data to cover extensively diverse domains. While fine-tuning offers a practical way to quickly adapt a GMPs to novel domains and tasks with limited samples, we observe that the performance of the resulting GMPs differs significantly with respect to the design choices of fine-tuning strategies. In this work, we first conduct an in-depth empirical study to investigate the effect of key factors in GMPs fine-tuning strategies, covering the action space, policy head, supervision signal and the choice of tunable parameters, where 2,500 rollouts are evaluated for a single configuration. We systematically discuss and summarize our findings and identify the key design choices, which we believe give a practical guideline for GMPs fine-tuning. We observe that in a low-data regime, with carefully chosen fine-tuning strategies, a GMPs significantly outperforms the state-of-the-art imitation learning algorithms. The results presented in this work establish a new baseline for future studies on fine-tuned GMPs, and provide a significant addition to the GMPs toolbox for the community.
Detecting carried objects is one of the requirements for developing systems to reason about activities involving people and objects. We present an approach to detect carried objects from a single video frame with a novel method that incorporates features from multiple scales. Initially, a foreground mask in a video frame is segmented into multi-scale superpixels. Then the human-like regions in the segmented area are identified by matching a set of extracted features from superpixels against learned features in a codebook. A carried object probability map is generated using the complement of the matching probabilities of superpixels to human-like regions and background information. A group of superpixels with high carried object probability and strong edge support is then merged to obtain the shape of the carried object. We applied our method to two challenging datasets, and results show that our method is competitive with or better than the state-of-the-art.
Recommender System (RS) is a hot area where artificial intelligence (AI) techniques can be effectively applied to improve performance. Since the well-known Netflix Challenge, collaborative filtering (CF) has become the most popular and effective recommendation method. Despite their success in CF, various AI techniques still have to face the data sparsity and cold start problems. Previous works tried to solve these two problems by utilizing auxiliary information, such as social connections among users and meta-data of items. However, they process different types of information separately, leading to information loss. In this work, we propose to utilize Heterogeneous Information Network (HIN), which is a natural and general representation of different types of data, to enhance CF-based recommending methods. HIN-based recommender systems face two problems: how to represent high-level semantics for recommendation and how to fuse the heterogeneous information to recommend. To address these problems, we propose to applying meta-graph to HIN-based RS and solve the information fusion problem with a "matrix factorization (MF) + factorization machine (FM)" framework. For the "MF" part, we obtain user-item similarity matrices from each meta-graph and adopt low-rank matrix approximation to get latent features for both users and items. For the "FM" part, we propose to apply FM with Group lasso (FMG) on the obtained features to simultaneously predict missing ratings and select useful meta-graphs. Experimental results on two large real-world datasets, i.e., Amazon and Yelp, show that our proposed approach is better than that of the state-of-the-art FM and other HIN-based recommending methods.
In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.