Motivated by the wide range of modern applications of the Erlang-B blocking model beyond communication networks and call centers to sizing and pricing in design production systems, messaging systems, and app-based parking systems, we study admission control for such a system but with unknown arrival and service rates. In our model, at every job arrival, a dispatcher decides to assign the job to an available server or block it. Every served job yields a fixed reward for the dispatcher, but it also results in a cost per unit time of service. Our goal is to design a dispatching policy that maximizes the long-term average reward for the dispatcher based on observing only the arrival times and the state of the system at each arrival that reflects a realistic sampling of such systems. Critically, the dispatcher observes neither the service times nor departure times so that standard reinforcement learning-based approaches that use reward signals do not apply. Hence, we develop our learning-based dispatch scheme as a parametric learning problem a'la self-tuning adaptive control. In our problem, certainty equivalent control switches between an always admit if room policy (explore infinitely often) and a never admit policy (immediately terminate learning), which is distinct from the adaptive control literature. Hence, our learning scheme judiciously uses the always admit if room policy so that learning doesn't stall. We prove that for all service rates, the proposed policy asymptotically learns to take the optimal action and present finite-time regret guarantees. The extreme contrast in the certainty equivalent optimal control policies leads to difficulties in learning that show up in our regret bounds for different parameter regimes: constant regret in one regime versus regret growing logarithmically in the other.
We present LoD-NeuS, an efficient neural representation for high-frequency geometry detail recovery and anti-aliased novel view rendering. Drawing inspiration from voxel-based representations with the level of detail (LoD), we introduce a multi-scale tri-plane-based scene representation that is capable of capturing the LoD of the signed distance function (SDF) and the space radiance. Our representation aggregates space features from a multi-convolved featurization within a conical frustum along a ray and optimizes the LoD feature volume through differentiable rendering. Additionally, we propose an error-guided sampling strategy to guide the growth of the SDF during the optimization. Both qualitative and quantitative evaluations demonstrate that our method achieves superior surface reconstruction and photorealistic view synthesis compared to state-of-the-art approaches.
Extracting semantic representations from mobile user interfaces (UI) and using the representations for designers' decision-making processes have shown the potential to be effective computational design support tools. Current approaches rely on machine learning models trained on small-sized mobile UI datasets to extract semantic vectors and use screenshot-to-screenshot comparison to retrieve similar-looking UIs given query screenshots. However, the usability of these methods is limited because they are often not open-sourced and have complex training pipelines for practitioners to follow, and are unable to perform screenshot set-to-set (i.e., app-to-app) retrieval. To this end, we (1) employ visual models trained with large web-scale images and test whether they could extract a UI representation in a zero-shot way and outperform existing specialized models, and (2) use mathematically founded methods to enable app-to-app retrieval and design consistency analysis. Our experiments show that our methods not only improve upon previous retrieval models but also enable multiple new applications.
Large language models (LLMs) pre-trained on vast internet-scale data have showcased remarkable capabilities across diverse domains. Recently, there has been escalating interest in deploying LLMs for robotics, aiming to harness the power of foundation models in real-world settings. However, this approach faces significant challenges, particularly in grounding these models in the physical world and in generating dynamic robot motions. To address these issues, we introduce a novel paradigm in which we use few-shot prompts collected from the physical environment, enabling the LLM to autoregressively generate low-level control commands for robots without task-specific fine-tuning. Experiments across various robots and environments validate that our method can effectively prompt a robot to walk. We thus illustrate how LLMs can proficiently function as low-level feedback controllers for dynamic motion control even in high-dimensional robotic systems. The project website and source code can be found at: //prompt2walk.github.io/ .
In 5G cellular networks, frequency range 2 (FR2) introduces higher frequencies that cause rapid signal degradation and challenge user mobility. In recent studies, a conditional handover procedure has been adopted as an enhancement to baseline handover to enhance user mobility robustness. In this article, the mobility performance of conditional handover is analyzed for a 5G mm-wave network in FR2 that employs beamforming. In addition, a resource-efficient random access procedure is proposed that increases the probability of contention-free random access during a handover. Moreover, a simple yet effective decision tree-based supervised learning method is proposed to minimize the handover failures that are caused by the beam preparation phase of the random access procedure. Results have shown that a tradeoff exists between contention-free random access and handover failures. It is also seen that the optimum operation point of random access is achievable with the proposed learning algorithm for conditional handover. Moreover, a mobility performance comparison of conditional handover with baseline handover is also carried out. Results have shown that while baseline handover causes fewer handover failures than conditional handover, the total number of mobility failures in the latter is less due to the decoupling of the handover preparation and execution phases.
Desensitization addresses safe optimal planning under parametric uncertainties by providing sensitivity function-based risk measures. This paper expands upon the existing work on desensitization to address safe planning for a class of two-player differential games. In the proposed game, parametric uncertainties correspond to variations in a vector of model parameters about its nominal value. The two players in the proposed formulation are assumed to have information about the nominal value of the parameter vector. However, only one of the players is assumed to have complete knowledge of parametric variation, creating a form of information asymmetry in the proposed game. The lack of knowledge regarding the parametric variations is expected to result in state constraint violations for the player with an information disadvantage. In this regard, a desensitized feedback strategy that provides safe trajectories is proposed for the player with incomplete information. The proposed feedback strategy is evaluated in instances involving one pursuer and one evader with an uncertain dynamic obstacle, where the pursuer is assumed to know only the nominal value of the obstacle's speed. At the same time, the evader knows the obstacle's true speed, and also the fact that the pursuer possesses only the nominal value. Subsequently, deceptive strategies are proposed for the evader, who has an information advantage, and these strategies are assessed against the pursuer's desensitized strategy.
We derive minimax adaptive rates for a new, broad class of Tikhonov-regularized learning problems in Hilbert scales under general source conditions. Our analysis does not require the regression function to be contained in the hypothesis class, and most notably does not employ the conventional \textit{a priori} assumptions on kernel eigendecay. Using the theory of interpolation, we demonstrate that the spectrum of the Mercer operator can be inferred in the presence of ``tight'' $L^{\infty}(\mathcal{X})$ embeddings of suitable Hilbert scales. Our analysis utilizes a new Fourier isocapacitary condition, which captures the interplay of the kernel Dirichlet capacities and small ball probabilities via the optimal Hilbert scale function.
While standard speaker diarization attempts to answer the question "who spoken when", most of relevant applications in reality are more interested in determining "who spoken what". Whether it is the conventional modularized approach or the more recent end-to-end neural diarization (EEND), an additional automatic speech recognition (ASR) model and an orchestration algorithm are required to associate the speaker labels with recognized words. In this paper, we propose Word-level End-to-End Neural Diarization (WEEND) with auxiliary network, a multi-task learning algorithm that performs end-to-end ASR and speaker diarization in the same neural architecture. That is, while speech is being recognized, speaker labels are predicted simultaneously for each recognized word. Experimental results demonstrate that WEEND outperforms the turn-based diarization baseline system on all 2-speaker short-form scenarios and has the capability to generalize to audio lengths of 5 minutes. Although 3+speaker conversations are harder, we find that with enough in-domain training data, WEEND has the potential to deliver high quality diarized text.
We introduce a distinctive real-time, causal, neural network-based active speaker detection system optimized for low-power edge computing. This system drives a virtual cinematography module and is deployed on a commercial device. The system uses data originating from a microphone array and a 360-degree camera. Our network requires only 127 MFLOPs per participant, for a meeting with 14 participants. Unlike previous work, we examine the error rate of our network when the computational budget is exhausted, and find that it exhibits graceful degradation, allowing the system to operate reasonably well even in this case. Departing from conventional DOA estimation approaches, our network learns to query the available acoustic data, considering the detected head locations. We train and evaluate our algorithm on a realistic meetings dataset featuring up to 14 participants in the same meeting, overlapped speech, and other challenging scenarios.
Quantum density matrix represents all the information of the entire quantum system, and novel models of meaning employing density matrices naturally model linguistic phenomena such as hyponymy and linguistic ambiguity, among others in quantum question answering tasks. Naturally, we argue that applying the quantum density matrix into classical Question Answering (QA) tasks can show more effective performance. Specifically, we (i) design a new mechanism based on Long Short-Term Memory (LSTM) to accommodate the case when the inputs are matrixes; (ii) apply the new mechanism to QA problems with Convolutional Neural Network (CNN) and gain the LSTM-based QA model with the quantum density matrix. Experiments of our new model on TREC-QA and WIKI-QA data sets show encouraging results. Similarly, we argue that the quantum density matrix can also enhance the image feature information and the relationship between the features for the classical image classification. Thus, we (i) combine density matrices and CNN to design a new mechanism; (ii) apply the new mechanism to some representative classical image classification tasks. A series of experiments show that the application of quantum density matrix in image classification has the generalization and high efficiency on different datasets. The application of quantum density matrix both in classical question answering tasks and classical image classification tasks show more effective performance.
Loop closing and relocalization are crucial techniques to establish reliable and robust long-term SLAM by addressing pose estimation drift and degeneration. This article begins by formulating loop closing and relocalization within a unified framework. Then, we propose a novel multi-head network LCR-Net to tackle both tasks effectively. It exploits novel feature extraction and pose-aware attention mechanism to precisely estimate similarities and 6-DoF poses between pairs of LiDAR scans. In the end, we integrate our LCR-Net into a SLAM system and achieve robust and accurate online LiDAR SLAM in outdoor driving environments. We thoroughly evaluate our LCR-Net through three setups derived from loop closing and relocalization, including candidate retrieval, closed-loop point cloud registration, and continuous relocalization using multiple datasets. The results demonstrate that LCR-Net excels in all three tasks, surpassing the state-of-the-art methods and exhibiting a remarkable generalization ability. Notably, our LCR-Net outperforms baseline methods without using a time-consuming robust pose estimator, rendering it suitable for online SLAM applications. To our best knowledge, the integration of LCR-Net yields the first LiDAR SLAM with the capability of deep loop closing and relocalization. The implementation of our methods will be made open-source.