Maximum mean discrepancies (MMDs) like the kernel Stein discrepancy (KSD) have grown central to a wide range of applications, including hypothesis testing, sampler selection, distribution approximation, and variational inference. In each setting, these kernel-based discrepancy measures are required to (i) separate a target P from other probability measures or even (ii) control weak convergence to P. In this article we derive new sufficient and necessary conditions to ensure (i) and (ii). For MMDs on separable metric spaces, we characterize those kernels that separate Bochner embeddable measures and introduce simple conditions for separating all measures with unbounded kernels and for controlling convergence with bounded kernels. We use these results on $\mathbb{R}^d$ to substantially broaden the known conditions for KSD separation and convergence control and to develop the first KSDs known to exactly metrize weak convergence to P. Along the way, we highlight the implications of our results for hypothesis testing, measuring and improving sample quality, and sampling with Stein variational gradient descent.
We consider channel coding for discrete memoryless channels (DMCs) with a novel cost constraint that constrains both the mean and the variance of the cost of the codewords. We show that the maximum (asymptotically) achievable rate under the new cost formulation is equal to the capacity-cost function; in particular, the strong converse holds. We further characterize the optimal second-order coding rate of these cost-constrained codes; in particular, the optimal second-order coding rate is finite. We then show that the second-order coding performance is strictly improved with feedback using a new variation of timid/bold coding, significantly broadening the applicability of timid/bold coding schemes from unconstrained compound-dispersion channels to all cost-constrained channels. Equivalent results on the minimum average probability of error are also given.
Reconfigurable intelligent surfaces, with their large number of antennas, offer an interesting opportunity for high spatial-resolution imaging. In this paper, we propose a novel RIS-aided integrated imaging and communication system that can reduce the RIS beam training overhead for communication by leveraging the imaging of the surrounding environment. In particular, using the RIS as a wireless imaging device, our system constructs the scene depth map of the environment, including the mobile user. Then, we develop a user detection algorithm that subtracts the background and extracts the mobile user attributes from the depth map. These attributes are then utilized to design the RIS interaction vector and the beam selection strategy with low overhead. Simulation results show that the proposed approach can achieve comparable beamforming gain to the optimal/exhaustive beam selection solution while requiring 1000 times less beam training overhead.
Feature attributions are ubiquitous tools for understanding the predictions of machine learning models. However, popular methods for scoring input variables such as SHAP and LIME suffer from high instability due to random sampling. Leveraging ideas from multiple hypothesis testing, we devise attribution methods that correctly rank the most important features with high probability. Our algorithm RankSHAP guarantees that the $K$ highest Shapley values have the proper ordering with probability exceeding $1-\alpha$. Empirical results demonstrate its validity and impressive computational efficiency. We also build on previous work to yield similar results for LIME, ensuring the most important features are selected in the right order.
Numerical difference computation is one of the cores and indispensable in the modern digital era. Tao general difference (TGD) is a novel theory and approach to difference computation for discrete sequences and arrays in multidimensional space. Built on the solid theoretical foundation of the general difference in a finite interval, the TGD operators demonstrate exceptional signal processing capabilities in real-world applications. A novel smoothness property of a sequence is defined on the first- and second TGD. This property is used to denoise one-dimensional signals, where the noise is the non-smooth points in the sequence. Meanwhile, the center of the gradient in a finite interval can be accurately location via TGD calculation. This solves a traditional challenge in computer vision, which is the precise localization of image edges with noise robustness. Furthermore, the power of TGD operators extends to spatio-temporal edge detection in three-dimensional arrays, enabling the identification of kinetic edges in video data. These diverse applications highlight the properties of TGD in discrete domain and the significant promise of TGD for the computation across signal processing, image analysis, and video analytic.
Mobile edge computing (MEC) is powerful to alleviate the heavy computing tasks in integrated sensing and communication (ISAC) systems. In this paper, we investigate joint beamforming and offloading design in a three-tier integrated sensing, communication and computation (ISCC) framework comprising one cloud server, multiple mobile edge servers, and multiple terminals. While executing sensing tasks, the user terminals can optionally offload sensing data to either MEC server or cloud servers. To minimize the execution latency, we jointly optimize the transmit beamforming matrices and offloading decision variables under the constraint of sensing performance. An alternating optimization algorithm based on multidimensional fractional programming is proposed to tackle the non-convex problem. Simulation results demonstrates the superiority of the proposed mechanism in terms of convergence and task execution latency reduction, compared with the state-of-the-art two-tier ISCC framework.
Calibrating simulation models that take large quantities of multi-dimensional data as input is a hard simulation optimization problem. Existing adaptive sampling strategies offer a methodological solution. However, they may not sufficiently reduce the computational cost for estimation and solution algorithm's progress within a limited budget due to extreme noise levels and heteroskedasticity of system responses. We propose integrating stratification with adaptive sampling for the purpose of efficiency in optimization. Stratification can exploit local dependence in the simulation inputs and outputs. Yet, the state-of-the-art does not provide a full capability to adaptively stratify the data as different solution alternatives are evaluated. We devise two procedures for data-driven calibration problems that involve a large dataset with multiple covariates to calibrate models within a fixed overall simulation budget. The first approach dynamically stratifies the input data using binary trees, while the second approach uses closed-form solutions based on linearity assumptions between the objective function and concomitant variables. We find that dynamical adjustment of stratification structure accelerates optimization and reduces run-to-run variability in generated solutions. Our case study for calibrating a wind power simulation model, widely used in the wind industry, using the proposed stratified adaptive sampling, shows better-calibrated parameters under a limited budget.
In current text-based task-oriented dialogue (TOD) systems, user emotion detection (ED) is often overlooked or is typically treated as a separate and independent task, requiring additional training. In contrast, our work demonstrates that seamlessly unifying ED and TOD modeling brings about mutual benefits, and is therefore an alternative to be considered. Our method consists in augmenting SimpleToD, an end-to-end TOD system, by extending belief state tracking to include ED, relying on a single language model. We evaluate our approach using GPT-2 and Llama-2 on the EmoWOZ benchmark, a version of MultiWOZ annotated with emotions. Our results reveal a general increase in performance for ED and task results. Our findings also indicate that user emotions provide useful contextual conditioning for system responses, and can be leveraged to further refine responses in terms of empathy.
Graph Neural Networks (GNNs) have received considerable attention on graph-structured data learning for a wide variety of tasks. The well-designed propagation mechanism which has been demonstrated effective is the most fundamental part of GNNs. Although most of GNNs basically follow a message passing manner, litter effort has been made to discover and analyze their essential relations. In this paper, we establish a surprising connection between different propagation mechanisms with a unified optimization problem, showing that despite the proliferation of various GNNs, in fact, their proposed propagation mechanisms are the optimal solution optimizing a feature fitting function over a wide class of graph kernels with a graph regularization term. Our proposed unified optimization framework, summarizing the commonalities between several of the most representative GNNs, not only provides a macroscopic view on surveying the relations between different GNNs, but also further opens up new opportunities for flexibly designing new GNNs. With the proposed framework, we discover that existing works usually utilize naive graph convolutional kernels for feature fitting function, and we further develop two novel objective functions considering adjustable graph kernels showing low-pass or high-pass filtering capabilities respectively. Moreover, we provide the convergence proofs and expressive power comparisons for the proposed models. Extensive experiments on benchmark datasets clearly show that the proposed GNNs not only outperform the state-of-the-art methods but also have good ability to alleviate over-smoothing, and further verify the feasibility for designing GNNs with our unified optimization framework.
It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small- and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.
Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.