We consider the paradigm of unsupervised anomaly detection, which involves the identification of anomalies within a dataset in the absence of labeled examples. Though distance-based methods are top-performing for unsupervised anomaly detection, they suffer heavily from the sensitivity to the choice of the number of the nearest neighbors. In this paper, we propose a new distance-based algorithm called bagged regularized $k$-distances for anomaly detection (BRDAD) converting the unsupervised anomaly detection problem into a convex optimization problem. Our BRDAD algorithm selects the weights by minimizing the surrogate risk, i.e., the finite sample bound of the empirical risk of the bagged weighted $k$-distances for density estimation (BWDDE). This approach enables us to successfully address the sensitivity challenge of the hyperparameter choice in distance-based algorithms. Moreover, when dealing with large-scale datasets, the efficiency issues can be addressed by the incorporated bagging technique in our BRDAD algorithm. On the theoretical side, we establish fast convergence rates of the AUC regret of our algorithm and demonstrate that the bagging technique significantly reduces the computational complexity. On the practical side, we conduct numerical experiments on anomaly detection benchmarks to illustrate the insensitivity of parameter selection of our algorithm compared with other state-of-the-art distance-based methods. Moreover, promising improvements are brought by applying the bagging technique in our algorithm on real-world datasets.
The pathfinding problem, which aims to identify a collision-free path between two points, is crucial for many applications, such as robot navigation and autonomous driving. Classic methods, such as A$^*$ search, perform well on small-scale maps but face difficulties scaling up. Conversely, data-driven approaches can improve pathfinding efficiency but require extensive data labeling and lack theoretical guarantees, making it challenging for practical applications. To combine the strengths of the two methods, we utilize the imperative learning (IL) strategy and propose a novel self-supervised pathfinding framework, termed imperative learning-based A$^*$ (iA$^*$). Specifically, iA$^*$ is a bilevel optimization process where the lower-level optimization is dedicated to finding the optimal path by a differentiable A$^*$ search module, and the upper-level optimization narrows down the search space to improve efficiency via setting suitable initial values from a data-driven model. Besides, the model within the upper-level optimization is a fully convolutional network, trained by the calculated loss in the lower-level optimization. Thus, the framework avoids extensive data labeling and can be applied in diverse environments. Our comprehensive experiments demonstrate that iA$^*$ surpasses both classical and data-driven methods in pathfinding efficiency and shows superior robustness among different tasks, validated with public datasets and simulation environments.
Block orthogonal sparse superposition (BOSS) code is a class of joint coded modulation methods, which can closely achieve the finite-blocklength capacity with a low-complexity decoder at a few coding rates under Gaussian channels. However, for fading channels, the code performance degrades considerably because coded symbols experience different channel fading effects. In this paper, we put forth novel joint demodulation and decoding methods for BOSS codes under fading channels. For a fast fading channel, we present a minimum mean square error approximate maximum a posteriori (MMSE-A-MAP) algorithm for the joint demodulation and decoding when channel state information is available at the receiver (CSIR). We also propose a joint demodulation and decoding method without using CSIR for a block fading channel scenario. We refer to this as the non-coherent sphere decoding (NSD) algorithm. Simulation results demonstrate that BOSS codes with MMSE-A-MAP decoding outperform CRC-aided polar codes, while NSD decoding achieves comparable performance to quasi-maximum likelihood decoding with significantly reduced complexity. Both decoding algorithms are suitable for parallelization, satisfying low-latency constraints. Additionally, real-time simulations on a software-defined radio testbed validate the feasibility of using BOSS codes for low-power transmission.
This paper proposes to develop a new variant of the two-time-scale stochastic approximation to find the roots of two coupled nonlinear operators, assuming only noisy samples of these operators can be observed. Our key idea is to leverage the classic Ruppert-Polyak averaging technique to dynamically estimate the operators through their samples. The estimated values of these averaging steps will then be used in the two-time-scale stochastic approximation updates to find the desired solution. Our main theoretical result is to show that under the strongly monotone condition of the underlying nonlinear operators the mean-squared errors of the iterates generated by the proposed method converge to zero at an optimal rate $O(1/k)$, where $k$ is the number of iterations. Our result significantly improves the existing result of two-time-scale stochastic approximation, where the best known finite-time convergence rate is $O(1/k^{2/3})$. We illustrate this result by applying the proposed method to develop new reinforcement learning algorithms with improved performance.
We prove that the single-site Glauber dynamics for sampling proper $q$-colorings mixes in $O_\Delta(n\log n)$ time on line graphs with $n$ vertices and maximum degree $\Delta$ when $q>(1+o(1))\Delta$. The main tool in our proof is the matrix trickle-down theorem developed by Abdolazimi, Liu and Oveis Gharan (FOCS, 2021).
Shared gaze visualizations have been found to enhance collaboration and communication outcomes in diverse HCI scenarios including computer supported collaborative work and learning contexts. Given the importance of gaze in surgery operations, especially when a surgeon trainer and trainee need to coordinate their actions, research on the use of gaze to facilitate intra-operative coordination and instruction has been limited and shows mixed implications. We performed a field observation of 8 surgeries and an interview study with 14 surgeons to understand their visual needs during operations, informing ways to leverage and augment gaze to enhance intra-operative coordination and instruction. We found that trainees have varying needs in receiving visual guidance which are often unfulfilled by the trainers' instructions. It is critical for surgeons to control the timing of the gaze-based visualizations and effectively interpret gaze data. We suggest overlay technologies, e.g., gaze-based summaries and depth sensing, to augment raw gaze in support of surgical coordination and instruction.
In this work, maximal $\alpha$-leakage is introduced to quantify how much a quantum adversary can learn about any sensitive information of data upon observing its disturbed version via a quantum privacy mechanism. We first show that an adversary's maximal expected $\alpha$-gain using optimal measurement is characterized by measured conditional R\'enyi entropy. This can be viewed as a parametric generalization of K\"onig et al.'s famous guessing probability formula [IEEE Trans. Inf. Theory, 55(9), 2009]. Then, we prove that the $\alpha$-leakage and maximal $\alpha$-leakage for a quantum privacy mechanism are determined by measured Arimoto information and measured R\'enyi capacity, respectively. Various properties of maximal $\alpha$-leakage, such as data processing inequality and composition property are established as well. Moreover, we show that regularized $\alpha$-leakage and regularized maximal $\alpha$-leakage for identical and independent quantum privacy mechanisms coincide with $\alpha$-tilted sandwiched R\'enyi information and sandwiched R\'enyi capacity, respectively.
Machine Unlearning, the process of selectively eliminating the influence of certain data examples used during a model's training, has gained significant attention as a means for practitioners to comply with recent data protection regulations. However, existing unlearning methods face critical drawbacks, including their prohibitively high cost, often associated with a large number of hyperparameters, and the limitation of forgetting only relatively small data portions. This often makes retraining the model from scratch a quicker and more effective solution. In this study, we introduce Gradient-based and Task-Agnostic machine Unlearning ($\nabla \tau$), an optimization framework designed to remove the influence of a subset of training data efficiently. It applies adaptive gradient ascent to the data to be forgotten while using standard gradient descent for the remaining data. $\nabla \tau$ offers multiple benefits over existing approaches. It enables the unlearning of large sections of the training dataset (up to 30%). It is versatile, supporting various unlearning tasks (such as subset forgetting or class removal) and applicable across different domains (images, text, etc.). Importantly, $\nabla \tau$ requires no hyperparameter adjustments, making it a more appealing option than retraining the model from scratch. We evaluate our framework's effectiveness using a set of well-established Membership Inference Attack metrics, demonstrating up to 10% enhancements in performance compared to state-of-the-art methods without compromising the original model's accuracy.
Generative commonsense reasoning which aims to empower machines to generate sentences with the capacity of reasoning over a set of concepts is a critical bottleneck for text generation. Even the state-of-the-art pre-trained language generation models struggle at this task and often produce implausible and anomalous sentences. One reason is that they rarely consider incorporating the knowledge graph which can provide rich relational information among the commonsense concepts. To promote the ability of commonsense reasoning for text generation, we propose a novel knowledge graph augmented pre-trained language generation model KG-BART, which encompasses the complex relations of concepts through the knowledge graph and produces more logical and natural sentences as output. Moreover, KG-BART can leverage the graph attention to aggregate the rich concept semantics that enhances the model generalization on unseen concept sets. Experiments on benchmark CommonGen dataset verify the effectiveness of our proposed approach by comparing with several strong pre-trained language generation models, particularly KG-BART outperforms BART by 5.80, 4.60, in terms of BLEU-3, 4. Moreover, we also show that the generated context by our model can work as background scenarios to benefit downstream commonsense QA tasks.
Most existing knowledge graphs suffer from incompleteness, which can be alleviated by inferring missing links based on known facts. One popular way to accomplish this is to generate low-dimensional embeddings of entities and relations, and use these to make inferences. ConvE, a recently proposed approach, applies convolutional filters on 2D reshapings of entity and relation embeddings in order to capture rich interactions between their components. However, the number of interactions that ConvE can capture is limited. In this paper, we analyze how increasing the number of these interactions affects link prediction performance, and utilize our observations to propose InteractE. InteractE is based on three key ideas -- feature permutation, a novel feature reshaping, and circular convolution. Through extensive experiments, we find that InteractE outperforms state-of-the-art convolutional link prediction baselines on FB15k-237. Further, InteractE achieves an MRR score that is 9%, 7.5%, and 23% better than ConvE on the FB15k-237, WN18RR and YAGO3-10 datasets respectively. The results validate our central hypothesis -- that increasing feature interaction is beneficial to link prediction performance. We make the source code of InteractE available to encourage reproducible research.
Most existing works in visual question answering (VQA) are dedicated to improving the accuracy of predicted answers, while disregarding the explanations. We argue that the explanation for an answer is of the same or even more importance compared with the answer itself, since it makes the question and answering process more understandable and traceable. To this end, we propose a new task of VQA-E (VQA with Explanation), where the computational models are required to generate an explanation with the predicted answer. We first construct a new dataset, and then frame the VQA-E problem in a multi-task learning architecture. Our VQA-E dataset is automatically derived from the VQA v2 dataset by intelligently exploiting the available captions. We have conducted a user study to validate the quality of explanations synthesized by our method. We quantitatively show that the additional supervision from explanations can not only produce insightful textual sentences to justify the answers, but also improve the performance of answer prediction. Our model outperforms the state-of-the-art methods by a clear margin on the VQA v2 dataset.