Haptic feedback provides an essential sensory stimulus crucial for interaction and analyzing three-dimensional spatio-temporal phenomena on surface visualizations. Given its ability to provide enhanced spatial perception and scene maneuverability, virtual reality (VR) catalyzes haptic interactions on surface visualizations. Various interaction modes, encompassing both mid-air and on-surface interactions -- with or without the application of assisting force stimuli -- have been explored using haptic force feedback devices. In this paper, we evaluate the use of on-surface and assisted on-surface haptic modes of interaction compared to a no-haptic interaction mode. A force-based haptic stylus is used for all three modalities; the on-surface mode uses collision based forces, whereas the assisted on-surface mode is accompanied by an additional snapping force. We conducted a within-subjects user study involving fundamental interaction tasks performed on surface visualizations. Keeping a consistent visual design across all three modes, our study incorporates tasks that require the localization of the highest, lowest, and random points on surfaces; and tasks that focus on brushing curves on surfaces with varying complexity and occlusion levels. Our findings show that participants took almost the same time to brush curves using all the interaction modes. They could draw smoother curves using the on-surface interaction modes compared to the no-haptic mode. However, the assisted on-surface mode provided better accuracy than the on-surface mode. The on-surface mode was slower in point localization, but the accuracy depended on the visual cues and occlusions associated with the tasks. Finally, we discuss participant feedback on using haptic force feedback as a tangible input modality and share takeaways to aid the design of haptics-based tangible interactions for surface visualizations.
The paper introduces an asymptotically optimal lifelong sampling-based path planning algorithm that combines the merits of lifelong planning algorithms and lazy search algorithms for rapid replanning in dynamic environments where edge evaluation is expensive. By evaluating only sub-path candidates for the optimal solution, the algorithm saves considerable evaluation time and thereby reduces the overall planning cost. It employs a novel informed rewiring cascade to efficiently repair the search tree when the underlying search graph changes. Simulation results demonstrate that the algorithm outperforms various state-of-the-art sampling-based planners in addressing both static and dynamic motion planning problems.
Sequential recommendation systems predict the next interaction item based on users' past interactions, aligning recommendations with individual preferences. Leveraging the strengths of Large Language Models (LLMs) in knowledge comprehension and reasoning, recent approaches are eager to apply LLMs to sequential recommendation. A common paradigm is converting user behavior sequences into instruction data, and fine-tuning the LLM with parameter-efficient fine-tuning (PEFT) methods like Low-Rank Adaption (LoRA). However, the uniform application of LoRA across diverse user behaviors is insufficient to capture individual variability, resulting in negative transfer between disparate sequences. To address these challenges, we propose Instance-wise LoRA (iLoRA). We innovatively treat the sequential recommendation task as a form of multi-task learning, integrating LoRA with the Mixture of Experts (MoE) framework. This approach encourages different experts to capture various aspects of user behavior. Additionally, we introduce a sequence representation guided gate function that generates customized expert participation weights for each user sequence, which allows dynamic parameter adjustment for instance-wise recommendations. In sequential recommendation, iLoRA achieves an average relative improvement of 11.4\% over basic LoRA in the hit ratio metric, with less than a 1\% relative increase in trainable parameters. Extensive experiments on three benchmark datasets demonstrate the effectiveness of iLoRA, highlighting its superior performance compared to existing methods in mitigating negative transfer and improving recommendation accuracy. Our data and code are available at //github.com/AkaliKong/iLoRA.
As an important component of data exploration and integration, Column Type Annotation (CTA) aims to label columns of a table with one or more semantic types. With the recent development of Large Language Models (LLMs), researchers have started to explore the possibility of using LLMs for CTA, leveraging their strong zero-shot capabilities. In this paper, we build on this promising work and improve on LLM-based methods for CTA by showing how to use a Knowledge Graph (KG) to augment the context information provided to the LLM. Our approach, called RACOON, combines both pre-trained parametric and non-parametric knowledge during generation to improve LLMs' performance on CTA. Our experiments show that RACOON achieves up to a 0.21 micro F-1 improvement compared against vanilla LLM inference.
Many real-world applications of tabular data involve using historic events to predict properties of new ones, for example whether a credit card transaction is fraudulent or what rating a customer will assign a product on a retail platform. Existing approaches to event prediction include costly, brittle, and application-dependent techniques such as time-aware positional embeddings, learned row and field encodings, and oversampling methods for addressing class imbalance. Moreover, these approaches often assume specific use-cases, for example that we know the labels of all historic events or that we only predict a pre-specified label and not the data's features themselves. In this work, we propose a simple but flexible baseline using standard autoregressive LLM-style transformers with elementary positional embeddings and a causal language modeling objective. Our baseline outperforms existing approaches across popular datasets and can be employed for various use-cases. We demonstrate that the same model can predict labels, impute missing values, or model event sequences.
Remotely sensed imagery interpretation (RSII) faces the three major problems: (1) objective representation of spatial distribution patterns; (2) edge uncertainty problem caused by downsampling encoder and intrinsic edge noises (e.g., mixed pixel and edge occlusion etc.); and (3) false detection problem caused by geometric registration error in change detection. To solve the aforementioned problems, uncertainty-diffusion-model-based high-Frequency TransFormer network (UDHF2-Net) is the first to be proposed, whose superiorities are as follows: (1) a spatially-stationary-and-non-stationary high-frequency connection paradigm (SHCP) is proposed to enhance the interaction of spatially frequency-wise stationary and non-stationary features to yield high-fidelity edge extraction result. Inspired by HRFormer, SHCP proposes high-frequency-wise stream to replace high-resolution-wise stream in HRFormer through the whole encoder-decoder process with parallel frequency-wise high-to-low streams, so it improves the edge extraction accuracy by continuously remaining high-frequency information; (2) a mask-and-geo-knowledge-based uncertainty diffusion module (MUDM), which is a self-supervised learning strategy, is proposed to improve the edge accuracy of extraction and change detection by gradually removing the simulated spectrum noises based on geo-knowledge and the generated diffused spectrum noises; (3) a frequency-wise semi-pseudo-Siamese UDHF2-Net is the first to be proposed to balance accuracy and complexity for change detection. Besides the aforementioned spectrum noises in semantic segmentation, MUDM is also a self-supervised learning strategy to effectively reduce the edge false change detection from the generated imagery with geometric registration error.
Differential privacy upper-bounds the information leakage of machine learning models, yet providing meaningful privacy guarantees has proven to be challenging in practice. The private prediction setting where model outputs are privatized is being investigated as an alternate way to provide formal guarantees at prediction time. Most current private prediction algorithms, however, rely on global sensitivity for noise calibration, which often results in large amounts of noise being added to the predictions. Data-specific noise calibration, such as smooth sensitivity, could significantly reduce the amount of noise added, but were so far infeasible to compute exactly for modern machine learning models. In this work we provide a novel and practical approach based on convex relaxation and bound propagation to compute a provable upper-bound for the local and smooth sensitivity of a prediction. This bound allows us to reduce the magnitude of noise added or improve privacy accounting in the private prediction setting. We validate our framework on datasets from financial services, medical image classification, and natural language processing and across models and find our approach to reduce the noise added by up to order of magnitude.
Frequently updating Large Language Model (LLM)-based recommender systems to adapt to new user interests -- as done for traditional ones -- is impractical due to high training costs, even with acceleration methods. This work explores adapting to dynamic user interests without any model updates by leveraging In-Context Learning (ICL), which allows LLMs to learn new tasks from few-shot examples provided in the input. Using new-interest examples as the ICL few-shot examples, LLMs may learn real-time interest directly, avoiding the need for model updates. However, existing LLM-based recommenders often lose the in-context learning ability during recommendation tuning, while the original LLM's in-context learning lacks recommendation-specific focus. To address this, we propose RecICL, which customizes recommendation-specific in-context learning for real-time recommendations. RecICL organizes training examples in an in-context learning format, ensuring that in-context learning ability is preserved and aligned with the recommendation task during tuning. Extensive experiments demonstrate RecICL's effectiveness in delivering real-time recommendations without requiring model updates. Our code is available at //github.com/ym689/rec_icl.
In task-oriented dialogue systems, intent detection is crucial for interpreting user queries and providing appropriate responses. Existing research primarily addresses simple queries with a single intent, lacking effective systems for handling complex queries with multiple intents and extracting different intent spans. Additionally, there is a notable absence of multilingual, multi-intent datasets. This study addresses three critical tasks: extracting multiple intent spans from queries, detecting multiple intents, and developing a multi-lingual multi-label intent dataset. We introduce a novel multi-label multi-class intent detection dataset (MLMCID-dataset) curated from existing benchmark datasets. We also propose a pointer network-based architecture (MLMCID) to extract intent spans and detect multiple intents with coarse and fine-grained labels in the form of sextuplets. Comprehensive analysis demonstrates the superiority of our pointer network-based system over baseline approaches in terms of accuracy and F1-score across various datasets.
We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.
High spectral dimensionality and the shortage of annotations make hyperspectral image (HSI) classification a challenging problem. Recent studies suggest that convolutional neural networks can learn discriminative spatial features, which play a paramount role in HSI interpretation. However, most of these methods ignore the distinctive spectral-spatial characteristic of hyperspectral data. In addition, a large amount of unlabeled data remains an unexploited gold mine for efficient data use. Therefore, we proposed an integration of generative adversarial networks (GANs) and probabilistic graphical models for HSI classification. Specifically, we used a spectral-spatial generator and a discriminator to identify land cover categories of hyperspectral cubes. Moreover, to take advantage of a large amount of unlabeled data, we adopted a conditional random field to refine the preliminary classification results generated by GANs. Experimental results obtained using two commonly studied datasets demonstrate that the proposed framework achieved encouraging classification accuracy using a small number of data for training.