Social norms fundamentally shape interpersonal communication. We present NormDial, a high-quality dyadic dialogue dataset with turn-by-turn annotations of social norm adherences and violations for Chinese and American cultures. Introducing the task of social norm observance detection, our dataset is synthetically generated in both Chinese and English using a human-in-the-loop pipeline by prompting large language models with a small collection of expert-annotated social norms. We show that our generated dialogues are of high quality through human evaluation and further evaluate the performance of existing large language models on this task. Our findings point towards new directions for understanding the nuances of social norms as they manifest in conversational contexts that span across languages and cultures.
Safety is the primary priority of autonomous driving. Nevertheless, no published dataset currently supports the direct and explainable safety evaluation for autonomous driving. In this work, we propose DeepAccident, a large-scale dataset generated via a realistic simulator containing diverse accident scenarios that frequently occur in real-world driving. The proposed DeepAccident dataset includes 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset with 40k annotated samples. In addition, we propose a new task, end-to-end motion and accident prediction, which can be used to directly evaluate the accident prediction ability for different autonomous driving algorithms. Furthermore, for each scenario, we set four vehicles along with one infrastructure to record data, thus providing diverse viewpoints for accident scenarios and enabling V2X (vehicle-to-everything) research on perception and prediction tasks. Finally, we present a baseline V2X model named V2XFormer that demonstrates superior performance for motion and accident prediction and 3D object detection compared to the single-vehicle model.
This research presents ADOD, a novel approach to address domain generalization in underwater object detection. Our method enhances the model's ability to generalize across diverse and unseen domains, ensuring robustness in various underwater environments. The first key contribution is Residual Attention YOLOv3, a novel variant of the YOLOv3 framework empowered by residual attention modules. These modules enable the model to focus on informative features while suppressing background noise, leading to improved detection accuracy and adaptability to different domains. The second contribution is the attention-based domain classification module, vital during training. This module helps the model identify domain-specific information, facilitating the learning of domain-invariant features. Consequently, ADOD can generalize effectively to underwater environments with distinct visual characteristics. Extensive experiments on diverse underwater datasets demonstrate ADOD's superior performance compared to state-of-the-art domain generalization methods, particularly in challenging scenarios. The proposed model achieves exceptional detection performance in both seen and unseen domains, showcasing its effectiveness in handling domain shifts in underwater object detection tasks. ADOD represents a significant advancement in adaptive object detection, providing a promising solution for real-world applications in underwater environments. With the prevalence of domain shifts in such settings, the model's strong generalization ability becomes a valuable asset for practical underwater surveillance and marine research endeavors.
While the general machine learning (ML) community has benefited from public datasets, tasks, and models, the progress of ML in healthcare has been hampered by a lack of such shared assets. The success of foundation models creates new challenges for healthcare ML by requiring access to shared pretrained models to validate performance benefits. We help address these challenges through three contributions. First, we publish a new dataset, EHRSHOT, which contains deidentified structured data from the electronic health records (EHRs) of 6,739 patients from Stanford Medicine. Unlike MIMIC-III/IV and other popular EHR datasets, EHRSHOT is longitudinal and not restricted to ICU/ED patients. Second, we publish the weights of CLMBR-T-base, a 141M parameter clinical foundation model pretrained on the structured EHR data of 2.57M patients. We are one of the first to fully release such a model for coded EHR data; in contrast, most prior models released for clinical data (e.g. GatorTron, ClinicalBERT) only work with unstructured text and cannot process the rich, structured data within an EHR. We provide an end-to-end pipeline for the community to validate and build upon its performance. Third, we define 15 few-shot clinical prediction tasks, enabling evaluation of foundation models on benefits such as sample efficiency and task adaptation. Our model and dataset are available via a research data use agreement from our website: //ehrshot.stanford.edu. Code to reproduce our results are available at our Github repo: //github.com/som-shahlab/ehrshot-benchmark
Robotic Process Automation (RPA) has gained widespread adoption in corporate organizations, streamlining work processes while also introducing additional maintenance tasks. Effective governance of RPA can be achieved through the reusability of RPA components. However, refactoring RPA processes poses challenges when dealing with larger development teams, outsourcing, and staff turnover. This research aims to explore the possibility of identifying similarities in RPA processes for refactoring. To address this issue, we have developed Similarity Discovering Techniques for RPA (SiDiTeR). SiDiTeR utilizes source code or process logs from RPAautomations to search for similar or identical parts within RPA processes. The techniques introduced are specifically tailored to the RPA domain. We have expanded the potential matches by introducing a dictionary feature which helps identify different activities that produce the same output, and this has led to improved results in the RPA domain. Through our analysis, we have discovered 655 matches across 156 processes, with the longest match spanning 163 occurrences in 15 processes. Process similarity within the RPA domain proves to be a viable solution for mitigating the maintenance burden associated with RPA. This underscores the significance of process similarity in the RPA domain.
Emotion recognition in conversations (ERC) is a rapidly evolving task within the natural language processing community, which aims to detect the emotions expressed by speakers during a conversation. Recently, a growing number of ERC methods have focused on leveraging supervised contrastive learning (SCL) to enhance the robustness and generalizability of learned features. However, current SCL-based approaches in ERC are impeded by the constraint of large batch sizes and the lack of compatibility with most existing ERC models. To address these challenges, we propose an efficient and model-agnostic SCL framework named Supervised Sample-Label Contrastive Learning with Soft-HGR Maximal Correlation (SSLCL), which eliminates the need for a large batch size and can be seamlessly integrated with existing ERC models without introducing any model-specific assumptions. Specifically, we introduce a novel perspective on utilizing label representations by projecting discrete labels into dense embeddings through a shallow multilayer perceptron, and formulate the training objective to maximize the similarity between sample features and their corresponding ground-truth label embeddings, while minimizing the similarity between sample features and label embeddings of disparate classes. Moreover, we innovatively adopt the Soft-HGR maximal correlation as a measure of similarity between sample features and label embeddings, leading to significant performance improvements over conventional similarity measures. Additionally, multimodal cues of utterances are effectively leveraged by SSLCL as data augmentations to boost model performances. Extensive experiments on two ERC benchmark datasets, IEMOCAP and MELD, demonstrate the compatibility and superiority of our proposed SSLCL framework compared to existing state-of-the-art SCL methods. Our code is available at \url{//github.com/TaoShi1998/SSLCL}.
We investigate multiuser uplink communication from multiple single-antenna users to a base station (BS), which is equipped with a movable-antenna (MA) array and adopts zero-forcing receivers to decode multiple signals. We aim to optimize the MAs' positions at the BS, to minimize the total transmit power of all users subject to the minimum rate requirement. After applying transformations, we show that the problem is equivalent to minimizing the sum of each eigenvalue's reciprocal of a matrix, which is a function of all MAs' positions. Subsequently, the projected gradient descent (PGD) method is utilized to find a locally optimal solution. In particular, different from the latest related work, we exploit the eigenvalue decomposition to successfully derive a closed-form gradient for the PGD, which facilitates the practical implementation greatly. We demonstrate by simulations that via careful optimization for all MAs' positions in our proposed design, the total transmit power of all users can be decreased significantly as compared to competitive benchmarks.
Time series forecasting (TSF) holds significant importance in modern society, spanning numerous domains. Previous representation learning-based TSF algorithms typically embrace a contrastive learning paradigm featuring segregated trend-periodicity representations. Yet, these methodologies disregard the inherent high-impact noise embedded within time series data, resulting in representation inaccuracies and seriously demoting the forecasting performance. To address this issue, we propose CLeaRForecast, a novel contrastive learning framework to learn high-purity time series representations with proposed sample, feature, and architecture purifying methods. More specifically, to avoid more noise adding caused by the transformations of original samples (series), transformations are respectively applied for trendy and periodic parts to provide better positive samples with obviously less noise. Moreover, we introduce a channel independent training manner to mitigate noise originating from unrelated variables in the multivariate series. By employing a streamlined deep-learning backbone and a comprehensive global contrastive loss function, we prevent noise introduction due to redundant or uneven learning of periodicity and trend. Experimental results show the superior performance of CLeaRForecast in various downstream TSF tasks.
The pervasive deployment of surveillance cameras produces a massive volume of data, requiring nuanced interpretation. This study thoroughly examines data representation and visualization techniques tailored for AI surveillance data within current infrastructures. It delves into essential data metrics, methods for situational awareness, and various visualization techniques, highlighting their potential to enhance safety and guide urban development. This study is built upon real-world research conducted in a community college environment, utilizing eight cameras over eight days. This study presents tools like the Occupancy Indicator, Statistical Anomaly Detection, Bird's Eye View, and Heatmaps to elucidate pedestrian behaviors, surveillance, and public safety. Given the intricate data from smart video surveillance, such as bounding boxes and segmented images, we aim to convert these computer vision results into intuitive visualizations and actionable insights for stakeholders, including law enforcement, urban planners, and social scientists. The results emphasize the crucial impact of visualizing AI surveillance data on emergency handling, public health protocols, crowd control, resource distribution, predictive modeling, city planning, and informed decision-making.
Click-through rate (CTR) prediction is a vital task in industry advertising systems. Most existing methods focus on the structure design of neural network for better accuracy and suffer from the data sparsity problem. Especially in industry advertising systems, the widely applied negative sample downsampling technique due to resource limitation worsens the problem, resulting in a decline in performance. In this paper, we propose \textbf{A}uxiliary Match \textbf{T}asks for enhancing \textbf{C}lick-\textbf{T}hrough \textbf{R}ate performance (AT4CTR) to alleviate the data sparsity problem. Specifically, we design two match tasks inspired by collaborative filtering to enhance the relevance between user and item. As the "click" action is a strong signal which indicates user's preference towards item directly, we make the first match task aim at pulling closer the representation between user and item regarding the positive samples. Since the user's past click behaviors can also be treated as the user him/herself, we apply the next item prediction as the second match task. For both the match tasks, we choose the InfoNCE in contrastive learning as their loss function. The two match tasks can provide meaningful training signals to speed up the model's convergence and alleviate the data sparsity. We conduct extensive experiments on a public dataset and a large-scale industry advertising dataset. The results demonstrate the effectiveness of the proposed auxiliary match tasks. AT4CTR has been deployed in the real industry advertising system and gains remarkable revenue.
Deep neural networks (DNNs) demonstrate outstanding performance across most computer vision tasks. Some critical applications, such as autonomous driving or medical imaging, also require investigation into their behavior and the reasons behind the decisions they make. In this vein, DNN attribution consists in studying the relationship between the predictions of a DNN and its inputs. Attribution methods have been adapted to highlight the most relevant weights or neurons in a DNN, allowing to more efficiently select which weights or neurons can be pruned. However, a limitation of these approaches is that weights are typically compared within each layer separately, while some layers might appear as more critical than others. In this work, we propose to investigate DNN layer importance, i.e. to estimate the sensitivity of the accuracy w.r.t. perturbations applied at the layer level. To do so, we propose a novel dataset to evaluate our method as well as future works. We benchmark a number of criteria and draw conclusions regarding how to assess DNN layer importance and, consequently, how to budgetize layers for increased DNN efficiency (with applications for DNN pruning and quantization), as well as robustness to hardware failure (e.g. bit swaps).