VTubers, or Virtual YouTubers, are live streamers who create streaming content using animated 2D or 3D virtual avatars. In recent years, there has been a significant increase in the number of VTuber creators and viewers across the globe. This practise has drawn research attention into topics such as viewers' engagement behaviors and perceptions, however, as animated avatars offer more identity and performance flexibility than traditional live streaming where one uses their own body, little research has focused on how this flexibility influences how creators present themselves. This research thus seeks to fill this gap by presenting results from a qualitative study of 16 Chinese-speaking VTubers' streaming practices. The data revealed that the virtual avatars that were used while live streaming afforded creators opportunities to present themselves using inflated presentations and resulted in inclusive interactions with viewers. The results also unveiled the inflated, and often sexualized, gender expressions of VTubers while they were situated in misogynistic environments. The socio-technical facets of VTubing were found to potentially reduce sexual harassment and sexism, whilst also raising self-objectification concerns.
This paper presents a novel approach to address the Entity Recognition and Linking Challenge at NLPCC 2015. The task involves extracting named entity mentions from short search queries and linking them to entities within a reference Chinese knowledge base. To tackle this problem, we first expand the existing knowledge base and utilize external knowledge to identify candidate entities, thereby improving the recall rate. Next, we extract features from the candidate entities and utilize Support Vector Regression and Multiple Additive Regression Tree as scoring functions to filter the results. Additionally, we apply rules to further refine the results and enhance precision. Our method is computationally efficient and achieves an F1 score of 0.535.
We propose MAMo, a novel memory and attention frame-work for monocular video depth estimation. MAMo can augment and improve any single-image depth estimation networks into video depth estimation models, enabling them to take advantage of the temporal information to predict more accurate depth. In MAMo, we augment model with memory which aids the depth prediction as the model streams through the video. Specifically, the memory stores learned visual and displacement tokens of the previous time instances. This allows the depth network to cross-reference relevant features from the past when predicting depth on the current frame. We introduce a novel scheme to continuously update the memory, optimizing it to keep tokens that correspond with both the past and the present visual information. We adopt attention-based approach to process memory features where we first learn the spatio-temporal relation among the resultant visual and displacement memory tokens using self-attention module. Further, the output features of self-attention are aggregated with the current visual features through cross-attention. The cross-attended features are finally given to a decoder to predict depth on the current frame. Through extensive experiments on several benchmarks, including KITTI, NYU-Depth V2, and DDAD, we show that MAMo consistently improves monocular depth estimation networks and sets new state-of-the-art (SOTA) accuracy. Notably, our MAMo video depth estimation provides higher accuracy with lower latency, when omparing to SOTA cost-volume-based video depth models.
Hardware Reverse Engineering (HRE) is a technique for analyzing Integrated Circuits (ICs). Experts employ HRE for various security-critical tasks, such as design verification or the detection of intellectual property violations. However, HRE also enables threat actors to subvert the security of an IC. Previous studies have shown that analysts rely heavily on their cognitive abilities to perform HRE as no fully automated solutions exist. Therefore, conducting controlled experimental studies to assess the cognitive processes involved in HRE could open new avenues for hardware protection. However, researchers have faced the methodological challenge that HRE experts are largely unavailable for such empirical research. To address this scarcity, we have developed REVERSIM, a game-based simulation that mimics realistic HRE subprocesses and is specifically designed to require no prior knowledge. To support these claims, we conducted two empirical studies: First, we performed semi-structured interviews with 14 professionals and researchers from the HRE domain, who attested to the comparability of REVERSIM to real-world HRE problems. Second, we conducted a user study involving 89 non-expert participants, demonstrating that participants could engage in the simulation without prior knowledge in HRE or related domains. Finally, we outline several research directions for experiments with REVERSIM, highlighting its potential in advancing HRE research.
Jailbreak vulnerabilities in Large Language Models (LLMs), which exploit meticulously crafted prompts to elicit content that violates service guidelines, have captured the attention of research communities. While model owners can defend against individual jailbreak prompts through safety training strategies, this relatively passive approach struggles to handle the broader category of similar jailbreaks. To tackle this issue, we introduce FuzzLLM, an automated fuzzing framework designed to proactively test and discover jailbreak vulnerabilities in LLMs. We utilize templates to capture the structural integrity of a prompt and isolate key features of a jailbreak class as constraints. By integrating different base classes into powerful combo attacks and varying the elements of constraints and prohibited questions, FuzzLLM enables efficient testing with reduced manual effort. Extensive experiments demonstrate FuzzLLM's effectiveness and comprehensiveness in vulnerability discovery across various LLMs.
We present Serberus, the first comprehensive mitigation for hardening constant-time (CT) code against Spectre attacks (involving the PHT, BTB, RSB, STL and/or PSF speculation primitives) on existing hardware. Serberus is based on three insights. First, some hardware control-flow integrity (CFI) protections restrict transient control-flow to the extent that it may be comprehensively considered by software analyses. Second, conformance to the accepted CT code discipline permits two code patterns that are unsafe in the post-Spectre era. Third, once these code patterns are addressed, all Spectre leakage of secrets in CT programs can be attributed to one of four classes of taint primitives--instructions that can transiently assign a secret value to a publicly-typed register. We evaluate Serberus on cryptographic primitives in the OpenSSL, Libsodium, and HACL* libraries. Serberus introduces 21.3% runtime overhead on average, compared to 24.9% for the next closest state-of-the-art software mitigation, which is less secure.
We present an expanded study of the performance of FLASH when using Linux Kernel Hugepages on Ookami, an HPE Apollo 80 A64FX platform. FLASH is a multi-scale, multi-physics simulation code written principally in modern Fortran and makes use of the PARAMESH library to manage a block-structured adaptive mesh. Our initial study used only the Fujitsu compiler to utilize standard hugepages (hp), but further investigation allowed us to utilize hp for multiple compilers by linking to the Fujitsu library libmpg and transparent hugepages (thp) by enabling it at the node level. By comparing the results of hardware counters and in-code timers, we found that hp and thp do not significantly impact the runtime performance of FLASH. Interestingly, there is a significant reduction in the TLB misses, differences in cache and memory access counters, and strange behavior is observed when using thp.
Markov games model interactions among multiple players in a stochastic, dynamic environment. Each player in a Markov game maximizes its expected total discounted reward, which depends upon the policies of the other players. We formulate a class of Markov games, termed affine Markov games, where an affine reward function couples the players' actions. We introduce a novel solution concept, the soft-Bellman equilibrium, where each player is boundedly rational and chooses a soft-Bellman policy rather than a purely rational policy as in the well-known Nash equilibrium concept. We provide conditions for the existence and uniqueness of the soft-Bellman equilibrium and propose a nonlinear least-squares algorithm to compute such an equilibrium in the forward problem. We then solve the inverse game problem of inferring the players' reward parameters from observed state-action trajectories via a projected-gradient algorithm. Experiments in a predator-prey OpenAI Gym environment show that the reward parameters inferred by the proposed algorithm outperform those inferred by a baseline algorithm: they reduce the Kullback-Leibler divergence between the equilibrium policies and observed policies by at least two orders of magnitude.
The Right to be Forgotten (RTBF) was first established as the result of the ruling of Google Spain SL, Google Inc. v AEPD, Mario Costeja Gonz\'alez, and was later included as the Right to Erasure under the General Data Protection Regulation (GDPR) of European Union to allow individuals the right to request personal data be deleted by organizations. Specifically for search engines, individuals can send requests to organizations to exclude their information from the query results. It was a significant emergent right as the result of the evolution of technology. With the recent development of Large Language Models (LLMs) and their use in chatbots, LLM-enabled software systems have become popular. But they are not excluded from the RTBF. Compared with the indexing approach used by search engines, LLMs store, and process information in a completely different way. This poses new challenges for compliance with the RTBF. In this paper, we explore these challenges and provide our insights on how to implement technical solutions for the RTBF, including the use of differential privacy, machine unlearning, model editing, and prompt engineering. With the rapid advancement of AI and the increasing need of regulating this powerful technology, learning from the case of RTBF can provide valuable lessons for technical practitioners, legal experts, organizations, and authorities.
We present a novel method, Aerial Diffusion, for generating aerial views from a single ground-view image using text guidance. Aerial Diffusion leverages a pretrained text-image diffusion model for prior knowledge. We address two main challenges corresponding to domain gap between the ground-view and the aerial view and the two views being far apart in the text-image embedding manifold. Our approach uses a homography inspired by inverse perspective mapping prior to finetuning the pretrained diffusion model. Additionally, using the text corresponding to the ground-view to finetune the model helps us capture the details in the ground-view image at a relatively low bias towards the ground-view image. Aerial Diffusion uses an alternating sampling strategy to compute the optimal solution on complex high-dimensional manifold and generate a high-fidelity (w.r.t. ground view) aerial image. We demonstrate the quality and versatility of Aerial Diffusion on a plethora of images from various domains including nature, human actions, indoor scenes, etc. We qualitatively prove the effectiveness of our method with extensive ablations and comparisons. To the best of our knowledge, Aerial Diffusion is the first approach that performs ground-to-aerial translation in an unsupervised manner.
Food consumption is one of the biggest contributors to climate change. However, online grocery shoppers often lack the time, motivation, or knowledge to contemplate a food's environmental impact. At the same time, they are concerned with their own well-being. To empower grocery shoppers in making nutritionally and environmentally informed decisions, we investigate the efficacy of the Scale-Score, a label combining nutritional and environmental information to highlight a product's benefit to both the consumer's and the planet's health, without obscuring either information. We conducted an online survey to understand user needs and requirements regarding a joint food label, we developed an open-source mock online grocery environment, and assessed label efficacy. We find that the Scale-Score supports nutritious purchases, yet needs improving regarding sustainability support. Our research shows first insights into design considerations and performance of a combined yet disjoint food label, potentially altering the label design space.