欧美精品日韩精品国内精品_欧美日韩精品视频一区二区在线播_天天射夭夭干欧美性爱_欧美日韩精品一二亚洲色_成年人黄色视频网_亚洲国产日韩欧美在线_精品久久国产字幕

Left, Center, Right is a popular dice game. We analyze the game using Markov chain and Monte Carlo methods. We compute the expected game length for two to eight players and determine the probability of winning for each player in the game. We discuss the surprising conclusions of which players have the highest and lowest chance of winning, and we propose a small rule change that makes the game a little more fair.

相關內容

Markov

關注 1

約束 · Analysis · 優化器 · 點云 · Better ·

2024 年 8 月 21 日

Informed, Constrained, Aligned: A Field Analysis on Degeneracy-aware Point Cloud Registration in the Wild

Turcan Tuna,Julian Nubert,Patrick Pfreundschuh,Cesar Cadena,Shehryar Khattak,Marco Hutter

from arxiv, Submitted to IEEE Transactions on Field Robotics

The ICP registration algorithm has been a preferred method for LiDAR-based robot localization for nearly a decade. However, even in modern SLAM solutions, ICP can degrade and become unreliable in geometrically ill-conditioned environments. Current solutions primarily focus on utilizing additional sources of information, such as external odometry, to either replace the degenerate directions of the optimization solution or add additional constraints in a sensor-fusion setup afterward. In response, this work investigates and compares new and existing degeneracy mitigation methods for robust LiDAR-based localization and analyzes the efficacy of these approaches in degenerate environments for the first time in the literature at this scale. Specifically, this work proposes and investigates i) the incorporation of different types of constraints into the ICP algorithm, ii) the effect of using active or passive degeneracy mitigation techniques, and iii) the choice of utilizing global point cloud registration methods on the ill-conditioned ICP problem in LiDAR degenerate environments. The study results are validated through multiple real-world field and simulated experiments. The analysis shows that active optimization degeneracy mitigation is necessary and advantageous in the absence of reliable external estimate assistance for LiDAR-SLAM. Furthermore, introducing degeneracy-aware hard constraints in the optimization before or during the optimization is shown to perform better in the wild than by including the constraints after. Moreover, with heuristic fine-tuned parameters, soft constraints can provide equal or better results in complex ill-conditioned scenarios. The implementations used in the analysis of this work are made publicly available to the community.

Performer · 類別 · 周期的 · 相互獨立的 · 對數幾率回歸 ·

2024 年 8 月 21 日

Are Scientists Changing their Research Productivity Classes When They Move Up the Academic Ladder?

Marek Kwiek,Wojciech Roszka

from arxiv, 36 pages, 9 tables, 4 figures

We approach productivity in science in a longitudinal fashion: We track careers over time, up to 40 years. We first allocate scientists to decile-based publishing productivity classes, from the bottom 10% to the top 10%. Then, we seek patterns of mobility between the classes in two career stages: assistant professorship and associate professorship. Our findings confirm that radically changing publishing productivity levels (upward or downward) almost never happens. Scientists with a very weak past track record in publications emerge as having marginal chances of becoming scientists with a very strong future track record across all science, technology, engineering, mathematics, and medicine (STEMM) fields. Hence, our research shows a long-term character of careers in science, with publishing productivity during the apprenticeship period of assistant professorship heavily influencing productivity during the more independent period of associate professorship. We use individual-level microdata on academic careers (from a national registry of scientists) and individual-level metadata on publications (from the Scopus raw dataset). Polish associate professors tend to be stuck in their productivity classes for years: High performers tend to remain high performers, and low performers tend to remain low performers over their careers. Logistic regression analysis powerfully supports our two-dimensional results. We examine all internationally visible Polish associate professors in five fields of science in STEMM fields (N = 4,165 with N art = 71,841 articles).

噪聲 · Pivotal（公司） · Performance · 模型評估 · 基 ·

2024 年 8 月 21 日

Privacy Preservation in Delay-Based Localization Systems: Artificial Noise or Artificial Multipath?

Yuchen Zhang,Hui Chen,Henk Wymeersch

from arxiv, 6pages, conference paper

Localization plays an increasingly pivotal role in 5G/6G systems, enabling various applications. This paper focuses on the privacy concerns associated with delay-based localization, where unauthorized base stations attempt to infer the location of the end user. We propose a method to disrupt localization at unauthorized nodes by injecting artificial components into the pilot signal, exploiting model mismatches inherent in these nodes. Specifically, we investigate the effectiveness of two techniques, namely artificial multipath (AM) and artificial noise (AN), in mitigating location leakage. By leveraging the misspecified Cram\'er-Rao bound framework, we evaluate the impact of these techniques on unauthorized localization performance. Our results demonstrate that pilot manipulation significantly degrades the accuracy of unauthorized localization while minimally affecting legitimate localization. Moreover, we find that the superiority of AM over AN varies depending on the specific scenario.

多峰值 · MoDELS · 語言模型化 · Prompt · 大語言模型 ·

2024 年 8 月 17 日

BaThe: Defense against the Jailbreak Attack in Multimodal Large Language Models by Treating Harmful Instruction as Backdoor Trigger

Yulin Chen,Haoran Li,Zihao Zheng,Yangqiu Song

Multimodal Large Language Models (MLLMs) have showcased impressive performance in a variety of multimodal tasks. On the other hand, the integration of additional image modality may allow the malicious users to inject harmful content inside the images for jailbreaking. Unlike text-based LLMs, where adversaries need to select discrete tokens to conceal their malicious intent using specific algorithms, the continuous nature of image signals provides a direct opportunity for adversaries to inject harmful intentions. In this work, we propose $\textbf{BaThe}$ ($\textbf{Ba}$ckdoor $\textbf{T}$rigger S$\textbf{h}$i$\textbf{e}$ld), a simple yet effective jailbreak defense mechanism. Our work is motivated by recent research on jailbreak backdoor attack and virtual prompt backdoor attack in generative language models. Jailbreak backdoor attack uses harmful instructions combined with manually crafted strings as triggers to make the backdoored model generate prohibited responses. We assume that harmful instructions can function as triggers, and if we alternatively set rejection responses as the triggered response, the backdoored model then can defend against jailbreak attacks. We achieve this by utilizing virtual rejection prompt, similar to the virtual prompt backdoor attack. We embed the virtual rejection prompt into the soft text embeddings, which we call ``wedge''. Our comprehensive experiments demonstrate that BaThe effectively mitigates various types of jailbreak attacks and is adaptable to defend against unseen attacks, with minimal impact on MLLMs' performance.

Pair · MoDELS · 可辨認的 · CASES · INFORMS ·

2024 年 8 月 14 日

Only One Relation Possible? Modeling the Ambiguity in Event Temporal Relation Extraction

Yutong Hu,Quzhe Huang,Yansong Feng

Event Temporal Relation Extraction (ETRE) aims to identify the temporal relationship between two events, which plays an important role in natural language understanding. Most previous works follow a single-label classification style, classifying an event pair into either a specific temporal relation (e.g., \textit{Before}, \textit{After}), or a special label \textit{Vague} when there may be multiple possible temporal relations between the pair. In our work, instead of directly making predictions on \textit{Vague}, we propose a multi-label classification solution for ETRE (METRE) to infer the possibility of each temporal relation independently, where we treat \textit{Vague} as the cases when there is more than one possible relation between two events. We design a speculation mechanism to explore the possible relations hidden behind \textit{Vague}, which enables the latent information to be used efficiently. Experiments on TB-Dense, MATRES and UDS-T show that our method can effectively utilize the \textit{Vague} instances to improve the recognition for specific temporal relations and outperforms most state-of-the-art methods.

Engineering · BERT · 均值 · 欠估計 · Nuance ·

2024 年 8 月 13 日

BERT's Conceptual Cartography: Mapping the Landscapes of Meaning

Nina Haket,Ryan Daniels

Conceptual Engineers want to make words better. However, they often underestimate how varied our usage of words is. In this paper, we take the first steps in exploring the contextual nuances of words by creating conceptual landscapes -- 2D surfaces representing the pragmatic usage of words -- that conceptual engineers can use to inform their projects. We use the spoken component of the British National Corpus and BERT to create contextualised word embeddings, and use Gaussian Mixture Models, a selection of metrics, and qualitative analysis to visualise and numerically represent lexical landscapes. Such an approach has not yet been used in the conceptual engineering literature and provides a detailed examination of how different words manifest in various contexts that is potentially useful to conceptual engineering projects. Our findings highlight the inherent complexity of conceptual engineering, revealing that each word exhibits a unique and intricate landscape. Conceptual Engineers cannot, therefore, use a one-size-fits-all approach when improving words -- a task that may be practically intractable at scale.

MoDELS · 語言模型化 · Performer · Processing（編程語言） · 大語言模型 ·

2024 年 8 月 13 日

Large Model Strategic Thinking, Small Model Efficiency: Transferring Theory of Mind in Large Language Models

Nunzio Lore,Alireza Sepehr Ilami,Babak Heydari

from arxiv, 18 pages, 6 figures

As the performance of larger, newer Large Language Models continues to improve for strategic Theory of Mind (ToM) tasks, the demand for these state of the art models increases commensurately. However, their deployment is costly both in terms of processing power and time. In this paper, we investigate the feasibility of creating smaller, simulation-ready agents by way of fine-tuning. To do this, we present a large pre-trained model with 20 unique scenarios that combine a social context with a social dilemma, recording its answers, and using them for Q\&A fine-tuning on a smaller model of the same family. Our focus is on in-context game-theoretic decision-making, the same domain within which human interaction occurs and that requires both a theory of mind (or a semblance thereof) and an understanding of social dynamics. We find that the fine-tuned smaller language model exhibited significant performance closer to that of its larger relative, and that their improvements extended in areas and contexts beyond the ones provided in the training examples. On average for all games, through fine-tuning, the smaller model showed a \%46 improvement in aligning with the behavior of the larger model, with \%100 representing complete alignment. This suggests that our pipeline represents an efficient method to transmit some form of theory of mind to smaller models, creating improved and cheaply deployable algorithms in the process. Despite their simplicity and their associated shortcomings and limitations, our findings represent a stepping stone in the pursuit and training of specialized models for strategic and social decision making.

自動問答 · MoDELS · AIM · Pivotal（公司） · 可理解性 ·

2024 年 8 月 8 日

VideoQA in the Era of LLMs: An Empirical Study

Junbin Xiao,Nanxin Huang,Hangyu Qin,Dongyang Li,Yicong Li,Fengbin Zhu,Zhulin Tao,Jianxing Yu,Liang Lin,Tat-Seng Chua,Angela Yao

from arxiv, Preprint. Under Review

Video Large Language Models (Video-LLMs) are flourishing and has advanced many video-language tasks. As a golden testbed, Video Question Answering (VideoQA) plays pivotal role in Video-LLM developing. This work conducts a timely and comprehensive study of Video-LLMs' behavior in VideoQA, aiming to elucidate their success and failure modes, and provide insights towards more human-like video understanding and question answering. Our analyses demonstrate that Video-LLMs excel in VideoQA; they can correlate contextual cues and generate plausible responses to questions about varied video contents. However, models falter in handling video temporality, both in reasoning about temporal content ordering and grounding QA-relevant temporal moments. Moreover, the models behave unintuitively - they are unresponsive to adversarial video perturbations while being sensitive to simple variations of candidate answers and questions. Also, they do not necessarily generalize better. The findings demonstrate Video-LLMs' QA capability in standard condition yet highlight their severe deficiency in robustness and interpretability, suggesting the urgent need on rationales in Video-LLM developing.

MoDELS · Performer · INFORMS · 多跳 · 跡 ·

2024 年 8 月 6 日

RULER: What's the Real Context Size of Your Long-Context Language Models?

Cheng-Ping Hsieh,Simeng Sun,Samuel Kriman,Shantanu Acharya,Dima Rekesh,Fei Jia,Yang Zhang,Boris Ginsburg

from arxiv, COLM 2024; Code is available at //github.com/hsiehjackson/RULER

The needle-in-a-haystack (NIAH) test, which examines the ability to retrieve a piece of information (the "needle") from long distractor texts (the "haystack"), has been widely adopted to evaluate long-context language models (LMs). However, this simple retrieval-based test is indicative of only a superficial form of long-context understanding. To provide a more comprehensive evaluation of long-context LMs, we create a new synthetic benchmark RULER with flexible configurations for customized sequence length and task complexity. RULER expands upon the vanilla NIAH test to encompass variations with diverse types and quantities of needles. Moreover, RULER introduces new task categories multi-hop tracing and aggregation to test behaviors beyond searching from context. We evaluate 17 long-context LMs with 13 representative tasks in RULER. Despite achieving nearly perfect accuracy in the vanilla NIAH test, almost all models exhibit large performance drops as the context length increases. While these models all claim context sizes of 32K tokens or greater, only half of them can maintain satisfactory performance at the length of 32K. Our analysis of Yi-34B, which supports context length of 200K, reveals large room for improvement as we increase input length and task complexity. We open source RULER to spur comprehensive evaluation of long-context LMs.

MoDELS · CRAFT · 語言模型化 · Learning · 黑盒 ·

2024 年 8 月 5 日

Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models?

Mohammad Bahrami Karkevandi,Nishant Vishwamitra,Peyman Najafirad

from arxiv, Accepted to AI4CYBER - KDD 2024

Large Language Models (LLMs) have demonstrated impressive capabilities in natural language tasks, but their safety and morality remain contentious due to their training on internet text corpora. To address these concerns, alignment techniques have been developed to improve the public usability and safety of LLMs. Yet, the potential for generating harmful content through these models seems to persist. This paper explores the concept of jailbreaking LLMs-reversing their alignment through adversarial triggers. Previous methods, such as soft embedding prompts, manually crafted prompts, and gradient-based automatic prompts, have had limited success on black-box models due to their requirements for model access and for producing a low variety of manually crafted prompts, making them susceptible to being blocked. This paper introduces a novel approach using reinforcement learning to optimize adversarial triggers, requiring only inference API access to the target model and a small surrogate model. Our method, which leverages a BERTScore-based reward function, enhances the transferability and effectiveness of adversarial triggers on new black-box models. We demonstrate that this approach improves the performance of adversarial triggers on a previously untested language model.