东京热加勒比中文无码_国产日韩精品全集在线观看_最新中文字幕国产不卡在线_最新2021精品视频自拍_无码精品国产DVD在线观看久9_国产精品中文第一字幕_男女无遮挡WWW免费网站

Encryption on the internet with the shift to HTTPS has been an important step to improve the privacy of internet users. However, there is an increasing body of work about extracting information from encrypted internet traffic without having to decrypt it. Such attacks bypass security guarantees assumed to be given by HTTPS and thus need to be understood. Prior works showed that the variable bitrates of video streams are sufficient to identify which video someone is watching. These works generally have to make trade-offs in aspects such as accuracy, scalability, robustness, etc. These trade-offs complicate the practical use of these attacks. To that end, we propose a deep metric learning framework based on the triplet loss method. Through this framework, we achieve robust, generalisable, scalable and transferable encrypted video stream detection. First, the triplet loss is better able to deal with video streams not seen during training. Second, our approach can accurately classify videos not seen during training. Third, we show that our method scales well to a dataset of over 1000 videos. Finally, we show that a model trained on video streams over Chrome can also classify streams over Firefox. Our results suggest that this side-channel attack is more broadly applicable than originally thought. We provide our code alongside a diverse and up-to-date dataset for future research.

相關內容

流(liu)

關注 1

state-of-the-art · 可約的 · 可辨認的 · Performer · 統計量 ·

2024 年 6 月 21 日

Landscape More Secure Than Portrait? Zooming Into the Directionality of Digital Images With Security Implications

Benedikt Lorch,Rainer B?hme

The orientation in which a source image is captured can affect the resulting security in downstream applications. One reason for this is that many state-of-the-art methods in media security assume that image statistics are similar in the horizontal and vertical directions, allowing them to reduce the number of features (or trainable weights) by merging coefficients. We show that this artificial symmetrization tends to suppress important properties of natural images and common processing operations, causing a loss of performance. We also observe the opposite problem, where unaddressed directionality causes learning-based methods to overfit to a single orientation. These are vulnerable to manipulation if an adversary chooses inputs with the less common orientation. This paper takes a comprehensive approach, identifies and systematizes causes of directionality at several stages of a typical acquisition pipeline, measures their effect, and demonstrates for three selected security applications (steganalysis, forensic source identification, and the detection of synthetic images) how the performance of state-of-the-art methods can be improved by properly accounting for directionality.

INTERACT · Chatbot · INFORMS · 模態 · 語言模型化 ·

2024 年 6 月 21 日

Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations

Lichao Zhang,Jia Yu,Shuai Zhang,Long Li,Yangyang Zhong,Guanbao Liang,Yuming Yan,Qing Ma,Fangsheng Weng,Fayu Pan,Jing Li,Renjun Xu,Zhenzhong Lan

Large Language Models (LLMs) have significantly advanced user-bot interactions, enabling more complex and coherent dialogues. However, the prevalent text-only modality might not fully exploit the potential for effective user engagement. This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement in chatbot conversations. We conduct a comprehensive analysis using a diverse set of chatbots and real-user interaction data, employing metrics such as retention rate and conversation length to evaluate user engagement. Our findings reveal a significant enhancement in user engagement with multi-modal interactions compared to text-only dialogues. Notably, the incorporation of a third modality significantly amplifies engagement beyond the benefits observed with just two modalities. These results suggest that multi-modal interactions optimize cognitive processing and facilitate richer information comprehension. This study underscores the importance of multi-modality in chatbot design, offering valuable insights for creating more engaging and immersive AI communication experiences and informing the broader AI community about the benefits of multi-modal interactions in enhancing user engagement.

Attention · 稀疏 · MoDELS · 語言模型化 · UniFormer ·

2024 年 6 月 21 日

MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression

Tianyu Fu,Haofeng Huang,Xuefei Ning,Genghan Zhang,Boju Chen,Tianqi Wu,Hongyi Wang,Zixiao Huang,Shiyao Li,Shengen Yan,Guohao Dai,Huazhong Yang,Yu Wang

from arxiv, 10 pages

Sparse attention can effectively mitigate the significant memory and throughput demands of Large Language Models (LLMs) in long contexts. Existing methods typically employ a uniform sparse attention mask, applying the same sparse pattern across different attention heads and input lengths. However, this uniform approach fails to capture the diverse attention patterns inherent in LLMs, ignoring their distinct accuracy-latency trade-offs. To address this challenge, we propose the Mixture of Attention (MoA), which automatically tailors distinct sparse attention configurations to different heads and layers. MoA constructs and navigates a search space of various attention patterns and their scaling rules relative to input sequence lengths. It profiles the model, evaluates potential configurations, and pinpoints the optimal sparse attention compression plan. MoA adapts to varying input sizes, revealing that some attention heads expand their focus to accommodate longer sequences, while other heads consistently concentrate on fixed-length local contexts. Experiments show that MoA increases the effective context length by $3.9\times$ with the same average attention span, boosting retrieval accuracy by $1.5-7.1\times$ over the uniform-attention baseline across Vicuna-7B, Vicuna-13B, and Llama3-8B models. Moreover, MoA narrows the capability gaps between sparse and dense models, reducing the maximum relative performance drop from $9\%-36\%$ to within $5\%$ across two long-context understanding benchmarks. MoA achieves a $1.2-1.4\times$ GPU memory reduction and boosts decode throughput by $5.5-6.7 \times$ for 7B and 13B dense models on a single GPU, with minimal impact on performance.

可約的 · Elevate · Pivotal（公司） · Continuity · Integration ·

2024 年 6 月 21 日

Securing the Future: Proactive Threat Hunting for Sustainable IoT Ecosystems

Saeid Ghasemshirazi,Ghazaleh Shirvani

In the rapidly evolving landscape of the IoT, the security of connected devices has become a paramount concern. This paper explores the concept of proactive threat hunting as a pivotal strategy for enhancing the security and sustainability of IoT systems. Proactive threat hunting is an alternative to traditional reactive security measures that analyses IoT networks continuously and in advance to find and eliminate threats before they occure. By improving the security posture of IoT devices this approach significantly contributes to extending IoT operational lifespan and reduces environmental impact. By integrating security metrics similar to the Common Vulnerability Scoring System (CVSS) into consumer platforms, this paper argues that proactive threat hunting can elevate user awareness about the security of IoT devices. This has the potential to impact consumer choices and encourage a security-conscious mindset in both the manufacturing and user communities. Through a comprehensive analysis, this study demonstrates how proactive threat hunting can contribute to the development of a more secure, sustainable, and user-aware IoT ecosystem.

INFORMS · 信息檢索 · Extensibility · Performer · Performance ·

2024 年 6 月 20 日

DIRAS: Efficient LLM-Assisted Annotation of Document Relevance in Retrieval Augmented Generation

Jingwei Ni,Tobias Schimanski,Meihong Lin,Mrinmaya Sachan,Elliott Ash,Markus Leippold

Retrieval Augmented Generation (RAG) is widely employed to ground responses to queries on domain-specific documents. But do RAG implementations leave out important information or excessively include irrelevant information? To allay these concerns, it is necessary to annotate domain-specific benchmarks to evaluate information retrieval (IR) performance, as relevance definitions vary across queries and domains. Furthermore, such benchmarks should be cost-efficiently annotated to avoid annotation selection bias. In this paper, we propose DIRAS (Domain-specific Information Retrieval Annotation with Scalability), a manual-annotation-free schema that fine-tunes open-sourced LLMs to annotate relevance labels with calibrated relevance probabilities. Extensive evaluation shows that DIRAS fine-tuned models achieve GPT-4-level performance on annotating and ranking unseen (query, document) pairs, and is helpful for real-world RAG development.

Performer · ChatGPT · Extensibility · 支持向量機 · Prompt ·

2024 年 6 月 19 日

Evaluating the Performance of ChatGPT for Spam Email Detection

Shijing Si,Yuwei Wu,Le Tang,Yugui Zhang,Jedrek Wosik

from arxiv, 12 pages, 4 figures

Email continues to be a pivotal and extensively utilized communication medium within professional and commercial domains. Nonetheless, the prevalence of spam emails poses a significant challenge for users, disrupting their daily routines and diminishing productivity. Consequently, accurately identifying and filtering spam based on content has become crucial for cybersecurity. Recent advancements in natural language processing, particularly with large language models like ChatGPT, have shown remarkable performance in tasks such as question answering and text generation. However, its potential in spam identification remains underexplored. To fill in the gap, this study attempts to evaluate ChatGPT's capabilities for spam identification in both English and Chinese email datasets. We employ ChatGPT for spam email detection using in-context learning, which requires a prompt instruction and a few demonstrations. We also investigate how the number of demonstrations in the prompt affects the performance of ChatGPT. For comparison, we also implement five popular benchmark methods, including naive Bayes, support vector machines (SVM), logistic regression (LR), feedforward dense neural networks (DNN), and BERT classifiers. Through extensive experiments, the performance of ChatGPT is significantly worse than deep supervised learning methods in the large English dataset, while it presents superior performance on the low-resourced Chinese dataset.

跡 · 大語言模型 · 優化器 · ReQuEST · Microsoft Azure ·

2024 年 6 月 17 日

BurstGPT: A Real-world Workload Dataset to Optimize LLM Serving Systems

Yuxin Wang,Yuhan Chen,Zeyu Li,Xueze Kang,Zhenheng Tang,Xin He,Rui Guo,Xin Wang,Qiang Wang,Amelie Chi Zhou,Xiaowen Chu

Serving systems for Large Language Models (LLMs) are often optimized to improve quality of service (QoS) and throughput. However, due to the lack of open-sourced LLM serving workloads, these systems are frequently evaluated under unrealistic workload assumptions. Consequently, performance may degrade when these systems are deployed in real-world scenarios. This work presents BurstGPT, an LLM serving workload with 5.29 million traces from regional Azure OpenAI GPT services over 121 days. BurstGPT captures realistic LLM serving characteristics through detailed tracing of: (1) Concurrency of requests: It traces burstiness variations of requests in Azure OpenAI GPT services, revealing diversified concurrency patterns in different services and model types. (2) Response Lengths of requests: It traces the auto-regressive serving processes of GPT models, showing statistical relations between requests and their responses. (3) Failures of requests: It traces failures of conversation and API services, showing intensive resource needs and limited resource availability of such services in Azure. Details of the characteristics can serve multiple purposes in LLM serving optimizations, such as system evaluation and trace provisioning. In our demo evaluation with BurstGPT, we observe that frequent variations in BurstGPT reveal declines in efficiency, stability, or reliability in realistic LLM serving. We identify that the generalization of KV cache management and request scheduling optimization is not guaranteed for different workloads, especially when systems are poorly optimized for unrealistic workloads. We have made the dataset publicly available to encourage further research at //github.com/HPMLL/BurstGPT.

MoDELS · 數據集 · 語言模型化 · 情景 · 大語言模型 ·

2024 年 6 月 17 日

RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content

Joao Monteiro,Pierre-Andre Noel,Etienne Marcotte,Sai Rajeswar,Valentina Zantedeschi,David Vazquez,Nicolas Chapados,Christopher Pal,Perouz Taslakian

Large Language Models (LLMs) are trained on vast amounts of data, most of which is automatically scraped from the internet. This data includes encyclopedic documents that harbor a vast amount of general knowledge (e.g., Wikipedia) but also potentially overlap with benchmark datasets used for evaluating LLMs. Consequently, evaluating models on test splits that might have leaked into the training set is prone to misleading conclusions. To foster sound evaluation of language models, we introduce a new test dataset named RepLiQA, suited for question-answering and topic retrieval tasks. RepLiQA is a collection of five splits of test sets, four of which have not been released to the internet or exposed to LLM APIs prior to this publication. Each sample in RepLiQA comprises (1) a reference document crafted by a human annotator and depicting an imaginary scenario (e.g., a news article) absent from the internet; (2) a question about the document's topic; (3) a ground-truth answer derived directly from the information in the document; and (4) the paragraph extracted from the reference document containing the answer. As such, accurate answers can only be generated if a model can find relevant content within the provided document. We run a large-scale benchmark comprising several state-of-the-art LLMs to uncover differences in performance across models of various types and sizes in a context-conditional language modeling setting. Released splits of RepLiQA can be found here: //huggingface.co/datasets/ServiceNow/repliqa.

語言模型化 · MoDELS · HTTPS · 穩健性 · 大語言模型 ·

2024 年 6 月 16 日

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Patrick Chao,Edoardo Debenedetti,Alexander Robey,Maksym Andriushchenko,Francesco Croce,Vikash Sehwag,Edgar Dobriban,Nicolas Flammarion,George J. Pappas,Florian Tramer,Hamed Hassani,Eric Wong

from arxiv, JailbreakBench v1.0: more attack artifacts, more test-time defenses, a more accurate jailbreak judge (Llama-3-70B with a custom prompt), a larger dataset of human preferences for selecting a jailbreak judge (300 examples), an over-refusal evaluation dataset (100 benign/borderline behaviors), a semantic refusal judge based on Llama-3-8B

Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or otherwise objectionable content. Evaluating these attacks presents a number of challenges, which the current collection of benchmarks and evaluation techniques do not adequately address. First, there is no clear standard of practice regarding jailbreaking evaluation. Second, existing works compute costs and success rates in incomparable ways. And third, numerous works are not reproducible, as they withhold adversarial prompts, involve closed-source code, or rely on evolving proprietary APIs. To address these challenges, we introduce JailbreakBench, an open-sourced benchmark with the following components: (1) an evolving repository of state-of-the-art adversarial prompts, which we refer to as jailbreak artifacts; (2) a jailbreaking dataset comprising 100 behaviors -- both original and sourced from prior work -- which align with OpenAI's usage policies; (3) a standardized evaluation framework at //github.com/JailbreakBench/jailbreakbench that includes a clearly defined threat model, system prompts, chat templates, and scoring functions; and (4) a leaderboard at //jailbreakbench.github.io/ that tracks the performance of attacks and defenses for various LLMs. We have carefully considered the potential ethical implications of releasing this benchmark, and believe that it will be a net positive for the community.

SR · 圖形處理器 · 注意力機制 · Neural Networks · Weight ·

2021 年 1 月 29 日

RetaGNN: Relational Temporal Attentive Graph Neural Networks for Holistic Sequential Recommendation

Cheng Hsu,Cheng-Te Li

from arxiv, Accepted to The Web Conference (WWW) 2021

Sequential recommendation (SR) is to accurately recommend a list of items for a user based on her current accessed ones. While new-coming users continuously arrive in the real world, one crucial task is to have inductive SR that can produce embeddings of users and items without re-training. Given user-item interactions can be extremely sparse, another critical task is to have transferable SR that can transfer the knowledge derived from one domain with rich data to another domain. In this work, we aim to present the holistic SR that simultaneously accommodates conventional, inductive, and transferable settings. We propose a novel deep learning-based model, Relational Temporal Attentive Graph Neural Networks (RetaGNN), for holistic SR. The main idea of RetaGNN is three-fold. First, to have inductive and transferable capabilities, we train a relational attentive GNN on the local subgraph extracted from a user-item pair, in which the learnable weight matrices are on various relations among users, items, and attributes, rather than nodes or edges. Second, long-term and short-term temporal patterns of user preferences are encoded by a proposed sequential self-attention mechanism. Third, a relation-aware regularization term is devised for better training of RetaGNN. Experiments conducted on MovieLens, Instagram, and Book-Crossing datasets exhibit that RetaGNN can outperform state-of-the-art methods under conventional, inductive, and transferable settings. The derived attention weights also bring model explainability.