又黄又爽又色的视频免费,在线看片日中文福利免费,久久99国产6精品久久久,男女十八禁啪啪无遮挡免费视看,免费无码二性在现看

Confidential computing on GPUs, like NVIDIA H100, mitigates the security risks of outsourced Large Language Models (LLMs) by implementing strong isolation and data encryption. Nonetheless, this encryption incurs a significant performance overhead, reaching up to 52.8 percent and 88.2 percent throughput drop when serving OPT-30B and OPT-66B, respectively. To address this challenge, we introduce PipeLLM, a user-transparent runtime system. PipeLLM removes the overhead by overlapping the encryption and GPU computation through pipelining - an idea inspired by the CPU instruction pipelining - thereby effectively concealing the latency increase caused by encryption. The primary technical challenge is that, unlike CPUs, the encryption module lacks prior knowledge of the specific data needing encryption until it is requested by the GPUs. To this end, we propose speculative pipelined encryption to predict the data requiring encryption by analyzing the serving patterns of LLMs. Further, we have developed an efficient, low-cost pipeline relinquishing approach for instances of incorrect predictions. Our experiments on NVIDIA H100 GPU show that compared with vanilla systems without confidential computing (e.g., vLLM, PEFT, and FlexGen), PipeLLM incurs modest overhead (less than 19.6 percent in throughput) across various LLM sizes, from 13B to 175B.

相關內容

語言模型化

關注 9

3D · 模態 · 數據集 · Principle · 回合 ·

2024 年 12 月 18 日

MobiFuse: A High-Precision On-device Depth Perception System with Multi-Data Fusion

Jinrui Zhang,Deyu Zhang,Tingting Long,Wenxin Chen,Ju Ren,Yunxin Liu,Yudong Zhao,Yaoxue Zhang,Youngki Lee

We present MobiFuse, a high-precision depth perception system on mobile devices that combines dual RGB and Time-of-Flight (ToF) cameras. To achieve this, we leverage physical principles from various environmental factors to propose the Depth Error Indication (DEI) modality, characterizing the depth error of ToF and stereo-matching. Furthermore, we employ a progressive fusion strategy, merging geometric features from ToF and stereo depth maps with depth error features from the DEI modality to create precise depth maps. Additionally, we create a new ToF-Stereo depth dataset, RealToF, to train and validate our model. Our experiments demonstrate that MobiFuse excels over baselines by significantly reducing depth measurement errors by up to 77.7%. It also showcases strong generalization across diverse datasets and proves effectiveness in two downstream tasks: 3D reconstruction and 3D segmentation. The demo video of MobiFuse in real-life scenarios is available at the de-identified YouTube link(//youtu.be/jy-Sp7T1LVs).

推斷 · MoDELS · 詞元分析器 · 代價 · 蒸餾 ·

2024 年 12 月 18 日

TRIM: Token Reduction and Inference Modeling for Cost-Effective Language Generation

Alfredo Garrachón Ruiz,Tomás de la Rosa,Daniel Borrajo

from arxiv, 12 pages

The inference cost of Large Language Models (LLMs) is a significant challenge due to their computational demands, specially on tasks requiring long outputs. However, natural language often contains redundancy, which presents an opportunity for optimization. We have observed that LLMs can generate distilled language-concise outputs that retain essential meaning, when prompted appropriately. We propose TRIM, a pipeline for saving computational cost in which a shorter distilled output from the LLM is reconstructed into a full narrative by a smaller model with lower inference costs. Our experiments show promising results, particularly in general knowledge domains with 20.58% saved tokens on average with tiny decrease in evaluation metrics, hinting that this approach can effectively balance efficiency and accuracy in language processing tasks.

語言模型化 · 層 · MoDELS · 解碼 · 自助法/自舉法 ·

2024 年 12 月 17 日

MeTHanol: Modularized Thinking Language Models with Intermediate Layer Thinking, Decoding and Bootstrapping Reasoning

Ningyuan Xi,Xiaoyu Wang,Yetao Wu,Teng Chen,Qingqing Gu,Yue Zhao,Jinxian Qu,Zhonglin Jiang,Yong Chen,Luo Ji

from arxiv, 19 pages, 7 figures

Large Language Model can reasonably understand and generate human expressions but may lack of thorough thinking and reasoning mechanisms. Recently there have been several studies which enhance the thinking ability of language models but most of them are not data-driven or training-based. In this paper, we are motivated by the cognitive mechanism in the natural world, and design a novel model architecture called TaS which allows it to first consider the thoughts and then express the response based upon the query. We design several pipelines to annotate or generate the thought contents from prompt-response samples, then add language heads in a middle layer which behaves as the thinking layer. We train the language model by the thoughts-augmented data and successfully let the thinking layer automatically generate reasonable thoughts and finally output more reasonable responses. Both qualitative examples and quantitative results validate the effectiveness and performance of TaS. Our code is available at //anonymous.4open.science/r/TadE.

Extensibility · 多樣性 · HTTPS · MoDELS · 語言模型化 ·

2024 年 12 月 17 日

OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain

Shuting Wang,Jiejun Tan,Zhicheng Dou,Ji-Rong Wen

As a typical and practical application of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) techniques have gained extensive attention, particularly in vertical domains where LLMs may lack domain-specific knowledge. In this paper, we introduce an omnidirectional and automatic RAG benchmark, OmniEval, in the financial domain. Our benchmark is characterized by its multi-dimensional evaluation framework, including (1) a matrix-based RAG scenario evaluation system that categorizes queries into five task classes and 16 financial topics, leading to a structured assessment of diverse query scenarios; (2) a multi-dimensional evaluation data generation approach, which combines GPT-4-based automatic generation and human annotation, achieving an 87.47\% acceptance ratio in human evaluations on generated instances; (3) a multi-stage evaluation system that evaluates both retrieval and generation performance, result in a comprehensive evaluation on the RAG pipeline; and (4) robust evaluation metrics derived from rule-based and LLM-based ones, enhancing the reliability of assessments through manual annotations and supervised fine-tuning of an LLM evaluator. Our experiments demonstrate the comprehensiveness of OmniEval, which includes extensive test datasets and highlights the performance variations of RAG systems across diverse topics and tasks, revealing significant opportunities for RAG models to improve their capabilities in vertical domains. We open source the code of our benchmark in \href{//github.com/RUC-NLPIR/OmniEval}{//github.com/RUC-NLPIR/OmniEval}.

3D · Guidance · 變換 · 前向 · 約束 ·

2024 年 12 月 17 日

CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image

Wonseok Roh,Hwanhee Jung,Jong Wook Kim,Seunggwan Lee,Innfarn Yoo,Andreas Lugmayr,Seunggeun Chi,Karthik Ramani,Sangpil Kim

Recently, generalizable feed-forward methods based on 3D Gaussian Splatting have gained significant attention for their potential to reconstruct 3D scenes using finite resources. These approaches create a 3D radiance field, parameterized by per-pixel 3D Gaussian primitives, from just a few images in a single forward pass. However, unlike multi-view methods that benefit from cross-view correspondences, 3D scene reconstruction with a single-view image remains an underexplored area. In this work, we introduce CATSplat, a novel generalizable transformer-based framework designed to break through the inherent constraints in monocular settings. First, we propose leveraging textual guidance from a visual-language model to complement insufficient information from a single image. By incorporating scene-specific contextual details from text embeddings through cross-attention, we pave the way for context-aware 3D scene reconstruction beyond relying solely on visual cues. Moreover, we advocate utilizing spatial guidance from 3D point features toward comprehensive geometric understanding under single-view settings. With 3D priors, image features can capture rich structural insights for predicting 3D Gaussians without multi-view techniques. Extensive experiments on large-scale datasets demonstrate the state-of-the-art performance of CATSplat in single-view 3D scene reconstruction with high-quality novel view synthesis.

語言模型化 · MoDELS · 模型評估 · 大語言模型 · state-of-the-art ·

2024 年 12 月 17 日

Citekit: A Modular Toolkit for Large Language Model Citation Generation

Jiajun Shen,Tong Zhou,Yubo Chen,Kang Liu

from arxiv, 7 pages, 14 figures

Enabling Large Language Models (LLMs) to generate citations in Question-Answering (QA) tasks is an emerging paradigm aimed at enhancing the verifiability of their responses when LLMs are utilizing external references to generate an answer. However, there is currently no unified framework to standardize and fairly compare different citation generation methods, leading to difficulties in reproducing different methods and a comprehensive assessment. To cope with the problems above, we introduce \name, an open-source and modular toolkit designed to facilitate the implementation and evaluation of existing citation generation methods, while also fostering the development of new approaches to improve citation quality in LLM outputs. This tool is highly extensible, allowing users to utilize 4 main modules and 14 components to construct a pipeline, evaluating an existing method or innovative designs. Our experiments with two state-of-the-art LLMs and 11 citation generation baselines demonstrate varying strengths of different modules in answer accuracy and citation quality improvement, as well as the challenge of enhancing granularity. Based on our analysis of the effectiveness of components, we propose a new method, self-RAG \snippet, obtaining a balanced answer accuracy and citation quality. Citekit is released at //github.com/SjJ1017/Citekit.

目標檢測 · Pyramid · Networking · Performance · 成比例 ·

2024 年 12 月 13 日

HS-FPN: High Frequency and Spatial Perception FPN for Tiny Object Detection

Zican Shi,Jing Hu,Jie Ren,Hengkang Ye,Xuyang Yuan,Yan Ouyang,Jia He,Bo Ji,Junyu Guo

from arxiv, 13 pages,12 figures,7 tables

The introduction of Feature Pyramid Network (FPN) has significantly improved object detection performance. However, substantial challenges remain in detecting tiny objects, as their features occupy only a very small proportion of the feature maps. Although FPN integrates multi-scale features, it does not directly enhance or enrich the features of tiny objects. Furthermore, FPN lacks spatial perception ability. To address these issues, we propose a novel High Frequency and Spatial Perception Feature Pyramid Network (HS-FPN) with two innovative modules. First, we designed a high frequency perception module (HFP) that generates high frequency responses through high pass filters. These high frequency responses are used as mask weights from both spatial and channel perspectives to enrich and highlight the features of tiny objects in the original feature maps. Second, we developed a spatial dependency perception module (SDP) to capture the spatial dependencies that FPN lacks. Our experiments demonstrate that detectors based on HS-FPN exhibit competitive advantages over state-of-the-art models on the AI-TOD dataset for tiny object detection.

Amazon · INFORMS · 亞馬遜AWS · echo回聲（移動應用） · AIM ·

2024 年 12 月 13 日

The PET Paradox: How Amazon Instrumentalises PETs in Sidewalk to Entrench Its Infrastructural Power

Thijmen van Gend,Donald Jay Bertulfo,Seda Gürses

Recent applications of Privacy Enhancing Technologies (PETs) reveal a paradox. PETs aim to alleviate power asymmetries, but can actually entrench the infrastructural power of companies implementing them vis-\`a-vis other public and private organisations. We investigate whether and how this contradiction manifests with an empirical study of Amazon's cloud connectivity service called Sidewalk. In 2021, Amazon remotely updated Echo and Ring devices in consumers' homes, to transform them into Sidewalk "gateways". Compatible Internet of Things (IoT) devices, called "endpoints", can connect to an associated "Application Server" in Amazon Web Services (AWS) through these gateways. We find that Sidewalk is not just a connectivity service, but an extension of Amazon's cloud infrastructure as a software production environment for IoT manufacturers. PETs play a prominent role in this pursuit: we observe a two-faceted PET paradox. First, suppressing some information flows allows Amazon to promise narrow privacy guarantees to owners of Echo and Ring devices when "flipping" them into gateways. Once flipped, these gateways constitute a crowdsourced connectivity infrastructure that covers 90% of the US population and expands their AWS offerings. We show how novel information flows, enabled by Sidewalk connectivity, raise greater surveillance and competition concerns. Second, Amazon governs the implementation of these PETs, requiring manufacturers to adjust their device hardware, operating system and software; cloud use; factory lines; and organisational processes. Together, these changes turn manufacturers' endpoints into accessories of Amazon's computational infrastructure; further entrenching Amazon's infrastructural power. We argue that power analyses undergirding PET design should go beyond analysing information flows. We propose future steps for policy and tech research.

MoDELS · 語言模型化 · INTERACT · state-of-the-art · 優化器 ·

2024 年 12 月 12 日

LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models

Anoop Cherian,Radu Corcodel,Siddarth Jain,Diego Romeres

Physical reasoning is an important skill needed for robotic agents when operating in the real world. However, solving such reasoning problems often involves hypothesizing and reflecting over complex multi-body interactions under the effect of a multitude of physical forces and thus learning all such interactions poses a significant hurdle for state-of-the-art machine learning frameworks, including large language models (LLMs). To study this problem, we propose a new physical reasoning task and a dataset, dubbed TraySim. Our task involves predicting the dynamics of several objects on a tray that is given an external impact -- the domino effect of the ensued object interactions and their dynamics thus offering a challenging yet controlled setup, with the goal of reasoning being to infer the stability of the objects after the impact. To solve this complex physical reasoning task, we present LLMPhy, a zero-shot black-box optimization framework that leverages the physics knowledge and program synthesis abilities of LLMs, and synergizes these abilities with the world models built into modern physics engines. Specifically, LLMPhy uses an LLM to generate code to iteratively estimate the physical hyperparameters of the system (friction, damping, layout, etc.) via an implicit analysis-by-synthesis approach using a (non-differentiable) simulator in the loop and uses the inferred parameters to imagine the dynamics of the scene towards solving the reasoning task. To show the effectiveness of LLMPhy, we present experiments on our TraySim dataset to predict the steady-state poses of the objects. Our results show that the combination of the LLM and the physics engine leads to state-of-the-art zero-shot physical reasoning performance, while demonstrating superior convergence against standard black-box optimization methods and better estimation of the physical parameters.

語音識別 · Google Voice · 清華大學智能產業研究院 · CRAFT · Cortana ·

2018 年 1 月 24 日

CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition

Xuejing Yuan,Yuxuan Chen,Yue Zhao,Yunhui Long,Xiaokang Liu,Kai Chen,Shengzhi Zhang,Heqing Huang,Xiaofeng Wang,Carl A. Gunter

ASR (automatic speech recognition) systems like Siri, Alexa, Google Voice or Cortana has become quite popular recently. One of the key techniques enabling the practical use of such systems in people's daily life is deep learning. Though deep learning in computer vision is known to be vulnerable to adversarial perturbations, little is known whether such perturbations are still valid on the practical speech recognition. In this paper, we not only demonstrate such attacks can happen in reality, but also show that the attacks can be systematically conducted. To minimize users' attention, we choose to embed the voice commands into a song, called CommandSong. In this way, the song carrying the command can spread through radio, TV or even any media player installed in the portable devices like smartphones, potentially impacting millions of users in long distance. In particular, we overcome two major challenges: minimizing the revision of a song in the process of embedding commands, and letting the CommandSong spread through the air without losing the voice "command". Our evaluation demonstrates that we can craft random songs to "carry" any commands and the modify is extremely difficult to be noticed. Specially, the physical attack that we play the CommandSongs over the air and record them can success with 94 percentage.