斗破苍穹第四季25集免费观看,操下面视频在线观看免费欧美

The virtualization of Radio Access Networks (vRAN) is well on its way to become a reality, driven by its advantages such as flexibility and cost-effectiveness. However, virtualization comes at a high price - virtual Base Stations (vBSs) sharing the same computing platform incur a significant computing overhead due to in extremis consumption of shared cache memory resources. Consequently, vRAN suffers from increased energy consumption, which fuels the already high operational costs in 5G networks. This paper investigates cache memory allocation mechanisms' effectiveness in reducing total energy consumption. Using an experimental vRAN platform, we profile the energy consumption and CPU utilization of vBS as a function of the network state (e.g., traffic demand, modulation scheme). Then, we address the high dimensionality of the problem by decomposing it per vBS, which is possible thanks to the Last-Level Cache (LLC) isolation implemented in our system. Based on this, we train a vBS digital twin, which allows us to train offline a classifier, avoiding the performance degradation of the system during training. Our results show that our approach performs very closely to an offline optimal oracle, outperforming standard approaches used in today's deployments.

相關內容

cache

關注 0

多峰值 · MoDELS · Performer · 任務對話系統 · HTTPS ·

2024 年 6 月 14 日

AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models

Yuhang Wu,Wenmeng Yu,Yean Cheng,Yan Wang,Xiaohan Zhang,Jiazheng Xu,Ming Ding,Yuxiao Dong

Evaluating the alignment capabilities of large Vision-Language Models (VLMs) is essential for determining their effectiveness as helpful assistants. However, existing benchmarks primarily focus on basic abilities using nonverbal methods, such as yes-no and multiple-choice questions. In this paper, we address this gap by introducing AlignMMBench, a comprehensive alignment benchmark specifically designed for emerging Chinese VLMs. This benchmark is meticulously curated from real-world scenarios and Chinese Internet sources, encompassing thirteen specific tasks across three categories, and includes both single-turn and multi-turn dialogue scenarios. Incorporating a prompt rewrite strategy, AlignMMBench encompasses 1,054 images and 4,978 question-answer pairs. To facilitate the evaluation pipeline, we propose CritiqueVLM, a rule-calibrated evaluator that exceeds GPT-4's evaluation ability. Finally, we report the performance of representative VLMs on AlignMMBench, offering insights into the capabilities and limitations of different VLM architectures. All evaluation codes and data are available on //alignmmbench.github.io.

INTERACT · MoDELS · 情景 · 控制器 · 語言模型化 ·

2024 年 6 月 13 日

Mobile-Env: Building Qualified Evaluation Benchmarks for LLM-GUI Interaction

Danyang Zhang,Zhennan Shen,Rui Xie,Situo Zhang,Tianbao Xie,Zihan Zhao,Siyuan Chen,Lu Chen,Hongshen Xu,Ruisheng Cao,Kai Yu

The Graphical User Interface (GUI) is pivotal for human interaction with the digital world, enabling efficient device control and the completion of complex tasks. Recent progress in Large Language Models (LLMs) and Vision Language Models (VLMs) offers the chance to create advanced GUI agents. To ensure their effectiveness, there's a pressing need for qualified benchmarks that provide trustworthy and reproducible evaluations -- a challenge current benchmarks often fail to address. To tackle this issue, we introduce Mobile-Env, a comprehensive toolkit tailored for creating GUI benchmarks in the Android mobile environment. Mobile-Env offers an isolated and controllable setting for reliable evaluations, and accommodates intermediate instructions and rewards to reflect real-world usage more naturally. Utilizing Mobile-Env, we collect an open-world task set across various real-world apps and a fixed world set, WikiHow, which captures a significant amount of dynamic online contents for fully controllable and reproducible evaluation. We conduct comprehensive evaluations of LLM agents using these benchmarks. Our findings reveal that even advanced models (e.g., GPT-4V and LLaMA-3) struggle with tasks that are relatively simple for humans. This highlights a crucial gap in current models and underscores the importance of developing more capable foundation models and more effective GUI agent frameworks.

MoDELS · Weight · IR · CASES · Performer ·

2024 年 6 月 12 日

Diffusion Soup: Model Merging for Text-to-Image Diffusion Models

Benjamin Biggs,Arjun Seshadri,Yang Zou,Achin Jain,Aditya Golatkar,Yusheng Xie,Alessandro Achille,Ashwin Swaminathan,Stefano Soatto

We present Diffusion Soup, a compartmentalization method for Text-to-Image Generation that averages the weights of diffusion models trained on sharded data. By construction, our approach enables training-free continual learning and unlearning with no additional memory or inference costs, since models corresponding to data shards can be added or removed by re-averaging. We show that Diffusion Soup samples from a point in weight space that approximates the geometric mean of the distributions of constituent datasets, which offers anti-memorization guarantees and enables zero-shot style mixing. Empirically, Diffusion Soup outperforms a paragon model trained on the union of all data shards and achieves a 30% improvement in Image Reward (.34 $\to$ .44) on domain sharded data, and a 59% improvement in IR (.37 $\to$ .59) on aesthetic data. In both cases, souping also prevails in TIFA score (respectively, 85.5 $\to$ 86.5 and 85.6 $\to$ 86.8). We demonstrate robust unlearning -- removing any individual domain shard only lowers performance by 1% in IR (.45 $\to$ .44) -- and validate our theoretical insights on anti-memorization using real data. Finally, we showcase Diffusion Soup's ability to blend the distinct styles of models finetuned on different shards, resulting in the zero-shot generation of hybrid styles.

語言模型化 · MoDELS · 大語言模型 · Taxonomy · 可理解性 ·

2024 年 6 月 12 日

Leveraging Large Language Models for NLG Evaluation: Advances and Challenges

Zhen Li,Xiaohan Xu,Tao Shen,Can Xu,Jia-Chen Gu,Yuxuan Lai,Chongyang Tao,Shuai Ma

from arxiv, 21 pages, 5 figures

In the rapidly evolving domain of Natural Language Generation (NLG) evaluation, introducing Large Language Models (LLMs) has opened new avenues for assessing generated content quality, e.g., coherence, creativity, and context relevance. This paper aims to provide a thorough overview of leveraging LLMs for NLG evaluation, a burgeoning area that lacks a systematic analysis. We propose a coherent taxonomy for organizing existing LLM-based evaluation metrics, offering a structured framework to understand and compare these methods. Our detailed exploration includes critically assessing various LLM-based methodologies, as well as comparing their strengths and limitations in evaluating NLG outputs. By discussing unresolved challenges, including bias, robustness, domain-specificity, and unified evaluation, this paper seeks to offer insights to researchers and advocate for fairer and more advanced NLG evaluation techniques.

條件獨立的 · 相互獨立的 · INFORMS · 相關系數 · 協方差矩陣 ·

2024 年 6 月 11 日

Methods for Recovering Conditional Independence Graphs: A Survey

Harsh Shrivastava,Urszula Chajewska

Conditional Independence (CI) graphs are a type of probabilistic graphical models that are primarily used to gain insights about feature relationships. Each edge represents the partial correlation between the connected features which gives information about their direct dependence. In this survey, we list out different methods and study the advances in techniques developed to recover CI graphs. We cover traditional optimization methods as well as recently developed deep learning architectures along with their recommended implementations. To facilitate wider adoption, we include preliminaries that consolidate associated operations, for example techniques to obtain covariance matrix for mixed datatypes.

多峰值 · 語言模型化 · MoDELS · 值域 · 大語言模型 ·

2024 年 6 月 11 日

EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning

Yi Chen,Yuying Ge,Yixiao Ge,Mingyu Ding,Bohao Li,Rui Wang,Ruifeng Xu,Ying Shan,Xihui Liu

from arxiv, Project released at: //github.com/ChenYi99/EgoPlan

The pursuit of artificial general intelligence (AGI) has been accelerated by Multimodal Large Language Models (MLLMs), which exhibit superior reasoning, generalization capabilities, and proficiency in processing multimodal inputs. A crucial milestone in the evolution of AGI is the attainment of human-level planning, a fundamental ability for making informed decisions in complex environments, and solving a wide range of real-world problems. Despite the impressive advancements in MLLMs, a question remains: How far are current MLLMs from achieving human-level planning? To shed light on this question, we introduce EgoPlan-Bench, a comprehensive benchmark to evaluate the planning abilities of MLLMs in real-world scenarios from an egocentric perspective, mirroring human perception. EgoPlan-Bench emphasizes the evaluation of planning capabilities of MLLMs, featuring realistic tasks, diverse action plans, and intricate visual observations. Our rigorous evaluation of a wide range of MLLMs reveals that EgoPlan-Bench poses significant challenges, highlighting a substantial scope for improvement in MLLMs to achieve human-level task planning. To facilitate this advancement, we further present EgoPlan-IT, a specialized instruction-tuning dataset that effectively enhances model performance on EgoPlan-Bench. We have made all codes, data, and a maintained benchmark leaderboard available to advance future research.

鏈路預測 · Networking · 圖形處理器 · MoDELS · Neural Networks ·

2024 年 6 月 7 日

GENIE: Watermarking Graph Neural Networks for Link Prediction

Venkata Sai Pranav Bachina,Ankit Gangwal,Aaryan Ajay Sharma,Charu Sharma

from arxiv, 20 pages, 12 figures

Graph Neural Networks (GNNs) have advanced the field of machine learning by utilizing graph-structured data, which is ubiquitous in the real world. GNNs have applications in various fields, ranging from social network analysis to drug discovery. GNN training is strenuous, requiring significant computational resources and human expertise. It makes a trained GNN an indispensable Intellectual Property (IP) for its owner. Recent studies have shown GNNs to be vulnerable to model-stealing attacks, which raises concerns over IP rights protection. Watermarking has been shown to be effective at protecting the IP of a GNN model. Existing efforts to develop a watermarking scheme for GNNs have only focused on the node classification and the graph classification tasks. To the best of our knowledge, we introduce the first-ever watermarking scheme for GNNs tailored to the Link Prediction (LP) task. We call our proposed watermarking scheme GENIE (watermarking Graph nEural Networks for lInk prEdiction). We design GENIE using a novel backdoor attack to create a trigger set for two key methods of LP: (1) node representation-based and (2) subgraph-based. In GENIE, the watermark is embedded into the GNN model by training it on both the trigger set and a modified training set, resulting in a watermarked GNN model. To assess a suspect model, we verify the watermark against the trigger set. We extensively evaluate GENIE across 3 model architectures (i.e., SEAL, GCN, and GraphSAGE) and 7 real-world datasets. Furthermore, we validate the robustness of GENIE against 11 state-of-the-art watermark removal techniques and 3 model extraction attacks. We also demonstrate that GENIE is robust against ownership piracy attack. Our ownership demonstration scheme statistically guarantees both False Positive Rate (FPR) and False Negative Rate (FNR) to be less than $10^{-6}$.

可理解性 · Performer · 評論員 · Extensibility · 分解的 ·

2024 年 6 月 6 日

MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding

Junjie Zhou,Yan Shu,Bo Zhao,Boya Wu,Shitao Xiao,Xi Yang,Yongping Xiong,Bo Zhang,Tiejun Huang,Zheng Liu

The evaluation of Long Video Understanding (LVU) performance poses an important but challenging research problem. Despite previous efforts, the existing video understanding benchmarks are severely constrained by several issues, especially the insufficient lengths of videos, a lack of diversity in video types and evaluation tasks, and the inappropriateness for evaluating LVU performances. To address the above problems, we propose a new benchmark, called MLVU (Multi-task Long Video Understanding Benchmark), for the comprehensive and in-depth evaluation of LVU. MLVU presents the following critical values: 1) The substantial and flexible extension of video lengths, which enables the benchmark to evaluate LVU performance across a wide range of durations. 2) The inclusion of various video genres, e.g., movies, surveillance footage, egocentric videos, cartoons, game videos, etc., which reflects the models' LVU performances in different scenarios. 3) The development of diversified evaluation tasks, which enables a comprehensive examination of MLLMs' key abilities in long-video understanding. The empirical study with 20 latest MLLMs reveals significant room for improvement in today's technique, as all existing methods struggle with most of the evaluation tasks and exhibit severe performance degradation when handling longer videos. Additionally, it suggests that factors such as context length, image-understanding quality, and the choice of LLM backbone can play critical roles in future advancements. We anticipate that MLVU will advance the research of long video understanding by providing a comprehensive and in-depth analysis of MLLMs.

Learning · 強化學習 · Automator · state-of-the-art · 優化器 ·

2024 年 6 月 6 日

GOOSE: Goal-Conditioned Reinforcement Learning for Safety-Critical Scenario Generation

Joshua Ransiek,Johannes Plaum,Jacob Langner,Eric Sax

Scenario-based testing is considered state-of-the-art for verifying and validating Advanced Driver Assistance Systems (ADASs) and Automated Driving Systems (ADSs). However, the practical application of scenario-based testing requires an efficient method to generate or collect the scenarios that are needed for the safety assessment. In this paper, we propose Goal-conditioned Scenario Generation (GOOSE), a goal-conditioned reinforcement learning (RL) approach that automatically generates safety-critical scenarios to challenge ADASs or ADSs. In order to simultaneously set up and optimize scenarios, we propose to control vehicle trajectories at the scenario level. Each step in the RL framework corresponds to a scenario simulation. We use Non-Uniform Rational B-Splines (NURBS) for trajectory modeling. To guide the goal-conditioned agent, we formulate test-specific, constraint-based goals inspired by the OpenScenario Domain Specific Language(DSL). Through experiments conducted on multiple pre-crash scenarios derived from UN Regulation No. 157 for Active Lane Keeping Systems (ALKS), we demonstrate the effectiveness of GOOSE in generating scenarios that lead to safety-critical events.

可理解性 · 可辨認的 · TOOLS · state-of-the-art · HTTPS ·

2024 年 6 月 5 日

Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

Bowen Jiang,Yangxinyu Xie,Xiaomeng Wang,Weijie J. Su,Camillo J. Taylor,Tanwi Mallick

Rationality is the quality of being guided by reason, characterized by logical thinking and decision-making that align with evidence and logical rules. This quality is essential for effective problem-solving, as it ensures that solutions are well-founded and systematically derived. Despite the advancements of large language models (LLMs) in generating human-like text with remarkable accuracy, they present biases inherited from the training data, inconsistency across different contexts, and difficulty understanding complex scenarios involving multiple layers of context. Therefore, recent research attempts to leverage the strength of multiple agents working collaboratively with various types of data and tools for enhanced consistency and reliability. To that end, this paper aims to understand whether multi-modal and multi-agent systems are advancing toward rationality by surveying the state-of-the-art works, identifying advancements over single-agent and single-modal systems in terms of rationality, and discussing open problems and future directions. We maintain an open repository at //github.com/bowen-upenn/MMMA_Rationality.