精品亚洲中文一区二区三区,日本成年黄色一区二区三区,两个黑人挺进校花体内NP

Programming often involves converting detailed and complex specifications into code, a process during which developers typically utilize visual aids to more effectively convey concepts. While recent developments in Large Multimodal Models have demonstrated remarkable abilities in visual reasoning and mathematical tasks, there is little work on investigating whether these models can effectively interpret visual elements for code generation. To this end, we present MMCode, the first multi-modal coding dataset for evaluating algorithmic problem-solving skills in visually rich contexts. MMCode contains 3,548 questions and 6,620 images collected from real-world programming challenges harvested from 10 code competition websites, presenting significant challenges due to the extreme demand for reasoning abilities. Our experiment results show that current state-of-the-art models struggle to solve these problems. The results highlight the lack of powerful vision-code models, and we hope MMCode can serve as an inspiration for future works in this domain. The data and code are publicly available at //github.com/happylkx/MMCode.

相關內容

MoDELS

關注 0

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · Performer · 全 · 有偏 · 知識 (knowledge) ·

2024 年 5 月 24 日

Erase to Enhance: Data-Efficient Machine Unlearning in MRI Reconstruction

Yuyang Xue,Jingshuai Liu,Steven McDonagh,Sotirios A. Tsaftaris

from arxiv, The paper is accpeted by MIDL 2024

Machine unlearning is a promising paradigm for removing unwanted data samples from a trained model, towards ensuring compliance with privacy regulations and limiting harmful biases. Although unlearning has been shown in, e.g., classification and recommendation systems, its potential in medical image-to-image translation, specifically in image recon-struction, has not been thoroughly investigated. This paper shows that machine unlearning is possible in MRI tasks and has the potential to benefit for bias removal. We set up a protocol to study how much shared knowledge exists between datasets of different organs, allowing us to effectively quantify the effect of unlearning. Our study reveals that combining training data can lead to hallucinations and reduced image quality in the reconstructed data. We use unlearning to remove hallucinations as a proxy exemplar of undesired data removal. Indeed, we show that machine unlearning is possible without full retraining. Furthermore, our observations indicate that maintaining high performance is feasible even when using only a subset of retain data. We have made our code publicly accessible.

大語言模型 · 知識 (knowledge) · INFORMS · 推薦系統 · 語言模型化 ·

2024 年 5 月 24 日

TRAWL: External Knowledge-Enhanced Recommendation with LLM Assistance

Weiqing Luo,Chonggang Song,Lingling Yi,Gong Cheng

from arxiv, 8 pages, 3 figures

Combining semantic information with behavioral data is a crucial research area in recommender systems. A promising approach involves leveraging external knowledge to enrich behavioral-based recommender systems with abundant semantic information. However, this approach faces two primary challenges: denoising raw external knowledge and adapting semantic representations. To address these challenges, we propose an External Knowledge-Enhanced Recommendation method with LLM Assistance (TRAWL). This method utilizes large language models (LLMs) to extract relevant recommendation knowledge from raw external data and employs a contrastive learning strategy for adapter training. Experiments on public datasets and real-world online recommender systems validate the effectiveness of our approach.

推斷 · MoDELS · INFORMS · 層 · 近似 ·

2024 年 5 月 24 日

RAEE: A Training-Free Retrieval-Augmented Early Exiting Framework for Efficient Inference

Lianming Huang,Shangyu Wu,Yufei Cui,Ying Xiong,Xue Liu,Tei-Wei Kuo,Nan Guan,Chun Jason Xue

Deploying large language model inference remains challenging due to their high computational overhead. Early exiting accelerates model inference by adaptively reducing the number of inference layers. Existing methods require training internal classifiers to determine whether to exit at each intermediate layer. However, such classifier-based early exiting frameworks require significant effort to design and train the classifiers. To address these limitations, this paper proposes RAEE, a training-free Retrieval-Augmented Early Exiting framework for efficient inference. First, this paper demonstrates that the early exiting problem can be modeled as a distribution prediction problem, where the distribution is approximated using similar data's existing information. Next, the paper details the process of collecting existing information to build the retrieval database. Finally, based on the pre-built retrieval database, RAEE leverages the retrieved similar data's exiting information to guide the backbone model to exit at the layer, which is predicted by the approximated distribution. Experimental results demonstrate that the proposed RAEE can significantly accelerate inference. RAEE also achieves state-of-the-art zero-shot performance on 8 classification tasks.

Tensor · 優化器 · state-of-the-art · 原點 · 稀疏 ·

2024 年 5 月 23 日

cuFastTuckerPlus: A Stochastic Parallel Sparse FastTucker Decomposition Using GPU Tensor Cores

Zixuan Li,Mingxing Duan,Huizhang Luo,Wangdong Yang,Kenli Li,Keqin Li

Sparse tensors are prevalent in real-world applications, often characterized by their large-scale, high-order, and high-dimensional nature. Directly handling raw tensors is impractical due to the significant memory and computational overhead involved. The current mainstream approach involves compressing or decomposing the original tensor. One popular tensor decomposition algorithm is the Tucker decomposition. However, existing state-of-the-art algorithms for large-scale Tucker decomposition typically relax the original optimization problem into multiple convex optimization problems to ensure polynomial convergence. Unfortunately, these algorithms tend to converge slowly. In contrast, tensor decomposition exhibits a simple optimization landscape, making local search algorithms capable of converging to a global (approximate) optimum much faster. In this paper, we propose the FastTuckerPlus algorithm, which decomposes the original optimization problem into two non-convex optimization problems and solves them alternately using the Stochastic Gradient Descent method. Furthermore, we introduce cuFastTuckerPlus, a fine-grained parallel algorithm designed for GPU platforms, leveraging the performance of tensor cores. This algorithm minimizes memory access overhead and computational costs, surpassing the state-of-the-art algorithms. Our experimental results demonstrate that our method achieves a speedup of $3X$ to $5X$ compared to state-of-the-art algorithms.

3D · Prompt · MoDELS · HTTPS · 控制器 ·

2024 年 5 月 23 日

IPDreamer: Appearance-Controllable 3D Object Generation with Complex Image Prompts

Bohan Zeng,Shanglin Li,Yutang Feng,Ling Yang,Hong Li,Sicheng Gao,Jiaming Liu,Conghui He,Wentao Zhang,Jianzhuang Liu,Baochang Zhang,Shuicheng Yan

from arxiv, 22 pages, 16 figures

Recent advances in 3D generation have been remarkable, with methods such as DreamFusion leveraging large-scale text-to-image diffusion-based models to supervise 3D object generation. These methods enable the synthesis of detailed and photorealistic textured objects. However, the appearance of 3D objects produced by these text-to-3D models is unpredictable, and it is hard for the single-image-to-3D methods to deal with complex images, thus posing a challenge in generating appearance-controllable 3D objects. To achieve controllable complex 3D object synthesis, we propose IPDreamer, a novel approach that incorporates image prompt adaption to extract detailed and comprehensive appearance features from complex images, which are then utilized for 3D object generation. Our results demonstrate that IPDreamer effectively generates high-quality 3D objects that are consistent with both the provided text and the appearance of complex image prompts, demonstrating its promising capability in appearance-controllable 3D object generation. Our code is available at //github.com/zengbohan0217/IPDreamer.

Agent · INTERACT · 多峰值 · AI Agent · Extensibility ·

2024 年 5 月 23 日

MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs

Xianhao Yu,Jiaqi Fu,Renjia Deng,Wenjuan Han

from arxiv, Project website: //github.com/cocacola-lab/MineLand

While Vision-Language Models (VLMs) hold promise for tasks requiring extensive collaboration, traditional multi-agent simulators have facilitated rich explorations of an interactive artificial society that reflects collective behavior. However, these existing simulators face significant limitations. Firstly, they struggle with handling large numbers of agents due to high resource demands. Secondly, they often assume agents possess perfect information and limitless capabilities, hindering the ecological validity of simulated social interactions. To bridge this gap, we propose a multi-agent Minecraft simulator, MineLand, that bridges this gap by introducing three key features: large-scale scalability, limited multimodal senses, and physical needs. Our simulator supports 64 or more agents. Agents have limited visual, auditory, and environmental awareness, forcing them to actively communicate and collaborate to fulfill physical needs like food and resources. Additionally, we further introduce an AI agent framework, Alex, inspired by multitasking theory, enabling agents to handle intricate coordination and scheduling. Our experiments demonstrate that the simulator, the corresponding benchmark, and the AI agent framework contribute to more ecological and nuanced collective behavior.The source code of MineLand and Alex is openly available at //github.com/cocacola-lab/MineLand.

3D · 穩健性 · Performer · CARS · 稀疏化 ·

2024 年 5 月 22 日

RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar

Fangqiang Ding,Xiangyu Wen,Yunzhou Zhu,Yiming Li,Chris Xiaoxuan Lu

from arxiv, 16 pages, 3 figures

3D occupancy-based perception pipeline has significantly advanced autonomous driving by capturing detailed scene descriptions and demonstrating strong generalizability across various object categories and shapes. Current methods predominantly rely on LiDAR or camera inputs for 3D occupancy prediction. These methods are susceptible to adverse weather conditions, limiting the all-weather deployment of self-driving cars. To improve perception robustness, we leverage the recent advances in automotive radars and introduce a novel approach that utilizes 4D imaging radar sensors for 3D occupancy prediction. Our method, RadarOcc, circumvents the limitations of sparse radar point clouds by directly processing the 4D radar tensor, thus preserving essential scene details. RadarOcc innovatively addresses the challenges associated with the voluminous and noisy 4D radar data by employing Doppler bins descriptors, sidelobe-aware spatial sparsification, and range-wise self-attention mechanisms. To minimize the interpolation errors associated with direct coordinate transformations, we also devise a spherical-based feature encoding followed by spherical-to-Cartesian feature aggregation. We benchmark various baseline methods based on distinct modalities on the public K-Radar dataset. The results demonstrate RadarOcc's state-of-the-art performance in radar-based 3D occupancy prediction and promising results even when compared with LiDAR- or camera-based methods. Additionally, we present qualitative evidence of the superior performance of 4D radar in adverse weather conditions and explore the impact of key pipeline components through ablation studies.

INFORMS · MoDELS · 示例 · Extensibility · Performer ·

2024 年 5 月 22 日

DEGAP: Dual Event-Guided Adaptive Prefixes for Templated-Based Event Argument Extraction Model with Slot Querying

Guanghui Wang,Dexi Liu,Qizhi Wan,Xiping Liu,Wanlong Liu

Recent advancements in event argument extraction (EAE) involve incorporating beneficial auxiliary information into models during training and inference, such as retrieved instances and event templates. Additionally, some studies introduce learnable prefix vectors to models. These methods face three challenges: (1) insufficient utilization of relevant event instances due to deficiencies in retrieval; (2) neglect of important information provided by relevant event templates; (3) the advantages of prefixes are constrained due to their inability to meet the specific informational needs of EAE. In this work, we propose DEGAP, which addresses the above challenges through two simple yet effective components: (1) dual prefixes, where the instance-oriented prefix and template-oriented prefix are trained to learn information from different event instances and templates, respectively, and then provide relevant information as cues to EAE model without retrieval; (2) event-guided adaptive gating mechanism, which guides the prefixes based on the target event to fully leverage their advantages. Extensive experiments demonstrate that our method achieves new state-of-the-art performance on four datasets (ACE05, RAMS, WIKIEVENTS, and MLEE). Further analysis verifies the importance of the proposed design and the effectiveness of the main components.

Extensibility · 代碼 · JavaScript · ForCES · Engineering ·

2024 年 5 月 21 日

FV8: A Forced Execution JavaScript Engine for Detecting Evasive Techniques

Nikolaos Pantelaios,Alexandros Kapravelos

from arxiv, Usenix Security Symposium 2024 -- DOI To Be Announced soon

Evasion techniques allow malicious code to never be observed. This impacts significantly the detection capabilities of tools that rely on either dynamic or static analysis, as they never get to process the malicious code. The dynamic nature of JavaScript, where code is often injected dynamically, makes evasions particularly effective. Yet, we lack tools that can detect evasive techniques in a challenging environment such as JavaScript. In this paper, we present FV8, a modified V8 JavaScript engine designed to identify evasion techniques in JavaScript code. FV8 selectively enforces code execution on APIs that conditionally inject dynamic code, thus enhancing code coverage and consequently improving visibility into malicious code. We integrate our tool in both the Node.js engine and the Chromium browser, compelling code execution in npm packages and Chrome browser extensions. Our tool increases code coverage by 11% compared to default V8 and detects 28 unique evasion categories, including five previously unreported techniques. In data confirmed as malicious from both ecosystems, our tool identifies 1,443 (14.6%) npm packages and 164 (82%) extensions containing at least one type of evasion. In previously unexamined extensions (39,592), our tool discovered 16,471 injected third-party scripts, and a total of 8,732,120 lines of code executed due to our forced execution instrumentation. Furthermore, it tagged a total of 423 extensions as both evasive and malicious and we manually verify 110 extensions (26%) to actually be malicious, impacting two million users. Our tool is open-source and serves both as an in-browser and standalone dynamic analysis tool, capable of detecting evasive code, bypassing obfuscation in certain cases, offering improved access to malicious code, and supporting recursive analysis of dynamic code injections

Dataflow · Performer · MoDELS · FPGA · 優化器 ·

2024 年 5 月 21 日

FEATHER: A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow Switching

Jianming Tong,Anirudh Itagi,Prasanth Chatarasi,Tushar Krishna

from arxiv, 17 pages, 14 figures. International Symposium on Computer Architecture (ISCA), Jun 2024

The inference of ML models composed of diverse structures, types, and sizes boils down to the execution of different dataflows (i.e. different tiling, ordering, parallelism, and shapes). Using the optimal dataflow for every layer of workload can reduce latency by up to two orders of magnitude over a suboptimal dataflow. Unfortunately, reconfiguring hardware for different dataflows involves on-chip data layout reordering and datapath reconfigurations, leading to non-trivial overhead that hinders ML accelerators from exploiting different dataflows, resulting in suboptimal performance. To address this challenge, we propose FEATHER, an innovative accelerator that leverages a novel spatial array termed Nest and a novel multi-stage reduction network called BIRRD for performing flexible data reduction with layout reordering under the hood, enabling seamless switching between optimal dataflows with negligible latency and resources overhead. For systematically evaluating the performance interaction between dataflows and layouts, we enhance Timeloop, a state-of-the-art dataflow cost modeling and search framework, with layout assessment capabilities, and term it as Layoutloop. We model FEATHER into Layoutloop and also deploy FEATHER end-to-end on the edge ZCU104 FPGA. FEATHER delivers 1.27~2.89x inference latency speedup and 1.3~6.43x energy efficiency improvement compared to various SoTAs like NVDLA, SIGMA and Eyeriss under ResNet-50 and MobiletNet-V3 in Layoutloop. On practical FPGA devices, FEATHER achieves 2.65/3.91x higher throughput than Xilinx DPU/Gemmini. Remarkably, such performance and energy efficiency enhancements come at only 6% area over a fixed-dataflow Eyeriss-like accelerator. Our code is released at //github.com/maeri-project/FEATHER.