爱琴海论坛视频播放三免费,国产日韩VO免费一区二区,亚洲AV无码电影一区二区三区,亚洲婷婷丁香五月综合图

We propose a novel end-to-end pipeline for online long-range vectorized high-definition (HD) map construction using on-board camera sensors. The vectorized representation of HD maps, employing polylines and polygons to represent map elements, is widely used by downstream tasks. However, previous schemes designed with reference to dynamic object detection overlook the structural constraints within linear map elements, resulting in performance degradation in long-range scenarios. In this paper, we exploit the properties of map elements to improve the performance of map construction. We extract more accurate bird's eye view (BEV) features guided by their linear structure, and then propose a hierarchical sparse map representation to further leverage the scalability of vectorized map elements and design a progressive decoding mechanism and a supervision strategy based on this representation. Our approach, ScalableMap, demonstrates superior performance on the nuScenes dataset, especially in long-range scenarios, surpassing previous state-of-the-art model by 6.5 mAP while achieving 18.3 FPS. Code is available at //github.com/jingy1yu/ScalableMap.

相關內容

向量化

關注 1

Extensibility · HTTPS · 回合 · 樣例 · 多樣性 ·

2024 年 2 月 22 日

CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation

Jun Wang,Yuzhe Qin,Kaiming Kuang,Yigit Korkmaz,Akhilan Gurumoorthy,Hao Su,Xiaolong Wang

We introduce CyberDemo, a novel approach to robotic imitation learning that leverages simulated human demonstrations for real-world tasks. By incorporating extensive data augmentation in a simulated environment, CyberDemo outperforms traditional in-domain real-world demonstrations when transferred to the real world, handling diverse physical and visual conditions. Regardless of its affordability and convenience in data collection, CyberDemo outperforms baseline methods in terms of success rates across various tasks and exhibits generalizability with previously unseen objects. For example, it can rotate novel tetra-valve and penta-valve, despite human demonstrations only involving tri-valves. Our research demonstrates the significant potential of simulated human demonstrations for real-world dexterous manipulation tasks. More details can be found at //cyber-demo.github.io

Performer · Microsoft Surface · 基 · 網絡結構 ·

2024 年 2 月 22 日

Eavesdropping with Intelligent Reflective Surfaces: Near-Optimal Configuration Cycling

Francesco Malandrino,Alessandro Nordio,Carla Fabiana Chiasserini

from arxiv, arXiv admin note: text overlap with arXiv:2108.00149

Intelligent reflecting surfaces (IRSs) have several prominent advantages, including improving the level of wireless communication security and privacy. In this work, we focus on the latter aspect and introduce a strategy to counteract the presence of passive eavesdroppers overhearing transmissions from a base station towards legitimate users that are facilitated by the presence of IRSs. Specifically, we envision a transmission scheme that cycles across a number of IRS-to-user assignments, and we select them in a near-optimal fashion, thus guaranteeing both a high data rate and a good secrecy rate. Unlike most of the existing works addressing passive eavesdropping, the strategy we envision has low complexity and is suitable for scenarios where nodes are equipped with a limited number of antennas. Through our performance evaluation, we highlight the trade-off between the legitimate users' data rate and secrecy rate, and how the system parameters affect such a trade-off.

秩 · 優化器 · 級聯 · MoDELS · 全 ·

2024 年 2 月 21 日

Adaptive Neural Ranking Framework: Toward Maximized Business Goal for Cascade Ranking Systems

Yunli Wang,Zhiqiang Wang,Jian Yang,Shiyang Wen,Dongying Kong,Han Li,Kun Gai

from arxiv, 12 pages, Accepted by www2024

Cascade ranking is widely used for large-scale top-k selection problems in online advertising and recommendation systems, and learning-to-rank is an important way to optimize the models in cascade ranking. Previous works on learning-to-rank usually focus on letting the model learn the complete order or top-k order, and adopt the corresponding rank metrics (e.g. OPA and NDCG@k) as optimization targets. However, these targets can not adapt to various cascade ranking scenarios with varying data complexities and model capabilities; and the existing metric-driven methods such as the Lambda framework can only optimize a rough upper bound of limited metrics, potentially resulting in sub-optimal and performance misalignment. To address these issues, we propose a novel perspective on optimizing cascade ranking systems by highlighting the adaptability of optimization targets to data complexities and model capabilities. Concretely, we employ multi-task learning to adaptively combine the optimization of relaxed and full targets, which refers to metrics Recall@m@k and OPA respectively. We also introduce permutation matrix to represent the rank metrics and employ differentiable sorting techniques to relax hard permutation matrix with controllable approximate error bound. This enables us to optimize both the relaxed and full targets directly and more appropriately. We named this method as Adaptive Neural Ranking Framework (abbreviated as ARF). Furthermore, we give a specific practice under ARF. We use the NeuralSort to obtain the relaxed permutation matrix and draw on the variant of the uncertainty weight method in multi-task learning to optimize the proposed losses jointly. Experiments on a total of 4 public and industrial benchmarks show the effectiveness and generalization of our method, and online experiment shows that our method has significant application value.

MoDELS · 多峰值 · 語言模型化 · 大語言模型 · Performer ·

2024 年 2 月 21 日

CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models

Fuwen Luo,Chi Chen,Zihao Wan,Zhaolu Kang,Qidong Yan,Yingjie Li,Xiaolong Wang,Siyu Wang,Ziyue Wang,Xiaoyue Mi,Peng Li,Ning Ma,Maosong Sun,Yang Liu

Multimodal large language models (MLLMs) have demonstrated promising results in a variety of tasks that combine vision and language. As these models become more integral to research and applications, conducting comprehensive evaluations of their capabilities has grown increasingly important. However, most existing benchmarks fail to consider that, in certain situations, images need to be interpreted within a broader context. In this work, we introduce a new benchmark, named as CODIS, designed to assess the ability of models to use context provided in free-form text to enhance visual comprehension. Our findings indicate that MLLMs consistently fall short of human performance on this benchmark. Further analysis confirms that these models struggle to effectively extract and utilize contextual information to improve their understanding of images. This underscores the pressing need to enhance the ability of MLLMs to comprehend visuals in a context-dependent manner. View our project website at //thunlp-mt.github.io/CODIS.

語言模型化 · MoDELS · 大語言模型 · Automator · Performer ·

2024 年 2 月 21 日

OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models

Shuai Wang,Liang Ding,Li Shen,Yong Luo,Bo Du,Dacheng Tao

from arxiv, 20 pages, 15 figures

Advancing automated programming necessitates robust and comprehensive code generation benchmarks, yet current evaluation frameworks largely neglect object-oriented programming (OOP) in favor of functional programming (FP), e.g., HumanEval and MBPP. To address this, our study introduces a pioneering OOP-focused benchmark, featuring 431 Python programs that encompass essential OOP concepts and features like classes and encapsulation methods. We propose a novel evaluation metric, pass@o, tailored for OOP, enhancing traditional pass@k measures. Our evaluation of 23 leading large language models (LLMs), including both general and code-specialized models, reveals three key insights: 1) pass@o offers a more relevant and comprehensive assessment for OOP code generation; 2) Despite excelling in FP, code-specialized LLMs like WizardCoder lag in OOP compared to models like ChatGPT; 3) The poor performance of all advanced LLMs on our OOP benchmark highlights a critical need for improvements in this field. Our benchmark and scripts are publicly released at: //github.com/alphadl/OOP-eval.

統計量 · 泛函 · 規范化的 · Processing（編程語言） · Learning ·

2024 年 2 月 20 日

The Normalized Cross Density Functional: A Framework to Quantify Statistical Dependence for Random Processes

Bo Hu,Jose C. Principe

This paper presents a novel approach to measuring statistical dependence between two random processes (r.p.) using a positive-definite function called the Normalized Cross Density (NCD). NCD is derived directly from the probability density functions of two r.p. and constructs a data-dependent Hilbert space, the Normalized Cross-Density Hilbert Space (NCD-HS). By Mercer's Theorem, the NCD norm can be decomposed into its eigenspectrum, which we name the Multivariate Statistical Dependence (MSD) measure, and their sum, the Total Dependence Measure (TSD). Hence, the NCD-HS eigenfunctions serve as a novel embedded feature space, suitable for quantifying r.p. statistical dependence. In order to apply NCD directly to r.p. realizations, we introduce an architecture with two multiple-output neural networks, a cost function, and an algorithm named the Functional Maximal Correlation Algorithm (FMCA). With FMCA, the two networks learn concurrently by approximating each other's outputs, extending the Alternating Conditional Expectation (ACE) for multivariate functions. We mathematically prove that FMCA learns the dominant eigenvalues and eigenfunctions of NCD directly from realizations. Preliminary results with synthetic data and medium-sized image datasets corroborate the theory. Different strategies for applying NCD are proposed and discussed, demonstrating the method's versatility and stability beyond supervised learning. Specifically, when the two r.p. are high-dimensional real-world images and a white uniform noise process, FMCA learns factorial codes, i.e., the occurrence of a code guarantees that a specific training set image was present, which is important for feature learning.

相似度 · Learning · MoDELS · 近似 · 可約的 ·

2024 年 2 月 20 日

Xling: A Learned Filter Framework for Accelerating High-Dimensional Approximate Similarity Join

Yifan Wang,Vyom Pathak,Daisy Zhe Wang

Similarity join finds all pairs of close points within a given distance threshold. Many similarity join methods have been proposed, but they are usually not efficient on high-dimensional space due to the curse of dimensionality and data-unawareness. We investigate the possibility of using metric space Bloom filter (MSBF), a family of data structures checking if a query point has neighbors in a multi-dimensional space, to speed up similarity join. However, there are several challenges when applying MSBF to similarity join, including excessive information loss, data-unawareness and hard constraint on the distance metric. In this paper, we propose Xling, a generic framework to build a learning-based metric space filter with any existing regression model, aiming at accurately predicting whether a query point has enough number of neighbors. The framework provides a suite of optimization strategies to further improve the prediction quality based on the learning model, which has demonstrated significantly higher prediction quality than existing MSBF. We also propose XJoin, one of the first filter-based similarity join methods, based on Xling. By predicting and skipping those queries without enough neighbors, XJoin can effectively reduce unnecessary neighbor searching and therefore it achieves a remarkable acceleration. Benefiting from the generalization capability of deep learning models, XJoin can be easily transferred onto new dataset (in similar distribution) without re-training. Furthermore, Xling is not limited to being applied in XJoin, instead, it acts as a flexible plugin that can be inserted to any loop-based similarity join methods for a speedup.

視覺問答 · 自動問答 · CLUES · Extensibility · 3D ·

2024 年 2 月 20 日

NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario

Tianwen Qian,Jingjing Chen,Linhai Zhuo,Yang Jiao,Yu-Gang Jiang

from arxiv, Accepted to AAAI 2024

We introduce a novel visual question answering (VQA) task in the context of autonomous driving, aiming to answer natural language questions based on street-view clues. Compared to traditional VQA tasks, VQA in autonomous driving scenario presents more challenges. Firstly, the raw visual data are multi-modal, including images and point clouds captured by camera and LiDAR, respectively. Secondly, the data are multi-frame due to the continuous, real-time acquisition. Thirdly, the outdoor scenes exhibit both moving foreground and static background. Existing VQA benchmarks fail to adequately address these complexities. To bridge this gap, we propose NuScenes-QA, the first benchmark for VQA in the autonomous driving scenario, encompassing 34K visual scenes and 460K question-answer pairs. Specifically, we leverage existing 3D detection annotations to generate scene graphs and design question templates manually. Subsequently, the question-answer pairs are generated programmatically based on these templates. Comprehensive statistics prove that our NuScenes-QA is a balanced large-scale benchmark with diverse question formats. Built upon it, we develop a series of baselines that employ advanced 3D detection and VQA techniques. Our extensive experiments highlight the challenges posed by this new task. Codes and dataset are available at //github.com/qiantianwen/NuScenes-QA.

PDE · MoDELS · 圖 · INFORMS · Graph Transformer ·

2024 年 2 月 20 日

PDEformer: Towards a Foundation Model for One-Dimensional Partial Differential Equations

Zhanhong Ye,Xiang Huang,Leheng Chen,Hongsheng Liu,Zidong Wang,Bin Dong

This paper introduces PDEformer, a neural solver for partial differential equations (PDEs) capable of simultaneously addressing various types of PDEs. We advocate representing the PDE in the form of a computational graph, facilitating the seamless integration of both symbolic and numerical information inherent in a PDE. A graph Transformer and an implicit neural representation (INR) are employed to generate mesh-free predicted solutions. Following pretraining on data exhibiting a certain level of diversity, our model achieves zero-shot accuracies on benchmark datasets that surpass those of adequately trained expert models. Additionally, PDEformer demonstrates promising results in the inverse problem of PDE coefficient recovery.

Extensibility · GM · MoDELS · 類別 · 多代理人模型 ·

2021 年 2 月 9 日

Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

Lewis Hammond,James Fox,Tom Everitt,Alessandro Abate,Michael Wooldridge

from arxiv, Accepted to the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS-21)

Multi-agent influence diagrams (MAIDs) are a popular form of graphical model that, for certain classes of games, have been shown to offer key complexity and explainability advantages over traditional extensive form game (EFG) representations. In this paper, we extend previous work on MAIDs by introducing the concept of a MAID subgame, as well as subgame perfect and trembling hand perfect equilibrium refinements. We then prove several equivalence results between MAIDs and EFGs. Finally, we describe an open source implementation for reasoning about MAIDs and computing their equilibria.