久久久久精品电影,国产无遮挡又黄又爽不要VIP软

This paper is on the problem of Knowledge-Based Visual Question Answering (KB-VQA). Recent works have emphasized the significance of incorporating both explicit (through external databases) and implicit (through LLMs) knowledge to answer questions requiring external knowledge effectively. A common limitation of such approaches is that they consist of relatively complicated pipelines and often heavily rely on accessing GPT-3 API. Our main contribution in this paper is to propose a much simpler and readily reproducible pipeline which, in a nutshell, is based on efficient in-context learning by prompting LLaMA (1 and 2) using question-informative captions as contextual information. Contrary to recent approaches, our method is training-free, does not require access to external databases or APIs, and yet achieves state-of-the-art accuracy on the OK-VQA and A-OK-VQA datasets. Finally, we perform several ablation studies to understand important aspects of our method. Our code is publicly available at //github.com/alexandrosXe/ASimple-Baseline-For-Knowledge-Based-VQA

多峰值 · Performer · Attention · 可理解性 · Extensibility ·

2023 年 10 月 20 日

A Suite of Generative Tasks for Multi-Level Multimodal Webpage Understanding

Andrea Burns,Krishna Srinivasan,Joshua Ainslie,Geoff Brown,Bryan A. Plummer,Kate Saenko,Jianmo Ni,Mandy Guo

from arxiv, Accepted in EMNLP 2023, revision contains camera ready edits. Data can be downloaded at //github.com/google-research-datasets/wit/blob/main/wikiweb2m.md

Webpages have been a rich, scalable resource for vision-language and language only tasks. Yet only pieces of webpages are kept in existing datasets: image-caption pairs, long text articles, or raw HTML, never all in one place. Webpage tasks have resultingly received little attention and structured image-text data left underused. To study multimodal webpage understanding, we introduce the Wikipedia Webpage suite (WikiWeb2M) containing 2M pages with all of the associated image, text, and structure data. We verify its utility on three generative tasks: page description generation, section summarization, and contextual image captioning. We design a novel attention mechanism Prefix Global, which selects the most relevant image and text content as global tokens to attend to the rest of the webpage for context. By using page structure to separate such tokens, it performs better than full attention with lower computational complexity. Extensive experiments show that the new data in WikiWeb2M improves task performance compared to prior work.

CASE · Taxonomy · NLP · 可辨認的 · MoDELS ·

2023 年 10 月 19 日

From Dissonance to Insights: Dissecting Disagreements in Rationale Construction for Case Outcome Classification

Shanshan Xu,Santosh T. Y. S. S,Oana Ichim,Isabella Risini,Barbara Plank,Matthias Grabmair

from arxiv, Accepted to EMNLP 2023

In legal NLP, Case Outcome Classification (COC) must not only be accurate but also trustworthy and explainable. Existing work in explainable COC has been limited to annotations by a single expert. However, it is well-known that lawyers may disagree in their assessment of case facts. We hence collect a novel dataset RAVE: Rationale Variation in ECHR1, which is obtained from two experts in the domain of international human rights law, for whom we observe weak agreement. We study their disagreements and build a two-level task-independent taxonomy, supplemented with COC-specific subcategories. To our knowledge, this is the first work in the legal NLP that focuses on human label variation. We quantitatively assess different taxonomy categories and find that disagreements mainly stem from underspecification of the legal context, which poses challenges given the typically limited granularity and noise in COC metadata. We further assess the explainablility of SOTA COC models on RAVE and observe limited agreement between models and experts. Overall, our case study reveals hitherto underappreciated complexities in creating benchmark datasets in legal NLP that revolve around identifying aspects of a case's facts supposedly relevant to its outcome.

無偏 · Weight · 路徑 · 多樣性 · 哈爾濱工業大學（HIT） ·

2023 年 10 月 19 日

Toward Unbiased Multiple-Target Fuzzing with Path Diversity

Huanyao Rong,Wei You,Xiaofeng Wang,Tianhao Mao

In this paper, we propose a novel directed fuzzing solution named AFLRun, which features target path-diversity metric and unbiased energy assignment. Firstly, we develop a new coverage metric by maintaining extra virgin map for each covered target to track the coverage status of seeds that hit the target. This approach enables the storage of waypoints into the corpus that hit a target through interesting path, thus enriching the path diversity for each target. Additionally, we propose a corpus-level energy assignment strategy that guarantees fairness for each target. AFLRun starts with uniform target weight and propagates this weight to seeds to get a desired seed weight distribution. By assigning energy to each seed in the corpus according to such desired distribution, a precise and unbiased energy assignment can be achieved. We built a prototype system and assessed its performance using a standard benchmark and several extensively fuzzed real-world applications. The evaluation results demonstrate that AFLRun outperforms state-of-the-art fuzzers in terms of vulnerability detection, both in quantity and speed. Moreover, AFLRun uncovers 29 previously unidentified vulnerabilities, including 8 CVEs, across four distinct programs.

估計/估計量 · 控制器 · 在線 · 機器人 · 可約的 ·

2023 年 10 月 19 日

Object-Aware Impedance Control for Human-Robot Collaborative Task with Online Object Parameter Estimation

Jinseong Park,Yong-Sik Shin,Sanghyun Kim

from arxiv, 11 pages, 5 figures, for associated video, see //youtu.be/bGH6GAFlRgA?si=wXj_SRzEE8BYoV2a

Physical human-robot interactions (pHRIs) can improve robot autonomy and reduce physical demands on humans. In this paper, we consider a collaborative task with a considerably long object and no prior knowledge of the object's parameters. An integrated control framework with an online object parameter estimator and a Cartesian object-aware impedance controller is proposed to realize complicated scenarios. During the transportation task, the object parameters are estimated online while a robot and human lift an object. The perturbation motion is incorporated into the null space of the desired trajectory to enhance the estimator accuracy. An object-aware impedance controller is designed using the real-time estimation results to effectively transmit the intended human motion to the robot through the object. Experimental demonstrations of collaborative tasks, including object transportation and assembly tasks, are implemented to show the effectiveness of our proposed method.

稀疏 · Ray · Microsoft Surface · 偽標記 · INFORMS ·

2023 年 10 月 18 日

Geometry-Guided Ray Augmentation for Neural Surface Reconstruction with Sparse Views

Jiawei Yao,Chen Wang,Tong Wu,Chuming Li

In this paper, we propose a novel method for 3D scene and object reconstruction from sparse multi-view images. Different from previous methods that leverage extra information such as depth or generalizable features across scenes, our approach leverages the scene properties embedded in the multi-view inputs to create precise pseudo-labels for optimization without any prior training. Specifically, we introduce a geometry-guided approach that improves surface reconstruction accuracy from sparse views by leveraging spherical harmonics to predict the novel radiance while holistically considering all color observations for a point in the scene. Also, our pipeline exploits proxy geometry and correctly handles the occlusion in generating the pseudo-labels of radiance, which previous image-warping methods fail to avoid. Our method, dubbed Ray Augmentation (RayAug), achieves superior results on DTU and Blender datasets without requiring prior training, demonstrating its effectiveness in addressing the problem of sparse view reconstruction. Our pipeline is flexible and can be integrated into other implicit neural reconstruction methods for sparse views.

CASE · Taxonomy · 數據集 · NLP · 可辨認的 ·

2023 年 10 月 18 日

From Dissonance to Insights: Dissecting Disagreements in Rationale Dataset Construction for Case Outcome Classification

Shanshan Xu,Santosh T. Y. S. S,Oana Ichim,Isabella Risini,Barbara Plank,Matthias Grabmair

from arxiv, Accepted to EMNLP 2023

In legal NLP, Case Outcome Classification (COC) must not only be accurate but also trustworthy and explainable. Existing work in explainable COC has been limited to annotations by a single expert. However, it is well-known that lawyers may disagree in their assessment of case facts. We hence collect a novel dataset RAVE: Rationale Variation in ECHR1, which is obtained from two experts in the domain of international human rights law, for whom we observe weak agreement. We study their disagreements and build a two-level task-independent taxonomy, supplemented with COC-specific subcategories. To our knowledge, this is the first work in the legal NLP that focuses on human label variation. We quantitatively assess different taxonomy categories and find that disagreements mainly stem from underspecification of the legal context, which poses challenges given the typically limited granularity and noise in COC metadata. We further assess the explainablility of SOTA COC models on RAVE and observe limited agreement between models and experts. Overall, our case study reveals hitherto underappreciated complexities in creating benchmark datasets in legal NLP that revolve around identifying aspects of a case's facts supposedly relevant to its outcome.

確切的 · STOC · 稀疏 · 近似 · 相互獨立的 ·

2023 年 10 月 17 日

Simpler Reductions from Exact Triangle

Timothy M. Chan,Yinzhan Xu

from arxiv, To appear in SOSA 2024

In this paper, we provide simpler reductions from Exact Triangle to two important problems in fine-grained complexity: Exact Triangle with Few Zero-Weight $4$-Cycles and All-Edges Sparse Triangle. Exact Triangle instances with few zero-weight $4$-cycles was considered by Jin and Xu [STOC 2023], who used it as an intermediate problem to show $3$SUM hardness of All-Edges Sparse Triangle with few $4$-cycles (independently obtained by Abboud, Bringmann and Fischer [STOC 2023]), which is further used to show $3$SUM hardness of a variety of problems, including $4$-Cycle Enumeration, Offline Approximate Distance Oracle, Dynamic Approximate Shortest Paths and All-Nodes Shortest Cycles. We provide a simple reduction from Exact Triangle to Exact Triangle with few zero-weight $4$-cycles. Our new reduction not only simplifies Jin and Xu's previous reduction, but also strengthens the conditional lower bounds from being under the $3$SUM hypothesis to the even more believable Exact Triangle hypothesis. As a result, all conditional lower bounds shown by Jin and Xu [STOC 2023] and by Abboud, Bringmann and Fischer [STOC 2023] using All-Edges Sparse Triangle with few $4$-cycles as an intermediate problem now also hold under the Exact Triangle hypothesis. We also provide two alternative proofs of the conditional lower bound of the All-Edges Sparse Triangle problem under the Exact Triangle hypothesis, which was originally proved by Vassilevska Williams and Xu [FOCS 2020]. Both of our new reductions are simpler, and one of them is also deterministic -- all previous reductions from Exact Triangle or 3SUM to All-Edges Sparse Triangle (including P\u{a}tra\c{s}cu's seminal work [STOC 2010]) were randomized.

FRN · INFORMS · Networking · MoDELS · 學成 ·

2021 年 4 月 12 日

Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition

Delian Ruan, YanYan,Shenqi Lai,Zhenhua Chai,Chunhua Shen,Hanzi Wang

from arxiv, IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021 (CVPR 2021)

In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.

元學習 · 語音識別 · MAML · 學成 · 端到端 ·

2019 年 10 月 26 日

Meta Learning for End-to-End Low-Resource Speech Recognition

Jui-Yang Hsu,Yuan-Jui Chen,Hung-yi Lee

from arxiv, 5 pages, submitted to ICASSP 2020

In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR). We formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML). We evaluated the proposed approach using six languages as pretraining tasks and four languages as target tasks. Preliminary results showed that the proposed method, MetaASR, significantly outperforms the state-of-the-art multitask pretraining approach on all target languages with different combinations of pretraining languages. In addition, since MAML's model-agnostic property, this paper also opens new research direction of applying meta learning to more speech-related applications.

亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

相關內容