国产成人精品三级在线,亚洲国产中文精品在线观看香蕉

Many real-world applications (e.g., note taking, search) require extracting a sentence or paragraph from a document and showing that snippet to a human outside of the source document. Yet, users may find snippets difficult to understand as they lack context from the original document. In this work, we use language models to rewrite snippets from scientific documents to be read on their own. First, we define the requirements and challenges for this user-facing decontextualization task, such as clarifying where edits occur and handling references to other documents. Second, we propose a framework that decomposes the task into three stages: question generation, question answering, and rewriting. Using this framework, we collect gold decontextualizations from experienced scientific article readers. We then conduct a range of experiments across state-of-the-art commercial and open-source language models to identify how to best provide missing-but-relevant information to models for our task. Finally, we develop QaDecontext, a simple prompting strategy inspired by our framework that improves over end-to-end prompting. We conclude with analysis that finds, while rewriting is easy, question generation and answering remain challenging for today's models.

知識薈萃

精品入門和進階教程、論文和代碼整理等

查看相關VIP內容、論文、資訊等

值域 · Iris (數據集) · MoDELS · state-of-the-art · 三維重建 ·

2024 年 1 月 23 日

IRIS: Inverse Rendering of Indoor Scenes from Low Dynamic Range Images

Zhi-Hao Lin,Jia-Bin Huang,Zhengqin Li,Zhao Dong,Christian Richardt,Tuotuo Li,Michael Zollh?fer,Johannes Kopf,Shenlong Wang,Changil Kim

from arxiv, Project Website: //irisldr.github.io/

While numerous 3D reconstruction and novel-view synthesis methods allow for photorealistic rendering of a scene from multi-view images easily captured with consumer cameras, they bake illumination in their representations and fall short of supporting advanced applications like material editing, relighting, and virtual object insertion. The reconstruction of physically based material properties and lighting via inverse rendering promises to enable such applications. However, most inverse rendering techniques require high dynamic range (HDR) images as input, a setting that is inaccessible to most users. We present a method that recovers the physically based material properties and spatially-varying HDR lighting of a scene from multi-view, low-dynamic-range (LDR) images. We model the LDR image formation process in our inverse rendering pipeline and propose a novel optimization strategy for material, lighting, and a camera response model. We evaluate our approach with synthetic and real scenes compared to the state-of-the-art inverse rendering methods that take either LDR or HDR input. Our method outperforms existing methods taking LDR images as input, and allows for highly realistic relighting and object insertion.

多峰值 · MoDELS · 可約的 · WikiHow · 單峰值 ·

2024 年 1 月 23 日

Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings

Daniel Rose,Vaishnavi Himakunthala,Andy Ouyang,Ryan He,Alex Mei,Yujie Lu,Michael Saxon,Chinmay Sonar,Diba Mirza,William Yang Wang

Recent advances in large language models elicit reasoning in a chain-of-thought that allows models to decompose problems in a human-like fashion. Though this paradigm improves multi-step reasoning ability in language models, it is limited by being unimodal and applied mainly to question-answering tasks. We claim that incorporating visual augmentation into reasoning is essential, especially for complex, imaginative tasks. Consequently, we introduce VCoT, a novel method that leverages chain-of-thought prompting with vision-language grounding to recursively bridge the logical gaps within sequential data. Our method uses visual guidance to generate synthetic multimodal infillings that add consistent and novel information to reduce the logical gaps for downstream tasks that can benefit from temporal reasoning, as well as provide interpretability into models' multi-step reasoning. We apply VCoT to the Visual Storytelling and WikiHow summarization datasets and demonstrate through human evaluation that VCoT offers novel and consistent synthetic data augmentation beating chain-of-thought baselines, which can be used to enhance downstream performance.

多跳 · 可交換的 · 802.11 · 查準率/準確率 · 值域 ·

2024 年 1 月 22 日

Secure Multi-hop Telemetry Broadcasts for UAV Swarm Communication

Randolf Rotta,Pavlo Mykytyn

from arxiv, 2 pages, 8 references

Unmanned Aerial Vehicles (UAVs) are evolving as adaptable platforms for a wide range of applications such as precise inspections, emergency response, and remote sensing. Autonomous UAV swarms require efficient and stable communication during deployment for a successful mission execution. For instance, the periodic exchange of telemetry data between all swarm members provides the foundation for formation flight and collision avoidance. However, due to the mobility of the vehicles and instability of wireless transmissions, maintaining a secure and reliable all-to-all communication remains challenging. This paper investigates encrypted and authenticated multi-hop broadcast communication based on the transmission of custom IEEE 802.11 Wi-Fi data frames.

Bagging · MoDELS · 潛在 · 機器人 · 可約的 ·

2024 年 1 月 21 日

Bimanual Deformable Bag Manipulation Using a Structure-of-Interest Based Latent Dynamics Model

Peng Zhou,Pai Zheng,Jiaming Qi,Chenxi Li,Chenguang Yang,David Navarro-Alarcon,Jia Pan

The manipulation of deformable objects by robotic systems presents a significant challenge due to their complex and infinite-dimensional configuration spaces. This paper introduces a novel approach to Deformable Object Manipulation (DOM) by emphasizing the identification and manipulation of Structures of Interest (SOIs) in deformable fabric bags. We propose a bimanual manipulation framework that leverages a Graph Neural Network (GNN)-based latent dynamics model to succinctly represent and predict the behavior of these SOIs. Our approach involves constructing a graph representation from partial point cloud data of the object and learning the latent dynamics model that effectively captures the essential deformations of the fabric bag within a reduced computational space. By integrating this latent dynamics model with Model Predictive Control (MPC), we empower robotic manipulators to perform precise and stable manipulation tasks focused on the SOIs. We have validated our framework through various empirical experiments demonstrating its efficacy in bimanual manipulation of fabric bags. Our contributions not only address the complexities inherent in DOM but also provide new perspectives and methodologies for enhancing robotic interactions with deformable objects by concentrating on their critical structural elements. Experimental videos can be obtained from //sites.google.com/view/bagbot.

Integration · 向量化 · MoDELS · Performer · PULSE ·

2024 年 1 月 20 日

Joint Transmit Signal and Beamforming Design for Integrated Sensing and Power Transfer Systems

Kenneth MacSporran Mayer,Nikita Shanin,Zhenlong You,Sebastian Lotter,Stefan Brückner,Martin Vossiek,Laura Cottatellucci,Robert Schober

from arxiv, 7 pages, 2 figures, six page version of this paper has been submitted to IEEE ICC 2024

Integrating different functionalities, conventionally implemented as dedicated systems, into a single platform allows utilising the available resources more efficiently. We consider an integrated sensing and power transfer (ISAPT) system and propose the joint optimisation of the rectangular pulse-shaped transmit signal and the beamforming vector to combine sensing and wireless power transfer (WPT) functionalities efficiently. In contrast to prior works, we adopt an accurate non-linear circuit-based energy harvesting (EH) model. We formulate and solve a non-convex optimisation problem for a general number of EH receivers to maximise a weighted sum of the average harvested powers at the EH receivers while ensuring the received echo signal reflected by a sensing target (ST) has sufficient power for estimating the range to the ST with a prescribed accuracy within the considered coverage region. The average harvested power is shown to monotonically increase with the pulse duration when the average transmit power budget is sufficiently large. We discuss the trade-off between sensing performance and power transfer for the considered ISAPT system. The proposed approach significantly outperforms a heuristic baseline scheme based on a linear EH model, which linearly combines energy beamforming with the beamsteering vector in the direction to the ST as its transmit strategy.

優化器 · 正則化項 · 集成 · Processing（編程語言） · 變換 ·

2024 年 1 月 19 日

Ensemble Variational Fokker-Planck Methods for Data Assimilation

Amit N Subrahmanya,Andrey A Popov,Adrian Sandu

Particle flow filters solve Bayesian inference problems by smoothly transforming a set of particles into samples from the posterior distribution. Particles move in state space under the flow of an McKean-Vlasov-Ito process. This work introduces the Variational Fokker-Planck (VFP) framework for data assimilation, a general approach that includes previously known particle flow filters as special cases. The McKean-Vlasov-Ito process that transforms particles is defined via an optimal drift that depends on the selected diffusion term. It is established that the underlying probability density - sampled by the ensemble of particles - converges to the Bayesian posterior probability density. For a finite number of particles the optimal drift contains a regularization term that nudges particles toward becoming independent random variables. Based on this analysis, we derive computationally-feasible approximate regularization approaches that penalize the mutual information between pairs of particles, and avoid particle collapse. Moreover, the diffusion plays a role akin to a particle rejuvenation approach that aims to alleviate particle collapse. The VFP framework is very flexible. Different assumptions on prior and intermediate probability distributions can be used to implement the optimal drift, and localization and covariance shrinkage can be applied to alleviate the curse of dimensionality. A robust implicit-explicit method is discussed for the efficient integration of stiff McKean-Vlasov-Ito processes. The effectiveness of the VFP framework is demonstrated on three progressively more challenging test problems, namely the Lorenz '63, Lorenz '96 and the quasi-geostrophic equations.

Performer · 語言模型化 · MoDELS · 大語言模型 · 監督 ·

2024 年 1 月 19 日

How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition

Guanting Dong,Hongyi Yuan,Keming Lu,Chengpeng Li,Mingfeng Xue,Dayiheng Liu,Wei Wang,Zheng Yuan,Chang Zhou,Jingren Zhou

Large language models (LLMs) with enormous pre-training tokens and parameters emerge diverse abilities, including math reasoning, code generation, and instruction following. These abilities are further enhanced by supervised fine-tuning (SFT). While the open-source community has explored ad-hoc SFT for enhancing individual capabilities, proprietary LLMs exhibit versatility across various skills. Therefore, understanding the facilitation of multiple abilities via SFT is paramount. In this study, we specifically focuses on the interplay of data composition between mathematical reasoning, code generation, and general human-aligning abilities during SFT. We propose four intriguing research questions to explore the association between model performance and various factors including data amount, composition ratio, model size and SFT strategies. Our experiments reveal that distinct capabilities scale differently and larger models generally show superior performance with same amount of data. Mathematical reasoning and code generation consistently improve with increasing data amount, whereas general abilities plateau after roughly a thousand samples. Moreover, we observe data composition appears to enhance various abilities under limited data conditions, yet can lead to performance conflicts when data is plentiful. Our findings also suggest the amount of composition data influences performance more than the composition ratio. In analysis of SFT strategies, we find that sequentially learning multiple skills risks catastrophic forgetting. Our proposed Dual-stage Mixed Fine-tuning (DMT) strategy offers a promising solution to learn multiple abilities with different scaling patterns.

INFORMS · Learning · Performer · Machine Learning · 潛在 ·

2024 年 1 月 18 日

Active Restoration of Lost Audio Signals Using Machine Learning and Latent Information

Zohra Adila Cheddad,Abbas Cheddad

from arxiv, 18 Pages, 2 Tables, 8 Figures

Digital audio signal reconstruction of a lost or corrupt segment using deep learning algorithms has been explored intensively in recent years. Nevertheless, prior traditional methods with linear interpolation, phase coding and tone insertion techniques are still in vogue. However, we found no research work on reconstructing audio signals with the fusion of dithering, steganography, and machine learning regressors. Therefore, this paper proposes the combination of steganography, halftoning (dithering), and state-of-the-art shallow and deep learning methods. The results (including comparing the SPAIN, Autoregressive, deep learning-based, graph-based, and other methods) are evaluated with three different metrics. The observations from the results show that the proposed solution is effective and can enhance the reconstruction of audio signals performed by the side information (e.g., Latent representation) steganography provides. Moreover, this paper proposes a novel framework for reconstruction from heavily compressed embedded audio data using halftoning (i.e., dithering) and machine learning, which we termed the HCR (halftone-based compression and reconstruction). This work may trigger interest in optimising this approach and/or transferring it to different domains (i.e., image reconstruction). Compared to existing methods, we show improvement in the inpainting performance in terms of signal-to-noise ratio (SNR), the objective difference grade (ODG) and Hansen's audio quality metric. In particular, our proposed framework outperformed the learning-based methods (D2WGAN and SG) and the traditional statistical algorithms (e.g., SPAIN, TDC, WCP).

對象識別 · MoDELS · Backbone · Extensibility · 學成 ·

2020 年 3 月 31 日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Mohan Zhou,Yalong Bai,Wei Zhang,Tiejun Zhao,Tao Mei

from arxiv, 10 pages, 7 figures, accepted by CVPR 2020

Most object recognition approaches predominantly focus on learning discriminative visual patterns while overlooking the holistic object structure. Though important, structure modeling usually requires significant manual annotations and therefore is labor-intensive. In this paper, we propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions into the traditional framework. We show the recognition backbone can be substantially enhanced for more robust representation learning, without any cost of extra annotation and inference speed. Specifically, we first propose an object-extent learning module for localizing the object according to the visual patterns shared among the instances in the same category. We then design a spatial context learning module for modeling the internal structures of the object, through predicting the relative positions within the extent. These two modules can be easily plugged into any backbone networks during training and detached at inference time. Extensive experiments show that our look-into-object approach (LIO) achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft). We also show that this learning paradigm is highly generalizable to other tasks such as object detection and segmentation (MS COCO). Project page: //github.com/JDAI-CV/LIO.

事件抽取 · 學成 · 逆強化學習 · GAN · 估計/估計量 ·

2018 年 4 月 21 日

Event Extraction with Generative Adversarial Imitation Learning

Tongtao Zhang,Heng Ji

We propose a new method for event extraction (EE) task based on an imitation learning framework, specifically, inverse reinforcement learning (IRL) via generative adversarial network (GAN). The GAN estimates proper rewards according to the difference between the actions committed by the expert (or ground truth) and the agent among complicated states in the environment. EE task benefits from these dynamic rewards because instances and labels yield to various extents of difficulty and the gains are expected to be diverse -- e.g., an ambiguous but correctly detected trigger or argument should receive high gains -- while the traditional RL models usually neglect such differences and pay equal attention on all instances. Moreover, our experiments also demonstrate that the proposed framework outperforms state-of-the-art methods, without explicit feature engineering.