青柠在线观看免费高清1,99热日韩这里只有国产中文精品,日本少妇一区二区在线观看,美女扒开尿孔全身100%裸露,羞羞影院两性午夜免费视频

We propose a novel digital backpropagation (DBP) technique that combines perturbation theory, subband processing, and splitting ratio optimization. We obtain 0.23 dB, 0.47 dB, or 0.91 dB gains w.r.t. dispersion compensation with only 74, 161, or 681 real multiplications/2D-symbol, improving significantly on existing DBP techniques.

相關內容

反向傳播

關注 354

反向傳播一詞嚴格來說僅指用于計算梯度的算法，而不是指如何使用梯度。但是該術語通常被寬松地指整個學習算法，包括如何使用梯度，例如通過隨機梯度下降。反向傳播將增量計算概括為增量規則中的增量規則，該規則是反向傳播的單層版本，然后通過自動微分進行廣義化，其中反向傳播是反向累積（或“反向模式”）的特例。在機器學習中，反向傳播（backprop）是一種廣泛用于訓練前饋神經網絡以進行監督學習的算法。對于其他人工神經網絡（ANN）都存在反向傳播的一般化–一類算法，通常稱為“反向傳播”。反向傳播算法的工作原理是，通過鏈規則計算損失函數相對于每個權重的梯度，一次計算一層，從最后一層開始向后迭代，以避免鏈規則中中間項的冗余計算。

2024 年 6 月 25 日

Speaker-Independent Acoustic-to-Articulatory Inversion through Multi-Channel Attention Discriminator

Woo-Jin Chung,Hong-Goo Kang

from arxiv, Accepted to INTERSPEECH 2024

We present a novel speaker-independent acoustic-to-articulatory inversion (AAI) model, overcoming the limitations observed in conventional AAI models that rely on acoustic features derived from restricted datasets. To address these challenges, we leverage representations from a pre-trained self-supervised learning (SSL) model to more effectively estimate the global, local, and kinematic pattern information in Electromagnetic Articulography (EMA) signals during the AAI process. We train our model using an adversarial approach and introduce an attention-based Multi-duration phoneme discriminator (MDPD) designed to fully capture the intricate relationship among multi-channel articulatory signals. Our method achieves a Pearson correlation coefficient of 0.847, marking state-of-the-art performance in speaker-independent AAI models. The implementation details and code can be found online.

圖像分割 · DIS · Processing（編程語言） · Extensibility · Performance ·

2024 年 6 月 25 日

Bilateral Reference for High-Resolution Dichotomous Image Segmentation

Peng Zheng,Dehong Gao,Deng-Ping Fan,Li Liu,Jorma Laaksonen,Wanli Ouyang,Nicu Sebe

from arxiv, Version 5, with updated DIS performance, accuracy-efficiency comparison, and 3rd-party applications

We introduce a novel bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS). It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef). The LM aids in object localization using global semantic information. Within the RM, we utilize BiRef for the reconstruction process, where hierarchical patches of images provide the source reference and gradient maps serve as the target reference. These components collaborate to generate the final predicted maps. We also introduce auxiliary gradient supervision to enhance focus on regions with finer details. Furthermore, we outline practical training strategies tailored for DIS to improve map quality and training process. To validate the general applicability of our approach, we conduct extensive experiments on four tasks to evince that BiRefNet exhibits remarkable performance, outperforming task-specific cutting-edge methods across all benchmarks. Our codes are available at //github.com/ZhengPeng7/BiRefNet.

MoDELS · 獎勵函數 · 反向傳播 · 泛函 · Performer ·

2024 年 6 月 21 日

Directly Fine-Tuning Diffusion Models on Differentiable Rewards

Kevin Clark,Paul Vicol,Kevin Swersky,David J Fleet

from arxiv, Published at ICLR 2024

We present Direct Reward Fine-Tuning (DRaFT), a simple and effective method for fine-tuning diffusion models to maximize differentiable reward functions, such as scores from human preference models. We first show that it is possible to backpropagate the reward function gradient through the full sampling procedure, and that doing so achieves strong performance on a variety of rewards, outperforming reinforcement learning-based approaches. We then propose more efficient variants of DRaFT: DRaFT-K, which truncates backpropagation to only the last K steps of sampling, and DRaFT-LV, which obtains lower-variance gradient estimates for the case when K=1. We show that our methods work well for a variety of reward functions and can be used to substantially improve the aesthetic quality of images generated by Stable Diffusion 1.4. Finally, we draw connections between our approach and prior work, providing a unifying perspective on the design space of gradient-based fine-tuning algorithms.

去噪 · Performer · Networking · 圖像降噪 · SR ·

2024 年 6 月 20 日

Zero-Shot Image Denoising for High-Resolution Electron Microscopy

Xuanyu Tian,Zhuoya Dong,Xiyue Lin,Yue Gao,Hongjiang Wei,Yanhang Ma,Jingyi Yu,Yuyao Zhang

from arxiv, 12 pages, 12 figures

High-resolution electron microscopy (HREM) imaging technique is a powerful tool for directly visualizing a broad range of materials in real-space. However, it faces challenges in denoising due to ultra-low signal-to-noise ratio (SNR) and scarce data availability. In this work, we propose Noise2SR, a zero-shot self-supervised learning (ZS-SSL) denoising framework for HREM. Within our framework, we propose a super-resolution (SR) based self-supervised training strategy, incorporating the Random Sub-sampler module. The Random Sub-sampler is designed to generate approximate infinite noisy pairs from a single noisy image, serving as an effective data augmentation in zero-shot denoising. Noise2SR trains the network with paired noisy images of different resolutions, which is conducted via SR strategy. The SR-based training facilitates the network adopting more pixels for supervision, and the random sub-sampling helps compel the network to learn continuous signals enhancing the robustness. Meanwhile, we mitigate the uncertainty caused by random-sampling by adopting minimum mean squared error (MMSE) estimation for the denoised results. With the distinctive integration of training strategy and proposed designs, Noise2SR can achieve superior denoising performance using a single noisy HREM image. We evaluate the performance of Noise2SR in both simulated and real HREM denoising tasks. It outperforms state-of-the-art ZS-SSL methods and achieves comparable denoising performance with supervised methods. The success of Noise2SR suggests its potential for improving the SNR of images in material imaging domains.

Taxonomy · RecSys · 大語言模型 · Performer · Unstructured ·

2024 年 6 月 20 日

Taxonomy-Guided Zero-Shot Recommendations with LLMs

Yueqing Liang,Liangwei Yang,Chen Wang,Xiongxiao Xu,Philip S. Yu,Kai Shu

With the emergence of large language models (LLMs) and their ability to perform a variety of tasks, their application in recommender systems (RecSys) has shown promise. However, we are facing significant challenges when deploying LLMs into RecSys, such as limited prompt length, unstructured item information, and un-constrained generation of recommendations, leading to sub-optimal performance. To address these issues, we propose a novel method using a taxonomy dictionary. This method provides a systematic framework for categorizing and organizing items, improving the clarity and structure of item information. By incorporating the taxonomy dictionary into LLM prompts, we achieve efficient token utilization and controlled feature generation, leading to more accurate and contextually relevant recommendations. Our Taxonomy-guided Recommendation (TaxRec) approach features a two-step process: one-time taxonomy categorization and LLM-based recommendation, enabling zero-shot recommendations without the need for domain-specific fine-tuning. Experimental results demonstrate TaxRec significantly enhances recommendation quality compared to traditional zero-shot approaches, showcasing its efficacy as personal recommender with LLMs. Code is available at //github.com/yueqingliang1/TaxRec.

MoDELS · 圖 · Networking · 圖注意力網絡 · Performer ·

2024 年 6 月 17 日

Composing Object Relations and Attributes for Image-Text Matching

Khoi Pham,Chuong Huynh,Ser-Nam Lim,Abhinav Shrivastava

from arxiv, Accepted to CVPR'24

We study the visual semantic embedding problem for image-text matching. Most existing work utilizes a tailored cross-attention mechanism to perform local alignment across the two image and text modalities. This is computationally expensive, even though it is more powerful than the unimodal dual-encoder approach. This work introduces a dual-encoder image-text matching model, leveraging a scene graph to represent captions with nodes for objects and attributes interconnected by relational edges. Utilizing a graph attention network, our model efficiently encodes object-attribute and object-object semantic relations, resulting in a robust and fast-performing system. Representing caption as a scene graph offers the ability to utilize the strong relational inductive bias of graph neural networks to learn object-attribute and object-object relations effectively. To train the model, we propose losses that align the image and caption both at the holistic level (image-caption) and the local level (image-object entity), which we show is key to the success of the model. Our model is termed Composition model for Object Relations and Attributes, CORA. Experimental results on two prominent image-text retrieval benchmarks, Flickr30K and MSCOCO, demonstrate that CORA outperforms existing state-of-the-art computationally expensive cross-attention methods regarding recall score while achieving fast computation speed of the dual encoder.

優化器 · 約束 · 機器人 · 雅克比 · 論文 ·

2024 年 6 月 17 日

Propagative Distance Optimization for Constrained Inverse Kinematics

Yu Chen,Yilin Cai,Jinyun Xu,Zhongqiang Ren,Guanya Shi,Howie Choset

This paper investigates a constrained inverse kinematic (IK) problem that seeks a feasible configuration of an articulated robot under various constraints such as joint limits and obstacle collision avoidance. Due to the high-dimensionality and complex constraints, this problem is often solved numerically via iterative local optimization. Classic local optimization methods take joint angles as the decision variable, which suffers from non-linearity caused by the trigonometric constraints. Recently, distance-based IK methods have been developed as an alternative approach that formulates IK as an optimization over the distances among points attached to the robot and the obstacles. Although distance-based methods have demonstrated unique advantages, they still suffer from low computational efficiency, since these approaches usually ignore the chain structure in the kinematics of serial robots. This paper proposes a new method called propagative distance optimization for constrained inverse kinematics (PDO-IK), which captures and leverages the chain structure in the distance-based formulation and expedites the optimization by computing forward kinematics and the Jacobian propagatively along the kinematic chain. Test results show that PDO-IK runs up to two orders of magnitude faster than the existing distance-based methods under joint limits constraints and obstacle avoidance constraints. It also achieves up to three times higher success rates than the conventional joint-angle-based optimization methods for IK problems. The high runtime efficiency of PDO-IK allows the real-time computation (10$-$1500 Hz) and enables a simulated humanoid robot with 19 degrees of freedom (DoFs) to avoid moving obstacles, which is otherwise hard to achieve with the baselines.

NeRF · 貝葉斯網/貝葉斯網絡 · INFORMS · Extensibility · Networking ·

2024 年 6 月 15 日

Neural Visibility Field for Uncertainty-Driven Active Mapping

Shangjie Xue,Jesse Dill,Pranay Mathur,Frank Dellaert,Panagiotis Tsiotras,Danfei Xu

from arxiv, Accepted to CVPR 2024. More details can be found at //sites.google.com/view/nvf-cvpr24/

This paper presents Neural Visibility Field (NVF), a novel uncertainty quantification method for Neural Radiance Fields (NeRF) applied to active mapping. Our key insight is that regions not visible in the training views lead to inherently unreliable color predictions by NeRF at this region, resulting in increased uncertainty in the synthesized views. To address this, we propose to use Bayesian Networks to composite position-based field uncertainty into ray-based uncertainty in camera observations. Consequently, NVF naturally assigns higher uncertainty to unobserved regions, aiding robots to select the most informative next viewpoints. Extensive evaluations show that NVF excels not only in uncertainty quantification but also in scene reconstruction for active mapping, outperforming existing methods.

學成 · Networking · INFORMS · Performer · Neural Networks ·

2020 年 2 月 27 日

Meta-Transfer Learning for Zero-Shot Super-Resolution

Jae Woong Soh,Sunwoo Cho,Nam Ik Cho

from arxiv, Will be presented in CVPR 2020

Convolutional neural networks (CNNs) have shown dramatic improvements in single image super-resolution (SISR) by using large-scale external samples. Despite their remarkable performance based on the external dataset, they cannot exploit internal information within a specific image. Another problem is that they are applicable only to the specific condition of data that they are supervised. For instance, the low-resolution (LR) image should be a "bicubic" downsampled noise-free image from a high-resolution (HR) one. To address both issues, zero-shot super-resolution (ZSSR) has been proposed for flexible internal learning. However, they require thousands of gradient updates, i.e., long inference time. In this paper, we present Meta-Transfer Learning for Zero-Shot Super-Resolution (MZSR), which leverages ZSSR. Precisely, it is based on finding a generic initial parameter that is suitable for internal learning. Thus, we can exploit both external and internal information, where one single gradient update can yield quite considerable results. (See Figure 1). With our method, the network can quickly adapt to a given image condition. In this respect, our method can be applied to a large spectrum of image conditions within a fast adaptation process.

state-of-the-art · 學成 · Extensibility · INTERACT · Networking ·

2018 年 1 月 28 日

Multi-Pointer Co-Attention Networks for Recommendation

Yi Tay,Luu Anh Tuan,Siu Cheung Hui

Many recent state-of-the-art recommender systems such as D-ATT, TransNet and DeepCoNN exploit reviews for representation learning. This paper proposes a new neural architecture for recommendation with reviews. Our model operates on a multi-hierarchical paradigm and is based on the intuition that not all reviews are created equal, i.e., only a select few are important. The importance, however, should be dynamically inferred depending on the current target. To this end, we propose a review-by-review pointer-based learning scheme that extracts important reviews, subsequently matching them in a word-by-word fashion. This enables not only the most informative reviews to be utilized for prediction but also a deeper word-level interaction. Our pointer-based method operates with a novel gumbel-softmax based pointer mechanism that enables the incorporation of discrete vectors within differentiable neural architectures. Our pointer mechanism is co-attentive in nature, learning pointers which are co-dependent on user-item relationships. Finally, we propose a multi-pointer learning scheme that learns to combine multiple views of interactions between user and item. Overall, we demonstrate the effectiveness of our proposed model via extensive experiments on \textbf{24} benchmark datasets from Amazon and Yelp. Empirical results show that our approach significantly outperforms existing state-of-the-art, with up to 19% and 71% relative improvement when compared to TransNet and DeepCoNN respectively. We study the behavior of our multi-pointer learning mechanism, shedding light on evidence aggregation patterns in review-based recommender systems.