丰满人妻被公侵犯高清版-91日韩专区综合第一页

In this work, we propose a new transformer-based regularization to better localize objects for Weakly supervised semantic segmentation (WSSS). In image-level WSSS, Class Activation Map (CAM) is adopted to generate object localization as pseudo segmentation labels. To address the partial activation issue of the CAMs, consistency regularization is employed to maintain activation intensity invariance across various image augmentations. However, such methods ignore pair-wise relations among regions within each CAM, which capture context and should also be invariant across image views. To this end, we propose a new all-pairs consistency regularization (ACR). Given a pair of augmented views, our approach regularizes the activation intensities between a pair of augmented views, while also ensuring that the affinity across regions within each view remains consistent. We adopt vision transformers as the self-attention mechanism naturally embeds pair-wise affinity. This enables us to simply regularize the distance between the attention matrices of augmented image pairs. Additionally, we introduce a novel class-wise localization method that leverages the gradients of the class token. Our method can be seamlessly integrated into existing WSSS methods using transformers without modifying the architectures. We evaluate our method on PASCAL VOC and MS COCO datasets. Our method produces noticeably better class localization maps (67.3% mIoU on PASCAL VOC train), resulting in superior WSSS performances.

相關內容

正則化項

關注 0

統計量 · 推斷 · 估計/估計量 · EASE · UniFormer ·

2023 年 9 月 29 日

A Framework for Statistical Inference via Randomized Algorithms

Zhixiang Zhang,Sokbae Lee,Edgar Dobriban

Randomized algorithms, such as randomized sketching or projections, are a promising approach to ease the computational burden in analyzing large datasets. However, randomized algorithms also produce non-deterministic outputs, leading to the problem of evaluating their accuracy. In this paper, we develop a statistical inference framework for quantifying the uncertainty of the outputs of randomized algorithms. We develop appropriate statistical methods -- sub-randomization, multi-run plug-in and multi-run aggregation inference -- by using multiple runs of the same randomized algorithm, or by estimating the unknown parameters of the limiting distribution. As an example, we develop methods for statistical inference for least squares parameters via random sketching using matrices with i.i.d.entries, or uniform partial orthogonal matrices. For this, we characterize the limiting distribution of estimators obtained via sketch-and-solve as well as partial sketching methods. The analysis of i.i.d. sketches uses a trigonometric interpolation argument to establish a differential equation for the limiting expected characteristic function and find the dependence on the kurtosis of the entries of the sketching matrix. The results are supported via a broad range of simulations.

穩健性 · Learning · 控制器 · 設計 · 求逆 ·

2023 年 9 月 27 日

Template Model Inspired Task Space Learning for Robust Bipedal Locomotion

Guillermo A. Castillo,Bowen Weng,Shunpeng Yang,Wei Zhang,Ayonga Hereid

from arxiv, Accepted at 2023 International Conference on Intelligent Robots and Systems (IROS). Supplemental Video: //youtu.be/YTjMgGka4Ig

This work presents a hierarchical framework for bipedal locomotion that combines a Reinforcement Learning (RL)-based high-level (HL) planner policy for the online generation of task space commands with a model-based low-level (LL) controller to track the desired task space trajectories. Different from traditional end-to-end learning approaches, our HL policy takes insights from the angular momentum-based linear inverted pendulum (ALIP) to carefully design the observation and action spaces of the Markov Decision Process (MDP). This simple yet effective design creates an insightful mapping between a low-dimensional state that effectively captures the complex dynamics of bipedal locomotion and a set of task space outputs that shape the walking gait of the robot. The HL policy is agnostic to the task space LL controller, which increases the flexibility of the design and generalization of the framework to other bipedal robots. This hierarchical design results in a learning-based framework with improved performance, data efficiency, and robustness compared with the ALIP model-based approach and state-of-the-art learning-based frameworks for bipedal locomotion. The proposed hierarchical controller is tested in three different robots, Rabbit, a five-link underactuated planar biped; Walker2D, a seven-link fully-actuated planar biped; and Digit, a 3D humanoid robot with 20 actuated joints. The trained policy naturally learns human-like locomotion behaviors and is able to effectively track a wide range of walking speeds while preserving the robustness and stability of the walking gait even under adversarial conditions.

MoDELS · Prophet · 損失 · 樣例 · 優化器 ·

2023 年 9 月 27 日

Optimal Stopping with Multi-Dimensional Comparative Loss Aversion

Linda Cai,Joshua Gardner,S. Matthew Weinberg

from arxiv, Accepted to WINE 2023

Despite having the same basic prophet inequality setup and model of loss aversion, conclusions in our multi-dimensional model differs considerably from the one-dimensional model of Kleinberg et al. For example, Kleinberg et al. gives a tight closed-form on the competitive ratio that an online decision-maker can achieve as a function of $\lambda$, for any $\lambda \geq 0$. In our multi-dimensional model, there is a sharp phase transition: if $k$ denotes the number of dimensions, then when $\lambda \cdot (k-1) \geq 1$, no non-trivial competitive ratio is possible. On the other hand, when $\lambda \cdot (k-1) < 1$, we give a tight bound on the achievable competitive ratio (similar to Kleinberg et al.). As another example, Kleinberg et al. uncovers an exponential improvement in their competitive ratio for the random-order vs. worst-case prophet inequality problem. In our model with $k\geq 2$ dimensions, the gap is at most a constant-factor. We uncover several additional key differences in the multi- and single-dimensional models.

情景 · CHM · 優化器 · Extensibility · 賭博機/老虎機 ·

2023 年 9 月 26 日

An Asymptotically Optimal Algorithm for the Convex Hull Membership Problem

Gang Qiao,Ambuj Tewari

This work studies the pure-exploration setting for the convex hull membership (CHM) problem where one aims to efficiently and accurately determine if a given point lies in the convex hull of means of a finite set of distributions. We give a complete characterization of the sample complexity of the CHM problem in the one-dimensional setting. We present the first asymptotically optimal algorithm called Thompson-CHM, whose modular design consists of a stopping rule and a sampling rule. In addition, we extend the algorithm to settings that generalize several important problems in the multi-armed bandit literature. Furthermore, we discuss the extension of Thompson-CHM to higher dimensions. Finally, we provide numerical experiments to demonstrate the empirical behavior of the algorithm matches our theoretical results for realistic time horizons.

估計/估計量 · 似然 · 泛函 · Performer · UniFormer ·

2023 年 9 月 26 日

Barankin-Type Bound for Constrained Parameter Estimation

Eyal Nitzan,Tirza Routtenberg,Joseph Tabrikian

from arxiv, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

In constrained parameter estimation, the classical constrained Cramer-Rao bound (CCRB) and the recent Lehmann-unbiased CCRB (LU-CCRB) are lower bounds on the performance of mean-unbiased and Lehmann-unbiased estimators, respectively. Both the CCRB and the LU-CCRB require differentiability of the likelihood function, which can be a restrictive assumption. Additionally, these bounds are local bounds that are inappropriate for predicting the threshold phenomena of the constrained maximum likelihood (CML) estimator. The constrained Barankin-type bound (CBTB) is a nonlocal mean-squared-error (MSE) lower bound for constrained parameter estimation that does not require differentiability of the likelihood function. However, this bound requires a restrictive mean-unbiasedness condition in the constrained set. In this work, we propose the Lehmann-unbiased CBTB (LU-CBTB) on the weighted MSE (WMSE). This bound does not require differentiability of the likelihood function and assumes uniform Lehmann-unbiasedness, which is less restrictive than the CBTB uniform mean-unbiasedness. We show that the LU-CBTB is tighter than or equal to the LU-CCRB and coincides with the CBTB for linear constraints. For nonlinear constraints the LU-CBTB and the CBTB are different and the LU-CBTB can be a lower bound on the WMSE of constrained estimators in cases, where the CBTB is not. In the simulations, we consider direction-of-arrival estimation of an unknown constant modulus discrete signal. In this case, the likelihood function is not differentiable and constrained Cramer-Rao-type bounds do not exist, while CBTBs exist. It is shown that the LU-CBTB better predicts the CML estimator performance than the CBTB, since the CML estimator is Lehmann-unbiased but not mean-unbiased.

MoDELS · 論文 · 設計 · Robot ·

2023 年 9 月 26 日

Modeling Evacuee Behavior for Robot-Guided Emergency Evacuation

Mollik Nayyar,Alan Wagner

from arxiv, Presented at Social Robot Navigation: Advances and Evaluation. In conjunction with: IEEE International Conference on Robotics and Automation, ICRA 2022

This paper considers the problem of developing suitable behavior models of human evacuees during a robot-guided emergency evacuation. We describe our recent research developing behavior models of evacuees and potential future uses of these models. This paper considers how behavior models can contribute to the development and design of emergency evacuation simulations in order to improve social navigation during an evacuation.

流 · 語音識別 · Conformer · Attention · 變換 ·

2023 年 9 月 26 日

Exploring RWKV for Memory Efficient and Low Latency Streaming ASR

Keyu An,Shiliang Zhang

from arxiv, submitted to ICASSP 2024

Recently, self-attention-based transformers and conformers have been introduced as alternatives to RNNs for ASR acoustic modeling. Nevertheless, the full-sequence attention mechanism is non-streamable and computationally expensive, thus requiring modifications, such as chunking and caching, for efficient streaming ASR. In this paper, we propose to apply RWKV, a variant of linear attention transformer, to streaming ASR. RWKV combines the superior performance of transformers and the inference efficiency of RNNs, which is well-suited for streaming ASR scenarios where the budget for latency and memory is restricted. Experiments on varying scales (100h - 10000h) demonstrate that RWKV-Transducer and RWKV-Boundary-Aware-Transducer achieve comparable to or even better accuracy compared with chunk conformer transducer, with minimal latency and inference memory cost.

3D · CLUES · 表示 · 帶符號距離 · Extensibility ·

2023 年 9 月 26 日

MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection

Junkai Xu,Liang Peng,Haoran Cheng,Hao Li,Wei Qian,Ke Li,Wenxiao Wang,Deng Cai

from arxiv, Accepted by ICCV 2023

In the field of monocular 3D detection, it is common practice to utilize scene geometric clues to enhance the detector's performance. However, many existing works adopt these clues explicitly such as estimating a depth map and back-projecting it into 3D space. This explicit methodology induces sparsity in 3D representations due to the increased dimensionality from 2D to 3D, and leads to substantial information loss, especially for distant and occluded objects. To alleviate this issue, we propose MonoNeRD, a novel detection framework that can infer dense 3D geometry and occupancy. Specifically, we model scenes with Signed Distance Functions (SDF), facilitating the production of dense 3D representations. We treat these representations as Neural Radiance Fields (NeRF) and then employ volume rendering to recover RGB images and depth maps. To the best of our knowledge, this work is the first to introduce volume rendering for M3D, and demonstrates the potential of implicit reconstruction for image-based 3D perception. Extensive experiments conducted on the KITTI-3D benchmark and Waymo Open Dataset demonstrate the effectiveness of MonoNeRD. Codes are available at //github.com/cskkxjk/MonoNeRD.

Feel · 邊 · 模型評估 · Learning · 情景 ·

2023 年 9 月 26 日

Over-the-Air Computation Based on Balanced Number Systems for Federated Edge Learning

Alphan Sahin

from arxiv, Accepted for publication in IEEE Transactions on Wireless Communications. arXiv admin note: substantial text overlap with arXiv:2209.11004

In this study, we propose a digital over-the-air computation (OAC) scheme for achieving continuous-valued (analog) aggregation for federated edge learning (FEEL). We show that the average of a set of real-valued parameters can be calculated approximately by using the average of the corresponding numerals, where the numerals are obtained based on a balanced number system. By exploiting this key property, the proposed scheme encodes the local stochastic gradients into a set of numerals. Next, it determines the positions of the activated orthogonal frequency division multiplexing (OFDM) subcarriers by using the values of the numerals. To eliminate the need for precise sample-level time synchronization, channel estimation overhead, and channel inversion, the proposed scheme also uses a non-coherent receiver at the edge server (ES) and does not utilize a pre-equalization at the edge devices (EDs). We theoretically analyze the MSE performance of the proposed scheme and the convergence rate for a non-convex loss function. To improve the test accuracy of FEEL with the proposed scheme, we introduce the concept of adaptive absolute maximum (AAM). Our numerical results show that when the proposed scheme is used with AAM for FEEL, the test accuracy can reach up to 98% for heterogeneous data distribution.

FRN · INFORMS · Networking · MoDELS · 學成 ·

2021 年 4 月 12 日

Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition

Delian Ruan, YanYan,Shenqi Lai,Zhenhua Chai,Chunhua Shen,Hanzi Wang

from arxiv, IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021 (CVPR 2021)

In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.