婷婷激情五月天中文字幕_91午夜夜伦鲁鲁片免费无码影视_中文字幕AV免费三级片_亚洲欧美日韩精品_伊人久久精品无码AV一区二区三区_婷婷六月国产在线_午夜日韩久久影院

from arxiv, \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Due to their robustness and flexibility, neural-driven beamformers are a popular choice for speech separation in challenging environments with a varying amount of simultaneous speakers alongside noise and reverberation. Time-frequency masks and relative directions of the speakers regarding a fixed spatial grid can be used to estimate the beamformer's parameters. To some degree, speaker-independence is achieved by ensuring a greater amount of spatial partitions than speech sources. In this work, we analyze how to encode both mask and positioning into such a grid to enable joint estimation of both quantities. We propose mask-weighted spatial likelihood coding and show that it achieves considerable performance in both tasks compared to baseline encodings optimized for either localization or mask estimation. In the same setup, we demonstrate superiority for joint estimation of both quantities. Conclusively, we propose a universal approach which can replace an upstream sound source localization system solely by adapting the training framework, making it highly relevant in performance-critical scenarios.

相關內容

估計/估計量

關注 3

MoDELS · Vision · 基準 · 詞元分析器 · 穩健性 ·

2024 年 12 月 10 日

RADIO Amplified: Improved Baselines for Agglomerative Vision Foundation Models

Greg Heinrich,Mike Ranzinger, Hongxu, Yin,Yao Lu,Jan Kautz,Andrew Tao,Bryan Catanzaro,Pavlo Molchanov

Agglomerative models have recently emerged as a powerful approach to training vision foundation models, leveraging multi-teacher distillation from existing models such as CLIP, DINO, and SAM. This strategy enables the efficient creation of robust models, combining the strengths of individual teachers while significantly reducing computational and resource demands. In this paper, we thoroughly analyze state-of-the-art agglomerative models, identifying critical challenges including resolution mode shifts, teacher imbalance, idiosyncratic teacher artifacts, and an excessive number of output tokens. To address these issues, we propose several novel solutions: multi-resolution training, mosaic augmentation, and improved balancing of teacher loss functions. Specifically, in the context of Vision Language Models, we introduce a token compression technique to maintain high-resolution information within a fixed token count. We release our top-performing models, available in multiple scales (-B, -L, -H, and -g), alongside inference code and pretrained weights.

Learning · 變換 · 強化學習 · Excel · Performer ·

2024 年 12 月 10 日

RLT4Rec: Reinforcement Learning Transformer for User Cold Start and Item Recommendation

Dilina Chandika Rajapakse,Douglas Leith

We introduce a new sequential transformer reinforcement learning architecture RLT4Rec and demonstrate that it achieves excellent performance in a range of item recommendation tasks. RLT4Rec uses a relatively simple transformer architecture that takes as input the user's (item,rating) history and outputs the next item to present to the user. Unlike existing RL approaches, there is no need to input a state observation or estimate. RLT4Rec handles new users and established users within the same consistent framework and automatically balances the "exploration" needed to discover the preferences of a new user with the "exploitation" that is more appropriate for established users. Training of RLT4Rec is robust and fast and is insensitive to the choice of training data, learning to generate "good" personalised sequences that the user tends to rate highly even when trained on "bad" data.

共軛 · 共軛梯度 · PDE · 線性的 · Neural Networks ·

2024 年 12 月 10 日

Deep Learning-Enhanced Preconditioning for Efficient Conjugate Gradient Solvers in Large-Scale PDE Systems

Rui Li,Song Wang,Chen Wang

Preconditioning techniques are crucial for enhancing the efficiency of solving large-scale linear equation systems that arise from partial differential equation (PDE) discretization. These techniques, such as Incomplete Cholesky factorization (IC) and data-driven neural network methods, accelerate the convergence of iterative solvers like Conjugate Gradient (CG) by approximating the original matrices. This paper introduces a novel approach that integrates Graph Neural Network (GNN) with traditional IC, addressing the shortcomings of direct generation methods based on GNN and achieving significant improvements in computational efficiency and scalability. Experimental results demonstrate an average reduction in iteration counts by 24.8% compared to IC and a two-order-of-magnitude increase in training scale compared to previous methods. A three-dimensional static structural analysis utilizing finite element methods was validated on training sparse matrices of up to 5 million dimensions and inference scales of up to 10 million. Furthermore, the approach demon-strates robust generalization capabilities across scales, facilitating the effective acceleration of CG solvers for large-scale linear equations using small-scale data on modest hardware. The method's robustness and scalability make it a practical solution for computational science.

MoDELS · 部分可觀測馬爾可夫決策過程 · Markov · Processing（編程語言） · 離散化 ·

2024 年 12 月 9 日

Parameter Adjustments in POMDP-Based Trajectory Planning for Unsignalized Intersections

Adam Kollar?ík adn Zdeněk Hanzálek

This paper investigates the problem of trajectory planning for autonomous vehicles at unsignalized intersections, specifically focusing on scenarios where the vehicle lacks the right of way and yet must cross safely. To address this issue, we have employed a method based on the Partially Observable Markov Decision Processes (POMDPs) framework designed for planning under uncertainty. The method utilizes the Adaptive Belief Tree (ABT) algorithm as an approximate solver for the POMDPs. We outline the POMDP formulation, beginning with discretizing the intersection's topology. Additionally, we present a dynamics model for the prediction of the evolving states of vehicles, such as their position and velocity. Using an observation model, we also describe the connection of those states with the imperfect (noisy) available measurements. Our results confirmed that the method is able to plan collision-free trajectories in a series of simulations utilizing real-world traffic data from aerial footage of two distinct intersections. Furthermore, we studied the impact of parameter adjustments of the ABT algorithm on the method's performance. This provides guidance in determining reasonable parameter settings, which is valuable for future method applications.

通道 · MIMO · 極大 · 設計 · 估計/估計量 ·

2024 年 12 月 6 日

Analog-Only Beamforming for Near-Field Multiuser MIMO Communications

Ying Wang,Chenhao Qi

For the existing near-field multiuser communications based on hybrid beamforming (HBF) architectures, high-quality effective channel estimation is required to obtain the channel state information (CSI) for the design of the digital beamformer. To simplify the system reconfiguration and eliminate the pilot overhead required by the effective channel estimation, we considered an analog-only beamforming (AoBF) architecture in this study. The AoBF is designed with the aim of maximizing the sum rate, which is then transformed into a problem, maximizing the power transmitted to the target user equipment (UE) and meanwhile minimizing the power leaked to the other UEs. To solve this problem, we used beam focusing and beam nulling and proposed two AoBF schemes based on the majorization-minimization (MM) algorithm. First, the AoBF scheme based on perfect CSI is proposed, with the focus on the beamforming performance and regardless of the CSI acquisition. Then, the AoBF scheme based on imperfect CSI is proposed, where the low-dimensional imperfect CSI is obtained by beam sweeping based on a near-field codebook. Simulation results demonstrate that the two AoBF schemes can approach the sum rate of the HBF schemes but outperform HBF schemes in terms of energy efficiency (EE).

優化器 · MoDELS · 黑盒子 · 穩健性 · Performer ·

2024 年 12 月 6 日

Pretrained Optimization Model for Zero-Shot Black Box Optimization

Xiaobin Li,Kai Wu,Yujian Betterest Li,Xiaoyu Zhang,Handing Wang,Jing Liu

Zero-shot optimization involves optimizing a target task that was not seen during training, aiming to provide the optimal solution without or with minimal adjustments to the optimizer. It is crucial to ensure reliable and robust performance in various applications. Current optimizers often struggle with zero-shot optimization and require intricate hyperparameter tuning to adapt to new tasks. To address this, we propose a Pretrained Optimization Model (POM) that leverages knowledge gained from optimizing diverse tasks, offering efficient solutions to zero-shot optimization through direct application or fine-tuning with few-shot samples. Evaluation on the BBOB benchmark and two robot control tasks demonstrates that POM outperforms state-of-the-art black-box optimization methods, especially for high-dimensional tasks. Fine-tuning POM with a small number of samples and budget yields significant performance improvements. Moreover, POM demonstrates robust generalization across diverse task distributions, dimensions, population sizes, and optimization horizons. For code implementation, see //github.com/ninja-wm/POM/.

MoDELS · 縮放 · Vision · 語言模型化 · Better ·

2024 年 12 月 6 日

Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension

Xiyao Wang,Zhengyuan Yang,Linjie Li,Hongjin Lu,Yuancheng Xu,Chung-Ching Lin,Kevin Lin,Furong Huang,Lijuan Wang

Despite significant advancements in vision-language models (VLMs), there lacks effective approaches to enhance response quality by scaling inference-time computation. This capability is known to be a core step towards the self-improving models in recent large language model studies. In this paper, we present Vision Value Model (VisVM) that can guide VLM inference-time search to generate responses with better visual comprehension. Specifically, VisVM not only evaluates the generated sentence quality in the current search step, but also anticipates the quality of subsequent sentences that may result from the current step, thus providing a long-term value. In this way, VisVM steers VLMs away from generating sentences prone to hallucinations or insufficient detail, thereby producing higher quality responses. Experimental results demonstrate that VisVM-guided search significantly enhances VLMs' ability to generate descriptive captions with richer visual details and fewer hallucinations, compared with greedy decoding and search methods with other visual reward signals. Furthermore, we find that self-training the model with the VisVM-guided captions improve VLM's performance across a wide range of multimodal benchmarks, indicating the potential for developing self-improving VLMs. Our value model and code are available at //github.com/si0wang/VisVM.

塑造 · 估計/估計量 · 平滑 · 目標跟蹤 · Performer ·

2024 年 12 月 5 日

A Certifiable Algorithm for Simultaneous Shape Estimation and Object Tracking

Lorenzo Shaikewitz,Samuel Ubellacker,Luca Carlone

from arxiv, Accepted to IEEE RA-L. 11 pages, 6 figures (with appendix). Code released at //github.com/MIT-SPARK/certifiable_tracking. Video available at //youtu.be/eTIlVD9pDtc

Applications from manipulation to autonomous vehicles rely on robust and general object tracking to safely perform tasks in dynamic environments. We propose the first certifiably optimal category-level approach for simultaneous shape estimation and pose tracking of an object of known category (e.g. a car). Our approach uses 3D semantic keypoint measurements extracted from an RGB-D image sequence, and phrases the estimation as a fixed-lag smoothing problem. Temporal constraints enforce the object's rigidity (fixed shape) and smooth motion according to a constant-twist motion model. The solutions to this problem are the estimates of the object's state (poses, velocities) and shape (paramaterized according to the active shape model) over the smoothing horizon. Our key contribution is to show that despite the non-convexity of the fixed-lag smoothing problem, we can solve it to certifiable optimality using a small-size semidefinite relaxation. We also present a fast outlier rejection scheme that filters out incorrect keypoint detections with shape and time compatibility tests, and wrap our certifiable solver in a graduated non-convexity scheme. We evaluate the proposed approach on synthetic and real data, showcasing its performance in a table-top manipulation scenario and a drone-based vehicle tracking application.

Prompt · MoDELS · 學成 · Extensibility · 向量化 ·

2022 年 3 月 10 日

Conditional Prompt Learning for Vision-Language Models

Kaiyang Zhou,Jingkang Yang,Chen Change Loy,Ziwei Liu

from arxiv, CVPR 2022. TL;DR: We propose a conditional prompt learning approach to solve the generalizability issue of static prompts

With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential to investigate ways to adapt these models to downstream datasets. A recently proposed method named Context Optimization (CoOp) introduces the concept of prompt learning -- a recent trend in NLP -- to the vision domain for adapting pre-trained vision-language models. Specifically, CoOp turns context words in a prompt into a set of learnable vectors and, with only a few labeled images for learning, can achieve huge improvements over intensively-tuned manual prompts. In our study we identify a critical problem of CoOp: the learned context is not generalizable to wider unseen classes within the same dataset, suggesting that CoOp overfits base classes observed during training. To address the problem, we propose Conditional Context Optimization (CoCoOp), which extends CoOp by further learning a lightweight neural network to generate for each image an input-conditional token (vector). Compared to CoOp's static prompts, our dynamic prompts adapt to each instance and are thus less sensitive to class shift. Extensive experiments show that CoCoOp generalizes much better than CoOp to unseen classes, even showing promising transferability beyond a single dataset; and yields stronger domain generalization performance as well. Code is available at //github.com/KaiyangZhou/CoOp.

小樣本學習 · 目標檢測 · Networking · 數據集 · 情景 ·

2020 年 3 月 31 日

Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

Qi Fan,Wei Zhuo,Chi-Keung Tang,Yu-Wing Tai

from arxiv, CVPR2020 Camera Ready. (Fix Figure 3 and Table 5. More implementation details in the supplementary material.)

Conventional methods for object detection typically require a substantial amount of training data and preparing such high-quality training data is very labor-intensive. In this paper, we propose a novel few-shot object detection network that aims at detecting objects of unseen categories with only a few annotated examples. Central to our method are our Attention-RPN, Multi-Relation Detector and Contrastive Training strategy, which exploit the similarity between the few shot support set and query set to detect novel objects while suppressing false detection in the background. To train our network, we contribute a new dataset that contains 1000 categories of various objects with high-quality annotations. To the best of our knowledge, this is one of the first datasets specifically designed for few-shot object detection. Once our few-shot network is trained, it can detect objects of unseen categories without further training or fine-tuning. Our method is general and has a wide range of potential applications. We produce a new state-of-the-art performance on different datasets in the few-shot setting. The dataset link is //github.com/fanq15/Few-Shot-Object-Detection-Dataset.