亚洲黄色网站不卡免费_欧美丰满美乳XXⅩ高潮WWW_国产日韩欧美亚洲综合专区_91国语精品自产拍在线_亚洲不卡在线精品国产高清_日韩精品综合_亚洲不卡一区二区三区

Stochastic Approximation (SA) is a widely used algorithmic approach in various fields, including optimization and reinforcement learning (RL). Among RL algorithms, Q-learning is particularly popular due to its empirical success. In this paper, we study asynchronous Q-learning with constant stepsize, which is commonly used in practice for its fast convergence. By connecting the constant stepsize Q-learning to a time-homogeneous Markov chain, we show the distributional convergence of the iterates in Wasserstein distance and establish its exponential convergence rate. We also establish a Central Limit Theory for Q-learning iterates, demonstrating the asymptotic normality of the averaged iterates. Moreover, we provide an explicit expansion of the asymptotic bias of the averaged iterate in stepsize. Specifically, the bias is proportional to the stepsize up to higher-order terms and we provide an explicit expression for the linear coefficient. This precise characterization of the bias allows the application of Richardson-Romberg (RR) extrapolation technique to construct a new estimate that is provably closer to the optimal Q function. Numerical results corroborate our theoretical finding on the improvement of the RR extrapolation method.

相關內容

有偏

關注 0

多峰值 · SOTA · Extensibility · INFORMS · state-of-the-art ·

2024 年 3 月 7 日

RED-DOT: Multimodal Fact-checking via Relevant Evidence Detection

Stefanos-Iordanis Papadopoulos,Christos Koutlis,Symeon Papadopoulos,Panagiotis C. Petrantonakis

Online misinformation is often multimodal in nature, i.e., it is caused by misleading associations between texts and accompanying images. To support the fact-checking process, researchers have been recently developing automatic multimodal methods that gather and analyze external information, evidence, related to the image-text pairs under examination. However, prior works assumed all external information collected from the web to be relevant. In this study, we introduce a "Relevant Evidence Detection" (RED) module to discern whether each piece of evidence is relevant, to support or refute the claim. Specifically, we develop the "Relevant Evidence Detection Directed Transformer" (RED-DOT) and explore multiple architectural variants (e.g., single or dual-stage) and mechanisms (e.g., "guided attention"). Extensive ablation and comparative experiments demonstrate that RED-DOT achieves significant improvements over the state-of-the-art (SotA) on the VERITE benchmark by up to 33.7%. Furthermore, our evidence re-ranking and element-wise modality fusion led to RED-DOT surpassing the SotA on NewsCLIPings+ by up to 3% without the need for numerous evidence or multiple backbone encoders. We release our code at: //github.com/stevejpapad/relevant-evidence-detection

近似 · 離散化 · 正交 · CASE · 稀疏 ·

2024 年 3 月 6 日

The Lanczos Tau Framework for Time-Delay Systems: Padé Approximation and Collocation Revisited

Evert Provoost,Wim Michiels

from arxiv, 19 pages, 4 figures

We reformulate the Lanczos tau method for the discretization of time-delay systems in terms of a pencil of operators, allowing for new insights into this approach. As a first main result, we show that, for the choice of a shifted Legendre basis, this method is equivalent to Pad\'e approximation in the frequency domain. We illustrate that Lanczos tau methods straightforwardly give rise to sparse, self nesting discretizations. Equivalence is also demonstrated with pseudospectral collocation, where the non-zero collocation points are chosen as the zeroes of orthogonal polynomials. The importance of such a choice manifests itself in the approximation of the $H^2$-norm, where, under mild conditions, super-geometric convergence is observed and, for a special case, super convergence is proved; both significantly faster than the algebraic convergence reported in previous work.

回合 · Networking · Learning · 深度強化學習 · 強化學習 ·

2024 年 3 月 6 日

Deep Reinforcement Learning with Task-Adaptive Retrieval via Hypernetwork

Yonggang Jin,Chenxu Wang,Tianyu Zheng,Liuyu Xiang,Yaodong Yang,Junge Zhang,Jie Fu,Zhaofeng He

Deep reinforcement learning algorithms are usually impeded by sampling inefficiency, heavily depending on multiple interactions with the environment to acquire accurate decision-making capabilities. In contrast, humans rely on their hippocampus to retrieve relevant information from past experiences of relevant tasks, which guides their decision-making when learning a new task, rather than exclusively depending on environmental interactions. Nevertheless, designing a hippocampus-like module for an agent to incorporate past experiences into established reinforcement learning algorithms presents two challenges. The first challenge involves selecting the most relevant past experiences for the current task, and the second challenge is integrating such experiences into the decision network. To address these challenges, we propose a novel method that utilizes a retrieval network based on task-conditioned hypernetwork, which adapts the retrieval network's parameters depending on the task. At the same time, a dynamic modification mechanism enhances the collaborative efforts between the retrieval and decision networks. We evaluate the proposed method across various tasks within a multitask scenario in the Minigrid environment. The experimental results demonstrate that our proposed method significantly outperforms strong baselines.

邊 · MoDELS · INTERACT · 損失函數（機器學習） · HTTPS ·

2024 年 3 月 6 日

RouteExplainer: An Explanation Framework for Vehicle Routing Problem

Daisuke Kikuta,Hiroki Ikeuchi,Kengo Tajiri,Yuusuke Nakano

from arxiv, Accepted at PAKDD 2024. This extended version includes more comprehensive explanations and appendices

The Vehicle Routing Problem (VRP) is a widely studied combinatorial optimization problem and has been applied to various practical problems. While the explainability for VRP is significant for improving the reliability and interactivity in practical VRP applications, it remains unexplored. In this paper, we propose RouteExplainer, a post-hoc explanation framework that explains the influence of each edge in a generated route. Our framework realizes this by rethinking a route as the sequence of actions and extending counterfactual explanations based on the action influence model to VRP. To enhance the explanation, we additionally propose an edge classifier that infers the intentions of each edge, a loss function to train the edge classifier, and explanation-text generation by Large Language Models (LLMs). We quantitatively evaluate our edge classifier on four different VRPs. The results demonstrate its rapid computation while maintaining reasonable accuracy, thereby highlighting its potential for deployment in practical applications. Moreover, on the subject of a tourist route, we qualitatively evaluate explanations generated by our framework. This evaluation not only validates our framework but also shows the synergy between explanation frameworks and LLMs. See //ntt-dkiku.github.io/xai-vrp for our code, datasets, models, and demo.

Performer · 分離的 · Reverberation · 位置編碼 · 層 ·

2024 年 3 月 6 日

CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single- and Multi-Channel Speaker Separation

Vahid Ahmadi Kalkhorani,DeLiang Wang

from arxiv, 9 pages

We introduce CrossNet, a complex spectral mapping approach to speaker separation and enhancement in reverberant and noisy conditions. The proposed architecture comprises an encoder layer, a global multi-head self-attention module, a cross-band module, a narrow-band module, and an output layer. CrossNet captures global, cross-band, and narrow-band correlations in the time-frequency domain. To address performance degradation in long utterances, we introduce a random chunk positional encoding. Experimental results on multiple datasets demonstrate the effectiveness and robustness of CrossNet, achieving state-of-the-art performance in tasks including reverberant and noisy-reverberant speaker separation. Furthermore, CrossNet exhibits faster and more stable training in comparison to recent baselines. Additionally, CrossNet's high performance extends to multi-microphone conditions, demonstrating its versatility in various acoustic scenarios.

優化器 · 動量 · 情景 · 方差減小 · 隨機梯度下降 ·

2024 年 3 月 5 日

Non-Convex Stochastic Composite Optimization with Polyak Momentum

Yuan Gao,Anton Rodomanov,Sebastian U. Stich

The stochastic proximal gradient method is a powerful generalization of the widely used stochastic gradient descent (SGD) method and has found numerous applications in Machine Learning. However, it is notoriously known that this method fails to converge in non-convex settings where the stochastic noise is significant (i.e. when only small or bounded batch sizes are used). In this paper, we focus on the stochastic proximal gradient method with Polyak momentum. We prove this method attains an optimal convergence rate for non-convex composite optimization problems, regardless of batch size. Additionally, we rigorously analyze the variance reduction effect of the Polyak momentum in the composite optimization setting and we show the method also converges when the proximal step can only be solved inexactly. Finally, we provide numerical experiments to validate our theoretical results.

prototype · 噪聲 · 可辨認的 · Performer · 圖像分割 ·

2024 年 3 月 5 日

ProCNS: Progressive Prototype Calibration and Noise Suppression for Weakly-Supervised Medical Image Segmentation

Y. Liu,L. Lin,K. K. Y. Wong,X. Tang

Weakly-supervised segmentation (WSS) has emerged as a solution to mitigate the conflict between annotation cost and model performance by adopting sparse annotation formats (e.g., point, scribble, block, etc.). Typical approaches attempt to exploit anatomy and topology priors to directly expand sparse annotations into pseudo-labels. However, due to a lack of attention to the ambiguous edges in medical images and insufficient exploration of sparse supervision, existing approaches tend to generate erroneous and overconfident pseudo proposals in noisy regions, leading to cumulative model error and performance degradation. In this work, we propose a novel WSS approach, named ProCNS, encompassing two synergistic modules devised with the principles of progressive prototype calibration and noise suppression. Specifically, we design a Prototype-based Regional Spatial Affinity (PRSA) loss to maximize the pair-wise affinities between spatial and semantic elements, providing our model of interest with more reliable guidance. The affinities are derived from the input images and the prototype-refined predictions. Meanwhile, we propose an Adaptive Noise Perception and Masking (ANPM) module to obtain more enriched and representative prototype representations, which adaptively identifies and masks noisy regions within the pseudo proposals, reducing potential erroneous interference during prototype computation. Furthermore, we generate specialized soft pseudo-labels for the noisy regions identified by ANPM, providing supplementary supervision. Extensive experiments on three medical image segmentation tasks involving different modalities demonstrate that the proposed framework significantly outperforms representative state-of-the-art methods

Performer · 表示 · state-of-the-art · 向量化 · Performance ·

2024 年 3 月 4 日

DaReNeRF: Direction-aware Representation for Dynamic Scenes

Ange Lou,Benjamin Planche,Zhongpai Gao,Yamin Li,Tianyu Luan,Hao Ding,Terrence Chen,Jack Noble,Ziyan Wu

from arxiv, Accepted at CVPR 2024. Paper + supplementary material

Addressing the intricate challenge of modeling and re-rendering dynamic scenes, most recent approaches have sought to simplify these complexities using plane-based explicit representations, overcoming the slow training time issues associated with methods like Neural Radiance Fields (NeRF) and implicit representations. However, the straightforward decomposition of 4D dynamic scenes into multiple 2D plane-based representations proves insufficient for re-rendering high-fidelity scenes with complex motions. In response, we present a novel direction-aware representation (DaRe) approach that captures scene dynamics from six different directions. This learned representation undergoes an inverse dual-tree complex wavelet transformation (DTCWT) to recover plane-based information. DaReNeRF computes features for each space-time point by fusing vectors from these recovered planes. Combining DaReNeRF with a tiny MLP for color regression and leveraging volume rendering in training yield state-of-the-art performance in novel view synthesis for complex dynamic scenes. Notably, to address redundancy introduced by the six real and six imaginary direction-aware wavelet coefficients, we introduce a trainable masking approach, mitigating storage issues without significant performance decline. Moreover, DaReNeRF maintains a 2x reduction in training time compared to prior art while delivering superior performance.

Learning · Processing（編程語言） · Machine Learning · 評論員 · Vision ·

2022 年 6 月 30 日

Causal Machine Learning: A Survey and Open Problems

Jean Kaddour,Aengus Lynch,Qi Liu,Matt J. Kusner,Ricardo Silva

Causal Machine Learning (CausalML) is an umbrella term for machine learning methods that formalize the data-generation process as a structural causal model (SCM). This allows one to reason about the effects of changes to this process (i.e., interventions) and what would have happened in hindsight (i.e., counterfactuals). We categorize work in \causalml into five groups according to the problems they tackle: (1) causal supervised learning, (2) causal generative modeling, (3) causal explanations, (4) causal fairness, (5) causal reinforcement learning. For each category, we systematically compare its methods and point out open problems. Further, we review modality-specific applications in computer vision, natural language processing, and graph representation learning. Finally, we provide an overview of causal benchmarks and a critical discussion of the state of this nascent field, including recommendations for future work.

簇 · 無監督 · Re-ID · MoDELS · Extensibility ·

2021 年 12 月 3 日

Mind Your Clever Neighbours: Unsupervised Person Re-identification via Adaptive Clustering Relationship Modeling

Lianjie Jia,Chenyang Yu,Xiehao Ye,Tianyu Yan,Yinjie Lei,Pingping Zhang

from arxiv, This work has been accepted by AAAI-2022. Some modifications may be performed for the final version

Unsupervised person re-identification (Re-ID) attracts increasing attention due to its potential to resolve the scalability problem of supervised Re-ID models. Most existing unsupervised methods adopt an iterative clustering mechanism, where the network was trained based on pseudo labels generated by unsupervised clustering. However, clustering errors are inevitable. To generate high-quality pseudo-labels and mitigate the impact of clustering errors, we propose a novel clustering relationship modeling framework for unsupervised person Re-ID. Specifically, before clustering, the relation between unlabeled images is explored based on a graph correlation learning (GCL) module and the refined features are then used for clustering to generate high-quality pseudo-labels.Thus, GCL adaptively mines the relationship between samples in a mini-batch to reduce the impact of abnormal clustering when training. To train the network more effectively, we further propose a selective contrastive learning (SCL) method with a selective memory bank update policy. Extensive experiments demonstrate that our method shows much better results than most state-of-the-art unsupervised methods on Market1501, DukeMTMC-reID and MSMT17 datasets. We will release the code for model reproduction.