成人午夜性影院视频,国产精品久久久精品无码A片闺蜜,久久久久亚洲AV成人观看,国产日韩精品一区二区在线播放,精品一区二区天堂资源

We study the problem of computing an optimal policy of an infinite-horizon discounted constrained Markov decision process (constrained MDP). Despite the popularity of Lagrangian-based policy search methods used in practice, the oscillation of policy iterates in these methods has not been fully understood, bringing out issues such as violation of constraints and sensitivity to hyper-parameters. To fill this gap, we employ the Lagrangian method to cast a constrained MDP into a constrained saddle-point problem in which max/min players correspond to primal/dual variables, respectively, and develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy. Specifically, we first propose a regularized policy gradient primal-dual (RPG-PD) method that updates the policy using an entropy-regularized policy gradient, and the dual via a quadratic-regularized gradient ascent, simultaneously. We prove that the policy primal-dual iterates of RPG-PD converge to a regularized saddle point with a sublinear rate, while the policy iterates converge sublinearly to an optimal constrained policy. We further instantiate RPG-PD in large state or action spaces by including function approximation in policy parametrization, and establish similar sublinear last-iterate policy convergence. Second, we propose an optimistic policy gradient primal-dual (OPG-PD) method that employs the optimistic gradient method to update primal/dual variables, simultaneously. We prove that the policy primal-dual iterates of OPG-PD converge to a saddle point that contains an optimal constrained policy, with a linear rate. To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.

相關內容

策略迭代

關注 0

Performer · 估計/估計量 · INFORMS · 噪聲 · 大學 ·

2023 年 8 月 10 日

Universal Performance Bounds for Joint Self-Interference Cancellation and Data Detection in Full-Duplex Communications

Meng He,Chuan Huang

This paper studies the joint digital self-interference (SI) cancellation and data detection in an orthogonal-frequency-division-multiplexing (OFDM) full-duplex (FD) system, considering the effect of phase noise introduced by the oscillators at both the local transmitter and receiver. In particular, an universal iterative two-stage joint SI cancellation and data detection framework is considered and its performance bound independent of any specific estimation and detection methods is derived. First, the channel and phase noise estimation mean square error (MSE) lower bounds in each iteration are derived by analyzing the Fisher information of the received signal. Then, by substituting the derived MSE lower bound into the SINR expression, which is related to the channel and phase noise estimation MSE, the SINR upper bound in each iteration is computed. Finally, by exploiting the SINR upper bound and the transition information of the detection errors between two adjacent iterations, the universal bit error rate (BER) lower bound for data detection is derived.

INTERACT · INFORMS · 變換 · 成對型 · Vim ·

2023 年 8 月 9 日

Joint-Relation Transformer for Multi-Person Motion Prediction

Qingyao Xu,Weibo Mao,Jingze Gong,Chenxin Xu,Siheng Chen,Weidi Xie,Ya Zhang,Yanfeng Wang

Multi-person motion prediction is a challenging problem due to the dependency of motion on both individual past movements and interactions with other people. Transformer-based methods have shown promising results on this task, but they miss the explicit relation representation between joints, such as skeleton structure and pairwise distance, which is crucial for accurate interaction modeling. In this paper, we propose the Joint-Relation Transformer, which utilizes relation information to enhance interaction modeling and improve future motion prediction. Our relation information contains the relative distance and the intra-/inter-person physical constraints. To fuse relation and joint information, we design a novel joint-relation fusion layer with relation-aware attention to update both features. Additionally, we supervise the relation information by forecasting future distance. Experiments show that our method achieves a 13.4% improvement of 900ms VIM on 3DPW-SoMoF/RC and 17.8%/12.0% improvement of 3s MPJPE on CMU-Mpcap/MuPoTS-3D dataset.

圖注意力網絡 · 聲紋識別 · Attention · Performer · 圖 ·

2023 年 8 月 9 日

Speaker Recognition Using Isomorphic Graph Attention Network Based Pooling on Self-Supervised Representation

Zirui Ge,Xinzhou Xu,Haiyan Guo,Tingting Wang,Zhen Yang

from arxiv, 9 pages, 4 figures

The emergence of self-supervised representation (i.e., wav2vec 2.0) allows speaker-recognition approaches to process spoken signals through foundation models built on speech data. Nevertheless, effective fusion on the representation requires further investigating, due to the inclusion of fixed or sub-optimal temporal pooling strategies. Despite of improved strategies considering graph learning and graph attention factors, non-injective aggregation still exists in the approaches, which may influence the performance for speaker recognition. In this regard, we propose a speaker recognition approach using Isomorphic Graph ATtention network (IsoGAT) on self-supervised representation. The proposed approach contains three modules of representation learning, graph attention, and aggregation, jointly considering learning on the self-supervised representation and the IsoGAT. Then, we perform experiments for speaker recognition tasks on VoxCeleb1\&2 datasets, with the corresponding experimental results demonstrating the recognition performance for the proposed approach, compared with existing pooling approaches on the self-supervised representation.

同態加密 · MoDELS · Learning · Machine Learning · Extensibility ·

2023 年 8 月 9 日

Communication-Efficient Search under Fully Homomorphic Encryption for Federated Machine Learning

Dongfang Zhao

Homomorphic encryption (HE) has found extensive utilization in federated learning (FL) systems, capitalizing on its dual advantages: (i) ensuring the confidentiality of shared models contributed by participating entities, and (ii) enabling algebraic operations directly on ciphertexts representing encrypted models. Particularly, the approximate fully homomorphic encryption (FHE) scheme, known as CKKS, has emerged as the de facto encryption scheme, notably supporting decimal numbers. While recent research predominantly focuses on enhancing CKKS's encryption rate and evaluation speed in the context of FL, the search operation has been relatively disregarded due to the tendency of some applications to discard intermediate encrypted models. Yet, emerging studies emphasize the importance of managing and searching intermediate models for specific applications like large-scale scientific computing, necessitating robust data provenance and auditing support. To address this, our paper introduces an innovative approach that efficiently searches for a target encrypted value, incurring only a logarithmic number of network interactions. The proposed method capitalizes on CKKS's additive and multiplicative properties on encrypted models, propagating equality comparisons between values through a balanced binary tree structure to ultimately reach a single aggregate. A comprehensive analysis of the proposed algorithm underscores its potential to significantly broaden FL's applicability and impact.

Networking · Extensibility · INTERACT · 設計 · Performer ·

2023 年 8 月 8 日

Two-Stream Regression Network for Dental Implant Position Prediction

Xinquan Yang,Xuguang Li,Xuechen Li,Wenting Chen,Linlin Shen,Xin Li,Yongqiang Deng

In implant prosthesis treatment, the design of the surgical guide heavily relies on the manual location of the implant position, which is subjective and prone to doctor's experiences. When deep learning based methods has started to be applied to address this problem, the space between teeth are various and some of them might present similar texture characteristic with the actual implant region. Both problems make a big challenge for the implant position prediction. In this paper, we develop a two-stream implant position regression framework (TSIPR), which consists of an implant region detector (IRD) and a multi-scale patch embedding regression network (MSPENet), to address this issue. For the training of IRD, we extend the original annotation to provide additional supervisory information, which contains much more rich characteristic and do not introduce extra labeling costs. A multi-scale patch embedding module is designed for the MSPENet to adaptively extract features from the images with various tooth spacing. The global-local feature interaction block is designed to build the encoder of MSPENet, which combines the transformer and convolution for enriched feature representation. During inference, the RoI mask extracted from the IRD is used to refine the prediction results of the MSPENet. Extensive experiments on a dental implant dataset through five-fold cross-validation demonstrated that the proposed TSIPR achieves superior performance than existing methods.

Learning · Agent · INTERACT · 深度強化學習 · motivation ·

2022 年 8 月 2 日

Deep Reinforcement Learning for Multi-Agent Interaction

Ibrahim H. Ahmed,Cillian Brewitt,Ignacio Carlucho,Filippos Christianos,Mhairi Dunion,Elliot Fosong,Samuel Garcin,Shangmin Guo,Balint Gyevnar,Trevor McInroe,Georgios Papoudakis,Arrasy Rahman,Lukas Sch?fer,Massimiliano Tamborski,Giuseppe Vecchio,Cheng Wang,Stefano V. Albrecht

from arxiv, Published in AI Communications Special Issue on Multi-Agent Systems Research in the UK

The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel machine learning algorithms for autonomous systems control, with a specific focus on deep reinforcement learning and multi-agent reinforcement learning. Research problems include scalable learning of coordinated agent policies and inter-agent communication; reasoning about the behaviours, goals, and composition of other agents from limited observations; and sample-efficient learning based on intrinsic motivation, curriculum learning, causal inference, and representation learning. This article provides a broad overview of the ongoing research portfolio of the group and discusses open problems for future directions.

估計/估計量 · contrastive · INFORMS · 互信息 · 表示學習 ·

2021 年 6 月 25 日

Decomposed Mutual Information Estimation for Contrastive Representation Learning

Alessandro Sordoni,Nouha Dziri,Hannes Schulz,Geoff Gordon,Phil Bachman,Remi Tachet

from arxiv, ICML 2021

Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.

圖 · Networking · INTERACT · INFORMS · 圖形處理器 ·

2020 年 11 月 25 日

Time-Series Event Prediction with Evolutionary State Graph

Wenjie Hu,Yang Yang,Ziqiang Cheng,Carl Yang,Xiang Ren

from arxiv, A long version of EvoNet (WSDM 2021)

The accurate and interpretable prediction of future events in time-series data often requires the capturing of representative patterns (or referred to as states) underpinning the observed data. To this end, most existing studies focus on the representation and recognition of states, but ignore the changing transitional relations among them. In this paper, we present evolutionary state graph, a dynamic graph structure designed to systematically represent the evolving relations (edges) among states (nodes) along time. We conduct analysis on the dynamic graphs constructed from the time-series data and show that changes on the graph structures (e.g., edges connecting certain state nodes) can inform the occurrences of events (i.e., time-series fluctuation). Inspired by this, we propose a novel graph neural network model, Evolutionary State Graph Network (EvoNet), to encode the evolutionary state graph for accurate and interpretable time-series event prediction. Specifically, Evolutionary State Graph Network models both the node-level (state-to-state) and graph-level (segment-to-segment) propagation, and captures the node-graph (state-to-segment) interactions over time. Experimental results based on five real-world datasets show that our approach not only achieves clear improvements compared with 11 baselines, but also provides more insights towards explaining the results of event predictions.

自動問答 · MoDELS · Networking · Processing（編程語言） · state-of-the-art ·

2018 年 6 月 1 日

An Interpretable Reasoning Network for Multi-Relation Question Answering

Mantong Zhou,Minlie Huang,Xiaoyan Zhu

from arxiv, COLING 2018, 13pages

Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis, thereby allowing manual manipulation in predicting the final answer.

圖片分類 · 生成式對抗網絡 · Networking · 未標記 · GANs ·

2018 年 2 月 10 日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Zilong Zhong,Jonathan Li

from arxiv, Accepted by AAAI-18

High spectral dimensionality and the shortage of annotations make hyperspectral image (HSI) classification a challenging problem. Recent studies suggest that convolutional neural networks can learn discriminative spatial features, which play a paramount role in HSI interpretation. However, most of these methods ignore the distinctive spectral-spatial characteristic of hyperspectral data. In addition, a large amount of unlabeled data remains an unexploited gold mine for efficient data use. Therefore, we proposed an integration of generative adversarial networks (GANs) and probabilistic graphical models for HSI classification. Specifically, we used a spectral-spatial generator and a discriminator to identify land cover categories of hyperspectral cubes. Moreover, to take advantage of a large amount of unlabeled data, we adopted a conditional random field to refine the preliminary classification results generated by GANs. Experimental results obtained using two commonly studied datasets demonstrate that the proposed framework achieved encouraging classification accuracy using a small number of data for training.