成人艳情一二三区按摩_欧美色欧美专区第一页_欧美亚洲综合久久久BD_2021免费国内精品在拍自线_久久这里只有精品视频66_亚洲欧美日本国产专区一_国产成年无码aⅴ片在线观看

Real-world sequential decision making is characterized by sparse rewards and large decision spaces, posing significant difficulty for experiential learning systems like $\textit{tabula rasa}$ reinforcement learning (RL) agents. Large Language Models (LLMs), with a wealth of world knowledge, can help RL agents learn quickly and adapt to distribution shifts. In this work, we introduce Language Guided Exploration (LGE) framework, which uses a pre-trained language model (called GUIDE ) to provide decision-level guidance to an RL agent (called EXPLORER). We observe that on ScienceWorld (Wang et al.,2022), a challenging text environment, LGE outperforms vanilla RL agents significantly and also outperforms other sophisticated methods like Behaviour Cloning and Text Decision Transformer.

相關內容

Agent

關注 16

標注 · SOFT · 約束 · 數據點 · Extensibility ·

2024 年 4 月 15 日

Towards Eliminating Hard Label Constraints in Gradient Inversion Attacks

Yanbo Wang,Jian Liang,Ran He

from arxiv, ICLR2024 poster

Gradient inversion attacks aim to reconstruct local training data from intermediate gradients exposed in the federated learning framework. Despite successful attacks, all previous methods, starting from reconstructing a single data point and then relaxing the single-image limit to batch level, are only tested under hard label constraints. Even for single-image reconstruction, we still lack an analysis-based algorithm to recover augmented soft labels. In this work, we change the focus from enlarging batchsize to investigating the hard label constraints, considering a more realistic circumstance where label smoothing and mixup techniques are used in the training process. In particular, we are the first to initiate a novel algorithm to simultaneously recover the ground-truth augmented label and the input feature of the last fully-connected layer from single-input gradients, and provide a necessary condition for any analytical-based label recovery methods. Extensive experiments testify to the label recovery accuracy, as well as the benefits to the following image reconstruction. We believe soft labels in classification tasks are worth further attention in gradient inversion attacks.

MoDELS · 推斷 · 在線 · 蒙特卡羅 · Markov ·

2024 年 4 月 12 日

A State-Space Perspective on Modelling and Inference for Online Skill Rating

Samuel Duffield,Samuel Power,Lorenzo Rimella

We summarise popular methods used for skill rating in competitive sports, along with their inferential paradigms and introduce new approaches based on sequential Monte Carlo and discrete hidden Markov models. We advocate for a state-space model perspective, wherein players' skills are represented as time-varying, and match results serve as observed quantities. We explore the steps to construct the model and the three stages of inference: filtering, smoothing and parameter estimation. We examine the challenges of scaling up to numerous players and matches, highlighting the main approximations and reductions which facilitate statistical and computational efficiency. We additionally compare approaches in a realistic experimental pipeline that can be easily reproduced and extended with our open-source Python package, //github.com/SamDuffield/abile.

PageRank · 圖 · GPU · Processing（編程語言） · 劃分 ·

2024 年 4 月 12 日

Efficient GPU Implementation of Static and Incrementally Expanding DF-P PageRank for Dynamic Graphs

Subhajit Sahu

from arxiv, 23 pages, 13 figures, 4 tables

PageRank is a widely used centrality measure that "ranks" vertices in a graph by considering the connections and their importance. In this report, we first introduce one of the most efficient GPU implementations of Static PageRank, which recomputes PageRank scores from scratch. It uses a synchronous pull-based atomics-free PageRank computation, with the low and high in-degree vertices being partitioned and processed by two separate kernels. Next, we present our GPU implementation of incrementally expanding (and contracting) Dynamic Frontier with Pruning (DF-P) PageRank, which processes only a subset of vertices likely to change ranks. It is based on Static PageRank, and uses an additional partitioning between low and high out-degree vertices for incremental expansion of the set of affected vertices with two additional kernels. On a server with an NVIDIA A100 GPU, our Static PageRank outperforms Hornet and Gunrock's PageRank implementations by 31x and 5.9x respectively. On top of the above, DF-P PageRank outperforms Static PageRank by 2.1x on real-world dynamic graphs, and by 3.1x on large static graphs with random batch updates.

可交換的 · 情景 · 平穩的 · 確切的 · 推斷 ·

2024 年 4 月 10 日

Least Squares-Based Permutation Tests in Time Series

Joseph P. Romano,Marius A. Tirlea

from arxiv, 30 pages

This paper studies permutation tests for regression parameters in a time series setting, where the time series is assumed stationary but may exhibit an arbitrary (but weak) dependence structure. In such a setting, it is perhaps surprising that permutation tests can offer any type of inference guarantees, since permuting of covariates can destroy its relationship with the response. Indeed, the fundamental assumption of exchangeability of errors required for the finite-sample exactness of permutation tests, can easily fail. However, we show that permutation tests may be constructed which are asymptotically valid for a wide class of stationary processes, but remain exact when exchangeability holds. We also consider the problem of testing for no monotone trend and we construct asymptotically valid permutation tests in this setting as well.

Branch · 剪枝 · 點云 · Performer · 機器人 ·

2024 年 4 月 9 日

3D Branch Point Cloud Completion for Robotic Pruning in Apple Orchards

Tian Qiu,Alan Zoubi,Nikolai Spine,Lailiang Cheng,Yu Jiang

from arxiv, Submitted to IROS2024

Robotic branch pruning is a significantly growing research area to cope with the shortage of labor force in the context of agriculture. One fundamental requirement in robotic pruning is the perception of detailed geometry and topology of branches. However, the point clouds obtained in agricultural settings often exhibit incompleteness due to several constraints, thereby restricting the accuracy of downstream robotic pruning. In this work, we addressed the issue of point cloud quality through a simulation-based deep neural network, leveraging a Real-to-Simulation (Real2Sim) data generation pipeline that not only eliminates the need for manual parameterization but also guarantees the realism of simulated data. The simulation-based neural network was applied to jointly perform point cloud completion and skeletonization on real-world partial branches, without additional real-world training. The Sim2Real qualitative completion and skeletonization results showed the model's remarkable capability for geometry reconstruction and topology prediction. Additionally, we quantitatively evaluated the Sim2Real performance by comparing branch-level trait characterization errors using raw incomplete data and complete data. The Mean Absolute Error (MAE) reduced by 75% and 8% for branch diameter and branch angle estimation, respectively, using the best complete data, which indicates the effectiveness of the Real2Sim data in a zero-shot generalization setting. The characterization improvements contributed to the precision and efficacy of robotic branch pruning.

Learning · SSL · 特化 · 推薦系統 · 監督 ·

2024 年 4 月 4 日

A Comprehensive Survey on Self-Supervised Learning for Recommendation

Xubin Ren,Wei Wei,Lianghao Xia,Chao Huang

Recommender systems play a crucial role in tackling the challenge of information overload by delivering personalized recommendations based on individual user preferences. Deep learning techniques, such as RNNs, GNNs, and Transformer architectures, have significantly propelled the advancement of recommender systems by enhancing their comprehension of user behaviors and preferences. However, supervised learning methods encounter challenges in real-life scenarios due to data sparsity, resulting in limitations in their ability to learn representations effectively. To address this, self-supervised learning (SSL) techniques have emerged as a solution, leveraging inherent data structures to generate supervision signals without relying solely on labeled data. By leveraging unlabeled data and extracting meaningful representations, recommender systems utilizing SSL can make accurate predictions and recommendations even when confronted with data sparsity. In this paper, we provide a comprehensive review of self-supervised learning frameworks designed for recommender systems, encompassing a thorough analysis of over 170 papers. We conduct an exploration of nine distinct scenarios, enabling a comprehensive understanding of SSL-enhanced recommenders in different contexts. For each domain, we elaborate on different self-supervised learning paradigms, namely contrastive learning, generative learning, and adversarial learning, so as to present technical details of how SSL enhances recommender systems in various contexts. We consistently maintain the related open-source materials at //github.com/HKUDS/Awesome-SSLRec-Papers.

對象識別 · 解碼 · 模型評估 · Attention · Learning ·

2024 年 4 月 4 日

Decoding Natural Images from EEG for Object Recognition

Yonghao Song,Bingchuan Liu,Xiang Li,Nanlin Shi,Yijun Wang,Xiaorong Gao

from arxiv, ICLR, 2024

Electroencephalography (EEG) signals, known for convenient non-invasive acquisition but low signal-to-noise ratio, have recently gained substantial attention due to the potential to decode natural images. This paper presents a self-supervised framework to demonstrate the feasibility of learning image representations from EEG signals, particularly for object recognition. The framework utilizes image and EEG encoders to extract features from paired image stimuli and EEG responses. Contrastive learning aligns these two modalities by constraining their similarity. With the framework, we attain significantly above-chance results on a comprehensive EEG-image dataset, achieving a top-1 accuracy of 15.6% and a top-5 accuracy of 42.8% in challenging 200-way zero-shot tasks. Moreover, we perform extensive experiments to explore the biological plausibility by resolving the temporal, spatial, spectral, and semantic aspects of EEG signals. Besides, we introduce attention modules to capture spatial correlations, providing implicit evidence of the brain activity perceived from EEG data. These findings yield valuable insights for neural decoding and brain-computer interfaces in real-world scenarios. The code will be released on //github.com/eeyhsong/NICE-EEG.

MoDELS · Performer · 語言模型化 · 大語言模型 · Notability ·

2024 年 4 月 3 日

Investigating Data Contamination in Modern Benchmarks for Large Language Models

Chunyuan Deng,Yilun Zhao,Xiangru Tang,Mark Gerstein,Arman Cohan

from arxiv, NAACL 2024 Version

Recent observations have underscored a disparity between the inflated benchmark scores and the actual performance of LLMs, raising concerns about potential contamination of evaluation benchmarks. This issue is especially critical for closed-source models and certain open-source models where training data transparency is lacking. In this paper we study data contamination by proposing two methods tailored for both open-source and proprietary LLMs. We first introduce a retrieval-based system to explore potential overlaps between evaluation benchmarks and pretraining corpora. We further present a novel investigation protocol named \textbf{T}estset \textbf{S}lot Guessing (\textit{TS-Guessing}), applicable to both open and proprietary models. This approach entails masking a wrong answer in a multiple-choice question and prompting the model to fill in the gap. Additionally, it involves obscuring an unlikely word in an evaluation example and asking the model to produce it. We find that certain commercial LLMs could surprisingly guess the missing option in various test sets. Specifically, in the TruthfulQA benchmark, we find that LLMs exhibit notable performance improvement when provided with additional metadata in the benchmark. Further, in the MMLU benchmark, ChatGPT and GPT-4 demonstrated an exact match rate of 52\% and 57\%, respectively, in guessing the missing options in benchmark test data. We hope these results underscore the need for more robust evaluation methodologies and benchmarks in the field.

INFORMS · 平滑 · 可約的 · 泛函 · 優化器 ·

2024 年 3 月 31 日

Smooth Information Gathering in Two-Player Noncooperative Games

Fernando Palafox,Jesse Milzman,Dong Ho Lee,Ryan Park,David Fridovich-Keil

from arxiv, //github.com/CLeARoboticsLab/GamesVoI.jl

We present a mathematical framework for modeling two-player noncooperative games in which one player (the defender) is uncertain of the costs of the game and the second player's (the attacker's) intention but can preemptively allocate information-gathering resources to reduce this uncertainty. We obtain the defender's decisions by solving a two-stage problem. In Stage 1, the defender allocates information-gathering resources, and in Stage 2, the information-gathering resources output a signal that informs the defender about the costs of the game and the attacker's intent, and then both players play a noncooperative game. We provide a gradient-based algorithm to solve the two-stage game and apply this framework to a tower-defense game which can be interpreted as a variant of a Colonel Blotto game with smooth payoff functions and uncertainty over battlefield valuations. Finally, we analyze how optimal decisions shift with changes in information-gathering allocations and perturbations in the cost functions.

多樣性 · 學成 · state-of-the-art · MoDELS · 張成子空間 ·

2021 年 3 月 14 日

Modelling Behavioural Diversity for Learning in Open-Ended Games

Nicolas Perez Nieves,Yaodong Yang,Oliver Slumbers,David Henry Mguni,Jun Wang

from arxiv, corresponds to <[email protected]>

Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on \emph{determinantal point processes} (DPP). By incorporating the diversity metric into best-response dynamics, we develop \emph{diverse fictitious play} and \emph{diverse policy-space response oracle} for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the \emph{gamescape} -- convex polytopes spanned by agents' mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve much lower exploitability than state-of-the-art solvers by finding effective and diverse strategies.