一级a视频免费一区二区_久久人人爽人人爽人人片69AV_G高清精品一区二区三区_好吊色综合中文字幕二区_欧美伊人久久综合热线大杳蕉_免费人成网址在线观看国内_丁香婷婷五月天激情综合网

Program synthesis aims to create accurate, executable programs from problem specifications, specifically from natural language descriptions in our context. Recent studies have leveraged the power of reinforcement learning (RL) in conjunction with large language models (LLMs), significantly enhancing code generation capabilities. The application of RL focuses on directly optimizing for functional correctness, offering an advantage over conventional supervised methods. Despite policy-based RL methods dominating the literature on RL for program synthesis, the nature of program synthesis tasks hints at a natural alignment with value-based methods. This stems from the rich collection of off-policy programs, including those developed by human programmers and also historical samples, coupled with the straightforward verification of generated programs through automated unit testing, meaning rewards are easy to obtain. Diverging from the dominant use of policy-based algorithms, our work explores the feasibility of value-based approaches, leading to the development of our $\mathcal{B}$-Coder (pronounced Bellman coder). Yet, training value-based methods presents challenges due to the enormous search space inherent to program synthesis. To this end, we introduce an initialization protocol for RL agents utilizing pre-trained LMs and a conservative Bellman operator to reduce training complexities. Moreover, we demonstrate how to leverage the learned value functions as a dual strategy to post-process generated programs. Our empirical evaluations demonstrated $\mathcal{B}$-Coder's capability in achieving state-of-the-art performance when compared to policy-based methods. Remarkably, this achievement is reached with minimal reward engineering effort, highlighting the effectiveness of value-based RL, independent of reward designs.

相關內容

Learning

關注 12

經驗風險 · 風險函數 · 散度 · 泛函 · 穩健性 ·

2024 年 5 月 1 日

Robust Semi-supervised Learning via $f$-Divergence and $α$-Rényi Divergence

Gholamali Aminian,Amirhossien Bagheri,Mahyar JafariNodeh,Radmehr Karimian,Mohammad-Hossein Yassaee

from arxiv, Accepted in ISIT 2024

This paper investigates a range of empirical risk functions and regularization methods suitable for self-training methods in semi-supervised learning. These approaches draw inspiration from various divergence measures, such as $f$-divergences and $\alpha$-R\'enyi divergences. Inspired by the theoretical foundations rooted in divergences, i.e., $f$-divergences and $\alpha$-R\'enyi divergence, we also provide valuable insights to enhance the understanding of our empirical risk functions and regularization techniques. In the pseudo-labeling and entropy minimization techniques as self-training methods for effective semi-supervised learning, the self-training process has some inherent mismatch between the true label and pseudo-label (noisy pseudo-labels) and some of our empirical risk functions are robust, concerning noisy pseudo-labels. Under some conditions, our empirical risk functions demonstrate better performance when compared to traditional self-training methods.

平滑 · MoDELS · 正則化項 · 語言模型化 · Networking ·

2024 年 4 月 30 日

The Role of $n$-gram Smoothing in the Age of Neural Networks

Luca Malagutti,Andrius Buinovskij,Anej Svete,Clara Meister,Afra Amini,Ryan Cotterell

from arxiv, NAACL 2024

For nearly three decades, language models derived from the $n$-gram assumption held the state of the art on the task. The key to their success lay in the application of various smoothing techniques that served to combat overfitting. However, when neural language models toppled $n$-gram models as the best performers, $n$-gram smoothing techniques became less relevant. Indeed, it would hardly be an understatement to suggest that the line of inquiry into $n$-gram smoothing techniques became dormant. This paper re-opens the role classical $n$-gram smoothing techniques may play in the age of neural language models. First, we draw a formal equivalence between label smoothing, a popular regularization technique for neural language models, and add-$\lambda$ smoothing. Second, we derive a generalized framework for converting any $n$-gram smoothing technique into a regularizer compatible with neural language models. Our empirical results find that our novel regularizers are comparable to and, indeed, sometimes outperform label smoothing on language modeling and machine translation.

Shuffle · MoDELS · 噪聲 · Performer · state-of-the-art ·

2024 年 4 月 29 日

Tight Differential Privacy Guarantees for the Shuffle Model with $k$-Randomized Response

Sayan Biswas,Kangsoo Jung,Catuscia Palamidessi

Most differentially private (DP) algorithms assume a central model in which a reliable third party inserts noise to queries made on datasets, or a local model where the users locally perturb their data. However, the central model is vulnerable via a single point of failure, and in the local model, the utility of the data deteriorates significantly. The recently proposed shuffle model is an intermediate framework between the central and the local paradigms where the users send their locally privatized data to a server where messages are shuffled, effacing the link between a privatized message and the corresponding user, giving a better trade-off between privacy and utility than the local model, as its privacy gets amplified without adding more noise. In this paper, we theoretically derive the strictest known bound for DP guarantee for the shuffle models with $k$-Randomized Response local randomizers. There on, we focus on the utility of the shuffle model for histogram queries. Leveraging on the matrix inversion method, which is used to approximate the original distribution from the empirical one produced by the $k$-RR mechanism, we de-noise the histogram produced by the shuffle model to evaluate the total variation distance of the resulting histogram from the true one, which we regard as the measure of utility of the privacy mechanism. We perform experiments on both synthetic and real data to compare the privacy-utility trade-off of the shuffle model with that of the central one privatized by adding the state-of-the-art Gaussian noise to each bin. Although the experimental results stay consistent with the literature that favour the central model, we see that, the difference in statistical utilities between the central and the shuffle models is very small, showing that they are almost comparable under the same level of DP.

帶符號距離 · 規范化的 · 近似誤差 · 值域 · 近似 ·

2024 年 4 月 29 日

N$^{3}$-Mapping: Normal Guided Neural Non-Projective Signed Distance Fields for Large-scale 3D Mapping

Shuangfu Song,Junqiao Zhao,Kai Huang,Jiaye Lin,Chen Ye,Tiantian Feng

from arxiv, 8 pages, 10 figures. Accepted by RAL2024

Accurate and dense mapping in large-scale environments is essential for various robot applications. Recently, implicit neural signed distance fields (SDFs) have shown promising advances in this task. However, most existing approaches employ projective distances from range data as SDF supervision, introducing approximation errors and thus degrading the mapping quality. To address this problem, we introduce N$^{3}$-Mapping, an implicit neural mapping system featuring normal-guided neural non-projective signed distance fields. Specifically, we directly sample points along the surface normal, instead of the ray, to obtain more accurate non-projective distance values from range data. Then these distance values are used as supervision to train the implicit map. For large-scale mapping, we apply a voxel-oriented sliding window mechanism to alleviate the forgetting issue with a bounded memory footprint. Besides, considering the uneven distribution of measured point clouds, a hierarchical sampling strategy is designed to improve training efficiency. Experiments demonstrate that our method effectively mitigates SDF approximation errors and achieves state-of-the-art mapping quality compared to existing approaches.

向量化 · 近似 · 路徑 · 代價 · GROUP ·

2024 年 4 月 26 日

Approximation Algorithms for $\ell_p$-Shortest Path and $\ell_p$-Group Steiner Tree

Yury Makarychev,Max Ovsiankin,Erasmo Tani

We present polylogarithmic approximation algorithms for variants of the Shortest Path, Group Steiner Tree, and Group ATSP problems with vector costs. In these problems, each edge e has a non-negative vector cost $c_e \in \mathbb{R}^{\ell}_{\ge 0}$. For a feasible solution - a path, subtree, or tour (respectively) - we find the total vector cost of all the edges in the solution and then compute the $\ell_p$-norm of the obtained cost vector (we assume that $p \ge 1$ is an integer). Our algorithms for series-parallel graphs run in polynomial time and those for arbitrary graphs run in quasi-polynomial time. To obtain our results, we introduce and use new flow-based Sum-of-Squares relaxations. We also obtain a number of hardness results.

目標領域 · 極大 · 可約的 · 類別 · 源領域 ·

2024 年 4 月 26 日

Adversarial Reweighting with $α$-Power Maximization for Domain Adaptation

Xiang Gu,Xi Yu,Yan Yang,Jian Sun,Zongben Xu

from arxiv, To appear in IJCV

The practical Domain Adaptation (DA) tasks, e.g., Partial DA (PDA), open-set DA, universal DA, and test-time adaptation, have gained increasing attention in the machine learning community. In this paper, we propose a novel approach, dubbed Adversarial Reweighting with $\alpha$-Power Maximization (ARPM), for PDA where the source domain contains private classes absent in target domain. In ARPM, we propose a novel adversarial reweighting model that adversarially learns to reweight source domain data to identify source-private class samples by assigning smaller weights to them, for mitigating potential negative transfer. Based on the adversarial reweighting, we train the transferable recognition model on the reweighted source distribution to be able to classify common class data. To reduce the prediction uncertainty of the recognition model on the target domain for PDA, we present an $\alpha$-power maximization mechanism in ARPM, which enriches the family of losses for reducing the prediction uncertainty for PDA. Extensive experimental results on five PDA benchmarks, i.e., Office-31, Office-Home, VisDA-2017, ImageNet-Caltech, and DomainNet, show that our method is superior to recent PDA methods. Ablation studies also confirm the effectiveness of components in our approach. To theoretically analyze our method, we deduce an upper bound of target domain expected error for PDA, which is approximately minimized in our approach. We further extend ARPM to open-set DA, universal DA, and test time adaptation, and verify the usefulness through experiments.

泛函 · 在線 · INFORMS · 全 · 賭博機/老虎機 ·

2024 年 4 月 26 日

Online $\mathrm{L}^{\natural}$-Convex Minimization

Ken Yokoyama,Shinji Ito,Tatsuya Matsuoka,Kei Kimura,Makoto Yokoo

An online decision-making problem is a learning problem in which a player repeatedly makes decisions in order to minimize the long-term loss. These problems that emerge in applications often have nonlinear combinatorial objective functions, and developing algorithms for such problems has attracted considerable attention. An existing general framework for dealing with such objective functions is the online submodular minimization. However, practical problems are often out of the scope of this framework, since the domain of a submodular function is limited to a subset of the unit hypercube. To manage this limitation of the existing framework, we in this paper introduce the online $\mathrm{L}^{\natural}$-convex minimization, where an $\mathrm{L}^{\natural}$-convex function generalizes a submodular function so that the domain is a subset of the integer lattice. We propose computationally efficient algorithms for the online $\mathrm{L}^{\natural}$-convex function minimization in two major settings: the full information and the bandit settings. We analyze the regrets of these algorithms and show in particular that our algorithm for the full information setting obtains a tight regret bound up to a constant factor. We also demonstrate several motivating examples that illustrate the usefulness of the online $\mathrm{L}^{\natural}$-convex minimization.

相互獨立的 · 圖 · 查準率/準確率 · 泛化理論 · 團 ·

2024 年 4 月 26 日

Solving NP-hard Problems on \textsc{GaTEx} Graphs: Linear-Time Algorithms for Perfect Orderings, Cliques, Colorings, and Independent Sets

Marc Hellmuth,Guillaume E. Scholz

The class of $\mathsf{Ga}$lled-$\mathsf{T}$ree $\mathsf{Ex}$plainable ($\mathsf{GaTEx}$) graphs has recently been discovered as a natural generalization of cographs. Cographs are precisely those graphs that can be uniquely represented by a rooted tree where the leaves correspond to the vertices of the graph. As a generalization, $\mathsf{GaTEx}$ graphs are precisely those that can be uniquely represented by a particular rooted acyclic network, called a galled-tree. This paper explores the use of galled-trees to solve combinatorial problems on $\mathsf{GaTEx}$ graphs that are, in general, NP-hard. We demonstrate that finding a maximum clique, an optimal vertex coloring, a perfect order, as well as a maximum independent set in $\mathsf{GaTEx}$ graphs can be efficiently done in linear time. The key idea behind the linear-time algorithms is to utilize the galled-trees that explain the $\mathsf{GaTEx}$ graphs as a guide for computing the respective cliques, colorings, perfect orders, or independent sets.

Analysis · 可理解性 · contrastive · Performer · 輸入分布 ·

2024 年 4 月 26 日

Towards Understanding In-Context Learning with Contrastive Demonstrations and Saliency Maps

Fuxiao Liu,Paiheng Xu,Zongxia Li,Yue Feng,Hyemi Song

from arxiv, 10 pages, 5 figures

We investigate the role of various demonstration components in the in-context learning (ICL) performance of large language models (LLMs). Specifically, we explore the impacts of ground-truth labels, input distribution, and complementary explanations, particularly when these are altered or perturbed. We build on previous work, which offers mixed findings on how these elements influence ICL. To probe these questions, we employ explainable NLP (XNLP) methods and utilize saliency maps of contrastive demonstrations for both qualitative and quantitative analysis. Our findings reveal that flipping ground-truth labels significantly affects the saliency, though it's more noticeable in larger LLMs. Our analysis of the input distribution at a granular level reveals that changing sentiment-indicative terms in a sentiment analysis task to neutral ones does not have as substantial an impact as altering ground-truth labels. Finally, we find that the effectiveness of complementary explanations in boosting ICL performance is task-dependent, with limited benefits seen in sentiment analysis tasks compared to symbolic reasoning tasks. These insights are critical for understanding the functionality of LLMs and guiding the development of effective demonstrations, which is increasingly relevant in light of the growing use of LLMs in applications such as ChatGPT. Our research code is publicly available at //github.com/paihengxu/XICL.

INTERACT · Agent · Learning · 大語言模型 · MoDELS ·

2024 年 4 月 25 日

SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents

Ruiyi Wang,Haofei Yu,Wenxin Zhang,Zhengyang Qi,Maarten Sap,Graham Neubig,Yonatan Bisk,Hao Zhu

Humans learn social skills through both imitation and social interaction. This social learning process is largely understudied by existing research on building language agents. Motivated by this gap, we propose an interactive learning method, SOTOPIA-$\pi$, improving the social intelligence of language agents. This method leverages behavior cloning and self-reinforcement training on filtered social interaction data according to large language model (LLM) ratings. We show that our training method allows a 7B LLM to reach the social goal completion ability of an expert model (GPT-4-based agent), while improving the safety of language agents and maintaining general QA ability on the MMLU benchmark. We also find that this training paradigm uncovers some difficulties in LLM-based evaluation of social intelligence: LLM-based evaluators overestimate the abilities of the language agents trained specifically for social interaction.