男男网站网址视频免费观看,欧美精品A在线观看,亚洲天堂精品免费观看,国产欧美日韩精品二区图片

Recently, the large language models (LLMs) have shown extraordinary ability in understanding natural language and generating programming code. It has been a common practice of software engineers to consult LLMs when encountering coding questions. Although efforts have been made to avoid syntax errors and align the code with the intended semantics, the reliability and robustness of the code generationfrom LLMs have not yet been thoroughly studied. The executable code is not equivalent to the reliable and robust code, especially in the context of real-world software development. The misuse of APIs in the generated code could lead to severe problem, such as resource leaks, program crashes. To make things worse, the users of LLM code generation services are actually the developers that are most vulnerable to these code that seems right -- They are always novice developers that are not familiar with the APIs that LLMs generate code for them. Therefore, they could hardly tell the misuse in the code generated by LLMs, which further facilitates the incorrect code applied in real-world software. Existing code evaluation benchmark and datasets focus on crafting small tasks such as programming questions in coding interviews, which however deviates from the problem that developers would ask LLM for real-world coding help. To fill the missing piece, in this work, we propose a dataset RobustAPI for evaluating the reliability and robustness of code generated by LLMs. We collect 1208 coding questions from StackOverflow on 24 representative Java APIs. We summarize thecommon misuse patterns of these APIs and evaluate them oncurrent popular LLMs. The evaluation results show that evenfor GPT-4, 62% of the generated code contains API misuses,which would cause unexpected consequences if the code isintroduced into real-world software.

相關內容

代碼

關注 10

代碼（Code）是專知網的一個重要知識資料文檔板塊，旨在整理收錄論文源代碼、復現代碼，經典工程代碼等，便于用戶查閱下載使用。

多峰值 · MoDELS · Performer · GPT-4V · 語言模型化 ·

2023 年 11 月 30 日

Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs

Jonathan Roberts,Timo Lüddecke,Rehan Sheikh,Kai Han,Samuel Albanie

from arxiv, V2: Minor formatting changes and added missing subfigure captions

Multimodal large language models (MLLMs) have shown remarkable capabilities across a broad range of tasks but their knowledge and abilities in the geographic and geospatial domains are yet to be explored, despite potential wide-ranging benefits to navigation, environmental research, urban development, and disaster response. We conduct a series of experiments exploring various vision capabilities of MLLMs within these domains, particularly focusing on the frontier model GPT-4V, and benchmark its performance against open-source counterparts. Our methodology involves challenging these models with a small-scale geographic benchmark consisting of a suite of visual tasks, testing their abilities across a spectrum of complexity. The analysis uncovers not only where such models excel, including instances where they outperform humans, but also where they falter, providing a balanced view of their capabilities in the geographic domain. To enable the comparison and evaluation of future models, our benchmark will be publicly released.

MoDELS · 縮放 · 語言模型化 · GPT-4 · 生成模型 ·

2023 年 11 月 30 日

CritiqueLLM: Scaling LLM-as-Critic for Effective and Explainable Evaluation of Large Language Model Generation

Pei Ke,Bosi Wen,Zhuoer Feng,Xiao Liu,Xuanyu Lei,Jiale Cheng,Shengyuan Wang,Aohan Zeng,Yuxiao Dong,Hongning Wang,Jie Tang,Minlie Huang

from arxiv, 18 pages, 5 figures

Since the natural language processing (NLP) community started to make large language models (LLMs), such as GPT-4, act as a critic to evaluate the quality of generated texts, most of them only train a critique generation model of a specific scale on specific datasets. We argue that a comprehensive investigation on the key factor of LLM-based evaluation models, such as scaling properties, is lacking, so that it is still inconclusive whether these models have potential to replace GPT-4's evaluation in practical scenarios. In this paper, we propose a new critique generation model called CritiqueLLM, which includes a dialogue-based prompting method for high-quality referenced / reference-free evaluation data. Experimental results show that our model can achieve comparable evaluation performance to GPT-4 especially in system-level correlations, and even outperform GPT-4 in 3 out of 8 tasks in a challenging reference-free setting. We conduct detailed analysis to show promising scaling properties of our model in the quality of generated critiques. We also demonstrate that our generated critiques can act as scalable feedback to directly improve the generation quality of LLMs.

Attention · 穩健性 · MoDELS · 語言模型化 · Extensibility ·

2023 年 11 月 30 日

Improving the Robustness of Transformer-based Large Language Models with Dynamic Attention

Lujia Shen,Yuwen Pu,Shouling Ji,Changjiang Li,Xuhong Zhang,Chunpeng Ge,Ting Wang

Transformer-based models, such as BERT and GPT, have been widely adopted in natural language processing (NLP) due to their exceptional performance. However, recent studies show their vulnerability to textual adversarial attacks where the model's output can be misled by intentionally manipulating the text inputs. Despite various methods that have been proposed to enhance the model's robustness and mitigate this vulnerability, many require heavy consumption resources (e.g., adversarial training) or only provide limited protection (e.g., defensive dropout). In this paper, we propose a novel method called dynamic attention, tailored for the transformer architecture, to enhance the inherent robustness of the model itself against various adversarial attacks. Our method requires no downstream task knowledge and does not incur additional costs. The proposed dynamic attention consists of two modules: (I) attention rectification, which masks or weakens the attention value of the chosen tokens, and (ii) dynamic modeling, which dynamically builds the set of candidate tokens. Extensive experiments demonstrate that dynamic attention significantly mitigates the impact of adversarial attacks, improving up to 33\% better performance than previous methods against widely-used adversarial attacks. The model-level design of dynamic attention enables it to be easily combined with other defense methods (e.g., adversarial training) to further enhance the model's robustness. Furthermore, we demonstrate that dynamic attention preserves the state-of-the-art robustness space of the original model compared to other dynamic modeling methods.

通道 · INFORMS · SimPLe · 代碼 · 操作 ·

2023 年 11 月 29 日

Simple and Tighter Derivation of Achievability for Classical Communication over Quantum Channels

Hao-Chung Cheng

from arxiv, v2: Introduction revised

Achievability in information theory refers to demonstrating a coding strategy that accomplishes a prescribed performance benchmark for the underlying task. In quantum information theory, the crafted Hayashi-Nagaoka operator inequality is an essential technique in proving a wealth of one-shot achievability bounds since it effectively resembles a union bound in various problems. In this work, we show that the pretty-good measurement naturally plays a role as the union bound as well. A judicious application of it considerably simplifies the derivation of one-shot achievability for classical-quantum (c-q) channel coding via an elegant three-line proof. The proposed analysis enjoys the following favorable features. (i) The established one-shot bound admits a closed-form expression as in the celebrated Holevo-Helstrom Theorem. Namely, the error probability of sending $M$ messages through a c-q channel is upper bounded by the minimum error of distinguishing the joint channel input-output state against $(M-1)$ decoupled products states. (ii) Our bound directly yields asymptotic results in the large deviation, small deviation, and moderate deviation regimes in a unified manner. (iii) The coefficients incurred in applying the Hayashi-Nagaoka operator inequality are no longer needed. Hence, the derived one-shot bound sharpens existing results relying on the Hayashi-Nagaoka operator inequality. In particular, we obtain the tightest achievable $\epsilon$-one-shot capacity for c-q channel coding heretofore, improving the third-order coding rate in the asymptotic scenario. (iv) Our result holds for infinite-dimensional Hilbert space. (v) The proposed method applies to deriving one-shot achievability for classical data compression with quantum side information, entanglement-assisted classical communication over quantum channels, and various quantum network information-processing protocols.

穩健性 · 估計/估計量 · 稀疏 · 統計量 · 優化器 ·

2023 年 11 月 29 日

Efficient Computation of Sparse and Robust Maximum Association Estimators

Pia Pfeiffer,Andreas Alfons,Peter Filzmoser

Although robust statistical estimators are less affected by outlying observations, their computation is usually more challenging. This is particularly the case in high-dimensional sparse settings. The availability of new optimization procedures, mainly developed in the computer science domain, offers new possibilities for the field of robust statistics. This paper investigates how such procedures can be used for robust sparse association estimators. The problem can be split into a robust estimation step followed by an optimization for the remaining decoupled, (bi-)convex problem. A combination of the augmented Lagrangian algorithm and adaptive gradient descent is implemented to also include suitable constraints for inducing sparsity. We provide results concerning the precision of the algorithm and show the advantages over existing algorithms in this context. High-dimensional empirical examples underline the usefulness of this procedure. Extensions to other robust sparse estimators are possible.

GNN · 圖形處理器 · Neural Networks · Networking · 優化器 ·

2023 年 11 月 29 日

A Comprehensive Survey on Distributed Training of Graph Neural Networks

Haiyang Lin,Mingyu Yan,Xiaochun Ye,Dongrui Fan,Shirui Pan,Wenguang Chen,Yuan Xie

from arxiv, To Appear in Proceedings of the IEEE

Graph neural networks (GNNs) have been demonstrated to be a powerful algorithmic model in broad application fields for their effectiveness in learning over graphs. To scale GNN training up for large-scale and ever-growing graphs, the most promising solution is distributed training which distributes the workload of training across multiple computing nodes. At present, the volume of related research on distributed GNN training is exceptionally vast, accompanied by an extraordinarily rapid pace of publication. Moreover, the approaches reported in these studies exhibit significant divergence. This situation poses a considerable challenge for newcomers, hindering their ability to grasp a comprehensive understanding of the workflows, computational patterns, communication strategies, and optimization techniques employed in distributed GNN training. As a result, there is a pressing need for a survey to provide correct recognition, analysis, and comparisons in this field. In this paper, we provide a comprehensive survey of distributed GNN training by investigating various optimization techniques used in distributed GNN training. First, distributed GNN training is classified into several categories according to their workflows. In addition, their computational patterns and communication patterns, as well as the optimization techniques proposed by recent work are introduced. Second, the software frameworks and hardware platforms of distributed GNN training are also introduced for a deeper understanding. Third, distributed GNN training is compared with distributed training of deep neural networks, emphasizing the uniqueness of distributed GNN training. Finally, interesting issues and opportunities in this field are discussed.

估計/估計量 · 對數似然 · MoDELS · Performer · 狀態估計 ·

2023 年 11 月 28 日

On the Impact of Sampling on Deep Sequential State Estimation

Helena Calatrava,Ricardo Augusto Borsoi,Tales Imbiriba,Pau Closas

from arxiv, To appear in the Proceedings of the Asilomar Conference on Signals, Systems, and Computers, October 2023, 5 pages, 3 figures, 3 tables

State inference and parameter learning in sequential models can be successfully performed with approximation techniques that maximize the evidence lower bound to the marginal log-likelihood of the data distribution. These methods may be referred to as Dynamical Variational Autoencoders, and our specific focus lies on the deep Kalman filter. It has been shown that the ELBO objective can oversimplify data representations, potentially compromising estimation quality. Tighter Monte Carlo objectives have been proposed in the literature to enhance generative modeling performance. For instance, the IWAE objective uses importance weights to reduce the variance of marginal log-likelihood estimates. In this paper, importance sampling is applied to the DKF framework for learning deep Markov models, resulting in the IW-DKF, which shows an improvement in terms of log-likelihood estimates and KL divergence between the variational distribution and the transition model. The framework using the sampled DKF update rule is also accommodated to address sequential state and parameter estimation when working with highly non-linear physics-based models. An experiment with the 3-space Lorenz attractor shows an enhanced generative modeling performance and also a decrease in RMSE when estimating the model parameters and latent states, indicating that tighter MCOs lead to improved state inference performance.

語言模型化 · contrastive · 語義鴻溝 · MoDELS · Integration ·

2023 年 11 月 28 日

ControlRec: Bridging the Semantic Gap between Language Model and Personalized Recommendation

Junyan Qiu,Haitao Wang,Zhaolin Hong,Yiping Yang,Qiang Liu,Xingxing Wang

from arxiv, 11 pages, 7 figures

The successful integration of large language models (LLMs) into recommendation systems has proven to be a major breakthrough in recent studies, paving the way for more generic and transferable recommendations. However, LLMs struggle to effectively utilize user and item IDs, which are crucial identifiers for successful recommendations. This is mainly due to their distinct representation in a semantic space that is different from the natural language (NL) typically used to train LLMs. To tackle such issue, we introduce ControlRec, an innovative Contrastive prompt learning framework for Recommendation systems. ControlRec treats user IDs and NL as heterogeneous features and encodes them individually. To promote greater alignment and integration between them in the semantic space, we have devised two auxiliary contrastive objectives: (1) Heterogeneous Feature Matching (HFM) aligning item description with the corresponding ID or user's next preferred ID based on their interaction sequence, and (2) Instruction Contrastive Learning (ICL) effectively merging these two crucial data sources by contrasting probability distributions of output sequences generated by diverse tasks. Experimental results on four public real-world datasets demonstrate the effectiveness of the proposed method on improving model performance.

MoDELS · 講稿 · Learning · Sphering · 表示 ·

2023 年 11 月 2 日

A Review and Roadmap of Deep Causal Model from Different Causal Structures and Representations

Hang Chen,Keqing Du,Chenguang Li,Xinyu Yang

from arxiv, under review

The fusion of causal models with deep learning introducing increasingly intricate data sets, such as the causal associations within images or between textual components, has surfaced as a focal research area. Nonetheless, the broadening of original causal concepts and theories to such complex, non-statistical data has been met with serious challenges. In response, our study proposes redefinitions of causal data into three distinct categories from the standpoint of causal structure and representation: definite data, semi-definite data, and indefinite data. Definite data chiefly pertains to statistical data used in conventional causal scenarios, while semi-definite data refers to a spectrum of data formats germane to deep learning, including time-series, images, text, and others. Indefinite data is an emergent research sphere inferred from the progression of data forms by us. To comprehensively present these three data paradigms, we elaborate on their formal definitions, differences manifested in datasets, resolution pathways, and development of research. We summarize key tasks and achievements pertaining to definite and semi-definite data from myriad research undertakings, present a roadmap for indefinite data, beginning with its current research conundrums. Lastly, we classify and scrutinize the key datasets presently utilized within these three paradigms.

Processing（編程語言） · 學成 · 自然語言處理 · 語言處理 · 深度學習 ·

2019 年 9 月 11 日

A Survey of the Usages of Deep Learning in Natural Language Processing

Daniel W. Otter,Julian R. Medina,Jugal K. Kalita

Over the last several years, the field of natural language processing has been propelled forward by an explosion in the use of deep learning models. This survey provides a brief introduction to the field and a quick overview of deep learning architectures and methods. It then sifts through the plethora of recent studies and summarizes a large assortment of relevant contributions. Analyzed research areas include several core linguistic processing issues in addition to a number of applications of computational linguistics. A discussion of the current state of the art is then provided along with recommendations for future research in the field.