国产日黄色大片一区二区,高清国产三级在线播放,不卡一区二区三区视频在线,无码中文在线二区免费,亚洲色坉天堂久久

The development of Large Language Models (LLMs) has notably transformed numerous sectors, offering impressive text generation capabilities. Yet, the reliability and truthfulness of these models remain pressing concerns. To this end, we investigate iterative prompting, a strategy hypothesized to refine LLM responses, assessing its impact on LLM truthfulness, an area which has not been thoroughly explored. Our extensive experiments delve into the intricacies of iterative prompting variants, examining their influence on the accuracy and calibration of model responses. Our findings reveal that naive prompting methods significantly undermine truthfulness, leading to exacerbated calibration errors. In response to these challenges, we introduce several prompting variants designed to address the identified issues. These variants demonstrate marked improvements over existing baselines, signaling a promising direction for future research. Our work provides a nuanced understanding of iterative prompting and introduces novel approaches to enhance the truthfulness of LLMs, thereby contributing to the development of more accurate and trustworthy AI systems.

相關內容

Prompt

關注 10

Weight · 泛函 · MoDELS · 推斷 · 相互獨立的 ·

2024 年 3 月 22 日

Magic for the Age of Quantized DNNs

Yoshihide Sawada,Ryuji Saiin,Kazuma Suetake

from arxiv, 14 pages, 5 figures, 4 tables

Recently, the number of parameters in DNNs has explosively increased, as exemplified by LLMs (Large Language Models), making inference on small-scale computers more difficult. Model compression technology is, therefore, essential for integration into products. In this paper, we propose a method of quantization-aware training. We introduce a novel normalization (Layer-Batch Normalization) that is independent of the mini-batch size and does not require any additional computation cost during inference. Then, we quantize the weights by the scaled round-clip function with the weight standardization. We also quantize activation functions using the same function and apply surrogate gradients to train the model with both quantized weights and the quantized activation functions. We call this method Magic for the age of Quantised DNNs (MaQD). Experimental results show that our quantization method can be achieved with minimal accuracy degradation.

DIS · Analysis · Taxonomy · 優化器 · 損失 ·

2024 年 3 月 21 日

An Analysis of the Preferences of Distribution Indicators in Evolutionary Multi-Objective Optimization

Jesús Guillermo Falcón-Cardona,Mahboubeh Nezhadmoghaddam,Emilio Bernal-Zubieta

The distribution of objective vectors in a Pareto Front Approximation (PFA) is crucial for representing the associated manifold accurately. Distribution Indicators (DIs) assess the distribution of a PFA numerically, utilizing concepts like distance calculation, Biodiversity, Entropy, Potential Energy, or Clustering. Despite the diversity of DIs, their strengths and weaknesses across assessment scenarios are not well-understood. This paper introduces a taxonomy for classifying DIs, followed by a preference analysis of nine DIs, each representing a category in the taxonomy. Experimental results, considering various PFAs under controlled scenarios (loss of coverage, loss of uniformity, pathological distributions), reveal that some DIs can be misleading and need cautious use. Additionally, DIs based on Biodiversity and Potential Energy show promise for PFA evaluation and comparison of Multi-Objective Evolutionary Algorithms.

圖 · 大語言模型 · 語言模型化 · MoDELS · Extensibility ·

2024 年 3 月 21 日

Exploring the Potential of Large Language Models in Graph Generation

Yang Yao,Xin Wang,Zeyang Zhang,Yijian Qin,Ziwei Zhang,Xu Chu,Yuekui Yang,Wenwu Zhu,Hong Mei

Large language models (LLMs) have achieved great success in many fields, and recent works have studied exploring LLMs for graph discriminative tasks such as node classification. However, the abilities of LLMs for graph generation remain unexplored in the literature. Graph generation requires the LLM to generate graphs with given properties, which has valuable real-world applications such as drug discovery, while tends to be more challenging. In this paper, we propose LLM4GraphGen to explore the ability of LLMs for graph generation with systematical task designs and extensive experiments. Specifically, we propose several tasks tailored with comprehensive experiments to address key questions regarding LLMs' understanding of different graph structure rules, their ability to capture structural type distributions, and their utilization of domain knowledge for property-based graph generation. Our evaluations demonstrate that LLMs, particularly GPT-4, exhibit preliminary abilities in graph generation tasks, including rule-based and distribution-based generation. We also observe that popular prompting methods, such as few-shot and chain-of-thought prompting, do not consistently enhance performance. Besides, LLMs show potential in generating molecules with specific properties. These findings may serve as foundations for designing good LLMs based models for graph generation and provide valuable insights and further research.

dynamic programming · 離散化 · SimPLe · 離散數學 ·

2024 年 3 月 21 日

An Improved Lower Bound on the Number of Pseudoline Arrangements

Fernando Cortés Kühnast,Justin Dallant,Stefan Felsner,Manfred Scheucher

from arxiv, This article is to appear in the proceedings of the 40th International Symposium on Computational Geometry (SoCG 2024). It is a merge of the following two independent submissions: 1) Justin Dallant -- Improved Lower Bound on the Number of Pseudoline Arrangements 2) Fernando Cort\'es K\"uhnast, Stefan Felsner, and Manfred Scheucher -- An Improved Lower Bound on the Number of Pseudoline Arrangements

Arrangements of pseudolines are classic objects in discrete and computational geometry. They have been studied with increasing intensity since their introduction almost 100 years ago. The study of the number $B_n$ of non-isomorphic simple arrangements of $n$ pseudolines goes back to Goodman and Pollack, Knuth, and others. It is known that $B_n$ is in the order of $2^{\Theta(n^2)}$ and finding asymptotic bounds on $b_n = \frac{\log_2(B_n)}{n^2}$ remains a challenging task. In 2011, Felsner and Valtr showed that $0.1887 \leq b_n \le 0.6571$ for sufficiently large $n$. The upper bound remains untouched but in 2020 Dumitrescu and Mandal improved the lower bound constant to $0.2083$. Their approach utilizes the known values of $B_n$ for up to $n=12$. We tackle the lower bound by utilizing dynamic programming and the Lindstr\"om-Gessel-Viennot lemma. Our new bound is $b_n \geq 0.2721$ for sufficiently large $n$. The result is based on a delicate interplay of theoretical ideas and computer assistance.

Networking · 分離的 · 漢明距離 · CCC · 生成器網絡 ·

2024 年 3 月 21 日

On the Power of Quantum Distributed Proofs

Atsuya Hasegawa,Srijita Kundu,Harumichi Nishimura

from arxiv, 49 pages

Quantum nondeterministic distributed computing was recently introduced as dQMA (distributed quantum Merlin-Arthur) protocols by Fraigniaud, Le Gall, Nishimura and Paz (ITCS 2021). In dQMA protocols, with the help of quantum proofs and local communication, nodes on a network verify a global property of the network. Fraigniaud et al. showed that, when the network size is small, there exists an exponential separation in proof size between distributed classical and quantum verification protocols, for the equality problem, where the verifiers check if all the data owned by a subset of them are identical. In this paper, we further investigate and characterize the power of the dQMA protocols for various decision problems. First, we give a more efficient dQMA protocol for the equality problem with a simpler analysis. This is done by adding a symmetrization step on each node and exploiting properties of the permutation test, which is a generalization of the SWAP test. We also show a quantum advantage for the equality problem on path networks still persists even when the network size is large, by considering ``relay points'' between extreme nodes. Second, we show that even in a general network, there exist efficient dQMA protocols for the ranking verification problem, the Hamming distance problem, and more problems that derive from efficient quantum one-way communication protocols. Third, in a line network, we construct an efficient dQMA protocol for a problem that has an efficient two-party QMA communication protocol. Finally, we obtain the first lower bounds on the proof and communication cost of dQMA protocols. To prove a lower bound on the equality problem, we show any dQMA protocol with an entangled proof between nodes can be simulated with a dQMA protocol with a separable proof between nodes by using a QMA communication-complete problem introduced by Raz and Shpilka (CCC 2004).

Prompt · Performer · MoDELS · Analysis · ChatGPT ·

2024 年 3 月 20 日

On Prompt Sensitivity of ChatGPT in Affective Computing

Mostafa M. Amin,Bj?rn W. Schuller

from arxiv, 2 Tables, 1 Figure, preprint submission to ACII 2024

Recent studies have demonstrated the emerging capabilities of foundation models like ChatGPT in several fields, including affective computing. However, accessing these emerging capabilities is facilitated through prompt engineering. Despite the existence of some prompting techniques, the field is still rapidly evolving and many prompting ideas still require investigation. In this work, we introduce a method to evaluate and investigate the sensitivity of the performance of foundation models based on different prompts or generation parameters. We perform our evaluation on ChatGPT within the scope of affective computing on three major problems, namely sentiment analysis, toxicity detection, and sarcasm detection. First, we carry out a sensitivity analysis on pivotal parameters in auto-regressive text generation, specifically the temperature parameter $T$ and the top-$p$ parameter in Nucleus sampling, dictating how conservative or creative the model should be during generation. Furthermore, we explore the efficacy of several prompting ideas, where we explore how giving different incentives or structures affect the performance. Our evaluation takes into consideration performance measures on the affective computing tasks, and the effectiveness of the model to follow the stated instructions, hence generating easy-to-parse responses to be smoothly used in downstream applications.

Extensibility · 在線 · Learning · Agent · 強化學習 ·

2024 年 3 月 20 日

Reinforcement Learning for Online Testing of Autonomous Driving Systems: a Replication and Extension Study

Luca Giamattei,Matteo Biagiola,Roberto Pietrantuono,Stefano Russo,Paolo Tonella

In a recent study, Reinforcement Learning (RL) used in combination with many-objective search, has been shown to outperform alternative techniques (random search and many-objective search) for online testing of Deep Neural Network-enabled systems. The empirical evaluation of these techniques was conducted on a state-of-the-art Autonomous Driving System (ADS). This work is a replication and extension of that empirical study. Our replication shows that RL does not outperform pure random test generation in a comparison conducted under the same settings of the original study, but with no confounding factor coming from the way collisions are measured. Our extension aims at eliminating some of the possible reasons for the poor performance of RL observed in our replication: (1) the presence of reward components providing contrasting or useless feedback to the RL agent; (2) the usage of an RL algorithm (Q-learning) which requires discretization of an intrinsically continuous state space. Results show that our new RL agent is able to converge to an effective policy that outperforms random testing. Results also highlight other possible improvements, which open to further investigations on how to best leverage RL for online ADS testing.

Microsoft Windows · 詞元分析器 · Attention · 可約的 · 稀疏 ·

2024 年 3 月 20 日

Investigating the Effects of Sparse Attention on Cross-Encoders

Ferdinand Schlatt,Maik Fr?be,Matthias Hagen

from arxiv, Accepted at ECIR'24

Cross-encoders are effective passage and document re-rankers but less efficient than other neural or classic retrieval models. A few previous studies have applied windowed self-attention to make cross-encoders more efficient. However, these studies did not investigate the potential and limits of different attention patterns or window sizes. We close this gap and systematically analyze how token interactions can be reduced without harming the re-ranking effectiveness. Experimenting with asymmetric attention and different window sizes, we find that the query tokens do not need to attend to the passage or document tokens for effective re-ranking and that very small window sizes suffice. In our experiments, even windows of 4 tokens still yield effectiveness on par with previous cross-encoders while reducing the memory requirements by at least 22% / 59% and being 1% / 43% faster at inference time for passages / documents.

MoDELS · Performer · 代碼 · 蒸餾 · 語言模型化 ·

2024 年 3 月 20 日

Enhancing Code Generation Performance of Smaller Models by Distilling the Reasoning Ability of LLMs

Zhihong Sun,Chen Lyu,Bolun Li,Yao Wan,Hongyu Zhang,Ge Li,Zhi Jin

from arxiv, Accepted for LREC-COLING 2024

Large Language Models (LLMs) have recently made significant advances in code generation through the 'Chain-of-Thought' prompting technique. This technique empowers the model to autonomously devise "solution plans" to tackle intricate programming challenges, thereby improving its performance in code generation. Nevertheless, smaller models have been struggling to keep up with LLMs in deducing these plans, adversely affecting their code generation capabilities. Given the considerable size and associated deployment costs, along with concerns about data security, many teams opt for deploying smaller models for code generation. Consequently, there arises a compelling need for transferring LLMs' code generation reasoning abilities to the smaller models. In this paper, we propose the CodePLAN framework, which aims to transfer LLMs' reasoning capabilities to smaller models through distillation. We adopt a multi-task learning approach, jointly undertaking code generation and solution plan generation tasks, to enhance the code generation capabilities of the smaller model. To ensure the superior quality of the solution plans, we advocate for the utilization of backward reasoning and plan sampling strategies. Our experiments show that in comparison to the conventional fine-tuning approach, our approach improves the smaller model's code generation performance (measured in pass@1 metric) by over 130% on the challenging APPS benchmark.

Principle · MLOps · Machine Learning · Continuity · Learning ·

2024 年 3 月 19 日

Professional Insights into Benefits and Limitations of Implementing MLOps Principles

Gabriel Araujo,Marcos Kalinowski,Markus Endler,Fabio Calefato

from arxiv, Author version of paper accepted for publication at ICEIS 2024

Context: Machine Learning Operations (MLOps) has emerged as a set of practices that combines development, testing, and operations to deploy and maintain machine learning applications. Objective: In this paper, we assess the benefits and limitations of using the MLOps principles in online supervised learning. Method: We conducted two focus group sessions on the benefits and limitations of applying MLOps principles for online machine learning applications with six experienced machine learning developers. Results: The focus group revealed that machine learning developers see many benefits of using MLOps principles but also that these do not apply to all the projects they worked on. According to experts, this investment tends to pay off for larger applications with continuous deployment that require well-prepared automated processes. However, for initial versions of machine learning applications, the effort taken to implement the principles could enlarge the project's scope and increase the time needed to deploy a first version to production. The discussion brought up that most of the benefits are related to avoiding error-prone manual steps, enabling to restore the application to a previous state, and having a robust continuous automated deployment pipeline. Conclusions: It is important to balance the trade-offs of investing time and effort in implementing the MLOps principles considering the scope and needs of the project, favoring such investments for larger applications with continuous model deployment requirements.