成人艳情一二三区按摩_91成人精品爽啪在线观看_看全色黄大色黄大片女爽一黄_人人插人人摸精品在线视频_国产精品自在在线午夜精华在线_隔壁的少妇波多野结衣_亚洲免费AV一区二区三区

This paper analyzes Large Language Models (LLMs) with regard to their programming exercise generation capabilities. Through a survey study, we defined the state of the art, extracted their strengths and weaknesses and finally proposed an evaluation matrix, helping researchers and educators to decide which LLM is the best fitting for the programming exercise generation use case. We also found that multiple LLMs are capable of producing useful programming exercises. Nevertheless, there exist challenges like the ease with which LLMs might solve exercises generated by LLMs. This paper contributes to the ongoing discourse on the integration of LLMs in education.

相關內容

語言模型化

關注 9

邊 · Networking · MoDELS · 模型評估 · Neural Networks ·

2024 年 7 月 11 日

Towards Efficient Deployment of Hybrid SNNs on Neuromorphic and Edge AI Hardware

James Seekings,Peyton Chandarana,Mahsa Ardakani,MohammadReza Mohammadi,Ramtin Zand

This paper explores the synergistic potential of neuromorphic and edge computing to create a versatile machine learning (ML) system tailored for processing data captured by dynamic vision sensors. We construct and train hybrid models, blending spiking neural networks (SNNs) and artificial neural networks (ANNs) using PyTorch and Lava frameworks. Our hybrid architecture integrates an SNN for temporal feature extraction and an ANN for classification. We delve into the challenges of deploying such hybrid structures on hardware. Specifically, we deploy individual components on Intel's Neuromorphic Processor Loihi (for SNN) and Jetson Nano (for ANN). We also propose an accumulator circuit to transfer data from the spiking to the non-spiking domain. Furthermore, we conduct comprehensive performance analyses of hybrid SNN-ANN models on a heterogeneous system of neuromorphic and edge AI hardware, evaluating accuracy, latency, power, and energy consumption. Our findings demonstrate that the hybrid spiking networks surpass the baseline ANN model across all metrics and outperform the baseline SNN model in accuracy and latency.

優化器 · 語言模型化 · 大語言模型 · MoDELS · Integration ·

2024 年 7 月 11 日

Towards Explainable Evolution Strategies with Large Language Models

Jill Baumann,Oliver Kramer

from arxiv, Accepted at ESANN 2024

This paper introduces an approach that integrates self-adaptive Evolution Strategies (ES) with Large Language Models (LLMs) to enhance the explainability of complex optimization processes. By employing a self-adaptive ES equipped with a restart mechanism, we effectively navigate the challenging landscapes of benchmark functions, capturing detailed logs of the optimization journey, including fitness evolution, step-size adjustments, and restart events due to stagnation. An LLM is then utilized to process these logs, generating concise, user-friendly summaries that highlight key aspects such as convergence behavior, optimal fitness achievements, and encounters with local optima. Our case study on the Rastrigin function demonstrates how our approach makes the complexities of ES optimization transparent and accessible. Our findings highlight the potential of using LLMs to bridge the gap between advanced optimization algorithms and their interpretability.

Performer · 大語言模型 · 線性的 · state-of-the-art · Performance ·

2024 年 7 月 11 日

A Closer Look at Logical Reasoning with LLMs: The Choice of Tool Matters

Long Hei Matthew Lam,Ramya Keerthy Thatikonda,Ehsan Shareghi

from arxiv, Code and data are publicly available at: //github.com/Mattylam/Logic_Symbolic_Solvers_Experiment

The emergence of Large Language Models (LLMs) has demonstrated promising progress in solving logical reasoning tasks effectively. Several recent approaches have proposed to change the role of the LLM from the reasoner into a translator between natural language statements and symbolic representations which are then sent to external symbolic solvers to resolve. This paradigm has established the current state-of-the-art result in logical reasoning (i.e., deductive reasoning). However, it remains unclear whether the variance in performance of these approaches stems from the methodologies employed or the specific symbolic solvers utilized. There is a lack of consistent comparison between symbolic solvers and how they influence the overall reported performance. This is important, as each symbolic solver also has its own input symbolic language, presenting varying degrees of challenge in the translation process. To address this gap, we perform experiments on 3 deductive reasoning benchmarks with LLMs augmented with widely used symbolic solvers: Z3, Pyke, and Prover9. The tool-executable rates of symbolic translation generated by different LLMs exhibit a near 50% performance variation. This highlights a significant difference in performance rooted in very basic choices of tools. The almost linear correlation between the executable rate of translations and the accuracy of the outcomes from Prover9 highlight a strong alignment between LLMs ability to translate into Prover9 symbolic language, and the correctness of those translations.

可辨認的 · MoDELS · 大語言模型 · INTERACT · 估計/估計量 ·

2024 年 7 月 10 日

On LLM Wizards: Identifying Large Language Models' Behaviors for Wizard of Oz Experiments

Jingchao Fang,Nikos Arechiga,Keiichi Namaoshi,Nayeli Bravo,Candice Hogan,David A. Shamma

from arxiv, To be published in ACM IVA 2024

The Wizard of Oz (WoZ) method is a widely adopted research approach where a human Wizard ``role-plays'' a not readily available technology and interacts with participants to elicit user behaviors and probe the design space. With the growing ability for modern large language models (LLMs) to role-play, one can apply LLMs as Wizards in WoZ experiments with better scalability and lower cost than the traditional approach. However, methodological guidance on responsibly applying LLMs in WoZ experiments and a systematic evaluation of LLMs' role-playing ability are lacking. Through two LLM-powered WoZ studies, we take the first step towards identifying an experiment lifecycle for researchers to safely integrate LLMs into WoZ experiments and interpret data generated from settings that involve Wizards role-played by LLMs. We also contribute a heuristic-based evaluation framework that allows the estimation of LLMs' role-playing ability in WoZ experiments and reveals LLMs' behavior patterns at scale.

Performer · 語言模型化 · 相同 · MoDELS · 大語言模型 ·

2024 年 7 月 10 日

Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

Mosh Levy,Alon Jacoby,Yoav Goldberg

from arxiv, Accepted to ACL 2024

This paper explores the impact of extending input lengths on the capabilities of Large Language Models (LLMs). Despite LLMs advancements in recent times, their performance consistency across different input lengths is not well understood. We investigate this aspect by introducing a novel QA reasoning framework, specifically designed to assess the impact of input length. We isolate the effect of input length using multiple versions of the same sample, each being extended with padding of different lengths, types and locations. Our findings show a notable degradation in LLMs' reasoning performance at much shorter input lengths than their technical maximum. We show that the degradation trend appears in every version of our dataset, although at different intensities. Additionally, our study reveals that the traditional metric of next word prediction correlates negatively with performance of LLMs' on our reasoning dataset. We analyse our results and identify failure modes that can serve as useful guides for future research, potentially informing strategies to address the limitations observed in LLMs.

優化器 · 多峰值 · CASES · 統計量 · Engineering ·

2024 年 7 月 9 日

In Search of Excellence: SHOA as a Competitive Shrike Optimization Algorithm for Multimodal Problems

Hanan K. AbdulKarim,Tarik A. Rashid

from arxiv, 25 pages

In this paper, a swarm intelligence optimization algorithm is proposed as the Shrike Optimization Algorithm (SHOA). Many creatures living in a group and surviving for the next generation randomly search for food; they follow the best one in the swarm, called swarm intelligence. Swarm-based algorithms are designed to mimic creatures' behaviours, but in multimodal problem competition, they cannot find optimal solutions in some difficult cases. The main inspiration for the proposed algorithm is taken from the swarming behaviours of shrike birds in nature. The shrike birds are migrating from their territory to survive. However, the SHOA mimics the surviving behaviour of shrike birds for living, adaptation, and breeding. Two parts of optimization exploration and exploitation are designed by modelling shrike breeding and searching for foods to feed nestlings until they get ready to fly and live independently. This paper is a mathematical model for the SHOA to perform optimization. The SHOA benchmarked 19 well-known mathematical test functions, 10 from CEC-2019, and 12 from CEC-2022 most recent test functions, a total of 41 competitive mathematical test functions benchmarked and four real-world engineering problems with different conditions, both constrained and unconstrained. The statistical results obtained from the Wilcoxon sum ranking and Fridman test show that SHOA has a significant statistical superiority in handling the test benchmarks compared to competitor algorithms in multi-modal problems. The results for engineering optimization problems show the SHOA outperforms other nature-inspired algorithms in many cases.

Machine Learning · Learning · Analysis · Extensibility · 潛在狄利克雷分配 ·

2024 年 7 月 8 日

Predictive Analysis of CFPB Consumer Complaints Using Machine Learning

Dhwani Vaishnav,Manimozhi Neethinayagam,Akanksha Khaire,Jongwook Woo

from arxiv, 4 pages, 3 figures, 4 tables

This paper introduces the Consumer Feedback Insight & Prediction Platform, a system leveraging machine learning to analyze the extensive Consumer Financial Protection Bureau (CFPB) Complaint Database, a publicly available resource exceeding 4.9 GB in size. This rich dataset offers valuable insights into consumer experiences with financial products and services. The platform itself utilizes machine learning models to predict two key aspects of complaint resolution: the timeliness of company responses and the nature of those responses (e.g., closed, closed with relief etc.). Furthermore, the platform employs Latent Dirichlet Allocation (LDA) to delve deeper, uncovering common themes within complaints and revealing underlying trends and consumer issues. This comprehensive approach empowers both consumers and regulators. Consumers gain valuable insights into potential response wait times, while regulators can utilize the platform's findings to identify areas where companies may require further scrutiny regarding their complaint resolution practices.

多峰值 · 語言模型化 · MoDELS · 可理解性 · 模態 ·

2023 年 11 月 10 日

How to Bridge the Gap between Modalities: A Comprehensive Survey on Multimodal Large Language Model

Shezheng Song,Xiaopeng Li,Shasha Li

This review paper explores Multimodal Large Language Models (MLLMs), which integrate Large Language Models (LLMs) like GPT-4 to handle multimodal data such as text and vision. MLLMs demonstrate capabilities like generating image narratives and answering image-based questions, bridging the gap towards real-world human-computer interactions and hinting at a potential pathway to artificial general intelligence. However, MLLMs still face challenges in processing the semantic gap in multimodality, which may lead to erroneous generation, posing potential risks to society. Choosing the appropriate modality alignment method is crucial, as improper methods might require more parameters with limited performance improvement. This paper aims to explore modality alignment methods for LLMs and their existing capabilities. Implementing modality alignment allows LLMs to address environmental issues and enhance accessibility. The study surveys existing modal alignment methods in MLLMs into four groups: (1) Multimodal Converters that change data into something LLMs can understand; (2) Multimodal Perceivers to improve how LLMs perceive different types of data; (3) Tools Assistance for changing data into one common format, usually text; and (4) Data-Driven methods that teach LLMs to understand specific types of data in a dataset. This field is still in a phase of exploration and experimentation, and we will organize and update various existing research methods for multimodal information alignment.

Learning · 不完美信息 · Agent · 強化學習 · Self-Play ·

2022 年 6 月 30 日

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Julien Perolat,Bart de Vylder,Daniel Hennes,Eugene Tarassov,Florian Strub,Vincent de Boer,Paul Muller,Jerome T. Connor,Neil Burch,Thomas Anthony,Stephen McAleer,Romuald Elie,Sarah H. Cen,Zhe Wang,Audrunas Gruslys,Aleksandra Malysheva,Mina Khan,Sherjil Ozair,Finbarr Timbers,Toby Pohlen,Tom Eccles,Mark Rowland,Marc Lanctot,Jean-Baptiste Lespiau,Bilal Piot,Shayegan Omidshafiei,Edward Lockhart,Laurent Sifre,Nathalie Beauguerlange,Remi Munos,David Silver,Satinder Singh,Demis Hassabis,Karl Tuyls

We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons, Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play. DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling' around it, by directly modifying the underlying multi-agent learning dynamics. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform, competing with human expert players.

MoDELS · Performer · Processing（編程語言） · 學成 · 穩健性 ·

2021 年 9 月 3 日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Paul Michel

from arxiv, PhD thesis

The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.