This paper investigates the performance of the Large Language Models (LLMs) ChatGPT-3.5 and GPT-4 in solving introductory programming tasks. Based on the performance, implications for didactic scenarios and assessment formats utilizing LLMs are derived. For the analysis, 72 Python tasks for novice programmers were selected from the free site CodingBat. Full task descriptions were used as input to the LLMs, while the generated replies were evaluated using CodingBat's unit tests. In addition, the general availability of textual explanations and program code was analyzed. The results show high scores of 94.4 to 95.8% correct responses and reliable availability of textual explanations and program code, which opens new ways to incorporate LLMs into programming education and assessment.
This paper introduces FedMLSecurity, a benchmark designed to simulate adversarial attacks and corresponding defense mechanisms in Federated Learning (FL). As an integral module of the open-sourced library FedML that facilitates FL algorithm development and performance comparison, FedMLSecurity enhances FedML's capabilities to evaluate security issues and potential remedies in FL. FedMLSecurity comprises two major components: FedMLAttacker that simulates attacks injected during FL training, and FedMLDefender that simulates defensive mechanisms to mitigate the impacts of the attacks. FedMLSecurity is open-sourced and can be customized to a wide range of machine learning models (e.g., Logistic Regression, ResNet, GAN, etc.) and federated optimizers (e.g., FedAVG, FedOPT, FedNOVA, etc.). FedMLSecurity can also be applied to Large Language Models (LLMs) easily, demonstrating its adaptability and applicability in various scenarios.
This chapter provides a comprehensive discussion on AI regulation in the European Union, contrasting it with the more sectoral and self-regulatory approach in the UK. It argues for a hybrid regulatory strategy that combines elements from both philosophies, emphasizing the need for agility and safe harbors to ease compliance. The paper examines the AI Act as a pioneering legislative effort to address the multifaceted challenges posed by AI, asserting that, while the Act is a step in the right direction, it has shortcomings that could hinder the advancement of AI technologies. The paper also anticipates upcoming regulatory challenges, such as the management of toxic content, environmental concerns, and hybrid threats. It advocates for immediate action to create protocols for regulated access to high-performance, potentially open-source AI systems. Although the AI Act is a significant legislative milestone, it needs additional refinement and global collaboration for the effective governance of rapidly evolving AI technologies.
This paper presents a scalable multigrid preconditioner targeting large-scale systems arising from discontinuous Petrov-Galerkin (DPG) discretizations of high-frequency wave operators. This work is built on previously developed multigrid preconditioning techniques of Petrides and Demkowicz (Comput. Math. Appl. 87 (2021) pp. 12-26) and extends the convergence results from $\mathcal{O}(10^7)$ degrees of freedom (DOFs) to $\mathcal{O}(10^9)$ DOFs using a new scalable parallel MPI/OpenMP implementation. Novel contributions of this paper include an alternative definition of coarse-grid systems based on restriction of fine-grid operators, yielding superior convergence results. In the uniform refinement setting, a detailed convergence study is provided, demonstrating h and p robust convergence and linear dependence with respect to the wave frequency. The paper concludes with numerical results on hp-adaptive simulations including a large-scale seismic modeling benchmark problem with high material contrast.
This paper studies MapReduce-based heterogeneous coded distributed computing (CDC) where, besides different computing capabilities at workers, input files to be accessed by computing jobs have nonuniform popularity. We propose a file placement strategy that can handle an arbitrary number of input files. Furthermore, we design a nested coded shuffling strategy that can efficiently manage the nonuniformity of file popularity to maximize the coded multicasting opportunity. We then formulate the joint optimization of the proposed file placement and nested shuffling design variables to optimize the proposed CDC scheme. To reduce the high computational complexity in solving the resulting mixed-integer linear programming (MILP) problem, we propose a simple two-file-group-based file placement approach to obtain an approximate solution. Numerical results show that the optimized CDC scheme outperforms other alternatives. Also, the proposed two-file-group-based approach achieves nearly the same performance as the conventional branch-and-cut method in solving the MILP problem but with substantially lower computational complexity that is scalable over the number of files and workers. For computing jobs with aggregate target functions that commonly appear in machine learning applications, we propose a heterogeneous compressed CDC (C-CDC) scheme to further improve the shuffling efficiency. The C-CDC scheme uses a local data aggregation technique to compress the data to be shuffled for the shuffling load reduction. We again optimize the proposed C-CDC scheme and explore the two-file-group-based low-complexity approach for an approximate solution. Numerical results show the proposed C-CDC scheme provides a considerable shuffling load reduction over the CDC scheme, and also, the two-file-group-based file placement approach maintains good performance.
This paper studies Hoeffding's inequality for Markov chains under the generalized concentrability condition defined via integral probability metric (IPM). The generalized concentrability condition establishes a framework that interpolates and extends the existing hypotheses of Markov chain Hoeffding-type inequalities. The flexibility of our framework allows Hoeffding's inequality to be applied beyond the ergodic Markov chains in the traditional sense. We demonstrate the utility by applying our framework to several non-asymptotic analyses arising from the field of machine learning, including (i) a generalization bound for empirical risk minimization with Markovian samples, (ii) a finite sample guarantee for Ployak-Ruppert averaging of SGD, and (iii) a new regret bound for rested Markovian bandits with general state space.
In this paper, we hypothesize that gradient-based meta-learning (GBML) implicitly suppresses the Hessian along the optimization trajectory in the inner loop. Based on this hypothesis, we introduce an algorithm called SHOT (Suppressing the Hessian along the Optimization Trajectory) that minimizes the distance between the parameters of the target and reference models to suppress the Hessian in the inner loop. Despite dealing with high-order terms, SHOT does not increase the computational complexity of the baseline model much. It is agnostic to both the algorithm and architecture used in GBML, making it highly versatile and applicable to any GBML baseline. To validate the effectiveness of SHOT, we conduct empirical tests on standard few-shot learning tasks and qualitatively analyze its dynamics. We confirm our hypothesis empirically and demonstrate that SHOT outperforms the corresponding baseline. Code is available at: //github.com/JunHoo-Lee/SHOT
The paper presents a comprehensive performance evaluation of some heuristic search algorithms in the context of autonomous systems and robotics. The objective of the study is to evaluate and compare the performance of different search algorithms in different problem settings on the pathfinding domain. Experiments give us insight into the behavior of the evaluated heuristic search algorithms, over the variation of different parameters: domain size, obstacle density, and distance between the start and the goal states. Results are then used to design a selection algorithm that, on the basis of problem characteristics, suggests the best search algorithm to use.
This paper presents an exhaustive quantitative and qualitative evaluation of Large Language Models (LLMs) for Knowledge Graph (KG) construction and reasoning. We employ eight distinct datasets that encompass aspects including entity, relation and event extraction, link prediction, and question answering. Empirically, our findings suggest that GPT-4 outperforms ChatGPT in the majority of tasks and even surpasses fine-tuned models in certain reasoning and question-answering datasets. Moreover, our investigation extends to the potential generalization ability of LLMs for information extraction, which culminates in the presentation of the Virtual Knowledge Extraction task and the development of the VINE dataset. Drawing on these empirical findings, we further propose AutoKG, a multi-agent-based approach employing LLMs for KG construction and reasoning, which aims to chart the future of this field and offer exciting opportunities for advancement. We anticipate that our research can provide invaluable insights for future undertakings of KG\footnote{Code and datasets will be available in //github.com/zjunlp/AutoKG.
Graph Neural Networks (GNNs) have gained momentum in graph representation learning and boosted the state of the art in a variety of areas, such as data mining (\emph{e.g.,} social network analysis and recommender systems), computer vision (\emph{e.g.,} object detection and point cloud learning), and natural language processing (\emph{e.g.,} relation extraction and sequence learning), to name a few. With the emergence of Transformers in natural language processing and computer vision, graph Transformers embed a graph structure into the Transformer architecture to overcome the limitations of local neighborhood aggregation while avoiding strict structural inductive biases. In this paper, we present a comprehensive review of GNNs and graph Transformers in computer vision from a task-oriented perspective. Specifically, we divide their applications in computer vision into five categories according to the modality of input data, \emph{i.e.,} 2D natural images, videos, 3D data, vision + language, and medical images. In each category, we further divide the applications according to a set of vision tasks. Such a task-oriented taxonomy allows us to examine how each task is tackled by different GNN-based approaches and how well these approaches perform. Based on the necessary preliminaries, we provide the definitions and challenges of the tasks, in-depth coverage of the representative approaches, as well as discussions regarding insights, limitations, and future directions.
A fundamental goal of scientific research is to learn about causal relationships. However, despite its critical role in the life and social sciences, causality has not had the same importance in Natural Language Processing (NLP), which has traditionally placed more emphasis on predictive tasks. This distinction is beginning to fade, with an emerging area of interdisciplinary research at the convergence of causal inference and language processing. Still, research on causality in NLP remains scattered across domains without unified definitions, benchmark datasets and clear articulations of the remaining challenges. In this survey, we consolidate research across academic areas and situate it in the broader NLP landscape. We introduce the statistical challenge of estimating causal effects, encompassing settings where text is used as an outcome, treatment, or as a means to address confounding. In addition, we explore potential uses of causal inference to improve the performance, robustness, fairness, and interpretability of NLP models. We thus provide a unified overview of causal inference for the computational linguistics community.