亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In the realm of Large Language Models, the balance between instruction data quality and quantity has become a focal point. Recognizing this, we introduce a self-guided methodology for LLMs to autonomously discern and select cherry samples from vast open-source datasets, effectively minimizing manual curation and potential cost for instruction tuning an LLM. Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal tool to identify discrepancies between a model's expected responses and its autonomous generation prowess. Through the adept application of IFD, cherry samples are pinpointed, leading to a marked uptick in model training efficiency. Empirical validations on renowned datasets like Alpaca and WizardLM underpin our findings; with a mere 10% of conventional data input, our strategy showcases improved results. This synthesis of self-guided cherry-picking and the IFD metric signifies a transformative leap in the optimization of LLMs, promising both efficiency and resource-conscious advancements.

相關內容

Foundation models exhibit significant capabilities in decision-making and logical deductions. Nonetheless, a continuing discourse persists regarding their genuine understanding of the world as opposed to mere stochastic mimicry. This paper meticulously examines a simple transformer trained for Othello, extending prior research to enhance comprehension of the emergent world model of Othello-GPT. The investigation reveals that Othello-GPT encapsulates a linear representation of opposing pieces, a factor that causally steers its decision-making process. This paper further elucidates the interplay between the linear world representation and causal decision-making, and their dependence on layer depth and model complexity. We have made the code public.

Estimates of energy usage in layers of computing from devices to algorithms have been determined and analyzed. Building on the previous analysis [3], energy needed from single devices and systems including three large-scale computing applications such as Artificial Intelligence (AI)/Machine Learning for Natural Language Processing, Scientific Simulations, and Cryptocurrency Mining have been estimated. In contrast to the bit-level switching, in which transistors achieved energy efficiency due to geometrical scaling, higher energy is expended both at the at the instructions and simulations levels of an application. Additionally, the analysis based on AI/ML Accelerators indicate that changes in architectures using an older semiconductor technology node have comparable energy efficiency with a different architecture using a newer technology. Further comparisons of the energy in computing systems with the thermodynamic and biological limits, indicate that there is a 27-36 orders of magnitude higher energy requirements for total simulation of an application. These energy estimates underscore the need for serious considerations of energy efficiency in computing by including energy as a design parameter, enabling growing needs of compute-intensive applications in a digital world.

We consider a voting model, where a number of candidates need to be selected subject to certain feasibility constraints. The model generalises committee elections (where there is a single constraint on the number of candidates that need to be selected), various elections with diversity constraints, the model of public decisions (where decisions needs to be taken on a number of independent issues), and the model of collective scheduling. A critical property of voting is that it should be fair -- not only to individuals but also to groups of voters with similar opinions on the subject of the vote; in other words, the outcome of an election should proportionally reflect the voters' preferences. We formulate axioms of proportionality in this general model. Our axioms do not require predefining groups of voters; to the contrary, we ensure that the opinion of every subset of voters whose preferences are cohesive-enough are taken into account to the extent that is proportional to the size of the subset. Our axioms generalise the strongest known satisfiable axioms for the more specific models. We explain how to adapt two prominent committee election rules, Proportional Approval Voting (PAV) and Phragm\'{e}n Sequential Rule, as well as the concept of stable-priceability to our general model. The two rules satisfy our proportionality axioms if and only if the feasibility constraints are matroids.

The generalized quadratic assignment problem (GQAP) is one of the hardest problems to solve in the operations research area. The GQAP addressed in this work is defined as the task of minimizing the assignment and transportation costs of assigning a set of facilities to a set of locations. The facilities have different space requirements, and the locations have different space capacities. Multiple facilities can be assigned to each location if the space capacity is not violated. In this work, three instances of GQAP in different situations are presented. Then, a genetic algorithm is developed to solve the GQAP instances. Finally, the local neighborhood search with the steepest descend strategy is constructed and applied to the final solution obtained by the GA, and the final solution is compared with the best solution found by MPL/CPLEX software and reference papers. The results show that the developed GA heuristic is effective for solving the GQAP.

Large Language Models (LLMs) have achieved remarkable success in reasoning tasks with the development of prompting methods. However, existing prompting approaches cannot reuse insights of solving similar problems and suffer from accumulated errors in multi-step reasoning, since they prompt LLMs to reason \textit{from scratch}. To address these issues, we propose \textbf{\textit{Thought Propagation} (TP)}, which explores the analogous problems and leverages their solutions to enhance the complex reasoning ability of LLMs. These analogous problems are related to the input one, with reusable solutions and problem-solving strategies. Thus, it is promising to propagate insights of solving previous analogous problems to inspire new problem-solving. To achieve this, TP first prompts LLMs to propose and solve a set of analogous problems that are related to the input one. Then, TP reuses the results of analogous problems to directly yield a new solution or derive a knowledge-intensive plan for execution to amend the initial solution obtained from scratch. TP is compatible with existing prompting approaches, allowing plug-and-play generalization and enhancement in a wide range of tasks without much labor in task-specific prompt engineering. Experiments across three challenging tasks demonstrate TP enjoys a substantial improvement over the baselines by an average of 12\% absolute increase in finding the optimal solutions in Shortest-path Reasoning, 13\% improvement of human preference in Creative Writing, and 15\% enhancement in the task completion rate of LLM-Agent Planning.

Kinetic equations are crucial for modeling non-equilibrium phenomena, but their computational complexity is a challenge. This paper presents a data-driven approach using reduced order models (ROM) to efficiently model non-equilibrium flows in kinetic equations by comparing two ROM approaches: Proper Orthogonal Decomposition (POD) and autoencoder neural networks (AE). While AE initially demonstrate higher accuracy, POD's precision improves as more modes are considered. Notably, our work recognizes that the classical POD-MOR approach, although capable of accurately representing the non-linear solution manifold of the kinetic equation, may not provide a parsimonious model of the data due to the inherently non-linear nature of the data manifold. We demonstrate how AEs are used in finding the intrinsic dimension of a system and to allow correlating the intrinsic quantities with macroscopic quantities that have a physical interpretation.

Large Language Models (LLMs) have the ability to solve a variety of tasks, such as text summarization and mathematical questions, just out of the box, but they are often trained with a single task in mind. Due to high computational costs, the current trend is to use prompt instruction tuning to better adjust monolithic, pretrained LLMs for new -- but often individual -- downstream tasks. Thus, how one would expand prompt tuning to handle -- concomitantly -- heterogeneous tasks and data distributions is a widely open question. To address this gap, we suggest the use of \emph{Mixture of Prompts}, or MoPs, associated with smart gating functionality: the latter -- whose design is one of the contributions of this paper -- can identify relevant skills embedded in different groups of prompts and dynamically assign combined experts (i.e., collection of prompts), based on the target task. Additionally, MoPs are empirically agnostic to any model compression technique applied -- for efficiency reasons -- as well as instruction data source and task composition. In practice, MoPs can simultaneously mitigate prompt training "interference" in multi-task, multi-source scenarios (e.g., task and data heterogeneity across sources), as well as possible implications from model approximations. As a highlight, MoPs manage to decrease final perplexity from $\sim20\%$ up to $\sim70\%$, as compared to baselines, in the federated scenario, and from $\sim 3\%$ up to $\sim30\%$ in the centralized scenario.

Graph Neural Networks (GNNs) have been studied from the lens of expressive power and generalization. However, their optimization properties are less well understood. We take the first step towards analyzing GNN training by studying the gradient dynamics of GNNs. First, we analyze linearized GNNs and prove that despite the non-convexity of training, convergence to a global minimum at a linear rate is guaranteed under mild assumptions that we validate on real-world graphs. Second, we study what may affect the GNNs' training speed. Our results show that the training of GNNs is implicitly accelerated by skip connections, more depth, and/or a good label distribution. Empirical results confirm that our theoretical results for linearized GNNs align with the training behavior of nonlinear GNNs. Our results provide the first theoretical support for the success of GNNs with skip connections in terms of optimization, and suggest that deep GNNs with skip connections would be promising in practice.

Domain generalization (DG), i.e., out-of-distribution generalization, has attracted increased interests in recent years. Domain generalization deals with a challenging setting where one or several different but related domain(s) are given, and the goal is to learn a model that can generalize to an unseen test domain. For years, great progress has been achieved. This paper presents the first review for recent advances in domain generalization. First, we provide a formal definition of domain generalization and discuss several related fields. Next, we thoroughly review the theories related to domain generalization and carefully analyze the theory behind generalization. Then, we categorize recent algorithms into three classes and present them in detail: data manipulation, representation learning, and learning strategy, each of which contains several popular algorithms. Third, we introduce the commonly used datasets and applications. Finally, we summarize existing literature and present some potential research topics for the future.

Deep neural models in recent years have been successful in almost every field, including extremely complex problem statements. However, these models are huge in size, with millions (and even billions) of parameters, thus demanding more heavy computation power and failing to be deployed on edge devices. Besides, the performance boost is highly dependent on redundant labeled data. To achieve faster speeds and to handle the problems caused by the lack of data, knowledge distillation (KD) has been proposed to transfer information learned from one model to another. KD is often characterized by the so-called `Student-Teacher' (S-T) learning framework and has been broadly applied in model compression and knowledge transfer. This paper is about KD and S-T learning, which are being actively studied in recent years. First, we aim to provide explanations of what KD is and how/why it works. Then, we provide a comprehensive survey on the recent progress of KD methods together with S-T frameworks typically for vision tasks. In general, we consider some fundamental questions that have been driving this research area and thoroughly generalize the research progress and technical details. Additionally, we systematically analyze the research status of KD in vision applications. Finally, we discuss the potentials and open challenges of existing methods and prospect the future directions of KD and S-T learning.

北京阿比特科技有限公司