亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Recently, there has been a trend of evaluating the Large Language Model (LLM) quality in the flavor of LLM-as-a-Judge, namely leveraging another LLM to evaluate the current output quality. However, existing judges are proven to be biased, namely they would favor answers which present better superficial quality (such as verbosity, fluency) while ignoring the instruction following ability. In this work, we propose systematic research about the bias of LLM-as-a-Judge. Specifically, for closed-source judge models, we apply calibration to mitigate the significance of superficial quality, both on probability level and prompt level. For open-source judge models, we propose to mitigate the bias by contrastive training, with curated negative samples that deviate from instruction but present better superficial quality. We apply our methods on the bias evaluation benchmark, and experiment results show our methods mitigate the bias by a large margin while maintaining a satisfactory evaluation accuracy.

相關內容

Context: As the diversity and complexity of regulations affecting Software-Intensive Products and Services (SIPS) is increasing, software engineers need to address the growing regulatory scrutiny. As with any other non-negotiable requirements, SIPS compliance should be addressed early in SIPS engineering - i.e., during requirements engineering (RE). Objectives: In the conditions of the expanding regulatory landscape, existing research offers scattered insights into regulatory compliance of SIPS. This study addresses the pressing need for a structured overview of the state of the art in software RE and its contribution to regulatory compliance of SIPS. Method: We conducted a systematic mapping study to provide an overview of the current state of research regarding challenges, principles and practices for regulatory compliance of SIPS related to RE. We focused on the role of RE and its contribution to other SIPS lifecycle phases. We retrieved 6914 studies published from 2017 until 2023 from four academic databases, which we filtered down to 280 relevant primary studies. Results: We identified and categorized the RE-related challenges in regulatory compliance of SIPS and their potential connection to six types of principles and practices. We found that about 13.6% of the primary studies considered the involvement of both software engineers and legal experts. About 20.7% of primary studies considered RE in connection to other process areas. Most primary studies focused on a few popular regulation fields and application domains. Our results suggest that there can be differences in terms of challenges and involvement of stakeholders across different fields of regulation. Conclusion: Our findings highlight the need for an in-depth investigation of stakeholders' roles, relationships between process areas, and specific challenges for distinct regulatory fields to guide research and practice.

Collaborative Filtering (CF) methods dominate real-world recommender systems given their ability to learn high-quality, sparse ID-embedding tables that effectively capture user preferences. These tables scale linearly with the number of users and items, and are trained to ensure high similarity between embeddings of interacted user-item pairs, while maintaining low similarity for non-interacted pairs. Despite their high performance, encouraging dispersion for non-interacted pairs necessitates expensive regularization (e.g., negative sampling), hurting runtime and scalability. Existing research tends to address these challenges by simplifying the learning process, either by reducing model complexity or sampling data, trading performance for runtime. In this work, we move beyond model-level modifications and study the properties of the embedding tables under different learning strategies. Through theoretical analysis, we find that the singular values of the embedding tables are intrinsically linked to different CF loss functions. These findings are empirically validated on real-world datasets, demonstrating the practical benefits of higher stable rank, a continuous version of matrix rank which encodes the distribution of singular values. Based on these insights, we propose an efficient warm-start strategy that regularizes the stable rank of the user and item embeddings. We show that stable rank regularization during early training phases can promote higher-quality embeddings, resulting in training speed improvements of up to 66%. Additionally, stable rank regularization can act as a proxy for negative sampling, allowing for performance gains of up to 21% over loss functions with small negative sampling ratios. Overall, our analysis unifies current CF methods under a new perspective, their optimization of stable rank, motivating a flexible regularization method.

This study develops an algorithm to solve a variation of the Shortest Common Superstring (SCS) problem. There are two modifications to the base SCS problem. First, one string in the set S is allowed to have up to K mistakes, defined as not matching the SCS in at most K positions. Second, no string in S can be a substring of another in S. The algorithm proposed for the problem is exact.

In this position paper, we present a prototype of a visualizer for functional programs. Such programs, whose evaluation model is the reduction of an expression to a value through repeated application of rewriting rules, and which tend to make little or no use of mutable state, are amenable to visualization in the same fashion as simple mathematical expressions, with which every schoolchild is familiar. We show how such visualizations may be produced for the strict functional language OCaml, by direct interpretation of the abstract syntax tree and appropriate pretty-printing. We describe (and begin to address) the challenges of presenting such program traces in limited space and of identifying their essential elements, so that our methods will one day be practical for more than toy programs. We consider the problems posed by the parts of modern functional programming which are not purely functional such as mutable state, input/output and exceptions. We describe initial work on the use of such visualizations to address the problem of program debugging, which is our ultimate aim.

We initiate an investigation into the optimization properties of next-token prediction (NTP), the dominant training paradigm for modern language models. Specifically, we study the structural properties of the solutions selected by gradient-based optimizers among the many possible minimizers of the NTP objective. By framing NTP as cross-entropy minimization across distinct contexts, each tied with a sparse conditional probability distribution across a finite vocabulary of tokens, we introduce "NTP-separability conditions" that enable reaching the data-entropy lower bound. With this setup, and focusing on linear models with fixed context embeddings, we characterize the optimization bias of gradient descent (GD): Within the data subspace defined by the sparsity patterns of distinct contexts, GD selects parameters that equate the logits' differences of in-support tokens to their log-odds. In the orthogonal subspace, the GD parameters diverge in norm and select the direction that maximizes a margin specific to NTP. These findings extend previous research on implicit bias in one-hot classification to the NTP setting, highlighting key differences and prompting further research into the optimization and generalization properties of NTP, irrespective of the specific architecture used to generate the context embeddings.

Impact assessments have emerged as a common way to identify the negative and positive implications of AI deployment, with the goal of avoiding the downsides of its use. It is undeniable that impact assessments are important - especially in the case of rapidly proliferating technologies such as generative AI. But it is also essential to critically interrogate the current literature and practice on impact assessment, to identify its shortcomings, and to develop new approaches that are responsive to these limitations. In this provocation, we do just that by first critiquing the current impact assessment literature and then proposing a novel approach that addresses our concerns: Scenario-Based Sociotechnical Envisioning.

Reward allocation, also known as the credit assignment problem, has been an important topic in economics, engineering, and machine learning. An important concept in reward allocation is the core, which is the set of stable allocations where no agent has the motivation to deviate from the grand coalition. In previous works, computing the core requires either knowledge of the reward function in deterministic games or the reward distribution in stochastic games. However, this is unrealistic, as the reward function or distribution is often only partially known and may be subject to uncertainty. In this paper, we consider the core learning problem in stochastic cooperative games, where the reward distribution is unknown. Our goal is to learn the expected core, that is, the set of allocations that are stable in expectation, given an oracle that returns a stochastic reward for an enquired coalition each round. Within the class of strictly convex games, we present an algorithm named \texttt{Common-Points-Picking} that returns a point in the expected core given a polynomial number of samples, with high probability. To analyse the algorithm, we develop a new extension of the separation hyperplane theorem for multiple convex sets.

Graph Neural Networks (GNNs) have been studied from the lens of expressive power and generalization. However, their optimization properties are less well understood. We take the first step towards analyzing GNN training by studying the gradient dynamics of GNNs. First, we analyze linearized GNNs and prove that despite the non-convexity of training, convergence to a global minimum at a linear rate is guaranteed under mild assumptions that we validate on real-world graphs. Second, we study what may affect the GNNs' training speed. Our results show that the training of GNNs is implicitly accelerated by skip connections, more depth, and/or a good label distribution. Empirical results confirm that our theoretical results for linearized GNNs align with the training behavior of nonlinear GNNs. Our results provide the first theoretical support for the success of GNNs with skip connections in terms of optimization, and suggest that deep GNNs with skip connections would be promising in practice.

Deep Convolutional Neural Networks (CNNs) are a special type of Neural Networks, which have shown state-of-the-art results on various competitive benchmarks. The powerful learning ability of deep CNN is largely achieved with the use of multiple non-linear feature extraction stages that can automatically learn hierarchical representation from the data. Availability of a large amount of data and improvements in the hardware processing units have accelerated the research in CNNs and recently very interesting deep CNN architectures are reported. The recent race in deep CNN architectures for achieving high performance on the challenging benchmarks has shown that the innovative architectural ideas, as well as parameter optimization, can improve the CNN performance on various vision-related tasks. In this regard, different ideas in the CNN design have been explored such as use of different activation and loss functions, parameter optimization, regularization, and restructuring of processing units. However, the major improvement in representational capacity is achieved by the restructuring of the processing units. Especially, the idea of using a block as a structural unit instead of a layer is gaining substantial appreciation. This survey thus focuses on the intrinsic taxonomy present in the recently reported CNN architectures and consequently, classifies the recent innovations in CNN architectures into seven different categories. These seven categories are based on spatial exploitation, depth, multi-path, width, feature map exploitation, channel boosting and attention. Additionally, it covers the elementary understanding of the CNN components and sheds light on the current challenges and applications of CNNs.

Deep Convolutional Neural Networks have pushed the state-of-the art for semantic segmentation provided that a large amount of images together with pixel-wise annotations is available. Data collection is expensive and a solution to alleviate it is to use transfer learning. This reduces the amount of annotated data required for the network training but it does not get rid of this heavy processing step. We propose a method of transfer learning without annotations on the target task for datasets with redundant content and distinct pixel distributions. Our method takes advantage of the approximate content alignment of the images between two datasets when the approximation error prevents the reuse of annotation from one dataset to another. Given the annotations for only one dataset, we train a first network in a supervised manner. This network autonomously learns to generate deep data representations relevant to the semantic segmentation. Then the images in the new dataset, we train a new network to generate a deep data representation that matches the one from the first network on the previous dataset. The training consists in a regression between feature maps and does not require any annotations on the new dataset. We show that this method reaches performances similar to a classic transfer learning on the PASCAL VOC dataset with synthetic transformations.

北京阿比特科技有限公司