99视频在线播放喷射_午夜大片国产片观看爽爽_国产无遮挡又黄又爽不要VIP软_成人片免费无码视频_日韩激情电影一区二区在线_欧美日韩一区二区中文字幕视频_久久久久久久精品妇女影视

When there exists uncertainty, AI machines are designed to make decisions so as to reach the best expected outcomes. Expectations are based on true facts about the objective environment the machines interact with, and those facts can be encoded into AI models in the form of true objective probability functions. Accordingly, AI models involve probabilistic machine learning in which the probabilities should be objectively interpreted. We prove under some basic assumptions when machines can learn the true objective probabilities, if any, and when machines cannot learn them.

相關內容

Learning

關注 12

Performer · 類別 · 周期的 · 相互獨立的 · 對數幾率回歸 ·

2024 年 8 月 21 日

Are Scientists Changing their Research Productivity Classes When They Move Up the Academic Ladder?

Marek Kwiek,Wojciech Roszka

from arxiv, 36 pages, 9 tables, 4 figures

We approach productivity in science in a longitudinal fashion: We track careers over time, up to 40 years. We first allocate scientists to decile-based publishing productivity classes, from the bottom 10% to the top 10%. Then, we seek patterns of mobility between the classes in two career stages: assistant professorship and associate professorship. Our findings confirm that radically changing publishing productivity levels (upward or downward) almost never happens. Scientists with a very weak past track record in publications emerge as having marginal chances of becoming scientists with a very strong future track record across all science, technology, engineering, mathematics, and medicine (STEMM) fields. Hence, our research shows a long-term character of careers in science, with publishing productivity during the apprenticeship period of assistant professorship heavily influencing productivity during the more independent period of associate professorship. We use individual-level microdata on academic careers (from a national registry of scientists) and individual-level metadata on publications (from the Scopus raw dataset). Polish associate professors tend to be stuck in their productivity classes for years: High performers tend to remain high performers, and low performers tend to remain low performers over their careers. Logistic regression analysis powerfully supports our two-dimensional results. We examine all internationally visible Polish associate professors in five fields of science in STEMM fields (N = 4,165 with N art = 71,841 articles).

MoDELS · 語言模型化 · 約束 · 大語言模型 · Notability ·

2024 年 8 月 20 日

SysBench: Can Large Language Models Follow System Messages?

Yanzhao Qin,Tao Zhang,Tao Zhang,Yanjun Shen,Wenjing Luo,Haoze Sun,Yan Zhang,Yujing Qiao,Weipeng Chen,Zenan Zhou,Wentao Zhang,Bin Cui

Large Language Models (LLMs) have become instrumental across various applications, with the customization of these models to specific scenarios becoming increasingly critical. System message, a fundamental component of LLMs, is consist of carefully crafted instructions that guide the behavior of model to meet intended goals. Despite the recognized potential of system messages to optimize AI-driven solutions, there is a notable absence of a comprehensive benchmark for evaluating how well different LLMs follow these system messages. To fill this gap, we introduce SysBench, a benchmark that systematically analyzes system message following ability in terms of three challenging aspects: constraint complexity, instruction misalignment and multi-turn stability. In order to enable effective evaluation, SysBench constructs multi-turn user conversations covering various interaction relationships, based on six common types of constraints from system messages in real-world scenarios. Our dataset contains 500 system messages from various domains, each paired with 5 turns of user conversations, which have been manually formulated and checked to guarantee high quality. SysBench provides extensive evaluation across various LLMs, measuring their ability to follow specified constraints given in system messages. The results highlight both the strengths and weaknesses of existing models, offering key insights and directions for future research. The open source library SysBench is available at //github.com/PKU-Baichuan-MLSystemLab/SysBench.

語言模型化 · MoDELS · 大語言模型 · 樣例 · Processing（編程語言） ·

2024 年 8 月 18 日

Could a Large Language Model be Conscious?

David J. Chalmers

from arxiv, Invited lecture at NeurIPS, November 28, 2022

There has recently been widespread discussion of whether large language models might be sentient. Should we take this idea seriously? I will break down the strongest reasons for and against. Given mainstream assumptions in the science of consciousness, there are significant obstacles to consciousness in current models: for example, their lack of recurrent processing, a global workspace, and unified agency. At the same time, it is quite possible that these obstacles will be overcome in the next decade or so. I conclude that while it is somewhat unlikely that current large language models are conscious, we should take seriously the possibility that successors to large language models may be conscious in the not-too-distant future.

泛函 · MoDELS · Performer · Atom（文本編輯器） · Processing（編程語言） ·

2024 年 8 月 16 日

Achieving Complex Image Edits via Function Aggregation with Diffusion Models

Mohammadreza Samadi,Fred X. Han,Mohammad Salameh,Hao Wu,Fengyu Sun,Chunhua Zhou,Di Niu

Diffusion models have demonstrated strong performance in generative tasks, making them ideal candidates for image editing. Recent studies highlight their ability to apply desired edits effectively by following textual instructions, yet two key challenges persist. First, these models struggle to apply multiple edits simultaneously, resulting in computational inefficiencies due to their reliance on sequential processing. Second, relying on textual prompts to determine the editing region can lead to unintended alterations in other parts of the image. In this work, we introduce FunEditor, an efficient diffusion model designed to learn atomic editing functions and perform complex edits by aggregating simpler functions. This approach enables complex editing tasks, such as object movement, by aggregating multiple functions and applying them simultaneously to specific areas. FunEditor is 5 to 24 times faster inference than existing methods on complex tasks like object movement. Our experiments demonstrate that FunEditor significantly outperforms recent baselines, including both inference-time optimization methods and fine-tuned models, across various metrics, such as image quality assessment (IQA) and object-background consistency.

Analysis · MoDELS · Performer · Hacking · Twitter ·

2024 年 8 月 11 日

A Diamond Model Analysis on Twitter's Biggest Hack

Chaitanya Rahalkar

Cyberattacks have prominently increased over the past few years now, and have targeted actors from a wide variety of domains. Understanding the motivation, infrastructure, attack vectors, etc. behind such attacks is vital to proactively work against preventing such attacks in the future and also to analyze the economic and social impact of such attacks. In this paper, we leverage the diamond model to perform an intrusion analysis case study of the 2020 Twitter account hijacking Cyberattack. We follow this standardized incident response model to map the adversary, capability, infrastructure, and victim and perform a comprehensive analysis of the attack, and the impact posed by the attack from a Cybersecurity policy standpoint.

Integration · MoDELS · 環 · on the fly · state-of-the-art ·

2024 年 8 月 9 日

Integrating Loop Acceleration into Bounded Model Checking

Florian Frohn,Jürgen Giesl

Bounded Model Checking (BMC) is a powerful technique for proving unsafety. However, finding deep counterexamples that require a large bound is challenging for BMC. On the other hand, acceleration techniques compute "shortcuts" that "compress" many execution steps into a single one. In this paper, we tightly integrate acceleration techniques into SMT-based bounded model checking. By adding suitable "shortcuts" on the fly, our approach can quickly detect deep counterexamples. Moreover, using so-called blocking clauses, our approach can prove safety of examples where BMC diverges. An empirical comparison with other state-of-the-art techniques shows that our approach is highly competitive for proving unsafety, and orthogonal to existing techniques for proving safety.

Analysis · MoDELS · 推薦系統 · 可辨認的 · Extensibility ·

2024 年 8 月 7 日

A Reproducible Analysis of Sequential Recommender Systems

Filippo Betello,Antonio Purificato,Federico Siciliano,Giovanni Trappolini,Andrea Bacciu,Nicola Tonellotto,Fabrizio Silvestri

from arxiv, 8 pages, 5 figures

Sequential Recommender Systems (SRSs) have emerged as a highly efficient approach to recommendation systems. By leveraging sequential data, SRSs can identify temporal patterns in user behaviour, significantly improving recommendation accuracy and relevance.Ensuring the reproducibility of these models is paramount for advancing research and facilitating comparisons between them. Existing works exhibit shortcomings in reproducibility and replicability of results, leading to inconsistent statements across papers. Our work fills these gaps by standardising data pre-processing and model implementations, providing a comprehensive code resource, including a framework for developing SRSs and establishing a foundation for consistent and reproducible experimentation. We conduct extensive experiments on several benchmark datasets, comparing various SRSs implemented in our resource. We challenge prevailing performance benchmarks, offering new insights into the SR domain. For instance, SASRec does not consistently outperform GRU4Rec. On the contrary, when the number of model parameters becomes substantial, SASRec starts to clearly dominate all the other SRSs. This discrepancy underscores the significant impact that experimental configuration has on the outcomes and the importance of setting it up to ensure precise and comprehensive results. Failure to do so can lead to significantly flawed conclusions, highlighting the need for rigorous experimental design and analysis in SRS research. Our code is available at //github.com/antoniopurificato/recsys_repro_conf.

代碼 · 語言模型化 · Taxonomy · MoDELS · 大語言模型 ·

2024 年 8 月 6 日

Where Do Large Language Models Fail When Generating Code?

Zhijie Wang,Zijie Zhou,Da Song,Yuheng Huang,Shengmai Chen,Lei Ma,Tianyi Zhang

from arxiv, Extended from our MAPS 2023 paper. Our data is available at //llm-code-errors.cs.purdue.edu

Large Language Models (LLMs) have shown great potential in code generation. However, current LLMs still cannot reliably generate correct code. Moreover, it is unclear what kinds of code generation errors LLMs can make. To address this, we conducted an empirical study to analyze incorrect code snippets generated by six popular LLMs on the HumanEval dataset. We analyzed these errors alongside two dimensions of error characteristics -- semantic characteristics and syntactic characteristics -- to derive a comprehensive code generation error taxonomy for LLMs through open coding and thematic analysis. We then labeled all 557 incorrect code snippets based on this taxonomy. Our results showed that the six LLMs exhibited similar distributions of syntactic characteristics while different distributions of semantic characteristics. Furthermore, we analyzed the correlation between different error characteristics and factors such as task complexity, code length, and test-pass rate. Finally, we highlight the challenges that LLMs may encounter when generating code and propose implications for future research on reliable code generation with LLMs.

MoDELS · CRAFT · 語言模型化 · Learning · 黑盒 ·

2024 年 8 月 5 日

Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models?

Mohammad Bahrami Karkevandi,Nishant Vishwamitra,Peyman Najafirad

from arxiv, Accepted to AI4CYBER - KDD 2024

Large Language Models (LLMs) have demonstrated impressive capabilities in natural language tasks, but their safety and morality remain contentious due to their training on internet text corpora. To address these concerns, alignment techniques have been developed to improve the public usability and safety of LLMs. Yet, the potential for generating harmful content through these models seems to persist. This paper explores the concept of jailbreaking LLMs-reversing their alignment through adversarial triggers. Previous methods, such as soft embedding prompts, manually crafted prompts, and gradient-based automatic prompts, have had limited success on black-box models due to their requirements for model access and for producing a low variety of manually crafted prompts, making them susceptible to being blocked. This paper introduces a novel approach using reinforcement learning to optimize adversarial triggers, requiring only inference API access to the target model and a small surrogate model. Our method, which leverages a BERTScore-based reward function, enhances the transferability and effectiveness of adversarial triggers on new black-box models. We demonstrate that this approach improves the performance of adversarial triggers on a previously untested language model.

模式崩潰 · 對抗自編碼 · 自編碼器 · 峰值 · Better ·

2018 年 3 月 23 日

Generative Adversarial Autoencoder Networks

Ngoc-Trung Tran,Tuan-Anh Bui,Ngai-Man Cheung

We introduce an effective model to overcome the problem of mode collapse when training Generative Adversarial Networks (GAN). Firstly, we propose a new generator objective that finds it better to tackle mode collapse. And, we apply an independent Autoencoders (AE) to constrain the generator and consider its reconstructed samples as "real" samples to slow down the convergence of discriminator that enables to reduce the gradient vanishing problem and stabilize the model. Secondly, from mappings between latent and data spaces provided by AE, we further regularize AE by the relative distance between the latent and data samples to explicitly prevent the generator falling into mode collapse setting. This idea comes when we find a new way to visualize the mode collapse on MNIST dataset. To the best of our knowledge, our method is the first to propose and apply successfully the relative distance of latent and data samples for stabilizing GAN. Thirdly, our proposed model, namely Generative Adversarial Autoencoder Networks (GAAN), is stable and has suffered from neither gradient vanishing nor mode collapse issues, as empirically demonstrated on synthetic, MNIST, MNIST-1K, CelebA and CIFAR-10 datasets. Experimental results show that our method can approximate well multi-modal distribution and achieve better results than state-of-the-art methods on these benchmark datasets. Our model implementation is published here: //github.com/tntrung/gaan