亚洲精品无码黄色网站在线观看,亚洲高清日韩国产一区二区三区,国产精品第一页在线观看,百合互慰无码免费视频在线观看

We consider scheduling in the M/G/1 queue with unknown job sizes. It is known that the Gittins policy minimizes mean response time in this setting. However, the behavior of the tail of response time under Gittins is poorly understood, even in the large-response-time limit. Characterizing Gittins's asymptotic tail behavior is important because if Gittins has optimal tail asymptotics, then it simultaneously provides optimal mean response time and good tail performance. In this work, we give the first comprehensive account of Gittins's asymptotic tail behavior. For heavy-tailed job sizes, we find that Gittins always has asymptotically optimal tail. The story for light-tailed job sizes is less clear-cut: Gittins's tail can be optimal, pessimal, or in between. To remedy this, we show that a modification of Gittins avoids pessimal tail behavior while achieving near-optimal mean response time.

相關內容

優化器

關注 4

SimPLe · MoDELS · CASES · INTERACT · 語言模型化 ·

2024 年 3 月 8 日

Can LLMs Follow Simple Rules?

Norman Mu,Sarah Chen,Zifan Wang,Sizhe Chen,David Karamardian,Lulwa Aljeraisy,Basel Alomair,Dan Hendrycks,David Wagner

from arxiv, Project website: //eecs.berkeley.edu/~normanmu/llm_rules; revised content

As Large Language Models (LLMs) are deployed with increasing real-world responsibilities, it is important to be able to specify and constrain the behavior of these systems in a reliable manner. Model developers may wish to set explicit rules for the model, such as "do not generate abusive content", but these may be circumvented by jailbreaking techniques. Existing evaluations of adversarial attacks and defenses on LLMs generally require either expensive manual review or unreliable heuristic checks. To address this issue, we propose Rule-following Language Evaluation Scenarios (RuLES), a programmatic framework for measuring rule-following ability in LLMs. RuLES consists of 14 simple text scenarios in which the model is instructed to obey various rules while interacting with the user. Each scenario has a programmatic evaluation function to determine whether the model has broken any rules in a conversation. Our evaluations of proprietary and open models show that almost all current models struggle to follow scenario rules, even on straightforward test cases. We also demonstrate that simple optimization attacks suffice to significantly increase failure rates on test cases. We conclude by exploring two potential avenues for improvement: test-time steering and supervised fine-tuning.

Performer · RAVEN · 多樣性 · CLUES · 多跳 ·

2024 年 3 月 8 日

How Far Are We from Intelligent Visual Deductive Reasoning?

Yizhe Zhang,He Bai,Ruixiang Zhang,Jiatao Gu,Shuangfei Zhai,Josh Susskind,Navdeep Jaitly

from arxiv, ICLR 2024 AGI workshop. //github.com/apple/ml-rpm-bench

Vision-Language Models (VLMs) such as GPT-4V have recently demonstrated incredible strides on diverse vision language tasks. We dig into vision-based deductive reasoning, a more sophisticated but less explored realm, and find previously unexposed blindspots in the current SOTA VLMs. Specifically, we leverage Raven's Progressive Matrices (RPMs), to assess VLMs' abilities to perform multi-hop relational and deductive reasoning relying solely on visual clues. We perform comprehensive evaluations of several popular VLMs employing standard strategies such as in-context learning, self-consistency, and Chain-of-thoughts (CoT) on three diverse datasets, including the Mensa IQ test, IntelligenceTest, and RAVEN. The results reveal that despite the impressive capabilities of LLMs in text-based reasoning, we are still far from achieving comparable proficiency in visual deductive reasoning. We found that certain standard strategies that are effective when applied to LLMs do not seamlessly translate to the challenges presented by visual reasoning tasks. Moreover, a detailed analysis reveals that VLMs struggle to solve these tasks mainly because they are unable to perceive and comprehend multiple, confounding abstract patterns in RPM examples.

motivation · Spotify · COVID-19 · 相同 · 論文 ·

2024 年 3 月 7 日

What Attracts Employees to Work Onsite in Times of Increased Remote Working?

Darja Smite,Eriks Klotins,Nils Brede Moe

from arxiv, 5 pages, 1 table, 2 figures. Accepted by IEEE Software

COVID-19 pandemic has irreversibly changed the attitude towards office presence. While previously remote workers were met with skepticism and distrust, today the same applies to companies prohibiting remote working. Albeit many workspaces are half empty. In this paper, we offer insights into the role of the office, corporate policies and actions regarding remote work in eight companies: Ericsson, Knowit, SpareBank 1 Utvikling, Spotify, Storebrand, Telenor, Company-X, Company-Y, and their sites in Sweden, Norway and the UK. Our findings are twofold. First, we found that companies indeed struggle with office presence and a large share of corporate space (35-67%) is underutilized. Second, we found that the main motivator for office presence is Connection and community, followed by Material offerings, Preference and Duty. Finally, we summarize actionable advice to promote onsite work, which is likely to help many other companies to rejuvenate life in their offices.

大語言模型 · 可理解性 · 語言模型化 · MoDELS · entity ·

2024 年 3 月 7 日

Do Large Language Model Understand Multi-Intent Spoken Language ?

Shangjian Yin,Peijie Huang,Yuhong Xu,Haojing Huang,Jiatian Chen

This study marks a significant advancement by harnessing Large Language Models (LLMs) for multi-intent spoken language understanding (SLU), proposing a unique methodology that capitalizes on the generative power of LLMs within an SLU context. Our innovative technique reconfigures entity slots specifically for LLM application in multi-intent SLU environments and introduces the concept of Sub-Intent Instruction (SII), enhancing the dissection and interpretation of intricate, multi-intent communication within varied domains. The resultant datasets, dubbed LM-MixATIS and LM-MixSNIPS, are crafted from pre-existing benchmarks. Our research illustrates that LLMs can match and potentially excel beyond the capabilities of current state-of-the-art multi-intent SLU models. It further explores LLM efficacy across various intent configurations and dataset proportions. Moreover, we introduce two pioneering metrics, Entity Slot Accuracy (ESA) and Combined Semantic Accuracy (CSA), to provide an in-depth analysis of LLM proficiency in this complex field.

語言模型化 · 大語言模型 · MoDELS · CASE · 人工智能 ·

2024 年 3 月 7 日

Can Large Language Models Reason and Plan?

Subbarao Kambhampati

from arxiv, arXiv admin note: text overlap with arXiv:2402.01817

While humans sometimes do show the capability of correcting their own erroneous guesses with self-critiquing, there seems to be no basis for that assumption in the case of LLMs.

多樣性 · MoDELS · 可約的 · 語言模型化 · 大語言模型 ·

2024 年 3 月 6 日

Does Writing with Language Models Reduce Content Diversity?

Vishakh Padmakumar,He He

from arxiv, ICLR 2024

Large language models (LLMs) have led to a surge in collaborative writing with model assistance. As different users incorporate suggestions from the same model, there is a risk of decreased diversity in the produced content, potentially limiting diverse perspectives in public discourse. In this work, we measure the impact of co-writing on diversity via a controlled experiment, where users write argumentative essays in three setups -- using a base LLM (GPT3), a feedback-tuned LLM (InstructGPT), and writing without model help. We develop a set of diversity metrics and find that writing with InstructGPT (but not the GPT3) results in a statistically significant reduction in diversity. Specifically, it increases the similarity between the writings of different authors and reduces the overall lexical and content diversity. We additionally find that this effect is mainly attributable to InstructGPT contributing less diverse text to co-written essays. In contrast, the user-contributed text remains unaffected by model collaboration. This suggests that the recent improvement in generation quality from adapting models to human feedback might come at the cost of more homogeneous and less diverse content.

MoDELS · 語言模型化 · 大語言模型 · Performer · GPT-3.5 ·

2024 年 3 月 6 日

Can Large Language Models do Analytical Reasoning?

Yebowen Hu,Kaiqiang Song,Sangwoo Cho,Xiaoyang Wang,Hassan Foroosh,Dong Yu,Fei Liu

This paper explores the cutting-edge Large Language Model with analytical reasoning on sports. Our analytical reasoning embodies the tasks of letting large language models count how many points each team scores in a quarter in the NBA and NFL games. Our major discoveries are in two folds. Firstly, we find among all the models we employed, GPT-4 stands out in effectiveness, followed by Claude-2.1, with GPT-3.5, Gemini-Pro, and Llama-2-70b lagging behind. Specifically, we compare three different prompting techniques and a divide-and-conquer approach, we find that the latter was the most effective. Our divide-and-conquer approach breaks down play-by-play data into smaller, more manageable segments, solves each piece individually, and then aggregates them together. Besides the divide-and-conquer approach, we also explore the Chain of Thought (CoT) strategy, which markedly improves outcomes for certain models, notably GPT-4 and Claude-2.1, with their accuracy rates increasing significantly. However, the CoT strategy has negligible or even detrimental effects on the performance of other models like GPT-3.5 and Gemini-Pro. Secondly, to our surprise, we observe that most models, including GPT-4, struggle to accurately count the total scores for NBA quarters despite showing strong performance in counting NFL quarter scores. This leads us to further investigate the factors that impact the complexity of analytical reasoning tasks with extensive experiments, through which we conclude that task complexity depends on the length of context, the information density, and the presence of related information. Our research provides valuable insights into the complexity of analytical reasoning tasks and potential directions for developing future large language models.

相關系數 · 穩健性 · 泛函 · 數據轉換 · 異常點 ·

2024 年 3 月 6 日

Is Distance Correlation Robust?

Sarah Leyder,Jakob Raymaekers,Peter J. Rousseeuw

Distance correlation is a popular measure of dependence between random variables. It has some robustness properties, but not all. We prove that the influence function of the usual distance correlation is bounded, but that its breakdown value is zero. Moreover, it has an unbounded sensitivity function, converging to the bounded influence function for increasing sample size. To address this sensitivity to outliers we construct a more robust version of distance correlation, which is based on a new data transformation. Simulations indicate that the resulting method is quite robust, and has good power in the presence of outliers. We illustrate the method on genetic data. Comparing the classical distance correlation with its more robust version provides additional insight.

估計/估計量 · 推斷 · 清華大學智能產業研究院 · MoDELS · binary ·

2024 年 3 月 6 日

Bayesian Generalized Distributed Lag Regression with Variable Selection

Daniel Dempsey,Jason Wyse

Distributed Lag Models (DLMs) and similar regression approaches such as MIDAS have been used for many decades in econometrics, and more recently in the study of air quality and its impact on human health. They are useful not only for quantifying accumulating and delayed effects, but also for estimating the lags that are most susceptible to these effects. Among other things, they have been used to infer the period of exposure to poor air quality which might negatively impact child birth weight. The increased attention DLMs have received in recent years is reflective of their potential to help us understand a great many issues, particularly in the investigation of how the environment affects human health. In this paper we describe how to expand the utility of these models for Bayesian inference by leveraging latent-variables. In particular we explain how to perform binary regression to better handle imbalanced data, how to incorporate negative binomial regression, and how to estimate the probability of predictor inclusion. Extra parameters introduced through the DLM framework may require calibration for the MCMC algorithm, but this will not be the case in DLM-based analyses often seen in pollution exposure literature. In these cases, the parameters are inferred through a fully automatic Gibbs sampling procedure.

surge · Microsoft Surface · 統計量 · 均值 · 可辨認的 ·

2024 年 3 月 6 日

Is a Recent Surge in Global Warming Detectable?

Claudie Beaulieu,Colin Gallagher,Rebecca Killick,Robert Lund,Xueheng Shi

The global mean surface temperature is widely studied to monitor climate change. A current debate centers around whether there has been a recent (post-1970s) surge/acceleration in the warming rate. This paper addresses whether an acceleration in the warming rate is detectable from a statistical perspective. We use changepoint models, which are statistical techniques specifically designed for identifying structural changes in time series. Four global mean surface temperature records over 1850-2023 are scrutinized within. Our results show limited evidence for a warming surge; in most surface temperature time series, no change in the warming rate beyond the 1970s is detected. As such, we estimate minimum changes in the warming trend for a surge to be detectable in the near future.