国产乱人弄视频免费观看,无码一级毛片免费,日本一区二区三区暧暧视频免费,五月婷婷六月丁香自拍偷拍,一区二区三区视频色欲影院

Standard mixed-integer programming formulations for the stable set problem on $n$-node graphs require $n$ integer variables. We prove that this is almost optimal: We give a family of $n$-node graphs for which every polynomial-size MIP formulation requires $\Omega(n/\log^2 n)$ integer variables. By a polyhedral reduction we obtain an analogous result for $n$-item knapsack problems. In both cases, this improves the previously known bounds of $\Omega(\sqrt{n}/\log n)$ by Cevallos, Weltge & Zenklusen (SODA 2018). To this end, we show that there exists a family of $n$-node graphs whose stable set polytopes satisfy the following: any $(1+\varepsilon/n)$-approximate extended formulation for these polytopes, for some constant $\varepsilon > 0$, has size $2^{\Omega(n/\log n)}$. Our proof extends and simplifies the information-theoretic methods due to G\"o\"os, Jain & Watson (FOCS 2016, SIAM J. Comput. 2018) who showed the same result for the case of exact extended formulations (i.e. $\varepsilon = 0$).

相關內容

情景

關注 1

Performer · GPT-3.5 · 回合 · 穩健性 · MoDELS ·

2024 年 4 月 25 日

How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments

Jen-tse Huang,Eric John Li,Man Ho Lam,Tian Liang,Wenxuan Wang,Youliang Yuan,Wenxiang Jiao,Xing Wang,Zhaopeng Tu,Michael R. Lyu

from arxiv, 16 pages of main text. 11 pages of appendices. 15 figures, 9 tables. Updated scoring scheme

Decision-making, a complicated task requiring various types of abilities, presents an excellent framework for assessing Large Language Models (LLMs). Our research investigates LLMs' decision-making capabilities through the lens of a well-established field, Game Theory. We focus specifically on games that support the participation of more than two agents simultaneously. Subsequently, we introduce our framework, GAMA-Bench, including eight classical multi-agent games. We design a scoring scheme to assess a model's performance in these games quantitatively. Through GAMA-Bench, we investigate LLMs' robustness, generalizability, and enhancement strategies. Results reveal that while GPT-3.5 shows satisfying robustness, its generalizability is relatively limited. However, its performance can be improved through approaches such as Chain-of-Thought. Additionally, we conduct evaluations across various LLMs and find that GPT-4 outperforms other models on GAMA-Bench, achieving a score of 60.5. Moreover, Gemini-1.0-Pro and GPT-3.5 (0613, 1106, 0125) demonstrate similar intelligence on GAMA-Bench. The code and experimental results are made publicly available via //github.com/CUHK-ARISE/GAMABench.

Learning · 多樣性 · Performer · INFORMS · Better ·

2024 年 4 月 25 日

Structure in Deep Reinforcement Learning: A Survey and Open Problems

Aditya Mohan,Amy Zhang,Marius Lindauer

from arxiv, Published at the Journal of Artificial Intelligence Research, Volume 79, Pages 1167-1236

Reinforcement Learning (RL), bolstered by the expressive capabilities of Deep Neural Networks (DNNs) for function approximation, has demonstrated considerable success in numerous applications. However, its practicality in addressing various real-world scenarios, characterized by diverse and unpredictable dynamics, noisy signals, and large state and action spaces, remains limited. This limitation stems from poor data efficiency, limited generalization capabilities, a lack of safety guarantees, and the absence of interpretability, among other factors. To overcome these challenges and improve performance across these crucial metrics, one promising avenue is to incorporate additional structural information about the problem into the RL learning process. Various sub-fields of RL have proposed methods for incorporating such inductive biases. We amalgamate these diverse methodologies under a unified framework, shedding light on the role of structure in the learning problem, and classify these methods into distinct patterns of incorporating structure. By leveraging this comprehensive framework, we provide valuable insights into the challenges of structured RL and lay the groundwork for a design pattern perspective on RL research. This novel perspective paves the way for future advancements and aids in developing more effective and efficient RL algorithms that can potentially handle real-world scenarios better.

QoE · 流 · ReQuEST · INTERACT · 優化器 ·

2024 年 4 月 25 日

Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services

Jiachen Liu,Zhiyu Wu,Jae-Won Chung,Fan Lai,Myungjin Lee,Mosharaf Chowdhury

from arxiv, 16 pages, 22 figures

The advent of large language models (LLMs) has transformed text-based services, enabling capabilities ranging from real-time translation to AI-driven chatbots. However, existing serving systems primarily focus on optimizing server-side aggregate metrics like token generation throughput, ignoring individual user experience with streamed text. As a result, under high and/or bursty load, a significant number of users can receive unfavorable service quality or poor Quality-of-Experience (QoE). In this paper, we first formally define QoE of text streaming services, where text is delivered incrementally and interactively to users, by considering the end-to-end token delivery process throughout the entire interaction with the user. Thereafter, we propose Andes, a QoE-aware serving system that enhances user experience for LLM-enabled text streaming services. At its core, Andes strategically allocates contended GPU resources among multiple requests over time to optimize their QoE. Our evaluations demonstrate that, compared to the state-of-the-art LLM serving systems like vLLM, Andes improves the average QoE by up to 3.2$\times$ under high request rate, or alternatively, it attains up to 1.6$\times$ higher request rate while preserving high QoE.

MoDELS · ONNX · CASE · Analysis · Engineering ·

2024 年 4 月 24 日

Analysis of Failures and Risks in Deep Learning Model Converters: A Case Study in the ONNX Ecosystem

Purvish Jajal,Wenxin Jiang,Arav Tewari,Erik Kocinare,Joseph Woo,Anusha Sarraf,Yung-Hsiang Lu,George K. Thiruvathukal,James C. Davis

Software engineers develop, fine-tune, and deploy deep learning (DL) models using a variety of development frameworks and runtime environments. DL model converters move models between frameworks and to runtime environments. Conversion errors compromise model quality and disrupt deployment. However, the failure characteristics of DL model converters are unknown, adding risk when using DL interoperability technologies. This paper analyzes failures in DL model converters. We survey software engineers about DL interoperability tools, use cases, and pain points (N=92). Then, we characterize failures in model converters associated with the main interoperability tool, ONNX (N=200 issues in PyTorch and TensorFlow). Finally, we formulate and test two hypotheses about structural causes for the failures we studied. We find that the node conversion stage of a model converter accounts for ~75% of the defects and 33% of reported failure are related to semantically incorrect models. The cause of semantically incorrect models is elusive, but models with behaviour inconsistencies share operator sequences. Our results motivate future research on making DL interoperability software simpler to maintain, extend, and validate. Research into behavioural tolerances and architectural coverage metrics could be fruitful.

圖 · 相同 · MoDELS · 論文 ·

2024 年 4 月 24 日

Maximum Cut on Interval Graphs of Interval Count Two is NP-complete

Alexey Barsukov,Bodhayan Roy

An interval graph has interval count $\ell$ if it has an interval model, where among every $\ell+1$ intervals there are two that have the same length. Maximum Cut on interval graphs has been found to be NP-complete recently by Adhikary et al. while deciding its complexity on unit interval graphs (graphs with interval count one) remains a longstanding open problem. More recently, de Figueiredo et al. have made an advancement by showing that the problem remains NP-complete on interval graphs of interval count four. In this paper, we show that Maximum Cut is NP-complete even on interval graphs of interval count two.

估計/估計量 · UniFormer · 置信度 · 稀疏 · MoDELS ·

2024 年 4 月 23 日

Estimation and Uniform Inference in Sparse High-Dimensional Additive Models

Philipp Bach,Sven Klaassen,Jannis Kueck,Martin Spindler

We develop a novel method to construct uniformly valid confidence bands for a nonparametric component $f_1$ in the sparse additive model $Y=f_1(X_1)+\ldots + f_p(X_p) + \varepsilon$ in a high-dimensional setting. Our method integrates sieve estimation into a high-dimensional Z-estimation framework, facilitating the construction of uniformly valid confidence bands for the target component $f_1$. To form these confidence bands, we employ a multiplier bootstrap procedure. Additionally, we provide rates for the uniform lasso estimation in high dimensions, which may be of independent interest. Through simulation studies, we demonstrate that our proposed method delivers reliable results in terms of estimation and coverage, even in small samples.

cache · MoDELS · Performer · 解碼 · GPU ·

2024 年 4 月 23 日

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Hanshi Sun,Zhuoming Chen,Xinyu Yang,Yuandong Tian,Beidi Chen

With large language models (LLMs) widely deployed in long content generation recently, there has emerged an increasing demand for efficient long-sequence inference support. However, key-value (KV) cache, which is stored to avoid re-computation, has emerged as a critical bottleneck by growing linearly in size with the sequence length. Due to the auto-regressive nature of LLMs, the entire KV cache will be loaded for every generated token, resulting in low utilization of computational cores and high latency. While various compression methods for KV cache have been proposed to alleviate this issue, they suffer from degradation in generation quality. We introduce TriForce, a hierarchical speculative decoding system that is scalable to long sequence generation. This approach leverages the original model weights and dynamic sparse KV cache via retrieval as a draft model, which serves as an intermediate layer in the hierarchy and is further speculated by a smaller model to reduce its drafting latency. TriForce not only facilitates impressive speedups for Llama2-7B-128K, achieving up to 2.31$\times$ on an A100 GPU but also showcases scalability in handling even longer contexts. For the offloading setting on two RTX 4090 GPUs, TriForce achieves 0.108s/token$\unicode{x2014}$only half as slow as the auto-regressive baseline on an A100, which attains 7.78$\times$ on our optimized offloading system. Additionally, TriForce performs 4.86$\times$ than DeepSpeed-Zero-Inference on a single RTX 4090 GPU. TriForce's robustness is highlighted by its consistently outstanding performance across various temperatures. The code is available at //github.com/Infini-AI-Lab/TriForce.

任務對話系統 · CASES · Automator · Integration · 語言模型化 ·

2024 年 4 月 19 日

Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs

Clemencia Siro,Mohammad Aliannejadi,Maarten de Rijke

from arxiv, Accepted at SIGIR 2024 long paper track

In ad-hoc retrieval, evaluation relies heavily on user actions, including implicit feedback. In a conversational setting such signals are usually unavailable due to the nature of the interactions, and, instead, the evaluation often relies on crowdsourced evaluation labels. The role of user feedback in annotators' assessment of turns in a conversational perception has been little studied. We focus on how the evaluation of task-oriented dialogue systems (TDSs), is affected by considering user feedback, explicit or implicit, as provided through the follow-up utterance of a turn being evaluated. We explore and compare two methodologies for assessing TDSs: one includes the user's follow-up utterance and one without. We use both crowdworkers and large language models (LLMs) as annotators to assess system responses across four aspects: relevance, usefulness, interestingness, and explanation quality. Our findings indicate that there is a distinct difference in ratings assigned by both annotator groups in the two setups, indicating user feedback does influence system evaluation. Workers are more susceptible to user feedback on usefulness and interestingness compared to LLMs on interestingness and relevance. User feedback leads to a more personalized assessment of usefulness by workers, aligning closely with the user's explicit feedback. Additionally, in cases of ambiguous or complex user requests, user feedback improves agreement among crowdworkers. These findings emphasize the significance of user feedback in refining system evaluations and suggest the potential for automated feedback integration in future research. We publicly release the annotated data to foster research in this area.

INTERACT · MoDELS · 3D · 帶符號距離 · Seven ·

2024 年 4 月 18 日

G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis

Yufei Ye,Abhinav Gupta,Kris Kitani,Shubham Tulsiani

from arxiv, accepted to CVPR2024; project page at //judyye.github.io/ghop-www

We propose G-HOP, a denoising diffusion based generative prior for hand-object interactions that allows modeling both the 3D object and a human hand, conditioned on the object category. To learn a 3D spatial diffusion model that can capture this joint distribution, we represent the human hand via a skeletal distance field to obtain a representation aligned with the (latent) signed distance field for the object. We show that this hand-object prior can then serve as generic guidance to facilitate other tasks like reconstruction from interaction clip and human grasp synthesis. We believe that our model, trained by aggregating seven diverse real-world interaction datasets spanning across 155 categories, represents a first approach that allows jointly generating both hand and object. Our empirical evaluations demonstrate the benefit of this joint prior in video-based reconstruction and human grasp synthesis, outperforming current task-specific baselines. Project website: //judyye.github.io/ghop-www

正則化項 · 泰勒 · 非凸 · MoDELS · 易處理的 ·

2024 年 4 月 18 日

Global Convergence of High-Order Regularization Methods with Sums-of-Squares Taylor Models

Wenqi Zhu,Coralia Cartis

High-order tensor methods that employ Taylor-based local models (of degree $p\ge 3$) within adaptive regularization frameworks have been recently proposed for both convex and nonconvex optimization problems. They have been shown to have superior, and even optimal, worst-case global convergence rates and local rates compared to Newton's method. Finding rigorous and efficient techniques for minimizing the Taylor polynomial sub-problems remains a challenging aspect for these algorithms. Ahmadi et al. recently introduced a tensor method based on sum-of-squares (SoS) reformulations, so that each Taylor polynomial sub-problem in their approach can be tractably minimized using semidefinite programming (SDP); however, the global convergence and complexity of their method have not been addressed for general nonconvex problems. This paper introduces an algorithmic framework that combines the Sum of Squares (SoS) Taylor model with adaptive regularization techniques for nonconvex smooth optimization problems. Each iteration minimizes an SoS Taylor model, offering a polynomial cost per iteration. For general nonconvex functions, the worst-case evaluation complexity bound is $\mathcal{O}(\epsilon^{-2})$, while for strongly convex functions, an improved evaluation complexity bound of $\mathcal{O}(\epsilon^{-\frac{1}{p}})$ is established. To the best of our knowledge, this is the first global rate analysis for an adaptive regularization algorithm with a tractable high-order sub-problem in nonconvex smooth optimization, opening the way for further improvements.