99欧美日韩精品一区二区红桃-欧美成人性色XXⅩXXA片在线

In settings as diverse as autonomous vehicles, cloud computing, and pandemic quarantines, requests for service can arrive in near or true simultaneity with one another. This creates batches of arrivals to the underlying queueing system. In this paper, we study the staffing problem for the batch arrival queue. We show that batches place a significant stress on services, and thus require a high amount of resources and preparation. In fact, we find that there is no economy of scale as the number of customers in each batch increases, creating a stark contrast with the square root safety staffing rules enjoyed by systems with solitary arrivals of customers. Furthermore, when customers arrive both quickly and in batches, an economy of scale can exist, but it is weaker than what is typically expected. Methodologically, these staffing results follow from novel large batch and hybrid large-batch-and-large-rate limits of the general multi-server queueing model. In the pure large batch limit, we establish the first formal connection between multi-server queues and storage processes, another family of stochastic processes. By consequence, we show that the limit of the batch scaled queue length process is not asymptotically normal, and that, in fact, the fluid and diffusion-type limits coincide. This is what drives our staffing analysis of the batch arrival queue, and what implies that the (safety) staffing of this system must be directly proportional to the batch size just to achieve a non-degenerate probability of customers waiting.

相關內容

縮放

關注 0

估計/估計量 · INFORMS · 有偏 · 可辨認的 · Performer ·

2022 年 2 月 4 日

To Impute or not to Impute? -- Missing Data in Treatment Effect Estimation

Jeroen Berrevoets,Fergus Imrie,Trent Kyono,James Jordon,Mihaela van der Schaar

Missing data is a systemic problem in practical scenarios that causes noise and bias when estimating treatment effects. This makes treatment effect estimation from data with missingness a particularly tricky endeavour. A key reason for this is that standard assumptions on missingness are rendered insufficient due to the presence of an additional variable, treatment, besides the individual and the outcome. Having a treatment variable introduces additional complexity with respect to why some variables are missing that is not fully explored by previous work. In our work we identify a new missingness mechanism, which we term mixed confounded missingness (MCM), where some missingness determines treatment selection and other missingness is determined by treatment selection. Given MCM, we show that naively imputing all data leads to poor performing treatment effects models, as the act of imputation effectively removes information necessary to provide unbiased estimates. However, no imputation at all also leads to biased estimates, as missingness determined by treatment divides the population in distinct subpopulations, where estimates across these populations will be biased. Our solution is selective imputation, where we use insights from MCM to inform precisely which variables should be imputed and which should not. We empirically demonstrate how various learners benefit from selective imputation compared to other solutions for missing data.

Automator · 可約的 · Networking · 資源管理 · Continuity ·

2022 年 2 月 4 日

Predictive Closed-Loop Service Automation in O-RAN based Network Slicing

Joseph Thaliath,Solmaz Niknam,Sukhdeep Singh,Rahul Banerji,Navrati Saxena,Harpreet S. Dhillon,Jeffrey H. Reed,Ali Kashif Bashir,Avinash Bhat,Abhishek Roy

from arxiv, 7 pages, 3 figures, 1 table

Network slicing provides introduces customized and agile network deployment for managing different service types for various verticals under the same infrastructure. To cater to the dynamic service requirements of these verticals and meet the required quality-of-service (QoS) mentioned in the service-level agreement (SLA), network slices need to be isolated through dedicated elements and resources. Additionally, allocated resources to these slices need to be continuously monitored and intelligently managed. This enables immediate detection and correction of any SLA violation to support automated service assurance in a closed-loop fashion. By reducing human intervention, intelligent and closed-loop resource management reduces the cost of offering flexible services. Resource management in a network shared among verticals (potentially administered by different providers), would be further facilitated through open and standardized interfaces. Open radio access network (O-RAN) is perhaps the most promising RAN architecture that inherits all the aforementioned features, namely intelligence, open and standard interfaces, and closed control loop. Inspired by this, in this article we provide a closed-loop and intelligent resource provisioning scheme for O-RAN slicing to prevent SLA violations. In order to maintain realism, a real-world dataset of a large operator is used to train a learning solution for optimizing resource utilization in the proposed closed-loop service automation process. Moreover, the deployment architecture and the corresponding flow that are cognizant of the O-RAN requirements are also discussed.

Softmax · 正則化項 · CASES · CASE · 平穩的 ·

2022 年 2 月 2 日

On the Effect of Log-Barrier Regularization in Decentralized Softmax Gradient Play in Multiagent Systems

Runyu Zhang,Jincheng Mei,Bo Dai,Dale Schuurmans,Na Li

Softmax policy gradient is a popular algorithm for policy optimization in single-agent reinforcement learning, particularly since projection is not needed for each gradient update. However, in multi-agent systems, the lack of central coordination introduces significant additional difficulties in the convergence analysis. Even for a stochastic game with identical interest, there can be multiple Nash Equilibria (NEs), which disables proof techniques that rely on the existence of a unique global optimum. Moreover, the softmax parameterization introduces non-NE policies with zero gradient, making NE-seeking difficult for gradient-based algorithms. In this paper, we study the finite time convergence of decentralized softmax gradient play in a special form of game, Markov Potential Games (MPGs), which includes the identical interest game as a special case. We investigate both gradient play and natural gradient play, with and without $\log$-barrier regularization. Establishing convergence for the unregularized cases relies on an assumption that the stationary policies are isolated, and yields convergence bounds that contain a trajectory dependent constant that can be arbitrarily large. We introduce the $\log$-barrier regularization to overcome these drawbacks, with the cost of slightly worse dependence on other factors such as the action set size. An empirical study on an identical interest matrix game confirms the theoretical findings.

Hacking · 多代理人模型 ·

2022 年 1 月 25 日

Hacking the Colony: On the Disruptive Effect of Misleading Pheromone and How to Defend Against It

Ashay Aswale,Antonio López,Aukkawut Ammartayakun,Carlo Pinciroli

from arxiv, 8 pages, accepted at AAMAS2022

Ants have evolved to seek and retrieve food by leaving trails of pheromones. This mechanism has inspired several approaches to decentralized multi-robot coordination. However, in this paper, we show that pheromone trails are a fragile mechanism for coordination, and can be sabotaged to starve the colony. We introduce detractors: malicious agents that leave a misleading, but indistinguishable, trail of food pheromone to distract and trap cooperator ants in the nest. We analyze the effectiveness of detractors with respect to parameters such as evaporation rate of misleading pheromone and fraction of detractors in the colony. In addition, we propose a countermeasure to this attack by introducing a new type of pheromone: the cautionary pheromone. Cooperator ants secrete this type of pheromone atop existing food trails as a warning. When the cautionary pheromone intensity exceeds the food pheromone intensity, cooperator ants ignore overlapping food pheromone. We show that, despite its simplicity, this defense mechanism can limit, but not nullify, the effect of detractors. Ultimately, our work shows that pheromone-based coordination, while effective, is also fragile.

強化學習 · MoDELS · 學成 · contrastive · 最優化 ·

2021 年 12 月 8 日

Recent Advances in Reinforcement Learning in Finance

Ben Hambly,Renyuan Xu,Huining Yang

from arxiv, 60 pages, 1 figure

The rapid changes in the finance industry due to the increasing amount of data have revolutionized the techniques on data processing and data analysis and brought new theoretical and computational challenges. In contrast to classical stochastic control theory and other analytical approaches for solving financial decision-making problems that heavily reply on model assumptions, new developments from reinforcement learning (RL) are able to make full use of the large amount of financial data with fewer model assumptions and to improve decisions in complex financial environments. This survey paper aims to review the recent developments and use of RL approaches in finance. We give an introduction to Markov decision processes, which is the setting for many of the commonly used RL approaches. Various algorithms are then introduced with a focus on value and policy based methods that do not require any model assumptions. Connections are made with neural networks to extend the framework to encompass deep RL algorithms. Our survey concludes by discussing the application of these RL algorithms in a variety of decision-making problems in finance, including optimal execution, portfolio optimization, option pricing and hedging, market making, smart order routing, and robo-advising.

MoDELS · 評論員 · 縮放 · 可理解性 · GPT-3 ·

2021 年 8 月 18 日

On the Opportunities and Risks of Foundation Models

Rishi Bommasani,Drew A. Hudson,Ehsan Adeli,Russ Altman,Simran Arora,Sydney von Arx,Michael S. Bernstein,Jeannette Bohg,Antoine Bosselut,Emma Brunskill,Erik Brynjolfsson,Shyamal Buch,Dallas Card,Rodrigo Castellon,Niladri Chatterji,Annie Chen,Kathleen Creel,Jared Quincy Davis,Dora Demszky,Chris Donahue,Moussa Doumbouya,Esin Durmus,Stefano Ermon,John Etchemendy,Kawin Ethayarajh,Li Fei-Fei,Chelsea Finn,Trevor Gale,Lauren Gillespie,Karan Goel,Noah Goodman,Shelby Grossman,Neel Guha,Tatsunori Hashimoto,Peter Henderson,John Hewitt,Daniel E. Ho,Jenny Hong,Kyle Hsu,Jing Huang,Thomas Icard,Saahil Jain,Dan Jurafsky,Pratyusha Kalluri,Siddharth Karamcheti,Geoff Keeling,Fereshte Khani,Omar Khattab,Pang Wei Kohd,Mark Krass,Ranjay Krishna,Rohith Kuditipudi,Ananya Kumar,Faisal Ladhak,Mina Lee,Tony Lee,Jure Leskovec,Isabelle Levent,Xiang Lisa Li,Xuechen Li,Tengyu Ma,Ali Malik,Christopher D. Manning,Suvir Mirchandani,Eric Mitchell,Zanele Munyikwa,Suraj Nair,Avanika Narayan,Deepak Narayanan,Ben Newman,Allen Nie,Juan Carlos Niebles,Hamed Nilforoshan,Julian Nyarko,Giray Ogut,Laurel Orr,Isabel Papadimitriou,Joon Sung Park,Chris Piech,Eva Portelance,Christopher Potts,Aditi Raghunathan,Rob Reich,Hongyu Ren,Frieda Rong,Yusuf Roohani,Camilo Ruiz,Jack Ryan,Christopher Ré,Dorsa Sadigh,Shiori Sagawa,Keshav Santhanam,Andy Shih,Krishnan Srinivasan,Alex Tamkin,Rohan Taori,Armin W. Thomas,Florian Tramèr,Rose E. Wang,William Wang,Bohan Wu,Jiajun Wu,Yuhuai Wu,Sang Michael Xie,Michihiro Yasunaga,Jiaxuan You,Matei Zaharia,Michael Zhang,Tianyi Zhang,Xikun Zhang,Yuhui Zhang,Lucia Zheng,Kaitlyn Zhou,Percy Liang

from arxiv, Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI)

AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.

隨機性策略 · 強化學習 · CC · Extensibility · 學成 ·

2019 年 6 月 20 日

Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning

Liang Tong,Aron Laszka,Chao Yan,Ning Zhang,Yevgeniy Vorobeychik

from arxiv, v1.0

Detection of malicious behavior is a fundamental problem in security. One of the major challenges in using detection systems in practice is in dealing with an overwhelming number of alerts that are triggered by normal behavior (the so-called false positives), obscuring alerts resulting from actual malicious activity. While numerous methods for reducing the scope of this issue have been proposed, ultimately one must still decide how to prioritize which alerts to investigate, and most existing prioritization methods are heuristic, for example, based on suspiciousness or priority scores. We introduce a novel approach for computing a policy for prioritizing alerts using adversarial reinforcement learning. Our approach assumes that the attackers know the full state of the detection system and dynamically choose an optimal attack as a function of this state, as well as of the alert prioritization policy. The first step of our approach is to capture the interaction between the defender and attacker in a game theoretic model. To tackle the computational complexity of solving this game to obtain a dynamic stochastic alert prioritization policy, we propose an adversarial reinforcement learning framework. In this framework, we use neural reinforcement learning to compute best response policies for both the defender and the adversary to an arbitrary stochastic policy of the other. We then use these in a double-oracle framework to obtain an approximate equilibrium of the game, which in turn yields a robust stochastic policy for the defender. Extensive experiments using case studies in fraud and intrusion detection demonstrate that our approach is effective in creating robust alert prioritization policies.

可約的 · 深度強化學習 · 學成 · 優化器 · 強化學習 ·

2018 年 12 月 15 日

On Improving Decentralized Hysteretic Deep Reinforcement Learning

Xueguang Lu,Christopher Amato

Recent successes of value-based multi-agent deep reinforcement learning employ optimism in value function by carefully controlling learning rate(Omidshafiei et al., 2017) or reducing update prob-ability (Palmer et al., 2018). We introduce a de-centralized quantile estimator: Responsible Implicit Quantile Network (RIQN), while robust to teammate-environment interactions, able to reduce the amount of imposed optimism. Upon benchmarking against related Hysteretic-DQN(HDQN) and Lenient-DQN (LDQN), we findRIQN agents more stable, sample efficient and more likely to converge to the optimal policy.

重要性采樣 · 樣本空間 · 方差減小 · 樣本 · 蒙特卡羅 ·

2018 年 8 月 23 日

Learning to Importance Sample in Primary Sample Space

Quan Zheng,Matthias Zwicker

from arxiv, Submitted to SIGGRAPH ASIA'18

Importance sampling is one of the most widely used variance reduction strategies in Monte Carlo rendering. In this paper, we propose a novel importance sampling technique that uses a neural network to learn how to sample from a desired density represented by a set of samples. Our approach considers an existing Monte Carlo rendering algorithm as a black box. During a scene-dependent training phase, we learn to generate samples with a desired density in the primary sample space of the rendering algorithm using maximum likelihood estimation. We leverage a recent neural network architecture that was designed to represent real-valued non-volume preserving ('Real NVP') transformations in high dimensional spaces. We use Real NVP to non-linearly warp primary sample space and obtain desired densities. In addition, Real NVP efficiently computes the determinant of the Jacobian of the warp, which is required to implement the change of integration variables implied by the warp. A main advantage of our approach is that it is agnostic of underlying light transport effects, and can be combined with many existing rendering techniques by treating them as a black box. We show that our approach leads to effective variance reduction in several practical scenarios.

過擬合 · 泛化理論 · 深度強化學習 · 學成 · PARCO ·

2018 年 4 月 20 日

A Study on Overfitting in Deep Reinforcement Learning

Chiyuan Zhang,Oriol Vinyals,Remi Munos,Samy Bengio

Recent years have witnessed significant progresses in deep Reinforcement Learning (RL). Empowered with large scale neural networks, carefully designed architectures, novel training algorithms and massively parallel computing devices, researchers are able to attack many challenging RL problems. However, in machine learning, more training power comes with a potential risk of more overfitting. As deep RL techniques are being applied to critical problems such as healthcare and finance, it is important to understand the generalization behaviors of the trained agents. In this paper, we conduct a systematic study of standard RL agents and find that they could overfit in various ways. Moreover, overfitting could happen "robustly": commonly used techniques in RL that add stochasticity do not necessarily prevent or detect overfitting. In particular, the same agents and learning algorithms could have drastically different test performance, even when all of them achieve optimal rewards during training. The observations call for more principled and careful evaluation protocols in RL. We conclude with a general discussion on overfitting in RL and a study of the generalization behaviors from the perspective of inductive bias.