国产白浆一区二区无码视频在线,中文字幕日韩欧美爆乳在线不卡,欧美综合一区二区三区在线,D国产精品久久久久精品,国产欧美精品影音先锋在线

This paper marries two state-of-the-art controller synthesis methods for partially observable Markov decision processes (POMDPs), a prominent model in sequential decision making under uncertainty. A central issue is to find a POMDP controller - that solely decides based on the observations seen so far - to achieve a total expected reward objective. As finding optimal controllers is undecidable, we concentrate on synthesizing good finite-state controllers (FSCs). We do so by tightly integrating two modern, orthogonal methods for POMDP controller synthesis: a belief-based and an inductive approach. The former method obtains an FSC from a finite fragment of the so-called belief MDP, an MDP that keeps track of the probabilities of equally observable POMDP states. The latter is an inductive search technique over a set of FSCs, e.g., controllers with a fixed memory size. The key result of this paper is a symbiotic anytime algorithm that tightly integrates both approaches such that each profits from the controllers constructed by the other. Experimental results indicate a substantial improvement in the value of the controllers while significantly reducing the synthesis time and memory~footprint.

相關內容

控制器

關注 5

重要性采樣 · 樣本 · 優化器 · 近似 · 教程 ·

2023 年 7 月 10 日

Importance Sampling for Minimization of Tail Risks: A Tutorial

Anand Deo,Karthyek Murthy

This paper provides an introductory overview of how one may employ importance sampling effectively as a tool for solving stochastic optimization formulations incorporating tail risk measures such as Conditional Value-at-Risk. Approximating the tail risk measure by its sample average approximation, while appealing due to its simplicity and universality in use, requires a large number of samples to be able to arrive at risk-minimizing decisions with high confidence. This is primarily due to the rarity with which the relevant tail events get observed in the samples. In simulation, Importance Sampling is among the most prominent methods for substantially reducing the sample requirement while estimating probabilities of rare events. Can importance sampling be used for optimization as well? If so, what are the ingredients required for making importance sampling an effective tool for optimization formulations involving rare events? This tutorial aims to provide an introductory overview of the two key ingredients in this regard, namely, (i) how one may arrive at an importance sampling change of measure prescription at every decision, and (ii) the prominent techniques available for integrating such a prescription within a solution paradigm for stochastic optimization formulations.

Learning · 數據集 · 設計 · 強化學習 · 原點 ·

2023 年 7 月 10 日

Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data

Ruiqi Zhang,Andrea Zanette

from arxiv, 43 pages

In some applications of reinforcement learning, a dataset of pre-collected experience is already available but it is also possible to acquire some additional online data to help improve the quality of the policy. However, it may be preferable to gather additional data with a single, non-reactive exploration policy and avoid the engineering costs associated with switching policies. In this paper we propose an algorithm with provable guarantees that can leverage an offline dataset to design a single non-reactive policy for exploration. We theoretically analyze the algorithm and measure the quality of the final policy as a function of the local coverage of the original dataset and the amount of additional data collected.

Learning · 圖 · contrastive · 協同過濾 · 行人重識別 ·

2023 年 7 月 10 日

Graph Contrastive Learning with Multi-Objective for Personalized Product Retrieval in Taobao Search

Longbin Li,Chao Zhang,Sen Li,Yun Zhong,Qingwen Liu,Xiaoyi Zeng

In e-commerce search, personalized retrieval is a crucial technique for improving user shopping experience. Recent works in this domain have achieved significant improvements by the representation learning paradigm, e.g., embedding-based retrieval (EBR) and collaborative filtering (CF). EBR methods do not sufficiently exploit the useful collaborative signal and are difficult to learn the representations of long-tail item well. Graph-based CF methods improve personalization by modeling collaborative signal within the user click graph. However, existing Graph-based methods ignore user's multiple behaviours, such as click/purchase and the relevance constraint between user behaviours and items.In this paper, we propose a Graph Contrastive Learning with Multi-Objective (GCL-MO) collaborative filtering model, which solves the problems of weak relevance and incomplete personalization in e-commerce search. Specifically, GCL-MO builds a homogeneous graph of items and then optimizes a multi-objective function of personalization and relevance. Moreover, we propose a modified contrastive loss for multi-objectives graph learning, which avoids the mutual suppression among positive samples and thus improves the generalization and robustness of long-tail item representations. These learned item embeddings are then used for personalized retrieval by constructing an efficient offline-to-online inverted table. GCL-MO outperforms the online collaborative filtering baseline in both offline/online experimental metrics and shows a significant improvement in the online A/B testing of Taobao search.

優化器 · 極大 · Networking · 近似 · 確切的 ·

2023 年 7 月 8 日

The Bayan Algorithm: Detecting Communities in Networks Through Exact and Approximate Optimization of Modularity

Samin Aref,Hriday Chheda,Mahdi Mostajabdaveh

from arxiv, 20 pages, 2 figures, 1 table

Community detection is a classic problem in network science with extensive applications in various fields. Among numerous approaches, the most common method is modularity maximization. Despite their design philosophy and wide adoption, heuristic modularity maximization algorithms rarely return an optimal partition or anything similar. We propose a specialized algorithm, Bayan, which returns partitions with a guarantee of either optimality or proximity to an optimal partition. At the core of the Bayan algorithm is a branch-and-cut scheme that solves an integer programming formulation of the modularity maximization problem to optimality or approximate it within a factor. We compare Bayan against 30 alternative community detection methods using structurally diverse synthetic and real networks. Our results demonstrate Bayan's distinctive accuracy and stability in retrieving ground-truth communities of standard benchmark graphs. Bayan is several times faster than open-source and commercial solvers for modularity maximization making it capable of finding optimal partitions for instances that cannot be optimized by any other existing method. Overall, our assessments point to Bayan as a suitable choice for exact maximization of modularity in real networks with up to 3000 edges (in their largest connected component) and approximating maximum modularity in larger instances on ordinary computers. A Python implementation of the Bayan algorithm (the bayanpy library) is publicly available through the package installer for Python (pip).

優化器 · Agent · IR · 極大 · 占優策略 ·

2023 年 7 月 7 日

The Power of Two-sided Recruitment in Two-sided Markets

Yang Cai,Christopher Liaw,Aranyak Mehta,Mingfei Zhao

We consider the problem of maximizing the gains from trade (GFT) in two-sided markets. The seminal impossibility result by Myerson shows that even for bilateral trade, there is no individually rational (IR), Bayesian incentive compatible (BIC) and budget balanced (BB) mechanism that can achieve the full GFT. Moreover, the optimal BIC, IR and BB mechanism that maximizes the GFT is known to be complex and heavily depends on the prior. In this paper, we pursue a Bulow-Klemperer-style question, i.e. does augmentation allow for prior-independent mechanisms to beat the optimal mechanism? Our main result shows that in the double auction setting with $m$ i.i.d. buyers and $n$ i.i.d. sellers, by augmenting $O(1)$ buyers and sellers to the market, the GFT of a simple, dominant strategy incentive compatible (DSIC), and prior-independent mechanism in the augmented market is least the optimal in the original market, when the buyers' distribution first-order stochastically dominates the sellers' distribution. Furthermore, we consider general distributions without the stochastic dominance assumption. Existing hardness result by Babaioff et al. shows that no fixed finite number of agents is sufficient for all distributions. In the paper we provide a parameterized result, showing that $O(log(m/rn)/r)$ agents suffice, where $r$ is the probability that the buyer's value for the item exceeds the seller's value.

優化器 · 逼真度 · Analysis · 混合 · 3D ·

2023 年 7 月 7 日

Multi-Fidelity Data-Driven Design and Analysis of Reactor and Tube Simulations

Tom Savage,Nausheen Basha,Jonathan McDonough,Omar K Matar,Ehecatl Antonio del Rio Chanona

from arxiv, 22 Pages with Appendix

The development of new manufacturing techniques such as 3D printing have enabled the creation of previously infeasible chemical reactor designs. Systematically optimizing the highly parameterized geometries involved in these new classes of reactor is vital to ensure enhanced mixing characteristics and feasible manufacturability. Here we present a framework to rapidly solve this nonlinear, computationally expensive, and derivative-free problem, enabling the fast prototype of novel reactor parameterizations. We take advantage of Gaussian processes to adaptively learn a multi-fidelity model of reactor simulations across a number of different continuous mesh fidelities. The search space of reactor geometries is explored through an amalgam of different, potentially lower, fidelity simulations which are chosen for evaluation based on weighted acquisition function, trading off information gain with cost of simulation. Within our framework we derive a novel criteria for monitoring the progress and dictating the termination of multi-fidelity Bayesian optimization, ensuring a high fidelity solution is returned before experimental budget is exhausted. The class of reactor we investigate are helical-tube reactors under pulsed-flow conditions, which have demonstrated outstanding mixing characteristics, have the potential to be highly parameterized, and are easily manufactured using 3D printing. To validate our results, we 3D print and experimentally validate the optimal reactor geometry, confirming its mixing performance. In doing so we demonstrate our design framework to be extensible to a broad variety of expensive simulation-based optimization problems, supporting the design of the next generation of highly parameterized chemical reactors.

Performer · Attention · Learning · 縮放 · Better ·

2023 年 7 月 5 日

Scaling In-Context Demonstrations with Structured Attention

Tianle Cai,Kaixuan Huang,Jason D. Lee,Mengdi Wang

The recent surge of large language models (LLMs) highlights their ability to perform in-context learning, i.e., "learning" to perform a task from a few demonstrations in the context without any parameter updates. However, their capabilities of in-context learning are limited by the model architecture: 1) the use of demonstrations is constrained by a maximum sentence length due to positional embeddings; 2) the quadratic complexity of attention hinders users from using more demonstrations efficiently; 3) LLMs are shown to be sensitive to the order of the demonstrations. In this work, we tackle these challenges by proposing a better architectural design for in-context learning. We propose SAICL (Structured Attention for In-Context Learning), which replaces the full-attention by a structured attention mechanism designed for in-context learning, and removes unnecessary dependencies between individual demonstrations, while making the model invariant to the permutation of demonstrations. We evaluate SAICL in a meta-training framework and show that SAICL achieves comparable or better performance than full attention while obtaining up to 3.4x inference speed-up. SAICL also consistently outperforms a strong Fusion-in-Decoder (FiD) baseline which processes each demonstration independently. Finally, thanks to its linear nature, we demonstrate that SAICL can easily scale to hundreds of demonstrations with continuous performance gains with scaling.

Processing（編程語言） · 設計 · 情景 · MoDELS · CASES ·

2023 年 7 月 5 日

Bayesian D- and I-optimal designs for choice experiments involving mixtures and process variables

Mario Becerra,Peter Goos

from arxiv, arXiv admin note: text overlap with arXiv:2108.01748

Many food products involve mixtures of ingredients, where the mixtures can be expressed as combinations of ingredient proportions. In many cases, the quality and the consumer preference may also depend on the way in which the mixtures are processed. The processing is generally defined by the settings of one or more process variables. Experimental designs studying the joint impact of the mixture ingredient proportions and the settings of the process variables are called mixture-process variable experiments. In this article, we show how to combine mixture-process variable experiments and discrete choice experiments, to quantify and model consumer preferences for food products that can be viewed as processed mixtures. First, we describe the modeling of data from such combined experiments. Next, we describe how to generate D- and I-optimal designs for choice experiments involving mixtures and process variables, and we compare the two kinds of designs using two examples.

多樣性 · 學成 · state-of-the-art · MoDELS · 張成子空間 ·

2021 年 3 月 14 日

Modelling Behavioural Diversity for Learning in Open-Ended Games

Nicolas Perez Nieves,Yaodong Yang,Oliver Slumbers,David Henry Mguni,Jun Wang

from arxiv, corresponds to <[email protected]>

Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on \emph{determinantal point processes} (DPP). By incorporating the diversity metric into best-response dynamics, we develop \emph{diverse fictitious play} and \emph{diverse policy-space response oracle} for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the \emph{gamescape} -- convex polytopes spanned by agents' mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve much lower exploitability than state-of-the-art solvers by finding effective and diverse strategies.

估計/估計量 · 3D · 全 · 塑造 · 真實值 ·

2019 年 3 月 3 日

3D Hand Shape and Pose Estimation from a Single RGB Image

Liuhao Ge,Zhou Ren,Yuncheng Li,Zehao Xue,Yingying Wang,Jianfei Cai,Junsong Yuan

from arxiv, CVPR 2019 (Oral), //sites.google.com/site/geliuhaontu/home/cvpr2019

This work addresses a novel and challenging problem of estimating the full 3D hand shape and pose from a single RGB image. Most current methods in 3D hand analysis from monocular RGB images only focus on estimating the 3D locations of hand keypoints, which cannot fully express the 3D shape of hand. In contrast, we propose a Graph Convolutional Neural Network (Graph CNN) based method to reconstruct a full 3D mesh of hand surface that contains richer information of both 3D hand shape and pose. To train networks with full supervision, we create a large-scale synthetic dataset containing both ground truth 3D meshes and 3D poses. When fine-tuning the networks on real-world datasets without 3D ground truth, we propose a weakly-supervised approach by leveraging the depth map as a weak supervision in training. Through extensive evaluations on our proposed new datasets and two public datasets, we show that our proposed method can produce accurate and reasonable 3D hand mesh, and can achieve superior 3D hand pose estimation accuracy when compared with state-of-the-art methods.