四虎亚洲精品高清在线观看,东京热加勒比中文无码,欧美一区亚洲色图

We introduce a method called MASCOT (Multi-Agent Shape Control with Optimal Transport) to compute optimal control solutions of agents with shape/formation/density constraints. For example, we might want to apply shape constraints on the agents -- perhaps we desire the agents to hold a particular shape along the path, or we want agents to spread out in order to minimize collisions. We might also want a proportion of agents to move to one destination, while the other agents move to another, and to do this in the optimal way, i.e. the source-destination assignments should be optimal. In order to achieve this, we utilize the Earth Mover's Distance from Optimal Transport to distribute the agents into their proper positions so that certain shapes can be satisfied. This cost is both introduced in the terminal cost and in the running cost of the optimal control problem.

相關內容

Agent

關注 0

主動學習 · 圖像分類 · 圖像數據庫 · 自動化 · 學習策略 ·

2023 年 3 月 28 日

Automated wildlife image classification: An active learning tool for ecological applications

Ludwig Bothmann,Lisa Wimmer,Omid Charrakh,Tobias Weber,Hendrik Edelhoff,Wibke Peters,Hien Nguyen,Caryl Benjamin,Annette Menzel

Wildlife camera trap images are being used extensively to investigate animal abundance, habitat associations, and behavior, which is complicated by the fact that experts must first classify the images manually. Artificial intelligence systems can take over this task but usually need a large number of already-labeled training images to achieve sufficient performance. This requirement necessitates human expert labor and poses a particular challenge for projects with few cameras or short durations. We propose a label-efficient learning strategy that enables researchers with small or medium-sized image databases to leverage the potential of modern machine learning, thus freeing crucial resources for subsequent analyses. Our methodological proposal is two-fold: (1) We improve current strategies of combining object detection and image classification by tuning the hyperparameters of both models. (2) We provide an active learning (AL) system that allows training deep learning models very efficiently in terms of required human-labeled training images. We supply a software package that enables researchers to use these methods directly and thereby ensure the broad applicability of the proposed framework in ecological practice. We show that our tuning strategy improves predictive performance. We demonstrate how the AL pipeline reduces the amount of pre-labeled data needed to achieve a specific predictive performance and that it is especially valuable for improving out-of-sample predictive performance. We conclude that the combination of tuning and AL increases predictive performance substantially. Furthermore, we argue that our work can broadly impact the community through the ready-to-use software package provided. Finally, the publication of our models tailored to European wildlife data enriches existing model bases mostly trained on data from Africa and North America.

收縮 · 結構 · 最優 · 結構化 · 最優性 ·

2023 年 3 月 28 日

Structured Dynamic Pricing: Optimal Regret in a Global Shrinkage Model

Rashmi Ranjan Bhuyan,Adel Javanmard,Sungchul Kim,Gourab Mukherjee,Ryan A. Rossi,Tong Yu,Handong Zhao

from arxiv, 34 pages, 5 figures

We consider dynamic pricing strategies in a streamed longitudinal data set-up where the objective is to maximize, over time, the cumulative profit across a large number of customer segments. We consider a dynamic probit model with the consumers' preferences as well as price sensitivity varying over time. Building on the well-known finding that consumers sharing similar characteristics act in similar ways, we consider a global shrinkage structure, which assumes that the consumers' preferences across the different segments can be well approximated by a spatial autoregressive (SAR) model. In such a streamed longitudinal set-up, we measure the performance of a dynamic pricing policy via regret, which is the expected revenue loss compared to a clairvoyant that knows the sequence of model parameters in advance. We propose a pricing policy based on penalized stochastic gradient descent (PSGD) and explicitly characterize its regret as functions of time, the temporal variability in the model parameters as well as the strength of the auto-correlation network structure spanning the varied customer segments. Our regret analysis results not only demonstrate asymptotic optimality of the proposed policy but also show that for policy planning it is essential to incorporate available structural information as policies based on unshrunken models are highly sub-optimal in the aforementioned set-up.

差分 · 識別 · 下界 · 推斷 · 異質 ·

2023 年 3 月 27 日

A Differential Effect Approach to Partial Identification of Treatment Effects

Kan Chen,Bingkai Wang,Dylan S. Small

from arxiv, 51 pages, 5 figures, 11 tables

We consider identification and inference for the average treatment effect and heterogeneous treatment effect conditional on observable covariates in the presence of unmeasured confounding. Since point identification of these treatment effects is not achievable without strong assumptions, we obtain bounds on these treatment effects by leveraging differential effects, a tool that allows for using a second treatment to learn the effect of the first treatment. The differential effect is the effect of using one treatment in lieu of the other. We provide conditions under which differential treatment effects can be used to point identify or partially identify treatment effects. Under these conditions, we develop a flexible and easy-to-implement semi-parametric framework to estimate bounds and establish asymptotic properties over the support for conducting statistical inference. The proposed method is examined through a simulation study and two case studies that investigate the effect of smoking on the blood level of lead and cadmium using the National Health and Nutrition Examination Survey, and the effect of soft drink consumption on the occurrence of physical fights in teenagers using the Youth Risk Behavior Surveillance System.

奇異值 · 閾值 · 噪聲 · 結構 · 方差 ·

2023 年 3 月 26 日

ScreeNOT: Exact MSE-Optimal Singular Value Thresholding in Correlated Noise

David L. Donoho,Matan Gavish,Elad Romanov

We derive a formula for optimal hard thresholding of the singular value decomposition in the presence of correlated additive noise; although it nominally involves unobservables, we show how to apply it even where the noise covariance structure is not a-priori known or is not independently estimable. The proposed method, which we call ScreeNOT, is a mathematically solid alternative to Cattell's ever-popular but vague Scree Plot heuristic from 1966. ScreeNOT has a surprising oracle property: it typically achieves exactly, in large finite samples, the lowest possible MSE for matrix recovery, on each given problem instance - i.e. the specific threshold it selects gives exactly the smallest achievable MSE loss among all possible threshold choices for that noisy dataset and that unknown underlying true low rank model. The method is computationally efficient and robust against perturbations of the underlying covariance structure. Our results depend on the assumption that the singular values of the noise have a limiting empirical distribution of compact support; this model, which is standard in random matrix theory, is satisfied by many models exhibiting either cross-row correlation structure or cross-column correlation structure, and also by many situations where there is inter-element correlation structure. Simulations demonstrate the effectiveness of the method even at moderate matrix sizes. The paper is supplemented by ready-to-use software packages implementing the proposed algorithm: package ScreeNOT in Python (via PyPI) and R (via CRAN).

機器學習優化 · 機器人 · 計算時間 · 強化學習 · 靈活性 ·

2023 年 3 月 26 日

Robotic Packaging Optimization with Reinforcement Learning

Eveline Drijver,Rodrigo Pérez-Dattari,Jens Kober,Cosimo Della Santina,Zlatan Ajanovi?

from arxiv, 7 pages, 5 figures, 1 table, submitted to a conference

Intelligent manufacturing is becoming increasingly important due to the growing demand for maximizing productivity and flexibility while minimizing waste and lead times. This work investigates automated secondary robotic food packaging solutions that transfer food products from the conveyor belt into containers. A major problem in these solutions is varying product supply which can cause drastic productivity drops. Conventional rule-based approaches, used to address this issue, are often inadequate, leading to violation of the industry's requirements. Reinforcement learning, on the other hand, has the potential of solving this problem by learning responsive and predictive policy, based on experience. However, it is challenging to utilize it in highly complex control schemes. In this paper, we propose a reinforcement learning framework, designed to optimize the conveyor belt speed while minimizing interference with the rest of the control system. When tested on real-world data, the framework exceeds the performance requirements (99.8% packed products) and maintains quality (100% filled boxes). Compared to the existing solution, our proposed framework improves productivity, has smoother control, and reduces computation time.

輸運 · Wasserstein距離 · 最優 · 因果模型 · 圖結構 ·

2023 年 3 月 24 日

Optimal transport and Wasserstein distances for causal models

Stephan Eckstein,Patrick Cheridito

In this paper we introduce a variant of optimal transport adapted to the causal structure given by an underlying directed graph. Different graph structures lead to different specifications of the optimal transport problem. For instance, a fully connected graph yields standard optimal transport, a linear graph structure corresponds to adapted optimal transport, and an empty graph leads to a notion of optimal transport related to CO-OT, Gromov-Wasserstein distances and factored OT. We derive different characterizations of causal transport plans and introduce Wasserstein distances between causal models that respect the underlying graph structure. We show that average treatment effects are continuous with respect to causal Wasserstein distances and small perturbations of structural causal models lead to small deviations in causal Wasserstein distance. We also introduce an interpolation between causal models based on causal Wasserstein distance and compare it to standard Wasserstein interpolation.

排列 · 多項式時間 · 統計理論 ·

2023 年 3 月 24 日

Optimal Permutation Estimation in Crowd-Sourcing problems

Emmanuel Pilliat,Alexandra Carpentier,Nicolas Verzelen

Motivated by crowd-sourcing applications, we consider a model where we have partial observations from a bivariate isotonic n x d matrix with an unknown permutation $\pi$ * acting on its rows. Focusing on the twin problems of recovering the permutation $\pi$ * and estimating the unknown matrix, we introduce a polynomial-time procedure achieving the minimax risk for these two problems, this for all possible values of n, d, and all possible sampling efforts. Along the way, we establish that, in some regimes, recovering the unknown permutation $\pi$ * is considerably simpler than estimating the matrix.

離線強化學習 · 傳輸 · 傳輸方法 · 演示 · 強化學習 ·

2023 年 3 月 24 日

Optimal Transport for Offline Imitation Learning

Yicheng Luo,Zhengyao Jiang,Samuel Cohen,Edward Grefenstette,Marc Peter Deisenroth

from arxiv, Published in ICLR 2023

With the advent of large datasets, offline reinforcement learning (RL) is a promising framework for learning good decision-making policies without the need to interact with the real environment. However, offline RL requires the dataset to be reward-annotated, which presents practical challenges when reward engineering is difficult or when obtaining reward annotations is labor-intensive. In this paper, we introduce Optimal Transport Reward labeling (OTR), an algorithm that assigns rewards to offline trajectories, with a few high-quality demonstrations. OTR's key idea is to use optimal transport to compute an optimal alignment between an unlabeled trajectory in the dataset and an expert demonstration to obtain a similarity measure that can be interpreted as a reward, which can then be used by an offline RL algorithm to learn the policy. OTR is easy to implement and computationally efficient. On D4RL benchmarks, we show that OTR with a single demonstration can consistently match the performance of offline RL with ground-truth rewards.

算法 · 決策函數 · 均衡 · 控制器 · 映射 ·

2023 年 3 月 23 日

Decision-aid or Controller? Steering Human Decision Makers with Algorithms

Ruqing Xu,Sarah Dean

Algorithms are used to aid human decision makers by making predictions and recommending decisions. Currently, these algorithms are trained to optimize prediction accuracy. What if they were optimized to control final decisions? In this paper, we study a decision-aid algorithm that learns about the human decision maker and provides ''personalized recommendations'' to influence final decisions. We first consider fixed human decision functions which map observable features and the algorithm's recommendations to final decisions. We characterize the conditions under which perfect control over final decisions is attainable. Under fairly general assumptions, the parameters of the human decision function can be identified from past interactions between the algorithm and the human decision maker, even when the algorithm was constrained to make truthful recommendations. We then consider a decision maker who is aware of the algorithm's manipulation and responds strategically. By posing the setting as a variation of the cheap talk game [Crawford and Sobel, 1982], we show that all equilibria are partition equilibria where only coarse information is shared: the algorithm recommends an interval containing the ideal decision. We discuss the potential applications of such algorithms and their social implications.

回合 · 學成 · 強化學習 · INTERACT · 通用智能 ·

2022 年 5 月 13 日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Michael Bradley Johanson,Edward Hughes,Finbarr Timbers,Joel Z. Leibo

Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefer. We show that the emergent production, consumption, and pricing behaviors respond to environmental conditions in the directions predicted by supply and demand shifts in Microeconomics. We also demonstrate settings where the agents' emergent prices for goods vary over space, reflecting the local abundance of goods. After the price disparities emerge, some agents then discover a niche of transporting goods between regions with different prevailing prices -- a profitable strategy because they can buy goods where they are cheap and sell them where they are expensive. Finally, in a series of ablation experiments, we investigate how choices in the environmental rewards, bartering actions, agent architecture, and ability to consume tradable goods can either aid or inhibit the emergence of this economic behavior. This work is part of the environment development branch of a research program that aims to build human-like artificial general intelligence through multi-agent interactions in simulated societies. By exploring which environment features are needed for the basic phenomena of elementary microeconomics to emerge automatically from learning, we arrive at an environment that differs from those studied in prior multi-agent reinforcement learning work along several dimensions. For example, the model incorporates heterogeneous tastes and physical abilities, and agents negotiate with one another as a grounded form of communication.