成年人日屄视频免费观看_99热日韩这里只有国产中文精品_年轻女房东2中文字幕_久久人妻互换公开中文字幕_国产亚洲欧美男女猛烈啪啪_99热日韩一区二区在线播放_精品欧美成人免费观看视频A

Finding different solutions to the same problem is a key aspect of intelligence associated with creativity and adaptation to novel situations. In reinforcement learning, a set of diverse policies can be useful for exploration, transfer, hierarchy, and robustness. We propose DOMiNO, a method for Diversity Optimization Maintaining Near Optimality. We formalize the problem as a Constrained Markov Decision Process where the objective is to find diverse policies, measured by the distance between the state occupancies of the policies in the set, while remaining near-optimal with respect to the extrinsic reward. We demonstrate that the method can discover diverse and meaningful behaviors in various domains, such as different locomotion patterns in the DeepMind Control Suite. We perform extensive analysis of our approach, compare it with other multi-objective baselines, demonstrate that we can control both the quality and the diversity of the set via interpretable hyperparameters, and show that the discovered set is robust to perturbations.

相關內容

多樣(yang)性

關注 0

服務系統 · 產品 · 計算機視覺 · CVPR 2022 · 可行 ·

2023 年 3 月 27 日

Artificial Intelligence for Sustainability: Facilitating Sustainable Smart Product-Service Systems with Computer Vision

Jannis Walk,Niklas Kühl,Michael Saidani,Jürgen Schatte

The usage and impact of deep learning for cleaner production and sustainability purposes remain little explored. This work shows how deep learning can be harnessed to increase sustainability in production and product usage. Specifically, we utilize deep learning-based computer vision to determine the wear states of products. The resulting insights serve as a basis for novel product-service systems with improved integration and result orientation. Moreover, these insights are expected to facilitate product usage improvements and R&D innovations. We demonstrate our approach on two products: machining tools and rotating X-ray anodes. From a technical standpoint, we show that it is possible to recognize the wear state of these products using deep-learning-based computer vision. In particular, we detect wear through microscopic images of the two products. We utilize a U-Net for semantic segmentation to detect wear based on pixel granularity. The resulting mean dice coefficients of 0.631 and 0.603 demonstrate the feasibility of the proposed approach. Consequently, experts can now make better decisions, for example, to improve the machining process parameters. To assess the impact of the proposed approach on environmental sustainability, we perform life cycle assessments that show gains for both products. The results indicate that the emissions of CO2 equivalents are reduced by 12% for machining tools and by 44% for rotating anodes. This work can serve as a guideline and inspire researchers and practitioners to utilize computer vision in similar scenarios to develop sustainable smart product-service systems and enable cleaner production.

約束 · 變拓撲 · 變形 · 拓撲優化 · 結構 ·

2023 年 3 月 27 日

Finite Strain Topology Optimization with Nonlinear Stability Constraints

Guodong Zhang,Kapil Khandelwal,Tong Guo

from arxiv, 77 pages, 44 Figures

This paper proposes a computational framework for the design optimization of stable structures under large deformations by incorporating nonlinear buckling constraints. A novel strategy for suppressing spurious buckling modes related to low-density elements is proposed. The strategy depends on constructing a pseudo-mass matrix that assigns small pseudo masses for DOFs surrounded by only low-density elements and degenerates to an identity matrix for the solid region. A novel optimization procedure is developed that can handle both simple and multiple eigenvalues wherein consistent sensitivities of simple eigenvalues and directional derivatives of multiple eigenvalues are derived and utilized in a gradient-based optimization algorithm - the method of moving asymptotes. An adaptive linear energy interpolation method is also incorporated in nonlinear analyses to handle the low-density elements distortion under large deformations. The numerical results demonstrate that, for systems with either low or high symmetries, the nonlinear stability constraints can ensure structural stability at the target load under large deformations. Post-analysis on the B-spline fitted designs shows that the safety margin, i.e., the gap between the target load and the 1st critical load, of the optimized structures can be well controlled by selecting different stability constraint values. Interesting structural behaviors such as mode switching and multiple bifurcations are also demonstrated.

機器學習優化 · 機器人 · 計算時間 · 強化學習 · 靈活性 ·

2023 年 3 月 26 日

Robotic Packaging Optimization with Reinforcement Learning

Eveline Drijver,Rodrigo Pérez-Dattari,Jens Kober,Cosimo Della Santina,Zlatan Ajanovi?

from arxiv, 7 pages, 5 figures, 1 table, submitted to a conference

Intelligent manufacturing is becoming increasingly important due to the growing demand for maximizing productivity and flexibility while minimizing waste and lead times. This work investigates automated secondary robotic food packaging solutions that transfer food products from the conveyor belt into containers. A major problem in these solutions is varying product supply which can cause drastic productivity drops. Conventional rule-based approaches, used to address this issue, are often inadequate, leading to violation of the industry's requirements. Reinforcement learning, on the other hand, has the potential of solving this problem by learning responsive and predictive policy, based on experience. However, it is challenging to utilize it in highly complex control schemes. In this paper, we propose a reinforcement learning framework, designed to optimize the conveyor belt speed while minimizing interference with the rest of the control system. When tested on real-world data, the framework exceeds the performance requirements (99.8% packed products) and maintains quality (100% filled boxes). Compared to the existing solution, our proposed framework improves productivity, has smoother control, and reduces computation time.

上下文學習 · 上下文 · 偏差 · 示例 · 語言模型 ·

2023 年 3 月 25 日

Fairness-guided Few-shot Prompting for Large Language Models

Huan Ma,Changqing Zhang,Yatao Bian,Lemao Liu,Zhirui Zhang,Peilin Zhao,Shu Zhang,Huazhu Fu,Qinghua Hu,Bingzhe Wu

Large language models have demonstrated surprising ability to perform in-context learning, i.e., these models can be directly applied to solve numerous downstream tasks by conditioning on a prompt constructed by a few input-output examples. However, prior research has shown that in-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats. Therefore, the construction of an appropriate prompt is essential for improving the performance of in-context learning. In this paper, we revisit this problem from the view of predictive bias. Specifically, we introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes. Then we empirically show that prompts with higher bias always lead to unsatisfactory predictive quality. Based on this observation, we propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning. We perform comprehensive experiments with state-of-the-art mainstream models such as GPT-3 on various downstream tasks. Our results indicate that our method can enhance the model's in-context learning performance in an effective and interpretable manner.

大型語言模型 · 類比推理 · 語言模型 · GPT-3 · 零樣本 ·

2023 年 3 月 24 日

Emergent Analogical Reasoning in Large Language Models

Taylor Webb,Keith J. Holyoak,Hongjing Lu

The recent advent of large language models has reinvigorated debate over whether human cognitive capacities might emerge in such generic models given sufficient training data. Of particular interest is the ability of these models to reason about novel problems zero-shot, without any direct training. In human cognition, this capacity is closely tied to an ability to reason by analogy. Here, we performed a direct comparison between human reasoners and a large language model (the text-davinci-003 variant of GPT-3) on a range of analogical tasks, including a novel text-based matrix reasoning task closely modeled on Raven's Progressive Matrices. We found that GPT-3 displayed a surprisingly strong capacity for abstract pattern induction, matching or even surpassing human capabilities in most settings. Our results indicate that large language models such as GPT-3 have acquired an emergent ability to find zero-shot solutions to a broad range of analogy problems.

Learning · Agent · 變換 · 講稿 · 學習器 ·

2022 年 6 月 14 日

Transformers are Meta-Reinforcement Learners

Luckeciano C. Melo

from arxiv, Published at the International Conference on Machine Learning (ICML) 2022

The transformer architecture and variants presented remarkable success across many machine learning tasks in recent years. This success is intrinsically related to the capability of handling long sequences and the presence of context-dependent weights from the attention mechanism. We argue that these capabilities suit the central role of a Meta-Reinforcement Learning algorithm. Indeed, a meta-RL agent needs to infer the task from a sequence of trajectories. Furthermore, it requires a fast adaptation strategy to adapt its policy for a new task -- which can be achieved using the self-attention mechanism. In this work, we present TrMRL (Transformers for Meta-Reinforcement Learning), a meta-RL agent that mimics the memory reinstatement mechanism using the transformer architecture. It associates the recent past of working memories to build an episodic memory recursively through the transformer layers. We show that the self-attention computes a consensus representation that minimizes the Bayes Risk at each layer and provides meaningful features to compute the best actions. We conducted experiments in high-dimensional continuous control environments for locomotion and dexterous manipulation. Results show that TrMRL presents comparable or superior asymptotic performance, sample efficiency, and out-of-distribution generalization compared to the baselines in these environments.

回合 · 學成 · 強化學習 · INTERACT · 通用智能 ·

2022 年 5 月 13 日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Michael Bradley Johanson,Edward Hughes,Finbarr Timbers,Joel Z. Leibo

Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefer. We show that the emergent production, consumption, and pricing behaviors respond to environmental conditions in the directions predicted by supply and demand shifts in Microeconomics. We also demonstrate settings where the agents' emergent prices for goods vary over space, reflecting the local abundance of goods. After the price disparities emerge, some agents then discover a niche of transporting goods between regions with different prevailing prices -- a profitable strategy because they can buy goods where they are cheap and sell them where they are expensive. Finally, in a series of ablation experiments, we investigate how choices in the environmental rewards, bartering actions, agent architecture, and ability to consume tradable goods can either aid or inhibit the emergence of this economic behavior. This work is part of the environment development branch of a research program that aims to build human-like artificial general intelligence through multi-agent interactions in simulated societies. By exploring which environment features are needed for the basic phenomena of elementary microeconomics to emerge automatically from learning, we arrive at an environment that differs from those studied in prior multi-agent reinforcement learning work along several dimensions. For example, the model incorporates heterogeneous tastes and physical abilities, and agents negotiate with one another as a grounded form of communication.

Prompt · MoDELS · 學成 · Extensibility · 向量化 ·

2022 年 3 月 10 日

Conditional Prompt Learning for Vision-Language Models

Kaiyang Zhou,Jingkang Yang,Chen Change Loy,Ziwei Liu

from arxiv, CVPR 2022. TL;DR: We propose a conditional prompt learning approach to solve the generalizability issue of static prompts

With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential to investigate ways to adapt these models to downstream datasets. A recently proposed method named Context Optimization (CoOp) introduces the concept of prompt learning -- a recent trend in NLP -- to the vision domain for adapting pre-trained vision-language models. Specifically, CoOp turns context words in a prompt into a set of learnable vectors and, with only a few labeled images for learning, can achieve huge improvements over intensively-tuned manual prompts. In our study we identify a critical problem of CoOp: the learned context is not generalizable to wider unseen classes within the same dataset, suggesting that CoOp overfits base classes observed during training. To address the problem, we propose Conditional Context Optimization (CoCoOp), which extends CoOp by further learning a lightweight neural network to generate for each image an input-conditional token (vector). Compared to CoOp's static prompts, our dynamic prompts adapt to each instance and are thus less sensitive to class shift. Extensive experiments show that CoCoOp generalizes much better than CoOp to unseen classes, even showing promising transferability beyond a single dataset; and yields stronger domain generalization performance as well. Code is available at //github.com/KaiyangZhou/CoOp.

回合 · AI · CASE · 系統架構 · Engineering ·

2021 年 8 月 30 日

Multi-Agent Simulation for AI Behaviour Discovery in Operations Research

Michael Papasimeon,Lyndon Benke

from arxiv, 14 pages, 7 figures. To be published in proceedings of the 22nd International Workshop on Multi-Agent-Based Simulation (MABS 2021) at AAMAS 2021. //mabsworkshop.github.io/accepted/

We describe ACE0, a lightweight platform for evaluating the suitability and viability of AI methods for behaviour discovery in multiagent simulations. Specifically, ACE0 was designed to explore AI methods for multi-agent simulations used in operations research studies related to new technologies such as autonomous aircraft. Simulation environments used in production are often high-fidelity, complex, require significant domain knowledge and as a result have high R&D costs. Minimal and lightweight simulation environments can help researchers and engineers evaluate the viability of new AI technologies for behaviour discovery in a more agile and potentially cost effective manner. In this paper we describe the motivation for the development of ACE0.We provide a technical overview of the system architecture, describe a case study of behaviour discovery in the aerospace domain, and provide a qualitative evaluation of the system. The evaluation includes a brief description of collaborative research projects with academic partners, exploring different AI behaviour discovery methods.

強化學習 · 學成 · tuning · 回合 · 有向 ·

2020 年 1 月 19 日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Amit Kumar Mondal,Nadeem Jamali

Reinforcement learning is one of the core components in designing an artificial intelligent system emphasizing real-time response. Reinforcement learning influences the system to take actions within an arbitrary environment either having previous knowledge about the environment model or not. In this paper, we present a comprehensive study on Reinforcement Learning focusing on various dimensions including challenges, the recent development of different state-of-the-art techniques, and future directions. The fundamental objective of this paper is to provide a framework for the presentation of available methods of reinforcement learning that is informative enough and simple to follow for the new researchers and academics in this domain considering the latest concerns. First, we illustrated the core techniques of reinforcement learning in an easily understandable and comparable way. Finally, we analyzed and depicted the recent developments in reinforcement learning approaches. My analysis pointed out that most of the models focused on tuning policy values rather than tuning other things in a particular state of reasoning.