亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Active learning is a subfield of machine learning that is devised for design and modeling of systems with highly expensive sampling costs. Industrial and engineering systems are generally subject to physics constraints that may induce fatal failures when they are violated, while such constraints are frequently underestimated in active learning. In this paper, we develop a novel active learning method that avoids failures considering implicit physics constraints that govern the system. The proposed approach is driven by two tasks: the safe variance reduction explores the safe region to reduce the variance of the target model, and the safe region expansion aims to extend the explorable region exploiting the probabilistic model of constraints. The global acquisition function is devised to judiciously optimize acquisition functions of two tasks, and its theoretical properties are provided. The proposed method is applied to the composite fuselage assembly process with consideration of material failure using the Tsai-wu criterion, and it is able to achieve zero-failure without the knowledge of explicit failure regions.

相關內容

主(zhu)動(dong)學(xue)(xue)(xue)習(xi)(xi)(xi)(xi)是機器學(xue)(xue)(xue)習(xi)(xi)(xi)(xi)(更普遍的(de)說是人工智(zhi)能)的(de)一(yi)個子領域(yu),在統計(ji)(ji)學(xue)(xue)(xue)領域(yu)也(ye)叫查詢(xun)學(xue)(xue)(xue)習(xi)(xi)(xi)(xi)、最優實(shi)驗(yan)設計(ji)(ji)。“學(xue)(xue)(xue)習(xi)(xi)(xi)(xi)模塊(kuai)”和(he)“選擇策略”是主(zhu)動(dong)學(xue)(xue)(xue)習(xi)(xi)(xi)(xi)算法(fa)(fa)的(de)2個基本且(qie)重要(yao)(yao)的(de)模塊(kuai)。 主(zhu)動(dong)學(xue)(xue)(xue)習(xi)(xi)(xi)(xi)是“一(yi)種學(xue)(xue)(xue)習(xi)(xi)(xi)(xi)方(fang)法(fa)(fa),在這(zhe)(zhe)種方(fang)法(fa)(fa)中(zhong),學(xue)(xue)(xue)生(sheng)會主(zhu)動(dong)或體驗(yan)性地(di)參與(yu)學(xue)(xue)(xue)習(xi)(xi)(xi)(xi)過程,并且(qie)根據學(xue)(xue)(xue)生(sheng)的(de)參與(yu)程度(du),有(you)不同程度(du)的(de)主(zhu)動(dong)學(xue)(xue)(xue)習(xi)(xi)(xi)(xi)。” (Bonwell&Eison 1991)Bonwell&Eison(1991) 指出:“學(xue)(xue)(xue)生(sheng)除了被動(dong)地(di)聽(ting)課以外,還(huan)從(cong)事其他(ta)活動(dong)。” 在高(gao)等教育研究協會(ASHE)的(de)一(yi)份報告(gao)中(zhong),作(zuo)者討論(lun)了各種促進主(zhu)動(dong)學(xue)(xue)(xue)習(xi)(xi)(xi)(xi)的(de)方(fang)法(fa)(fa)。他(ta)們引用了一(yi)些(xie)(xie)文獻,這(zhe)(zhe)些(xie)(xie)文獻表明學(xue)(xue)(xue)生(sheng)不僅要(yao)(yao)做(zuo)聽(ting),還(huan)必須(xu)做(zuo)更多的(de)事情(qing)才能學(xue)(xue)(xue)習(xi)(xi)(xi)(xi)。他(ta)們必須(xu)閱(yue)讀,寫作(zuo),討論(lun)并參與(yu)解決問(wen)題(ti)。此過程涉及三個學(xue)(xue)(xue)習(xi)(xi)(xi)(xi)領域(yu),即知識,技能和(he)態度(du)(KSA)。這(zhe)(zhe)種學(xue)(xue)(xue)習(xi)(xi)(xi)(xi)行為(wei)分類法(fa)(fa)可以被認為(wei)是“學(xue)(xue)(xue)習(xi)(xi)(xi)(xi)過程的(de)目標”。特別是,學(xue)(xue)(xue)生(sheng)必須(xu)從(cong)事諸如分析,綜(zong)合和(he)評估之類的(de)高(gao)級思(si)維任務。

This paper investigates the learning, or system identification, of a class of piecewise-affine dynamical systems known as linear complementarity systems (LCSs). We propose a violation-based loss which enables efficient learning of the LCS parameterization, without prior knowledge of the hybrid mode boundaries, using gradient-based methods. The proposed violation-based loss incorporates both dynamics prediction loss and a novel complementarity - violation loss. We show several properties attained by this loss formulation, including its differentiability, the efficient computation of first- and second-order derivatives, and its relationship to the traditional prediction loss, which strictly enforces complementarity. We apply this violation-based loss formulation to learn LCSs with tens of thousands of (potentially stiff) hybrid modes. The results demonstrate a state-of-the-art ability to identify piecewise-affine dynamics, outperforming methods which must differentiate through non-smooth linear complementarity problems.

We study constrained reinforcement learning (CRL) from a novel perspective by setting constraints directly on state density functions, rather than the value functions considered by previous works. State density has a clear physical and mathematical interpretation, and is able to express a wide variety of constraints such as resource limits and safety requirements. Density constraints can also avoid the time-consuming process of designing and tuning cost functions required by value function-based constraints to encode system specifications. We leverage the duality between density functions and Q functions to develop an effective algorithm to solve the density constrained RL problem optimally and the constrains are guaranteed to be satisfied. We prove that the proposed algorithm converges to a near-optimal solution with a bounded error even when the policy update is imperfect. We use a set of comprehensive experiments to demonstrate the advantages of our approach over state-of-the-art CRL methods, with a wide range of density constrained tasks as well as standard CRL benchmarks such as Safety-Gym.

In real world settings, numerous constraints are present which are hard to specify mathematically. However, for the real world deployment of reinforcement learning (RL), it is critical that RL agents are aware of these constraints, so that they can act safely. In this work, we consider the problem of learning constraints from demonstrations of a constraint-abiding agent's behavior. We experimentally validate our approach and show that our framework can successfully learn the most likely constraints that the agent respects. We further show that these learned constraints are \textit{transferable} to new agents that may have different morphologies and/or reward functions. Previous works in this regard have either mainly been restricted to tabular (discrete) settings, specific types of constraints or assume the environment's transition dynamics. In contrast, our framework is able to learn arbitrary \textit{Markovian} constraints in high-dimensions in a completely model-free setting. The code can be found it: \url{//github.com/shehryar-malik/icrl}.

When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.

Recent studies have shown the vulnerability of reinforcement learning (RL) models in noisy settings. The sources of noises differ across scenarios. For instance, in practice, the observed reward channel is often subject to noise (e.g., when observed rewards are collected through sensors), and thus observed rewards may not be credible as a result. Also, in applications such as robotics, a deep reinforcement learning (DRL) algorithm can be manipulated to produce arbitrary errors. In this paper, we consider noisy RL problems where observed rewards by RL agents are generated with a reward confusion matrix. We call such observed rewards as perturbed rewards. We develop an unbiased reward estimator aided robust RL framework that enables RL agents to learn in noisy environments while observing only perturbed rewards. Our framework draws upon approaches for supervised learning with noisy data. The core ideas of our solution include estimating a reward confusion matrix and defining a set of unbiased surrogate rewards. We prove the convergence and sample complexity of our approach. Extensive experiments on different DRL platforms show that policies based on our estimated surrogate reward can achieve higher expected rewards, and converge faster than existing baselines. For instance, the state-of-the-art PPO algorithm is able to obtain 67.5% and 46.7% improvements in average on five Atari games, when the error rates are 10% and 30% respectively.

Latest deep learning methods for object detection provide remarkable performance, but have limits when used in robotic applications. One of the most relevant issues is the long training time, which is due to the large size and imbalance of the associated training sets, characterized by few positive and a large number of negative examples (i.e. background). Proposed approaches are based on end-to-end learning by back-propagation [22] or kernel methods trained with Hard Negatives Mining on top of deep features [8]. These solutions are effective, but prohibitively slow for on-line applications. In this paper we propose a novel pipeline for object detection that overcomes this problem and provides comparable performance, with a 60x training speedup. Our pipeline combines (i) the Region Proposal Network and the deep feature extractor from [22] to efficiently select candidate RoIs and encode them into powerful representations, with (ii) the FALKON [23] algorithm, a novel kernel-based method that allows fast training on large scale problems (millions of points). We address the size and imbalance of training data by exploiting the stochastic subsampling intrinsic into the method and a novel, fast, bootstrapping approach. We assess the effectiveness of the approach on a standard Computer Vision dataset (PASCAL VOC 2007 [5]) and demonstrate its applicability to a real robotic scenario with the iCubWorld Transformations [18] dataset.

Deep learning (DL) is a high dimensional data reduction technique for constructing high-dimensional predictors in input-output models. DL is a form of machine learning that uses hierarchical layers of latent features. In this article, we review the state-of-the-art of deep learning from a modeling and algorithmic perspective. We provide a list of successful areas of applications in Artificial Intelligence (AI), Image Processing, Robotics and Automation. Deep learning is predictive in its nature rather then inferential and can be viewed as a black-box methodology for high-dimensional function estimation.

For an autonomous agent to fulfill a wide range of user-specified goals at test time, it must be able to learn broadly applicable and general-purpose skill repertoires. Furthermore, to provide the requisite level of generality, these skills must handle raw sensory input such as images. In this paper, we propose an algorithm that acquires such general-purpose skills by combining unsupervised representation learning and reinforcement learning of goal-conditioned policies. Since the particular goals that might be required at test-time are not known in advance, the agent performs a self-supervised "practice" phase where it imagines goals and attempts to achieve them. We learn a visual representation with three distinct purposes: sampling goals for self-supervised practice, providing a structured transformation of raw sensory inputs, and computing a reward signal for goal reaching. We also propose a retroactive goal relabeling scheme to further improve the sample-efficiency of our method. Our off-policy algorithm is efficient enough to learn policies that operate on raw image observations and goals for a real-world robotic system, and substantially outperforms prior techniques.

Recommender systems can mitigate the information overload problem by suggesting users' personalized items. In real-world recommendations such as e-commerce, a typical interaction between the system and its users is -- users are recommended a page of items and provide feedback; and then the system recommends a new page of items. To effectively capture such interaction for recommendations, we need to solve two key problems -- (1) how to update recommending strategy according to user's \textit{real-time feedback}, and 2) how to generate a page of items with proper display, which pose tremendous challenges to traditional recommender systems. In this paper, we study the problem of page-wise recommendations aiming to address aforementioned two challenges simultaneously. In particular, we propose a principled approach to jointly generate a set of complementary items and the corresponding strategy to display them in a 2-D page; and propose a novel page-wise recommendation framework based on deep reinforcement learning, DeepPage, which can optimize a page of items with proper display based on real-time feedback from users. The experimental results based on a real-world e-commerce dataset demonstrate the effectiveness of the proposed framework.

We consider the multi-agent reinforcement learning setting with imperfect information in which each agent is trying to maximize its own utility. The reward function depends on the hidden state (or goal) of both agents, so the agents must infer the other players' hidden goals from their observed behavior in order to solve the tasks. We propose a new approach for learning in these domains: Self Other-Modeling (SOM), in which an agent uses its own policy to predict the other agent's actions and update its belief of their hidden state in an online manner. We evaluate this approach on three different tasks and show that the agents are able to learn better policies using their estimate of the other players' hidden states, in both cooperative and adversarial settings.

北京阿比特科技有限公司