亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We present algorithms for uniformly covering an unknown indoor region with a swarm of simple, anonymous and autonomous mobile agents. The exploration of such regions is made difficult by the lack of a common global reference frame, severe degradation of radio-frequency communication, and numerous ground obstacles. We propose addressing these challenges by using airborne agents, such as Micro Air Vehicles, in dual capacity, both as mobile explorers and (once they land) as beacons that help other agents navigate the region. The algorithms we propose are designed for a swarm of simple, identical, ant-like agents with local sensing capabilities. The agents enter the region, which is discretized as a graph, over time from one or more entry points and are tasked with occupying all of its vertices. Unlike many works in this area, we consider the requirement of informing an outside operator with limited information that the coverage mission is complete. Even with this additional requirement we show, both through simulations and mathematical proofs, that the dual role concept results in linear-time termination, while also besting many well-known algorithms in the literature in terms of energy use.

相關內容

In this work, we present a positivity-preserving high-order flux reconstruction method for the polyatomic Boltzmann--BGK equation augmented with a discrete velocity model that ensures the scheme is discretely conservative. Through modeling the internal degrees of freedom, the approach is further extended to polyatomic molecules and can encompass arbitrary constitutive laws. The approach is validated on a series of large-scale complex numerical experiments, ranging from shock-dominated flows computed on unstructured grids to direct numerical simulation of three-dimensional compressible turbulent flows, the latter of which is the first instance of such a flow computed by directly solving the Boltzmann equation. The results show the ability of the scheme to directly resolve shock structures without any ad hoc numerical shock capturing method and correctly approximate turbulent flow phenomena in a consistent manner with the hydrodynamic equations.

Tag-based visual-inertial localization is a lightweight method for enabling autonomous data collection missions of low-cost unmanned aerial vehicles (UAVs) in indoor construction environments. However, finding the optimal tag configuration (i.e., number, size, and location) on dynamic construction sites remains challenging. This paper proposes a perception-aware genetic algorithm-based tag placement planner (PGA-TaPP) to determine the optimal tag configuration using 4D-BIM, considering the project progress, safety requirements, and UAV's localizability. The proposed method provides a 4D plan for tag placement by maximizing the localizability in user-specified regions of interest (ROIs) while limiting the installation costs. Localizability is quantified using the Fisher information matrix (FIM) and encapsulated in navigable grids. The experimental results show the effectiveness of our method in finding an optimal 4D tag placement plan for the robust localization of UAVs on under-construction indoor sites.

We present Visual Navigation and Locomotion over obstacles (ViNL), which enables a quadrupedal robot to navigate unseen apartments while stepping over small obstacles that lie in its path (e.g., shoes, toys, cables), similar to how humans and pets lift their feet over objects as they walk. ViNL consists of: (1) a visual navigation policy that outputs linear and angular velocity commands that guides the robot to a goal coordinate in unfamiliar indoor environments; and (2) a visual locomotion policy that controls the robot's joints to avoid stepping on obstacles while following provided velocity commands. Both the policies are entirely "model-free", i.e. sensors-to-actions neural networks trained end-to-end. The two are trained independently in two entirely different simulators and then seamlessly co-deployed by feeding the velocity commands from the navigator to the locomotor, entirely "zero-shot" (without any co-training). While prior works have developed learning methods for visual navigation or visual locomotion, to the best of our knowledge, this is the first fully learned approach that leverages vision to accomplish both (1) intelligent navigation in new environments, and (2) intelligent visual locomotion that aims to traverse cluttered environments without disrupting obstacles. On the task of navigation to distant goals in unknown environments, ViNL using just egocentric vision significantly outperforms prior work on robust locomotion using privileged terrain maps (+32.8% success and -4.42 collisions per meter). Additionally, we ablate our locomotion policy to show that each aspect of our approach helps reduce obstacle collisions. Videos and code at //www.joannetruong.com/projects/vinl.html

Butterfly Optimization Algorithm (BOA) is a recent metaheuristic that has been used in several optimization problems. In this paper, we propose a new version of the algorithm (xBOA) based on the crossover operator and compare its results to the original BOA and 3 other variants recently introduced in the literature. We also proposed a framework for solving the unknown area exploration problem with energy constraints using metaheuristics in both single- and multi-robot scenarios. This framework allowed us to benchmark the performances of different metaheuristics for the robotics exploration problem. We conducted several experiments to validate this framework and used it to compare the effectiveness of xBOA with wellknown metaheuristics used in the literature through 5 evaluation criteria. Although BOA and xBOA are not optimal in all these criteria, we found that BOA can be a good alternative to many metaheuristics in terms of the exploration time, while xBOA is more robust to local optima; has better fitness convergence; and achieves better exploration rates than the original BOA and its other variants.

Educational robots allow experimenting with a variety of principles from mechanics, electronics, and informatics. Here we propose ClipBot, a low-cost, do-it-yourself, robot whose skeleton is made of two paper clips. An Arduino nano microcontroller actuates two servo motors that move the paper clips. However, such mechanical configuration confers physical impairments to movement. This creates the need for and allows experimenting with artificial intelligence methods to overcome hardware limitations. We report our experience in the usage of this robot during the study week 'fascinating informatics', organized by the Swiss Foundation Schweizer Jugend Forscht (www.sjf.ch). Students at the high school level were asked to implement a genetic algorithm to optimize the movements of the robot until it learned to walk. Such a methodology allowed the robot to learn the motor actuation scheme yielding straight movement in the forward direction using less than 20 iterations.

The utilization of renewable energy technologies, particularly hydrogen, has seen a boom in interest and has spread throughout the world. Ethanol steam reformation is one of the primary methods capable of producing hydrogen efficiently and reliably. This paper provides an in-depth study of the reformulated system both theoretically and numerically, as well as a plan to explore the possibility of converting the system into its conservation form. Lastly, we offer an overview of several numerical approaches for solving the general first-order quasi-linear hyperbolic equation to the particular model for ethanol steam reforming (ESR). We conclude by presenting some results that would enable the usage of these ODE/PDE solvers to be used in non-linear model predictive control (NMPC) algorithms and discuss the limitations of our approach and directions for future work.

Compared to on-policy policy gradient techniques, off-policy model-free deep reinforcement learning (RL) that uses previously gathered data can improve sampling efficiency. However, off-policy learning becomes challenging when the discrepancy between the distributions of the policy of interest and the policies that collected the data increases. Although the well-studied importance sampling and off-policy policy gradient techniques were proposed to compensate for this discrepancy, they usually require a collection of long trajectories that increases the computational complexity and induce additional problems such as vanishing/exploding gradients or discarding many useful experiences. Moreover, their generalization to continuous action domains is strictly limited as they require action probabilities, which is unsuitable for deterministic policies. To overcome these limitations, we introduce a novel policy similarity measure to mitigate the effects of such discrepancy. Our method offers an adequate single-step off-policy correction without any probability estimates, and theoretical results show that it can achieve a contraction mapping with a fixed unique point, which allows "safe" off-policy learning. An extensive set of empirical results indicate that our algorithm substantially improves the state-of-the-art and attains higher returns in fewer steps than the competing methods by efficiently scheduling the learning rate in Q-learning and policy optimization.

Multi-agent interactions are increasingly important in the context of reinforcement learning, and the theoretical foundations of policy gradient methods have attracted surging research interest. We investigate the global convergence of natural policy gradient (NPG) algorithms in multi-agent learning. We first show that vanilla NPG may not have parameter convergence, i.e., the convergence of the vector that parameterizes the policy, even when the costs are regularized (which enabled strong convergence guarantees in the policy space in the literature). This non-convergence of parameters leads to stability issues in learning, which becomes especially relevant in the function approximation setting, where we can only operate on low-dimensional parameters, instead of the high-dimensional policy. We then propose variants of the NPG algorithm, for several standard multi-agent learning scenarios: two-player zero-sum matrix and Markov games, and multi-player monotone games, with global last-iterate parameter convergence guarantees. We also generalize the results to certain function approximation settings. Note that in our algorithms, the agents take symmetric roles. Our results might also be of independent interest for solving nonconvex-nonconcave minimax optimization problems with certain structures. Simulations are also provided to corroborate our theoretical findings.

Designing reinforcement learning (RL) agents is typically a difficult process that requires numerous design iterations. Learning can fail for a multitude of reasons, and standard RL methods provide too few tools to provide insight into the exact cause. In this paper, we show how to integrate value decomposition into a broad class of actor-critic algorithms and use it to assist in the iterative agent-design process. Value decomposition separates a reward function into distinct components and learns value estimates for each. These value estimates provide insight into an agent's learning and decision-making process and enable new training methods to mitigate common problems. As a demonstration, we introduce SAC-D, a variant of soft actor-critic (SAC) adapted for value decomposition. SAC-D maintains similar performance to SAC, while learning a larger set of value predictions. We also introduce decomposition-based tools that exploit this information, including a new reward influence metric, which measures each reward component's effect on agent decision-making. Using these tools, we provide several demonstrations of decomposition's use in identifying and addressing problems in the design of both environments and agents. Value decomposition is broadly applicable and easy to incorporate into existing algorithms and workflows, making it a powerful tool in an RL practitioner's toolbox.

This manuscript portrays optimization as a process. In many practical applications the environment is so complex that it is infeasible to lay out a comprehensive theoretical model and use classical algorithmic theory and mathematical optimization. It is necessary as well as beneficial to take a robust approach, by applying an optimization method that learns as one goes along, learning from experience as more aspects of the problem are observed. This view of optimization as a process has become prominent in varied fields and has led to some spectacular success in modeling and systems that are now part of our daily lives.

北京阿比特科技有限公司