亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In many board games and other abstract games, patterns have been used as features that can guide automated game-playing agents. Such patterns or features often represent particular configurations of pieces, empty positions, etc., which may be relevant for a game's strategies. Their use has been particularly prevalent in the game of Go, but also many other games used as benchmarks for AI research. In this paper, we formulate a design and efficient implementation of spatial state-action features for general games. These are patterns that can be trained to incentivise or disincentivise actions based on whether or not they match variables of the state in a local area around action variables. We provide extensive details on several design and implementation choices, with a primary focus on achieving a high degree of generality to support a wide variety of different games using different board geometries or other graphs. Secondly, we propose an efficient approach for evaluating active features for any given set of features. In this approach, we take inspiration from heuristics used in problems such as SAT to optimise the order in which parts of patterns are matched and prune unnecessary evaluations. This approach is defined for a highly general and abstract description of the problem -- phrased as optimising the order in which propositions of formulas in disjunctive normal form are evaluated -- and may therefore also be of interest to other types of problems than board games. An empirical evaluation on 33 distinct games in the Ludii general game system demonstrates the efficiency of this approach in comparison to a naive baseline, as well as a baseline based on prefix trees, and demonstrates that the additional efficiency significantly improves the playing strength of agents using the features to guide search.

相關內容

Automator是蘋果公司為他們的Mac OS X系統開發的一款軟件。 只要通過點擊拖拽鼠標等操作就可以將一系列動作組合成一個工作流,從而幫助你自動的(可重復的)完成一些復雜的工作。Automator還能橫跨很多不同種類的程序,包括:查找器、Safari網絡瀏覽器、iCal、地址簿或者其他的一些程序。它還能和一些第三方的程序一起工作,如微軟的Office、Adobe公司的Photoshop或者Pixelmator等。

The successful reconstruction of perceptual experiences from human brain activity has provided insights into the neural representations of sensory experiences. However, reconstructing arbitrary sounds has been avoided due to the complexity of temporal sequences in sounds and the limited resolution of neuroimaging modalities. To overcome these challenges, leveraging the hierarchical nature of brain auditory processing could provide a path toward reconstructing arbitrary sounds. Previous studies have indicated a hierarchical homology between the human auditory system and deep neural network (DNN) models. Furthermore, advancements in audio-generative models enable to transform compressed representations back into high-resolution sounds. In this study, we introduce a novel sound reconstruction method that combines brain decoding of auditory features with an audio-generative model. Using fMRI responses to natural sounds, we found that the hierarchical sound features of a DNN model could be better decoded than spectrotemporal features. We then reconstructed the sound using an audio transformer that disentangled compressed temporal information in the decoded DNN features. Our method shows unconstrained sounds reconstruction capturing sound perceptual contents and quality and generalizability by reconstructing sound categories not included in the training dataset. Reconstructions from different auditory regions remain similar to actual sounds, highlighting the distributed nature of auditory representations. To see whether the reconstructions mirrored actual subjective perceptual experiences, we performed an experiment involving selective auditory attention to one of overlapping sounds. The results tended to resemble the attended sound than the unattended. These findings demonstrate that our proposed model provides a means to externalize experienced auditory contents from human brain activity.

Comprehending how the brain interacts with the external world through generated neural signals is crucial for determining its working mechanism, treating brain diseases, and understanding intelligence. Although many theoretical models have been proposed, they have thus far been difficult to integrate and develop. In this study, we were inspired in part by grid cells in creating a more general and robust grid module and constructing an interactive and self-reinforcing cognitive system together with Bayesian reasoning, an approach called space-division and exploration-exploitation with grid-feedback (Grid-SD2E). Here, a grid module can be used as an interaction medium between the outside world and a system, as well as a self-reinforcement medium within the system. The space-division and exploration-exploitation (SD2E) receives the 0/1 signals of a grid through its space-division (SD) module. The system described in this paper is also a theoretical model derived from experiments conducted by other researchers and our experience on neural decoding. Herein, we analyse the rationality of the system based on the existing theories in both neuroscience and cognitive science, and attempt to propose special and general rules to explain the different interactions between people and between people and the external world. What's more, based on this model, the smallest computing unit is extracted, which is analogous to a single neuron in the brain.

We explore sim-to-real transfer of deep reinforcement learning controllers for a heavy vehicle with active suspensions designed for traversing rough terrain. While related research primarily focuses on lightweight robots with electric motors and fast actuation, this study uses a forestry vehicle with a complex hydraulic driveline and slow actuation. We simulate the vehicle using multibody dynamics and apply system identification to find an appropriate set of simulation parameters. We then train policies in simulation using various techniques to mitigate the sim-to-real gap, including domain randomization, action delays, and a reward penalty to encourage smooth control. In reality, the policies trained with action delays and a penalty for erratic actions perform at nearly the same level as in simulation. In experiments on level ground, the motion trajectories closely overlap when turning to either side, as well as in a route tracking scenario. When faced with a ramp that requires active use of the suspensions, the simulated and real motions are in close alignment. This shows that the actuator model together with system identification yields a sufficiently accurate model of the actuators. We observe that policies trained without the additional action penalty exhibit fast switching or bang-bang control. These present smooth motions and high performance in simulation but transfer poorly to reality. We find that policies make marginal use of the local height map for perception, showing no indications of look-ahead planning. However, the strong transfer capabilities entail that further development concerning perception and performance can be largely confined to simulation.

In this paper, we study the statistical efficiency of Reinforcement Learning in Mean-Field Control (MFC) and Mean-Field Game (MFG) with general function approximation. We introduce a new concept called Mean-Field Model-Based Eluder Dimension (MBED), which subsumes a rich family of Mean-Field RL problems. Additionally, we propose algorithms based on Optimistic Maximal Likelihood Estimation, which can return an $\epsilon$-optimal policy for MFC or an $\epsilon$-Nash Equilibrium policy for MFG, with sample complexity polynomial w.r.t. relevant parameters and independent of the number of states, actions and the number of agents. Notably, our results only require a mild assumption of Lipschitz continuity on transition dynamics and avoid strong structural assumptions in previous work. Finally, in the tabular setting, given the access to a generative model, we establish an exponential lower bound for MFC setting, while providing a novel sample-efficient model elimination algorithm to approximate equilibrium in MFG setting. Our results reveal a fundamental separation between RL for single-agent, MFC, and MFG from the sample efficiency perspective.

New technologies for sensing and communication act as enablers for cooperative driving applications. Sensors are able to detect objects in the surrounding environment and information such as their current location is exchanged among vehicles. In order to cope with the vehicles' mobility, such information is required to be as fresh as possible for proper operation of cooperative driving applications. The age of information (AoI) has been proposed as a metric for evaluating freshness of information; recently also within the context of intelligent transportation systems (ITS). We investigate mechanisms to reduce the AoI of data transported in form of beacon messages while controlling their emission rate. We aim to balance packet collision probability and beacon frequency using the average peak age of information (PAoI) as a metric. This metric, however, only accounts for the generation time of the data but not for application-specific aspects, such as the location of the transmitting vehicle. We thus propose a new way of interpreting the AoI by considering information context, thereby incorporating vehicles' locations. As an example, we characterize such importance using the orientation and the distance of the involved vehicles. In particular, we introduce a weighting coefficient used in combination with the PAoI to evaluate the information freshness, thus emphasizing on information from more important neighbors. We further design the beaconing approach in a way to meet a given AoI requirement, thus, saving resources on the wireless channel while keeping the AoI minimal. We illustrate the effectiveness of our approach in Manhattan-like urban scenarios, reaching pre-specified targets for the AoI of beacon messages.

Existing traffic signal control systems rely on oversimplified rule-based methods, and even RL-based methods are often suboptimal and unstable. To address this, we propose a cooperative multi-objective architecture called Multi-Objective Multi-Agent Deep Deterministic Policy Gradient (MOMA-DDPG), which estimates multiple reward terms for traffic signal control optimization using age-decaying weights. Our approach involves two types of agents: one focuses on optimizing local traffic at each intersection, while the other aims to optimize global traffic throughput. We evaluate our method using real-world traffic data collected from an Asian country's traffic cameras. Despite the inclusion of a global agent, our solution remains decentralized as this agent is no longer necessary during the inference stage. Our results demonstrate the effectiveness of MOMA-DDPG, outperforming state-of-the-art methods across all performance metrics. Additionally, our proposed system minimizes both waiting time and carbon emissions. Notably, this paper is the first to link carbon emissions and global agents in traffic signal control.

Graph mining tasks arise from many different application domains, ranging from social networks, transportation, E-commerce, etc., which have been receiving great attention from the theoretical and algorithm design communities in recent years, and there has been some pioneering work using the hotly researched reinforcement learning (RL) techniques to address graph data mining tasks. However, these graph mining algorithms and RL models are dispersed in different research areas, which makes it hard to compare different algorithms with each other. In this survey, we provide a comprehensive overview of RL models and graph mining and generalize these algorithms to Graph Reinforcement Learning (GRL) as a unified formulation. We further discuss the applications of GRL methods across various domains and summarize the method description, open-source codes, and benchmark datasets of GRL methods. Finally, we propose possible important directions and challenges to be solved in the future. This is the latest work on a comprehensive survey of GRL literature, and this work provides a global view for researchers as well as a learning resource for researchers outside the domain. In addition, we create an online open-source for both interested researchers who want to enter this rapidly developing domain and experts who would like to compare GRL methods.

A community reveals the features and connections of its members that are different from those in other communities in a network. Detecting communities is of great significance in network analysis. Despite the classical spectral clustering and statistical inference methods, we notice a significant development of deep learning techniques for community detection in recent years with their advantages in handling high dimensional network data. Hence, a comprehensive overview of community detection's latest progress through deep learning is timely to both academics and practitioners. This survey devises and proposes a new taxonomy covering different categories of the state-of-the-art methods, including deep learning-based models upon deep neural networks, deep nonnegative matrix factorization and deep sparse filtering. The main category, i.e., deep neural networks, is further divided into convolutional networks, graph attention networks, generative adversarial networks and autoencoders. The survey also summarizes the popular benchmark data sets, model evaluation metrics, and open-source implementations to address experimentation settings. We then discuss the practical applications of community detection in various domains and point to implementation scenarios. Finally, we outline future directions by suggesting challenging topics in this fast-growing deep learning field.

Time Series Classification (TSC) is an important and challenging problem in data mining. With the increase of time series data availability, hundreds of TSC algorithms have been proposed. Among these methods, only a few have considered Deep Neural Networks (DNNs) to perform this task. This is surprising as deep learning has seen very successful applications in the last years. DNNs have indeed revolutionized the field of computer vision especially with the advent of novel deeper architectures such as Residual and Convolutional Neural Networks. Apart from images, sequential data such as text and audio can also be processed with DNNs to reach state-of-the-art performance for document classification and speech recognition. In this article, we study the current state-of-the-art performance of deep learning algorithms for TSC by presenting an empirical study of the most recent DNN architectures for TSC. We give an overview of the most successful deep learning applications in various time series domains under a unified taxonomy of DNNs for TSC. We also provide an open source deep learning framework to the TSC community where we implemented each of the compared approaches and evaluated them on a univariate TSC benchmark (the UCR/UEA archive) and 12 multivariate time series datasets. By training 8,730 deep learning models on 97 time series datasets, we propose the most exhaustive study of DNNs for TSC to date.

This paper presents a new multi-objective deep reinforcement learning (MODRL) framework based on deep Q-networks. We propose the use of linear and non-linear methods to develop the MODRL framework that includes both single-policy and multi-policy strategies. The experimental results on two benchmark problems including the two-objective deep sea treasure environment and the three-objective mountain car problem indicate that the proposed framework is able to converge to the optimal Pareto solutions effectively. The proposed framework is generic, which allows implementation of different deep reinforcement learning algorithms in different complex environments. This therefore overcomes many difficulties involved with standard multi-objective reinforcement learning (MORL) methods existing in the current literature. The framework creates a platform as a testbed environment to develop methods for solving various problems associated with the current MORL. Details of the framework implementation can be referred to //www.deakin.edu.au/~thanhthi/drl.htm.

北京阿比特科技有限公司