亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this work we consider a generalization of the well-known multivehicle routing problem: given a network, a set of agents occupying a subset of its nodes, and a set of tasks, we seek a minimum cost sequence of movements subject to the constraint that each task is visited by some agent at least once. The classical version of this problem assumes a central computational server that observes the entire state of the system perfectly and directs individual agents according to a centralized control scheme. In contrast, we assume that there is no centralized server and that each agent is an individual processor with no a priori knowledge of the underlying network (including task and agent locations). Moreover, our agents possess strictly local communication and sensing capabilities (restricted to a fixed radius around their respective locations), aligning more closely with several real-world multiagent applications. These restrictions introduce many challenges that are overcome through local information sharing and direct coordination between agents. We present a fully distributed, online, and scalable reinforcement learning algorithm for this problem whereby agents self-organize into local clusters and independently apply a multiagent rollout scheme locally to each cluster. We demonstrate empirically via extensive simulations that there exists a critical sensing radius beyond which the distributed rollout algorithm begins to improve over a greedy base policy. This critical sensing radius grows proportionally to the $\log^*$ function of the size of the network, and is, therefore, a small constant for any relevant network. Our decentralized reinforcement learning algorithm achieves approximately a factor of two cost improvement over the base policy for a range of radii bounded from below and above by two and three times the critical sensing radius, respectively.

相關內容

In this paper we consider online distributed learning problems. Online distributed learning refers to the process of training learning models on distributed data sources. In our setting a set of agents need to cooperatively train a learning model from streaming data. Differently from federated learning, the proposed approach does not rely on a central server but only on peer-to-peer communications among the agents. This approach is often used in scenarios where data cannot be moved to a centralized location due to privacy, security, or cost reasons. In order to overcome the absence of a central server, we propose a distributed algorithm that relies on a quantized, finite-time coordination protocol to aggregate the locally trained models. Furthermore, our algorithm allows for the use of stochastic gradients during local training. Stochastic gradients are computed using a randomly sampled subset of the local training data, which makes the proposed algorithm more efficient and scalable than traditional gradient descent. In our paper, we analyze the performance of the proposed algorithm in terms of the mean distance from the online solution. Finally, we present numerical results for a logistic regression task.

We study sensor/agent data collection and collaboration policies for parameter estimation, accounting for resource constraints and correlation between observations collected by distinct sensors/agents. Specifically, we consider a group of sensors/agents each samples from different variables of a multivariate Gaussian distribution and has different estimation objectives, and we formulate a sensor/agent's data collection and collaboration policy design problem as a Fisher information maximization (or Cramer-Rao bound minimization) problem. When the knowledge of correlation between variables is available, we analytically identify two particular scenarios: (1) where the knowledge of the correlation between samples cannot be leveraged for collaborative estimation purposes and (2) where the optimal data collection policy involves investing scarce resources to collaboratively sample and transfer information that is not of immediate interest and whose statistics are already known, with the sole goal of increasing the confidence on the estimate of the parameter of interest. When the knowledge of certain correlation is unavailable but collaboration may still be worthwhile, we propose novel ways to apply multi-armed bandit algorithms to learn the optimal data collection and collaboration policy in our distributed parameter estimation problem and demonstrate that the proposed algorithms, DOUBLE-F, DOUBLE-Z, UCB-F, UCB-Z, are effective through simulations.

The role of cryptocurrencies within the financial systems has been expanding rapidly in recent years among investors and institutions. It is therefore crucial to investigate the phenomena and develop statistical methods able to capture their interrelationships, the links with other global systems, and, at the same time, the serial heterogeneity. For these reasons, this paper introduces hidden Markov regression models for jointly estimating quantiles and expectiles of cryptocurrency returns using regime-switching copulas. The proposed approach allows us to focus on extreme returns and describe their temporal evolution by introducing time-dependent coefficients evolving according to a latent Markov chain. Moreover to model their time-varying dependence structure, we consider elliptical copula functions defined by state-specific parameters. Maximum likelihood estimates are obtained via an Expectation-Maximization algorithm. The empirical analysis investigates the relationship between daily returns of five cryptocurrencies and major world market indices.

Recently, DARPA launched the ShELL program, which aims to explore how experience sharing can benefit distributed lifelong learning agents in adapting to new challenges. In this paper, we address this issue by conducting both theoretical and empirical research on distributed multi-task reinforcement learning (RL), where a group of $N$ agents collaboratively solves $M$ tasks without prior knowledge of their identities. We approach the problem by formulating it as linearly parameterized contextual Markov decision processes (MDPs), where each task is represented by a context that specifies the transition dynamics and rewards. To tackle this problem, we propose an algorithm called DistMT-LSVI. First, the agents identify the tasks, and then they exchange information through a central server to derive $\epsilon$-optimal policies for the tasks. Our research demonstrates that to achieve $\epsilon$-optimal policies for all $M$ tasks, a single agent using DistMT-LSVI needs to run a total number of episodes that is at most $\tilde{\mathcal{O}}({d^3H^6(\epsilon^{-2}+c_{\rm sep}^{-2})}\cdot M/N)$, where $c_{\rm sep}>0$ is a constant representing task separability, $H$ is the horizon of each episode, and $d$ is the feature dimension of the dynamics and rewards. Notably, DistMT-LSVI improves the sample complexity of non-distributed settings by a factor of $1/N$, as each agent independently learns $\epsilon$-optimal policies for all $M$ tasks using $\tilde{\mathcal{O}}(d^3H^6M\epsilon^{-2})$ episodes. Additionally, we provide numerical experiments conducted on OpenAI Gym Atari environments that validate our theoretical findings.

The heterogeneous, geographically distributed infrastructure of fog computing poses challenges in data replication, data distribution, and data mobility for fog applications. Fog computing is still missing the necessary abstractions to manage application data, and fog application developers need to re-implement data management for every new piece of software. Proposed solutions are limited to certain application domains, such as the IoT, are not flexible in regard to network topology, or do not provide the means for applications to control the movement of their data. In this paper, we present FReD, a data replication middleware for the fog. FReD serves as a building block for configurable fog data distribution and enables low-latency, high-bandwidth, and privacy-sensitive applications. FReD is a common data access interface across heterogeneous infrastructure and network topologies, provides transparent and controllable data distribution, and can be integrated with applications from different domains. To evaluate our approach, we present a prototype implementation of FReD and show the benefits of developing with FReD using three case studies of fog computing applications.

Artificial intelligence has achieved significant success in handling complex tasks in recent years. This success is due to advances in machine learning algorithms and hardware acceleration. In order to obtain more accurate results and solve more complex problems, algorithms must be trained with more data. This huge amount of data could be time-consuming to process and require a great deal of computation. This solution could be achieved by distributing the data and algorithm across several machines, which is known as distributed machine learning. There has been considerable effort put into distributed machine learning algorithms, and different methods have been proposed so far. In this article, we present a comprehensive summary of the current state-of-the-art in the field through the review of these algorithms. We divide this algorithms in classification and clustering (traditional machine learning), deep learning and deep reinforcement learning groups. Distributed deep learning has gained more attention in recent years and most of studies worked on this algorithms. As a result, most of the articles we discussed here belong to this category. Based on our investigation of algorithms, we highlight limitations that should be addressed in future research.

The feedback that users provide through their choices (e.g., clicks, purchases) is one of the most common types of data readily available for training search and recommendation algorithms. However, myopically training systems based on choice data may only improve short-term engagement, but not the long-term sustainability of the platform and the long-term benefits to its users, content providers, and other stakeholders. In this paper, we thus develop a new framework in which decision makers (e.g., platform operators, regulators, users) can express long-term goals for the behavior of the platform (e.g., fairness, revenue distribution, legal requirements). These goals take the form of exposure or impact targets that go well beyond individual sessions, and we provide new control-based algorithms to achieve these goals. In particular, the controllers are designed to achieve the stated long-term goals with minimum impact on short-term engagement. Beyond the principled theoretical derivation of the controllers, we evaluate the algorithms on both synthetic and real-world data. While all controllers perform well, we find that they provide interesting trade-offs in efficiency, robustness, and the ability to plan ahead.

This paper presents a decentralized algorithm for solving distributed convex optimization problems in dynamic networks with time-varying objectives. The unique feature of the algorithm lies in its ability to accommodate a wide range of communication systems, including previously unsupported ones, by abstractly modeling the information exchange in the network. Specifically, it supports a novel communication protocol based on the "over-the-air" function computation (OTA-C) technology, that is designed for an efficient and truly decentralized implementation of the consensus step of the algorithm. Unlike existing OTA-C protocols, the proposed protocol does not require the knowledge of network graph structure or channel state information, making it particularly suitable for decentralized implementation over ultra-dense wireless networks with time-varying topologies and fading channels. Furthermore, the proposed algorithm synergizes with the "superiorization" methodology, allowing the development of new distributed algorithms with enhanced performance for the intended applications. The theoretical analysis establishes sufficient conditions for almost sure convergence of the algorithm to a common time-invariant solution for all agents, assuming such a solution exists. Our algorithm is applied to a real-world distributed random field estimation problem, showcasing its efficacy in terms of convergence speed, scalability, and spectral efficiency. Furthermore, we present a superiorized version of our algorithm that achieves faster convergence with significantly reduced energy consumption compared to the unsuperiorized algorithm.

Effective multi-robot teams require the ability to move to goals in complex environments in order to address real-world applications such as search and rescue. Multi-robot teams should be able to operate in a completely decentralized manner, with individual robot team members being capable of acting without explicit communication between neighbors. In this paper, we propose a novel game theoretic model that enables decentralized and communication-free navigation to a goal position. Robots each play their own distributed game by estimating the behavior of their local teammates in order to identify behaviors that move them in the direction of the goal, while also avoiding obstacles and maintaining team cohesion without collisions. We prove theoretically that generated actions approach a Nash equilibrium, which also corresponds to an optimal strategy identified for each robot. We show through extensive simulations that our approach enables decentralized and communication-free navigation by a multi-robot system to a goal position, and is able to avoid obstacles and collisions, maintain connectivity, and respond robustly to sensor noise.

The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.

北京阿比特科技有限公司