亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

During the past decades, evolutionary computation (EC) has demonstrated promising potential in solving various complex optimization problems of relatively small scales. Nowadays, however, ongoing developments in modern science and engineering are bringing increasingly grave challenges to the conventional EC paradigm in terms of scalability. As problem scales increase, on the one hand, the encoding spaces (i.e., dimensions of the decision vectors) are intrinsically larger; on the other hand, EC algorithms often require growing numbers of function evaluations (and probably larger population sizes as well) to work properly. To meet such emerging challenges, not only does it require delicate algorithm designs, but more importantly, a high-performance computing framework is indispensable. Hence, we develop a distributed GPU-accelerated algorithm library -- EvoX. First, we propose a generalized workflow for implementing general EC algorithms. Second, we design a scalable computing framework for running EC algorithms on distributed GPU devices. Third, we provide user-friendly interfaces to both researchers and practitioners for benchmark studies as well as extended real-world applications. To comprehensively assess the performance of EvoX, we conduct a series of experiments, including: (i) scalability test via numerical optimization benchmarks with problem dimensions/population sizes up to millions; (ii) acceleration test via a neuroevolution task with multiple GPU nodes; (iii) extensibility demonstration via the application to reinforcement learning tasks on the OpenAI Gym. The code of EvoX is available at //github.com/EMI-Group/EvoX.

相關內容

Classic no-regret online prediction algorithms, including variants of the Upper Confidence Bound ($\texttt{UCB}$) algorithm, $\texttt{Hedge}$, and $\texttt{EXP3}$, are inherently unfair by design. The unfairness stems from their very objective of playing the most rewarding arm as many times as possible while ignoring the less rewarding ones among $N$ arms. In this paper, we consider a fair prediction problem in the stochastic setting with hard lower bounds on the rate of accrual of rewards for a set of arms. We study the problem in both full and bandit feedback settings. Using queueing-theoretic techniques in conjunction with adversarial learning, we propose a new online prediction policy called $\texttt{BanditQ}$ that achieves the target reward rates while achieving a regret and target rate violation penalty of $O(T^{\frac{3}{4}}).$ In the full-information setting, the regret bound can be further improved to $O(\sqrt{T})$ when considering the average regret over the entire horizon of length $T$. The proposed policy is efficient and admits a black-box reduction from the fair prediction problem to the standard MAB problem with a carefully defined sequence of rewards. The design and analysis of the $\texttt{BanditQ}$ policy involve a novel use of the potential function method in conjunction with scale-free second-order regret bounds and a new self-bounding inequality for the reward gradients, which are of independent interest.

Multi-Access Point Coordination (MAPC) will be a key feature in next generation Wi-Fi 8 networks. MAPC aims to improve the overall network performance by allowing Access Points (APs) to share time, frequency and/or spatial resources in a coordinated way, thus alleviating inter-AP contention and enabling new multi-AP channel access strategies. This paper introduces a framework to support periodic MAPC transmissions on top of current Wi-Fi operation. We first focus on the problem of creating multi-AP groups that can transmit simultaneously to leverage Spatial Reuse opportunities. Then, once these groups are created, we study different scheduling algorithms to determine which groups will transmit at every MAPC transmission. Two different types of algorithms are tested: per-AP, and per-Group. While per-AP algorithms base their scheduling decision on the buffer state of individual APs, per-Group algorithms do that taking into account the aggregate buffer state of all APs in a group. Obtained results -- targetting worst-case delay -- show that per-AP based algorithms outperform per-Group ones due to their ability to guarantee that the AP with a) more packets, or b) with the oldest waiting packet in the buffer is selected.

Allocation and planning with a collection of tasks and a group of agents is an important problem in multiagent systems. One commonly faced bottleneck is scalability, as in general the multiagent model increases exponentially in size with the number of agents. We consider the combination of random task assignment and multiagent planning under multiple-objective constraints, and show that this problem can be decentralised to individual agent-task models. We present an algorithm of point-oriented Pareto computation, which checks whether a point corresponding to given cost and probability thresholds for our formal problem is feasible or not. If the given point is infeasible, our algorithm finds a Pareto-optimal point which is closest to the given point. We provide the first multi-objective model checking framework that simultaneously uses GPU and multi-core acceleration. Our framework manages CPU and GPU devices as a load balancing problem for parallel computation. Our experiments demonstrate that parallelisation achieves significant run time speed-up over sequential computation.

We propose two market designs for the optimal day-ahead scheduling of energy exchanges within renewable energy communities. The first one implements a cooperative demand side management scheme inside a community where members objectives are coupled through grid tariffs, whereas the second allows in addition the valuation of excess generation in the community and on the retail market. Both designs are formulated as centralized optimization problems first, and as non cooperative games then. In the latter case, the existence and efficiency of the corresponding (Generalized) Nash Equilibria are rigorously studied and proven, and distributed implementations of iterative solution algorithms for finding these equilibria are proposed, with proofs of convergence. The models are tested on a use-case made by 55 members with PV generation, storage and flexible appliances, and compared with a benchmark situation where members act individually (situation without community). We compute the global REC costs and individual bills, inefficiencies of the decentralized models compared to the centralized optima, as well as technical indices such as self-consumption ratio, self-sufficiency ratio, and peak-to-average ratio.

It is commonly assumed that the end-to-end networking performance of edge offloading is purely dictated by that of the network connectivity between end devices and edge computing facilities, where ongoing innovation in 5G/6G networking can help. However, with the growing complexity of edge-offloaded computation and dynamic load balancing requirements, an offloaded task often goes through a multi-stage pipeline that spans across multiple compute nodes and proxies interconnected via a dedicated network fabric within a given edge computing facility. As the latest hardware-accelerated transport technologies such as RDMA and GPUDirect RDMA are adopted to build such network fabric, there is a need for good understanding of the full potential of these technologies in the context of computation offload and the effect of different factors such as GPU scheduling and characteristics of computation on the net performance gain achievable by these technologies. This paper unveils detailed insights into the latency overhead in typical machine learning (ML)-based computation pipelines and analyzes the potential benefits of adopting hardware-accelerated communication. To this end, we build a model-serving framework that supports various communication mechanisms. Using the framework, we identify performance bottlenecks in state-of-the-art model-serving pipelines and show how hardware-accelerated communication can alleviate them. For example, we show that GPUDirect RDMA can save 15--50\% of model-serving latency, which amounts to 70--160 ms.

Graph neural networks (GNNs) are a type of deep learning models that learning over graphs, and have been successfully applied in many domains. Despite the effectiveness of GNNs, it is still challenging for GNNs to efficiently scale to large graphs. As a remedy, distributed computing becomes a promising solution of training large-scale GNNs, since it is able to provide abundant computing resources. However, the dependency of graph structure increases the difficulty of achieving high-efficiency distributed GNN training, which suffers from the massive communication and workload imbalance. In recent years, many efforts have been made on distributed GNN training, and an array of training algorithms and systems have been proposed. Yet, there is a lack of systematic review on the optimization techniques from graph processing to distributed execution. In this survey, we analyze three major challenges in distributed GNN training that are massive feature communication, the loss of model accuracy and workload imbalance. Then we introduce a new taxonomy for the optimization techniques in distributed GNN training that address the above challenges. The new taxonomy classifies existing techniques into four categories that are GNN data partition, GNN batch generation, GNN execution model, and GNN communication protocol.We carefully discuss the techniques in each category. In the end, we summarize existing distributed GNN systems for multi-GPUs, GPU-clusters and CPU-clusters, respectively, and give a discussion about the future direction on scalable GNNs.

Graph machine learning has been extensively studied in both academic and industry. However, as the literature on graph learning booms with a vast number of emerging methods and techniques, it becomes increasingly difficult to manually design the optimal machine learning algorithm for different graph-related tasks. To tackle the challenge, automated graph machine learning, which aims at discovering the best hyper-parameter and neural architecture configuration for different graph tasks/data without manual design, is gaining an increasing number of attentions from the research community. In this paper, we extensively discuss automated graph machine approaches, covering hyper-parameter optimization (HPO) and neural architecture search (NAS) for graph machine learning. We briefly overview existing libraries designed for either graph machine learning or automated machine learning respectively, and further in depth introduce AutoGL, our dedicated and the world's first open-source library for automated graph machine learning. Last but not least, we share our insights on future research directions for automated graph machine learning. This paper is the first systematic and comprehensive discussion of approaches, libraries as well as directions for automated graph machine learning.

Classic machine learning methods are built on the $i.i.d.$ assumption that training and testing data are independent and identically distributed. However, in real scenarios, the $i.i.d.$ assumption can hardly be satisfied, rendering the sharp drop of classic machine learning algorithms' performances under distributional shifts, which indicates the significance of investigating the Out-of-Distribution generalization problem. Out-of-Distribution (OOD) generalization problem addresses the challenging setting where the testing distribution is unknown and different from the training. This paper serves as the first effort to systematically and comprehensively discuss the OOD generalization problem, from the definition, methodology, evaluation to the implications and future directions. Firstly, we provide the formal definition of the OOD generalization problem. Secondly, existing methods are categorized into three parts based on their positions in the whole learning pipeline, namely unsupervised representation learning, supervised model learning and optimization, and typical methods for each category are discussed in detail. We then demonstrate the theoretical connections of different categories, and introduce the commonly used datasets and evaluation metrics. Finally, we summarize the whole literature and raise some future directions for OOD generalization problem. The summary of OOD generalization methods reviewed in this survey can be found at //out-of-distribution-generalization.com.

Reinforcement learning (RL) is a popular paradigm for addressing sequential decision tasks in which the agent has only limited environmental feedback. Despite many advances over the past three decades, learning in many domains still requires a large amount of interaction with the environment, which can be prohibitively expensive in realistic scenarios. To address this problem, transfer learning has been applied to reinforcement learning such that experience gained in one task can be leveraged when starting to learn the next, harder task. More recently, several lines of research have explored how tasks, or data samples themselves, can be sequenced into a curriculum for the purpose of learning a problem that may otherwise be too difficult to learn from scratch. In this article, we present a framework for curriculum learning (CL) in reinforcement learning, and use it to survey and classify existing CL methods in terms of their assumptions, capabilities, and goals. Finally, we use our framework to find open problems and suggest directions for future RL curriculum learning research.

The demand for artificial intelligence has grown significantly over the last decade and this growth has been fueled by advances in machine learning techniques and the ability to leverage hardware acceleration. However, in order to increase the quality of predictions and render machine learning solutions feasible for more complex applications, a substantial amount of training data is required. Although small machine learning models can be trained with modest amounts of data, the input for training larger models such as neural networks grows exponentially with the number of parameters. Since the demand for processing training data has outpaced the increase in computation power of computing machinery, there is a need for distributing the machine learning workload across multiple machines, and turning the centralized into a distributed system. These distributed systems present new challenges, first and foremost the efficient parallelization of the training process and the creation of a coherent model. This article provides an extensive overview of the current state-of-the-art in the field by outlining the challenges and opportunities of distributed machine learning over conventional (centralized) machine learning, discussing the techniques used for distributed machine learning, and providing an overview of the systems that are available.

北京阿比特科技有限公司