亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study a patrolling game played on a network $Q$, considered as a metric space. The Attacker chooses a point of $Q$ (not necessarily a node) to attack during a chosen time interval of fixed duration. The Patroller chooses a unit speed path on $Q$ and intercepts the attack (and wins) if she visits the attacked point during the attack time interval. This zero-sum game models the problem of protecting roads or pipelines from an adversarial attack. The payoff to the maximizing Patroller is the probability that the attack is intercepted. Our results include the following: (i) a solution to the game for any network $Q$, as long as the time required to carry out the attack is sufficiently short, (ii) a solution to the game for all tree networks that satisfy a certain condition on their extremities, and (iii) a solution to the game for any attack duration for stars with one long arc and the remaining arcs equal in length. We present a conjecture on the solution of the game for arbitrary trees and establish it in certain cases.

相關內容

讓 iOS 8 和 OS X Yosemite 無縫切換的一個新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source:

Due to its high parallelism, belief propagation (BP) decoding can be implemented with high throughput and is a promising solution to meet the ultra-high peak date rate requirement of future communication systems. However, for polar codes, the error-correcting performance of BP decoding is far inferior to that of widely used CRC-aided successive cancellation list (SCL) decoding algorithm. To close the performance gap to SCL, BP list (BPL) decoding expands the exploration of candidate codewords through multiple permuted factor graphs (PFGs). From an implementation perspective, designing a unified and flexible hardware architecture of BPL decoding that supports different PFGs and various code configurations is challenging. In this paper, we propose the first hardware implementation of a BPL decoder for polar codes and overcome the implementation challenge by applying a hardware-friendly algorithm that generates flexible permutations on the fly. First, we derive the permutation selection gain and provide a sequential generation (SG) algorithm to obtain a near-optimal PFG set. We further prove that any permutation can be decomposed into a combination of multiple fixed routings, and we design a low-complexity permutation network to satisfy the decoding schedule. Our BPL decoder not only has a low decoding latency by executing the decoding and permutation generation in parallel, but also supports an arbitrary list size without any area overhead. Experimental results show that, for length-${1024}$ polar codes with a code of one-half, our BPL decoder with a list size ${\mathbb{L}=32}$ has a similar error-correcting performance to SCL with ${\mathbb{L}=4}$ and achieves a throughput of ${25.63}$ Gbps and an area efficiency of ${29.46}$ Gbps/mm${^2}$ at SNR ${=4.0}$ dB, which is $1.99\times$ and $7.08\times$ faster than the state-of-the-art BP flip and SCL decoders, respectively.

Compared to on-policy policy gradient techniques, off-policy model-free deep reinforcement learning (RL) that uses previously gathered data can improve sampling efficiency. However, off-policy learning becomes challenging when the discrepancy between the distributions of the policy of interest and the policies that collected the data increases. Although the well-studied importance sampling and off-policy policy gradient techniques were proposed to compensate for this discrepancy, they usually require a collection of long trajectories that increases the computational complexity and induce additional problems such as vanishing/exploding gradients or discarding many useful experiences. Moreover, their generalization to continuous action domains is strictly limited as they require action probabilities, which is unsuitable for deterministic policies. To overcome these limitations, we introduce a novel policy similarity measure to mitigate the effects of such discrepancy. Our method offers an adequate single-step off-policy correction without any probability estimates, and theoretical results show that it can achieve a contraction mapping with a fixed unique point, which allows "safe" off-policy learning. An extensive set of empirical results indicate that our algorithm substantially improves the state-of-the-art and attains higher returns in fewer steps than the competing methods by efficiently scheduling the learning rate in Q-learning and policy optimization.

Deep Reinforcement Learning (RL) is mainly studied in a setting where the training and the testing environments are similar. But in many practical applications, these environments may differ. For instance, in control systems, the robot(s) on which a policy is learned might differ from the robot(s) on which a policy will run. It can be caused by different internal factors (e.g., calibration issues, system attrition, defective modules) or also by external changes (e.g., weather conditions). There is a need to develop RL methods that generalize well to variations of the training conditions. In this article, we consider the simplest yet hard to tackle generalization setting where the test environment is unknown at train time, forcing the agent to adapt to the system's new dynamics. This online adaptation process can be computationally expensive (e.g., fine-tuning) and cannot rely on meta-RL techniques since there is just a single train environment. To do so, we propose an approach where we learn a subspace of policies within the parameter space. This subspace contains an infinite number of policies that are trained to solve the training environment while having different parameter values. As a consequence, two policies in that subspace process information differently and exhibit different behaviors when facing variations of the train environment. Our experiments carried out over a large variety of benchmarks compare our approach with baselines, including diversity-based methods. In comparison, our approach is simple to tune, does not need any extra component (e.g., discriminator) and learns policies able to gather a high reward on unseen environments.

While there has been substantial success in applying actor-critic methods to continuous control, simpler critic-only methods such as Q-learning often remain intractable in the associated high-dimensional action spaces. However, most actor-critic methods come at the cost of added complexity: heuristics for stabilization, compute requirements as well as wider hyperparameter search spaces. We show that these issues can be largely alleviated via Q-learning by combining action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL). With bang-bang actions, performance of this critic-only approach matches state-of-the-art continuous actor-critic methods when learning from features or pixels. We extend classical bandit examples from cooperative MARL to provide intuition for how decoupled critics leverage state information to coordinate joint optimization, and demonstrate surprisingly strong performance across a wide variety of continuous control tasks.

A recent work of Larsen [Lar23] gave a faster combinatorial alternative to Bansal's SDP algorithm for finding a coloring $x\in\{-1,1\}^n$ that approximately minimizes the discrepancy $\mathrm{disc}(A,x) : = \| A x \|_{\infty}$ of a general real-valued $m\times n$ matrix $A$. Larsen's algorithm runs in $\widetilde{O}(mn^2)$ time compared to Bansal's $\widetilde{O}(mn^{4.5})$-time algorithm, at the price of a slightly weaker logarithmic approximation ratio in terms of the hereditary discrepancy of $A$ [Ban10]. In this work we present a combinatorial $\widetilde{O}(\mathrm{nnz}(A) + n^3)$ time algorithm with the same approximation guarantee as Larsen, which is optimal for tall matrices $m=\mathrm{poly}(n)$. Using a more intricate analysis and fast matrix-multiplication, we achieve $\widetilde{O}(\mathrm{nnz}(A) + n^{2.53})$ time, which breaks cubic runtime for square matrices, and bypasses the barrier of linear-programming approaches [ES14] for which input-sparsity time is currently out of reach. Our algorithm relies on two main ideas: (i) A new sketching technique for finding a projection matrix with short $\ell_2$-basis using implicit leverage-score sampling; (ii) A data structure for faster implementation of the iterative Edge-Walk partial-coloring algorithm of Lovett-Meka, using an alternative analysis that enables ``lazy" batch-updates with low-rank corrections. Our result nearly closes the computational gap between real-valued and binary matrices (set-systems), for which input-sparsity time coloring was very recently obtained [JSS23].

In this paper, we propose new geometrically unfitted space-time Finite Element methods for partial differential equations posed on moving domains of higher order accuracy in space and time. As a model problem, the convection-diffusion problem on a moving domain is studied. For geometrically higher order accuracy, we apply a parametric mapping on a background space-time tensor-product mesh. Concerning discretisation in time, we consider discontinuous Galerkin, as well as related continuous (Petrov-)Galerkin and Galerkin collocation methods. For stabilisation with respect to bad cut configurations and as an extension mechanism that is required for the latter two schemes, a ghost penalty stabilisation is employed. The article puts an emphasis on the techniques that allow to achieve a robust but higher order geometry handling for smooth domains. We investigate the computational properties of the respective methods in a series of numerical experiments. These include studies in different dimensions for different polynomial degrees in space and time, validating the higher order accuracy in both variables.

This article introduces the 50stateSimulations, a collection of simulated congressional districting plans and underlying code developed by the Algorithm-Assisted Redistricting Methodology (ALARM) Project. The 50stateSimulations allow for the evaluation of enacted and other congressional redistricting plans in the United States. While the use of redistricting simulation algorithms has become standard in academic research and court cases, any simulation analysis requires non-trivial efforts to combine multiple data sets, identify state-specific redistricting criteria, implement complex simulation algorithms, and summarize and visualize simulation outputs. We have developed a complete workflow that facilitates this entire process of simulation-based redistricting analysis for the congressional districts of all 50 states. The resulting 50stateSimulations include ensembles of simulated 2020 congressional redistricting plans and necessary replication data. We also provide the underlying code, which serves as a template for customized analyses. All data and code are free and publicly available. This article details the design, creation, and validation of the data.

The acoustic inverse obstacle scattering problem consists of determining the shape of a domain from measurements of the scattered far field due to some set of incident fields (probes). For a penetrable object with known sound speed, this can be accomplished by treating the boundary alone as an unknown curve. Alternatively, one can treat the entire object as unknown and use a more general volumetric representation, without making use of the known sound speed. Both lead to strongly nonlinear and nonconvex optimization problems for which recursive linearization provides a useful framework for numerical analysis. After extending our shape optimization approach developed earlier for impenetrable bodies, we carry out a systematic study of both methods and compare their performance on a variety of examples. Our findings indicate that the volumetric approach is more robust, even though the number of degrees of freedom is significantly larger. We conclude with a discussion of this phenomenon and potential directions for further research.

Neural networks have shown tremendous growth in recent years to solve numerous problems. Various types of neural networks have been introduced to deal with different types of problems. However, the main goal of any neural network is to transform the non-linearly separable input data into more linearly separable abstract features using a hierarchy of layers. These layers are combinations of linear and nonlinear functions. The most popular and common non-linearity layers are activation functions (AFs), such as Logistic Sigmoid, Tanh, ReLU, ELU, Swish and Mish. In this paper, a comprehensive overview and survey is presented for AFs in neural networks for deep learning. Different classes of AFs such as Logistic Sigmoid and Tanh based, ReLU based, ELU based, and Learning based are covered. Several characteristics of AFs such as output range, monotonicity, and smoothness are also pointed out. A performance comparison is also performed among 18 state-of-the-art AFs with different networks on different types of data. The insights of AFs are presented to benefit the researchers for doing further research and practitioners to select among different choices. The code used for experimental comparison is released at: \url{//github.com/shivram1987/ActivationFunctions}.

Bid optimization for online advertising from single advertiser's perspective has been thoroughly investigated in both academic research and industrial practice. However, existing work typically assume competitors do not change their bids, i.e., the wining price is fixed, leading to poor performance of the derived solution. Although a few studies use multi-agent reinforcement learning to set up a cooperative game, they still suffer the following drawbacks: (1) They fail to avoid collusion solutions where all the advertisers involved in an auction collude to bid an extremely low price on purpose. (2) Previous works cannot well handle the underlying complex bidding environment, leading to poor model convergence. This problem could be amplified when handling multiple objectives of advertisers which are practical demands but not considered by previous work. In this paper, we propose a novel multi-objective cooperative bid optimization formulation called Multi-Agent Cooperative bidding Games (MACG). MACG sets up a carefully designed multi-objective optimization framework where different objectives of advertisers are incorporated. A global objective to maximize the overall profit of all advertisements is added in order to encourage better cooperation and also to protect self-bidding advertisers. To avoid collusion, we also introduce an extra platform revenue constraint. We analyze the optimal functional form of the bidding formula theoretically and design a policy network accordingly to generate auction-level bids. Then we design an efficient multi-agent evolutionary strategy for model optimization. Offline experiments and online A/B tests conducted on the Taobao platform indicate both single advertiser's objective and global profit have been significantly improved compared to state-of-art methods.

北京阿比特科技有限公司