亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The exponential increase of wireless devices with highly demanding services such as streaming video, gaming and others has imposed several challenges to Wireless Local Area Networks (WLANs). In the context of Wi-Fi, IEEE 802.11ax brings high-data rates in dense user deployments. Additionally, it comes with new flexible features in the physical layer as dynamic Clear-Channel-Assessment (CCA) threshold with the goal of improving spatial reuse (SR) in response to radio spectrum scarcity in dense scenarios. In this paper, we formulate the Transmission Power (TP) and CCA configuration problem with an objective of maximizing fairness and minimizing station starvation. We present four main contributions into distributed SR optimization using Multi-Agent Multi-Armed Bandits (MAMABs). First, we propose to reduce the action space given the large cardinality of action combination of TP and CCA threshold values per Access Point (AP). Second, we present two deep Multi-Agent Contextual MABs (MA-CMABs), named Sample Average Uncertainty (SAU)-Coop and SAU-NonCoop as cooperative and non-cooperative versions to improve SR. In addition, we present an analysis whether cooperation is beneficial using MA-MABs solutions based on the e-greedy, Upper Bound Confidence (UCB) and Thompson techniques. Finally, we propose a deep reinforcement transfer learning technique to improve adaptability in dynamic environments. Simulation results show that cooperation via SAU-Coop algorithm contributes to an improvement of 14.7% in cumulative throughput, and 32.5% improvement of PLR when compared with no cooperation approaches. Finally, under dynamic scenarios, transfer learning contributes to mitigation of service drops for at least 60% of the total of users.

相關內容

We develop an extension of posterior sampling for reinforcement learning (PSRL) that is suited for a continuing agent-environment interface and integrates naturally into agent designs that scale to complex environments. The approach, continuing PSRL, maintains a statistically plausible model of the environment and follows a policy that maximizes expected $\gamma$-discounted return in that model. At each time, with probability $1-\gamma$, the model is replaced by a sample from the posterior distribution over environments. For a choice of discount factor that suitably depends on the horizon $T$, we establish an $\tilde{O}(\tau S \sqrt{A T})$ bound on the Bayesian regret, where $S$ is the number of environment states, $A$ is the number of actions, and $\tau$ denotes the reward averaging time, which is a bound on the duration required to accurately estimate the average reward of any policy. Our work is the first to formalize and rigorously analyze the resampling approach with randomized exploration.

Modern mobile applications such as navigation services and ride-sharing platforms rely heavily on geospatial technologies, most critically predictions of the time required for a vehicle to traverse a particular route, or the so-called estimated time of arrival (ETA). There are various methods used in practice, which differ in terms of the geographic granularity at which the predictive model is trained -- e.g., segment-based methods predict travel time at the level of road segments (or a combination of several adjacent road segments) and then aggregate across the route, whereas route-based methods use generic information about the trip, such as origin and destination, to predict travel time. Though various forms of these methods have been developed, there has been no rigorous theoretical comparison regarding their accuracies, and empirical studies have, in many cases, drawn opposite conclusions. We provide the first theoretical analysis of the predictive accuracy of various ETA prediction methods. In a finite-sample setting, we give mild conditions under which a segment-based method is more accurate than a wide variety of route-based methods. Then we analyze an asymptotic setting in which the number of trip observations grows with the size of the road network. Under a broad range of trip-generating processes on a grid network, we show that a class of very simple segment-based methods is at least as good, up to a logarithmic factor, as any possible predictor. In other words, segment-based methods are asymptotically optimal up to a logarithmic factor. Our work highlights that the accuracy of ETA prediction is driven not just by the sophistication of the model but also by the spatial granularity at which those methods are applied.

In this paper, we study dimension reduction techniques for large-scale controlled stochastic differential equations (SDEs). The drift of the considered SDEs contains a polynomial term satisfying a one-sided growth condition. Such nonlinearities in high dimensional settings occur, e.g., when stochastic reaction diffusion equations are discretized in space. We provide a brief discussion around existence, uniqueness and stability of solutions. (Almost) stability then is the basis for new concepts of Gramians that we introduce and study in this work. With the help of these Gramians, dominant subspace are identified leading to a balancing related highly accurate reduced order SDE. We provide an algebraic error criterion and an error analysis of the propose model reduction schemes. The paper is concluded by applying our method to spatially discretized reaction diffusion equations.

In a temporal graph, each edge is available at specific points in time. Such an availability point is often represented by a ''temporal edge'' that can be traversed from its tail only at a specific departure time, for arriving in its head after a specific travel time. In such a graph, the connectivity from one node to another is naturally captured by the existence of a temporal path where temporal edges can be traversed one after the other. When imposing constraints on how much time it is possible to wait at a node in-between two temporal edges, it then becomes interesting to consider temporal walks where it is allowed to visit several times the same node, possibly at different times. We study the complexity of computing minimum-cost temporal walks from a single source under waiting-time constraints in a temporal graph, and ask under which conditions this problem can be solved in linear time. Our main result is a linear time algorithm when the input temporal graph is given by its (classical) space-time representation. We use an algebraic framework for manipulating abstract costs, enabling the optimization of a large variety of criteria or even combinations of these. It allows to improve previous results for several criteria such as number of edges or overall waiting time even without waiting constraints. It saves a logarithmic factor for all criteria under waiting constraints. Interestingly, we show that a logarithmic factor in the time complexity appears to be necessary with a more basic input consisting of a single ordered list of temporal edges (sorted either by arrival times or departure times). We indeed show equivalence between the space-time representation and a representation with two ordered lists.

Prophet inequalities consist of many beautiful statements that establish tight performance ratios between online and offline allocation algorithms. Typically, tightness is established by constructing an algorithmic guarantee and a worst-case instance separately, whose bounds match as a result of some "ingenuity". In this paper, we instead formulate the construction of the worst-case instance as an optimization problem, which directly finds the tight ratio without needing to construct two bounds separately. Our analysis of this complex optimization problem involves identifying the structure in a new "Type Coverage" dual problem. It can be seen as akin to the celebrated Magician and OCRS problems, except more general in that it can also provide tight ratios relative to the optimal offline allocation, whereas the earlier problems only concerns the ex-ante relaxation of the offline problem. Through this analysis, our paper provides a unified framework that derives new prophet inequalities and recovers existing ones, including two important new results. First, we show that the "oblivious" method of setting a static threshold due to Chawla et al. (2020), surprisingly, is best-possible among all static threshold algorithms, under any number $k$ of units. We emphasize that this result is derived without needing to explicitly find any counterexample instances. Our result implies that the asymptotic convergence rate of $1-O(\sqrt{\log k/k})$ for static threshold algorithms, first established in Hajiaghayi et al. (2007), is tight; this confirms for the first time a separation with the convergence rate of adaptive algorithms, which is $1-\Theta(\sqrt{1/k})$ due to Alaei (2014). Second, turning to the IID setting, our framework allows us to numerically illustrate the tight guarantee (of adaptive algorithms) under any number $k$ of starting units. Our guarantees for $k>1$ exceed the state-of-the-art.

Cooperative sensing and heterogeneous information fusion are critical to realize vehicular cyber-physical systems (VCPSs). This paper makes the first attempt to quantitatively measure the quality of VCPS by designing a new metric called Age of View (AoV). Specifically, we first present the system architecture where heterogeneous information can be cooperatively sensed and uploaded via vehicle-to-infrastructure (V2I) communications in vehicular edge computing (VEC). Logical views are constructed by fusing the heterogeneous information at edge nodes. Further, we formulate the problem by deriving a cooperative sensing model based on the multi-class M/G/1 priority queue, and defining the AoV by modeling the timeliness, completeness and consistency of the logical views. On this basis, a multi-agent deep reinforcement learning solution is proposed. In particular, the system state includes vehicle sensed information, edge cached information and view requirements. The vehicle action space consists of the sensing frequencies and uploading priorities of information. A difference-reward-based credit assignment is designed to divide the system reward, which is defined as the VCPS quality, into the difference reward for vehicles. Edge node allocates V2I bandwidth to vehicles based on predicted vehicle trajectories and view requirements. Finally, we build the simulation model and give a comprehensive performance evaluation, which conclusively demonstrates the superiority of the proposed solution.

Recent advances in sensing technologies, wireless communications, and computing paradigms drive the evolution of vehicles in becoming an intelligent and electronic consumer products. This paper investigates enabling digital twins in vehicular edge computing (DT-VEC) via cooperative sensing and uploading, and makes the first attempt to achieve the quality-cost tradeoff in DT-VEC. First, a DT-VEC architecture is presented, where the heterogeneous information can be sensed by vehicles and uploaded to the edge node via vehicle-to-infrastructure (V2I) communications. The digital twins are modeled based on the sensed information, which are utilized to from the logical view to reflect the real-time status of the physical vehicular environment. Second, we derive the cooperative sensing model and the V2I uploading model by considering the timeliness and consistency of digital twins, and the redundancy, sensing cost, and transmission cost. On this basis, a bi-objective problem is formulated to maximize the system quality and minimize the system cost. Third, we propose a solution based on multi-agent multi-objective (MAMO) deep reinforcement learning, where a dueling critic network is proposed to evaluate the agent action based on the value of state and the advantage of action. Finally, we give a comprehensive performance evaluation, demonstrating the superiority of MAMO.

Modern video streaming services require quality assurance of the presented audiovisual material. Quality assurance mechanisms allow streaming platforms to provide quality levels that are considered sufficient to yield user satisfaction, with the least possible amount of data transferred. A variety of measures and approaches have been developed to control video quality, e.g., by adapting it to network conditions. These include objective matrices of the quality and thresholds identified by means of subjective perceptual judgments. The former group of matrices has recently gained the attention of (multi)media researchers. They call this area of study ``Quality of Experience'' (QoE). In this paper, we present a review of QoE's theoretical models together with a discussion of their properties and implications for the field. We argue that most of them represent the bottom-up approach to modeling. Such models focus on describing as many variables as possible, but with a limited ability to investigate the causal relationship between them; therefore, the applicability of the findings in practice is limited. To advance the field, we therefore propose a structural, top-down model of video QoE that describes causal relationships among variables. We hope that our framework will facilitate designing comparable experiments in the domain.

Bid optimization for online advertising from single advertiser's perspective has been thoroughly investigated in both academic research and industrial practice. However, existing work typically assume competitors do not change their bids, i.e., the wining price is fixed, leading to poor performance of the derived solution. Although a few studies use multi-agent reinforcement learning to set up a cooperative game, they still suffer the following drawbacks: (1) They fail to avoid collusion solutions where all the advertisers involved in an auction collude to bid an extremely low price on purpose. (2) Previous works cannot well handle the underlying complex bidding environment, leading to poor model convergence. This problem could be amplified when handling multiple objectives of advertisers which are practical demands but not considered by previous work. In this paper, we propose a novel multi-objective cooperative bid optimization formulation called Multi-Agent Cooperative bidding Games (MACG). MACG sets up a carefully designed multi-objective optimization framework where different objectives of advertisers are incorporated. A global objective to maximize the overall profit of all advertisements is added in order to encourage better cooperation and also to protect self-bidding advertisers. To avoid collusion, we also introduce an extra platform revenue constraint. We analyze the optimal functional form of the bidding formula theoretically and design a policy network accordingly to generate auction-level bids. Then we design an efficient multi-agent evolutionary strategy for model optimization. Offline experiments and online A/B tests conducted on the Taobao platform indicate both single advertiser's objective and global profit have been significantly improved compared to state-of-art methods.

Object tracking is challenging as target objects often undergo drastic appearance changes over time. Recently, adaptive correlation filters have been successfully applied to object tracking. However, tracking algorithms relying on highly adaptive correlation filters are prone to drift due to noisy updates. Moreover, as these algorithms do not maintain long-term memory of target appearance, they cannot recover from tracking failures caused by heavy occlusion or target disappearance in the camera view. In this paper, we propose to learn multiple adaptive correlation filters with both long-term and short-term memory of target appearance for robust object tracking. First, we learn a kernelized correlation filter with an aggressive learning rate for locating target objects precisely. We take into account the appropriate size of surrounding context and the feature representations. Second, we learn a correlation filter over a feature pyramid centered at the estimated target position for predicting scale changes. Third, we learn a complementary correlation filter with a conservative learning rate to maintain long-term memory of target appearance. We use the output responses of this long-term filter to determine if tracking failure occurs. In the case of tracking failures, we apply an incrementally learned detector to recover the target position in a sliding window fashion. Extensive experimental results on large-scale benchmark datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods in terms of efficiency, accuracy, and robustness.

北京阿比特科技有限公司