亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The distributed computation of a Nash equilibrium in aggregative games is gaining increased traction in recent years. Of particular interest is the mediator-free scenario where individual players only access or observe the decisions of their neighbors due to practical constraints. Given the competitive rivalry among participating players, protecting the privacy of individual players becomes imperative when sensitive information is involved. We propose a fully distributed equilibrium-computation approach for aggregative games that can achieve both rigorous differential privacy and guaranteed computation accuracy of the Nash equilibrium. This is in sharp contrast to existing differential-privacy solutions for aggregative games that have to either sacrifice the accuracy of equilibrium computation to gain rigorous privacy guarantees, or allow the cumulative privacy budget to grow unbounded, hence losing privacy guarantees, as iteration proceeds. Our approach uses independent noises across players, thus making it effective even when adversaries have access to all shared messages as well as the underlying algorithm structure. The encryption-free nature of the proposed approach, also ensures efficiency in computation and communication. The approach is also applicable in stochastic aggregative games, able to ensure both rigorous differential privacy and guaranteed computation accuracy of the Nash equilibrium when individual players only have stochastic estimates of their pseudo-gradient mappings. Numerical comparisons with existing counterparts confirm the effectiveness of the proposed approach.

相關內容

Many players in the automotive field support scenario-based assessment of automated vehicles (AVs), where individual traffic situations can be tested and, thus, facilitate concluding on the performance of AVs in different situations. Since a large number of different scenarios can occur in real-world traffic, the question is how to find a finite set of relevant scenarios. Scenarios extracted from large real-world datasets represent real-world traffic since real driving data is used. Extracting scenarios, however, is challenging because (1) the scenarios to be tested should assess the AVs behave safely, which conflicts with the fact that the majority of the data contains scenarios that are not interesting from a safety perspective, and (2) extensive data processing is required, which hinders the utilization of large real-world datasets. In this work, we propose an approach for extracting scenarios from real-world driving data. The first step is data preprocessing to tackle the errors and noise in real-world data by reconstructing the data. The second step performs data tagging to label actors' activities, their interactions with each other and the environment. Finally, the scenarios are extracted by searching for combinations of tags. The proposed approach is evaluated using data simulated with CARLA and applied to a part of a large real-world driving dataset, i.e., the Waymo Open Motion Dataset (WOMD). The code and scenarios extracted from WOMD are open to the research community to facilitate the assessment of the automated driving functions in different scenarios.

We study the stochastic Budgeted Multi-Armed Bandit (MAB) problem, where a player chooses from $K$ arms with unknown expected rewards and costs. The goal is to maximize the total reward under a budget constraint. A player thus seeks to choose the arm with the highest reward-cost ratio as often as possible. Current state-of-the-art policies for this problem have several issues, which we illustrate. To overcome them, we propose a new upper confidence bound (UCB) sampling policy, $\omega$-UCB, that uses asymmetric confidence intervals. These intervals scale with the distance between the sample mean and the bounds of a random variable, yielding a more accurate and tight estimation of the reward-cost ratio compared to our competitors. We show that our approach has logarithmic regret and consistently outperforms existing policies in synthetic and real settings.

Reinforcement learning has been widely successful in producing agents capable of playing games at a human level. However, this requires complex reward engineering, and the agent's resulting policy is often unpredictable. Going beyond reinforcement learning is necessary to model a wide range of human playstyles, which can be difficult to represent with a reward function. This paper presents a novel imitation learning approach to generate multiple persona policies for playtesting. Multimodal Generative Adversarial Imitation Learning (MultiGAIL) uses an auxiliary input parameter to learn distinct personas using a single-agent model. MultiGAIL is based on generative adversarial imitation learning and uses multiple discriminators as reward models, inferring the environment reward by comparing the agent and distinct expert policies. The reward from each discriminator is weighted according to the auxiliary input. Our experimental analysis demonstrates the effectiveness of our technique in two environments with continuous and discrete action spaces.

Object detection via inaccurate bounding boxes supervision has boosted a broad interest due to the expensive high-quality annotation data or the occasional inevitability of low annotation quality (\eg tiny objects). The previous works usually utilize multiple instance learning (MIL), which highly depends on category information, to select and refine a low-quality box. Those methods suffer from object drift, group prediction and part domination problems without exploring spatial information. In this paper, we heuristically propose a \textbf{Spatial Self-Distillation based Object Detector (SSD-Det)} to mine spatial information to refine the inaccurate box in a self-distillation fashion. SSD-Det utilizes a Spatial Position Self-Distillation \textbf{(SPSD)} module to exploit spatial information and an interactive structure to combine spatial information and category information, thus constructing a high-quality proposal bag. To further improve the selection procedure, a Spatial Identity Self-Distillation \textbf{(SISD)} module is introduced in SSD-Det to obtain spatial confidence to help select the best proposals. Experiments on MS-COCO and VOC datasets with noisy box annotation verify our method's effectiveness and achieve state-of-the-art performance. The code is available at //github.com/ucas-vg/PointTinyBenchmark/tree/SSD-Det.

Balancing is, especially among players, a highly debated topic of video games. Whether a game is sufficiently balanced greatly influences its reception, player satisfaction, churn rates and success. Yet, conceptions about the definition of balance diverge across industry, academia and players, and different understandings of designing balance can lead to worse player experiences than actual imbalances. This work accumulates concepts of balancing video games from industry and academia and introduces a player-driven approach to optimize player experience and satisfaction. Using survey data from 680 participants and empirically recorded data of over 4 million in-game fights of Guild Wars 2, we aggregate player opinions and requirements, contrast them to the status quo and approach a democratized quantitative technique to approximate closer configurations of balance. We contribute a strategy of refining balancing notions, a methodology of tailoring balance to the actual player base and point to an exemplary artifact that realizes this process.

Completing low-rank matrices from subsampled measurements has received much attention in the past decade. Existing works indicate that $\mathcal{O}(nr\log^2(n))$ datums are required to theoretically secure the completion of an $n \times n$ noisy matrix of rank $r$ with high probability, under some quite restrictive assumptions: (1) the underlying matrix must be incoherent; (2) observations follow the uniform distribution. The restrictiveness is partially due to ignoring the roles of the leverage score and the oracle information of each element. In this paper, we employ the leverage scores to characterize the importance of each element and significantly relax assumptions to: (1) not any other structure assumptions are imposed on the underlying low-rank matrix; (2) elements being observed are appropriately dependent on their importance via the leverage score. Under these assumptions, instead of uniform sampling, we devise an ununiform/biased sampling procedure that can reveal the ``importance'' of each observed element. Our proofs are supported by a novel approach that phrases sufficient optimality conditions based on the Golfing Scheme, which would be of independent interest to the wider areas. Theoretical findings show that we can provably recover an unknown $n\times n$ matrix of rank $r$ from just about $\mathcal{O}(nr\log^2 (n))$ entries, even when the observed entries are corrupted with a small amount of noisy information. The empirical results align precisely with our theories.

This paper describes the systems submitted by team6 for ChatEval, the DSTC 11 Track 4 competition. We present three different approaches to predicting turn-level qualities of chatbot responses based on large language models (LLMs). We report improvement over the baseline using dynamic few-shot examples from a vector store for the prompts for ChatGPT. We also analyze the performance of the other two approaches and report needed improvements for future work. We developed the three systems over just two weeks, showing the potential of LLMs for this task. An ablation study conducted after the challenge deadline shows that the new Llama 2 models are closing the performance gap between ChatGPT and open-source LLMs. However, we find that the Llama 2 models do not benefit from few-shot examples in the same way as ChatGPT.

We investigate time-optimal Multi-Robot Coverage Path Planning (MCPP) for both unweighted and weighted terrains, which aims to minimize the coverage time, defined as the maximum travel time of all robots. Specifically, we focus on a reduction from MCPP to Min-Max Rooted Tree Cover (MMRTC). For the first time, we propose a Mixed Integer Programming (MIP) model to optimally solve MMRTC, resulting in an MCPP solution with a coverage time that is provably at most four times the optimal. Moreover, we propose two suboptimal yet effective heuristics that reduce the number of variables in the MIP model, thus improving its efficiency for large-scale MCPP instances. We show that both heuristics result in reduced-size MIP models that remain complete (i.e., guaranteed to find a solution if one exists) for all MMRTC instances. Additionally, we explore the use of model optimization warm-startup to further improve the efficiency of both the original MIP model and the reduced-size MIP models. We validate the effectiveness of our MIP-based MCPP planner through experiments that compare it with two state-of-the-art MCPP planners on various instances, demonstrating a reduction in the coverage time by an average of 27.65% and 23.24% over them, respectively.

Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on \emph{determinantal point processes} (DPP). By incorporating the diversity metric into best-response dynamics, we develop \emph{diverse fictitious play} and \emph{diverse policy-space response oracle} for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the \emph{gamescape} -- convex polytopes spanned by agents' mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve much lower exploitability than state-of-the-art solvers by finding effective and diverse strategies.

Object detectors usually achieve promising results with the supervision of complete instance annotations. However, their performance is far from satisfactory with sparse instance annotations. Most existing methods for sparsely annotated object detection either re-weight the loss of hard negative samples or convert the unlabeled instances into ignored regions to reduce the interference of false negatives. We argue that these strategies are insufficient since they can at most alleviate the negative effect caused by missing annotations. In this paper, we propose a simple but effective mechanism, called Co-mining, for sparsely annotated object detection. In our Co-mining, two branches of a Siamese network predict the pseudo-label sets for each other. To enhance multi-view learning and better mine unlabeled instances, the original image and corresponding augmented image are used as the inputs of two branches of the Siamese network, respectively. Co-mining can serve as a general training mechanism applied to most of modern object detectors. Experiments are performed on MS COCO dataset with three different sparsely annotated settings using two typical frameworks: anchor-based detector RetinaNet and anchor-free detector FCOS. Experimental results show that our Co-mining with RetinaNet achieves 1.4%~2.1% improvements compared with different baselines and surpasses existing methods under the same sparsely annotated setting.

北京阿比特科技有限公司