亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study the following repeated non-atomic routing game. In every round, nature chooses a state in an i.i.d. manner according to a publicly known distribution, which influences link latency functions. The system planner makes private route recommendations to participating agents, which constitute a fixed fraction, according to a publicly known signaling strategy. The participating agents choose between obeying or not obeying the recommendation according to cumulative regret of the participating agent population in the previous round. The non-participating agents choose route according to myopic best response to a calibrated forecast of the routing decisions of the participating agents. We show that, for parallel networks, if the planner's signal strategy satisfies the obedience condition, then, almost surely, the link flows are asymptotically consistent with the Bayes correlated equilibrium induced by the signaling strategy.

相關內容

The click behavior is the most widely-used user positive feedback in recommendation. However, simply considering each click equally in training may suffer from clickbaits and title-content mismatching, and thus fail to precisely capture users' real satisfaction on items. Dwell time could be viewed as a high-quality quantitative indicator of user preferences on each click, while existing recommendation models do not fully explore the modeling of dwell time. In this work, we focus on reweighting clicks with dwell time in recommendation. Precisely, we first define a new behavior named valid read, which helps to select high-quality click instances for different users and items via dwell time. Next, we propose a normalized dwell time function to reweight click signals in training, which could better guide our model to provide a high-quality and efficient reading. The Click reweighting model achieves significant improvements on both offline and online evaluations in a real-world system.

Consider words of length $n$. The set of all periods of a word of length $n$ is a subset of $\{0,1,2,\ldots,n-1\}$. However, any subset of $\{0,1,2,\ldots,n-1\}$ is not necessarily a valid set of periods. In a seminal paper in 1981, Guibas and Odlyzko have proposed to encode the set of periods of a word into an $n$ long binary string, called an autocorrelation, where a one at position $i$ denotes a period of $i$. They considered the question of recognizing a valid period set, and also studied the number of valid period sets for length $n$, denoted $\kappa_n$. They conjectured that $\ln(\kappa_n)$ asymptotically converges to a constant times $\ln^2(n)$. If improved lower bounds for $\ln(\kappa_n)/\ln^2(n)$ were proposed in 2001, the question of a tight upper bound has remained opened since Guibas and Odlyzko's paper. Here, we exhibit an upper bound for this fraction, which implies its convergence and closes this long standing conjecture. Moreover, we extend our result to find similar bounds for the number of correlations: a generalization of autocorrelations which encodes the overlaps between two strings.

Two aspects of neural networks that have been extensively studied in the recent literature are their function approximation properties and their training by gradient descent methods. The approximation problem seeks accurate approximations with a minimal number of weights. In most of the current literature these weights are fully or partially hand-crafted, showing the capabilities of neural networks but not necessarily their practical performance. In contrast, optimization theory for neural networks heavily relies on an abundance of weights in over-parametrized regimes. This paper balances these two demands and provides an approximation result for shallow networks in $1d$ with non-convex weight optimization by gradient descent. We consider finite width networks and infinite sample limits, which is the typical setup in approximation theory. Technically, this problem is not over-parametrized, however, some form of redundancy reappears as a loss in approximation rate compared to best possible rates.

In this work, we present a new dataset and a computational strategy for a digital coach that aims to guide users in practicing the protocols of self-attachment therapy. Our framework augments a rule-based conversational agent with a deep-learning classifier for identifying the underlying emotion in a user's text response, as well as a deep-learning assisted retrieval method for producing novel, fluent and empathetic utterances. We also craft a set of human-like personas that users can choose to interact with. Our goal is to achieve a high level of engagement during virtual therapy sessions. We evaluate the effectiveness of our framework in a non-clinical trial with N=16 participants, all of whom have had at least four interactions with the agent over the course of five days. We find that our platform is consistently rated higher for empathy, user engagement and usefulness than the simple rule-based framework. Finally, we provide guidelines to further improve the design and performance of the application, in accordance with the feedback received.

A team might lose a powerful incentive to win in a round-robin contest if its final rank does not depend on the outcome of the matches still to be played. The current paper introduces a classification scheme to identify these weakly (where one team is indifferent) or strongly (where both teams are indifferent) stakeless games. The probability of such matches can serve as a novel fairness criterion to compare and evaluate timetabling alternatives. An optimal sequence of games with respect to the proposed metric increases the utility of all stakeholders at almost no price if the scheduling constraints are appropriately defined. Its application is illustrated through the 2021/22 season of the UEFA Champions League. According to our simulation model, the same schedule is optimal across all groups and the option followed in four of the eight groups is the best under a wide set of parameters. Avoiding strongly stakeless matches is verified to be a likely goal in the computer draw of the fixture that remains hidden from the public.

Decentralized cooperative resource allocation schemes for robotic swarms are essential to enable high reliability in high throughput data exchanges. These cooperative schemes require control signaling with the aim to avoid half-duplex problems at the receiver and mitigate interference. We propose two cooperative resource allocation schemes, device sequential and group scheduling, and introduce a control signaling design. We observe that failure in the reception of these control signals leads to non-cooperative behavior and to significant performance degradation. The cause of these failures are identified and specific countermeasures are proposed and evaluated. We compare the proposed resource allocation schemes against the NR sidelink mode 2 resource allocation and show that even though signaling has an important impact on the resource allocation performance, our proposed device sequential and group scheduling resource allocation schemes improve reliability by an order of magnitude compared to sidelink mode 2.

For any pattern $p$ of length at most two, we provide generating functions and asymptotic approximations for the number of $p$-equivalence classes of Dyck paths with catastrophes, where two paths of the same length are $p$-equivalent whenever the positions of the occurrences of the pattern $p$ are the same.

Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefer. We show that the emergent production, consumption, and pricing behaviors respond to environmental conditions in the directions predicted by supply and demand shifts in Microeconomics. We also demonstrate settings where the agents' emergent prices for goods vary over space, reflecting the local abundance of goods. After the price disparities emerge, some agents then discover a niche of transporting goods between regions with different prevailing prices -- a profitable strategy because they can buy goods where they are cheap and sell them where they are expensive. Finally, in a series of ablation experiments, we investigate how choices in the environmental rewards, bartering actions, agent architecture, and ability to consume tradable goods can either aid or inhibit the emergence of this economic behavior. This work is part of the environment development branch of a research program that aims to build human-like artificial general intelligence through multi-agent interactions in simulated societies. By exploring which environment features are needed for the basic phenomena of elementary microeconomics to emerge automatically from learning, we arrive at an environment that differs from those studied in prior multi-agent reinforcement learning work along several dimensions. For example, the model incorporates heterogeneous tastes and physical abilities, and agents negotiate with one another as a grounded form of communication.

Exploration-exploitation is a powerful and practical tool in multi-agent learning (MAL), however, its effects are far from understood. To make progress in this direction, we study a smooth analogue of Q-learning. We start by showing that our learning model has strong theoretical justification as an optimal model for studying exploration-exploitation. Specifically, we prove that smooth Q-learning has bounded regret in arbitrary games for a cost model that explicitly captures the balance between game and exploration costs and that it always converges to the set of quantal-response equilibria (QRE), the standard solution concept for games under bounded rationality, in weighted potential games with heterogeneous learning agents. In our main task, we then turn to measure the effect of exploration in collective system performance. We characterize the geometry of the QRE surface in low-dimensional MAL systems and link our findings with catastrophe (bifurcation) theory. In particular, as the exploration hyperparameter evolves over-time, the system undergoes phase transitions where the number and stability of equilibria can change radically given an infinitesimal change to the exploration parameter. Based on this, we provide a formal theoretical treatment of how tuning the exploration parameter can provably lead to equilibrium selection with both positive as well as negative (and potentially unbounded) effects to system performance.

The accurate and interpretable prediction of future events in time-series data often requires the capturing of representative patterns (or referred to as states) underpinning the observed data. To this end, most existing studies focus on the representation and recognition of states, but ignore the changing transitional relations among them. In this paper, we present evolutionary state graph, a dynamic graph structure designed to systematically represent the evolving relations (edges) among states (nodes) along time. We conduct analysis on the dynamic graphs constructed from the time-series data and show that changes on the graph structures (e.g., edges connecting certain state nodes) can inform the occurrences of events (i.e., time-series fluctuation). Inspired by this, we propose a novel graph neural network model, Evolutionary State Graph Network (EvoNet), to encode the evolutionary state graph for accurate and interpretable time-series event prediction. Specifically, Evolutionary State Graph Network models both the node-level (state-to-state) and graph-level (segment-to-segment) propagation, and captures the node-graph (state-to-segment) interactions over time. Experimental results based on five real-world datasets show that our approach not only achieves clear improvements compared with 11 baselines, but also provides more insights towards explaining the results of event predictions.

北京阿比特科技有限公司