亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study the problem of fairly allocating $m$ indivisible items among $n$ agents. Envy-free allocations, in which each agent prefers her bundle to the bundle of every other agent, need not exist in the worst case. However, when agents have additive preferences and the value $v_{i,j}$ of agent $i$ for item $j$ is drawn independently from a distribution $D_i$, envy-free allocations exist with high probability when $m \in \Omega( n \log n / \log \log n )$. In this paper, we study the existence of envy-free allocations under stochastic valuations far beyond the additive setting. We introduce a new stochastic model in which each agent's valuation is sampled by first fixing a worst-case function, and then drawing a uniformly random renaming of the items, independently for each agent. This strictly generalizes known settings; for example, $v_{i,j} \sim D_i$ may be seen as picking a random (instead of a worst-case) additive function before renaming. We prove that random renaming is sufficient to ensure that envy-free allocations exist with high probability in very general settings. When valuations are non-negative and ``order-consistent,'' a valuation class that generalizes additive, budget-additive, unit-demand, and single-minded agents, SD-envy-free allocations (a stronger notion of fairness than envy-freeness) exist for $m \in \omega(n^2)$ when $n$ divides $m$, and SD-EFX allocations exist for all $m \in \omega(n^2)$. The dependence on $n$ is tight, that is, for $m \in O(n^2)$ envy-free allocations don't exist with constant probability. For the case of arbitrary valuations (allowing non-monotone, negative, or mixed-manna valuations) and $n=2$ agents, we prove envy-free allocations exist with probability $1 - \Theta(1/m)$ (and this is tight).

相關內容

We propose a local model-checking proof system for a fragment of CTL. The rules of the proof system are motivated by the well-known fixed-point characterisation of CTL based on unfolding of the temporal operators. To guarantee termination of proofs, we tag the sequents of our proof system with the set of states that have already been explored for the respective temporal formula. We define the semantics of tagged sequents, and then state and prove soundness and completeness of the proof system, as well as termination of proof search for finite-state models.

One of the things that need to change when it comes to machine translation is the models' ability to translate code-switching content, especially with the rise of social media and user-generated content. In this paper, we are proposing a way of training a single machine translation model that is able to translate monolingual sentences from one language to another, along with translating code-switched sentences to either language. This model can be considered a bilingual model in the human sense. For better use of parallel data, we generated synthetic code-switched (CSW) data along with an alignment loss on the encoder to align representations across languages. Using the WMT14 English-French (En-Fr) dataset, the trained model strongly outperforms bidirectional baselines on code-switched translation while maintaining quality for non-code-switched (monolingual) data.

We study the complexity of the problem of verifying differential privacy for while-like programs working over boolean values and making probabilistic choices. Programs in this class can be interpreted into finite-state discrete-time Markov Chains (DTMC). We show that the problem of deciding whether a program is differentially private for specific values of the privacy parameters is PSPACE-complete. To show that this problem is in PSPACE, we adapt classical results about computing hitting probabilities for DTMC. To show PSPACE-hardness we use a reduction from the problem of checking whether a program almost surely terminates or not. We also show that the problem of approximating the privacy parameters that a program provides is PSPACE-hard. Moreover, we investigate the complexity of similar problems also for several relaxations of differential privacy: R\'enyi differential privacy, concentrated differential privacy, and truncated concentrated differential privacy. For these notions, we consider gap-versions of the problem of deciding whether a program is private or not and we show that all of them are PSPACE-complete.

Agents often exert influence when interacting with humans and non-human agents. However, the ethical status of such influence is often unclear. In this paper, we present the SHAPE framework, which lists reasons why influence may be unethical. We draw on literature from descriptive and moral philosophy and connect it to machine learning to help guide ethical considerations when developing algorithms with potential influence. Lastly, we explore mechanisms for governing algorithmic systems that influence people, inspired by mechanisms used in journalism, human subject research, and advertising.

Bootstrapping is behind much of the successes of deep Reinforcement Learning. However, learning the value function via bootstrapping often leads to unstable training due to fast-changing target values. Target Networks are employed to stabilize training by using an additional set of lagging parameters to estimate the target values. Despite the popularity of Target Networks, their effect on the optimization is still misunderstood. In this work, we show that they act as an implicit regularizer which can be beneficial in some cases, but also have disadvantages such as being inflexible and can result in instabilities, even when vanilla TD(0) converges. To overcome these issues, we propose an explicit Functional Regularization alternative that is flexible and a convex regularizer in function space and we theoretically study its convergence. We conduct an experimental study across a range of environments, discount factors, and off-policiness data collections to investigate the effectiveness of the regularization induced by Target Networks and Functional Regularization in terms of performance, accuracy, and stability. Our findings emphasize that Functional Regularization can be used as a drop-in replacement for Target Networks and result in performance improvement. Furthermore, adjusting both the regularization weight and the network update period in Functional Regularization can result in further performance improvements compared to solely adjusting the network update period as typically done with Target Networks. Our approach also enhances the ability to networks to recover accurate $Q$-values.

Understanding the interactions of agents trained with deep reinforcement learning is crucial for deploying agents in games or the real world. In the former, unreasonable actions confuse players. In the latter, that effect is even more significant, as unexpected behavior cause accidents with potentially grave and long-lasting consequences for the involved individuals. In this work, we propose using program synthesis to imitate reinforcement learning policies after seeing a trajectory of the action sequence. Programs have the advantage that they are inherently interpretable and verifiable for correctness. We adapt the state-of-the-art program synthesis system DreamCoder for learning concepts in grid-based environments, specifically, a navigation task and two miniature versions of Atari games, Space Invaders and Asterix. By inspecting the generated libraries, we can make inferences about the concepts the black-box agent has learned and better understand the agent's behavior. We achieve the same by visualizing the agent's decision-making process for the imitated sequences. We evaluate our approach with different types of program synthesizers based on a search-only method, a neural-guided search, and a language model fine-tuned on code.

Learning-based techniques, especially advanced Large Language Models (LLMs) for code, have gained considerable popularity in various software engineering (SE) tasks. However, most existing works focus on designing better learning-based models and pay less attention to the properties of datasets. Learning-based models, including popular LLMs for code, heavily rely on data, and the data's properties (e.g., data distribution) could significantly affect their behavior. We conducted an exploratory study on the distribution of SE data and found that such data usually follows a skewed distribution (i.e., long-tailed distribution) where a small number of classes have an extensive collection of samples, while a large number of classes have very few samples. We investigate three distinct SE tasks and analyze the impacts of long-tailed distribution on the performance of LLMs for code. Our experimental results reveal that the long-tailed distribution has a substantial impact on the effectiveness of LLMs for code. Specifically, LLMs for code perform between 30.0\% and 254.0\% worse on data samples associated with infrequent labels compared to data samples of frequent labels. Our study provides a better understanding of the effects of long-tailed distributions on popular LLMs for code and insights for the future development of SE automation.

Food consumption is one of the biggest contributors to climate change. However, online grocery shoppers often lack the time, motivation, or knowledge to contemplate a food's environmental impact. At the same time, they are concerned with their own well-being. To empower grocery shoppers in making nutritionally and environmentally informed decisions, we investigate the efficacy of the Scale-Score, a label combining nutritional and environmental information to highlight a product's benefit to both the consumer's and the planet's health, without obscuring either information. We conducted an online survey to understand user needs and requirements regarding a joint food label, we developed an open-source mock online grocery environment, and assessed label efficacy. We find that the Scale-Score supports nutritious purchases, yet needs improving regarding sustainability support. Our research shows first insights into design considerations and performance of a combined yet disjoint food label, potentially altering the label design space.

Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry. While many components are familiar, successfully training a SSL method involves a dizzying set of choices from the pretext tasks to training hyper-parameters. Our goal is to lower the barrier to entry into SSL research by laying the foundations and latest SSL recipes in the style of a cookbook. We hope to empower the curious researcher to navigate the terrain of methods, understand the role of the various knobs, and gain the know-how required to explore how delicious SSL can be.

We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons, Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play. DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling' around it, by directly modifying the underlying multi-agent learning dynamics. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform, competing with human expert players.

北京阿比特科技有限公司