亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study countably infinite Markov decision processes (MDPs) with real-valued transition rewards. Every infinite run induces the following sequences of payoffs: 1. Point payoff (the sequence of directly seen transition rewards), 2. Mean payoff (the sequence of the sums of all rewards so far, divided by the number of steps), and 3. Total payoff (the sequence of the sums of all rewards so far). For each payoff type, the objective is to maximize the probability that the $\liminf$ is non-negative. We establish the complete picture of the strategy complexity of these objectives, i.e., how much memory is necessary and sufficient for $\varepsilon$-optimal (resp. optimal) strategies. Some cases can be won with memoryless deterministic strategies, while others require a step counter, a reward counter, or both.

相關內容

In recent work, Lemire (2021) presented a fast algorithm to convert number strings into binary floating-point numbers. The algorithm has been adopted by several important systems: e.g., it is part of the runtime libraries of GCC 12, Rust 1.55, and Go 1.16. The algorithm parses any number string with a significand containing no more than 19 digits into an IEEE floating-point number. However, there is a check leading to a fallback function to ensure correctness. This fallback function is never called in practice. We prove that the fallback is unnecessary. Thus we can slightly simplify the algorithm and its implementation.

Semi-supervised learning (SSL) has achieved great success in leveraging a large amount of unlabeled data to learn a promising classifier. A popular approach is pseudo-labeling that generates pseudo labels only for those unlabeled data with high-confidence predictions. As for the low-confidence ones, existing methods often simply discard them because these unreliable pseudo labels may mislead the model. Nevertheless, we highlight that these data with low-confidence pseudo labels can be still beneficial to the training process. Specifically, although the class with the highest probability in the prediction is unreliable, we can assume that this sample is very unlikely to belong to the classes with the lowest probabilities. In this way, these data can be also very informative if we can effectively exploit these complementary labels, i.e., the classes that a sample does not belong to. Inspired by this, we propose a novel Contrastive Complementary Labeling (CCL) method that constructs a large number of reliable negative pairs based on the complementary labels and adopts contrastive learning to make use of all the unlabeled data. Extensive experiments demonstrate that CCL significantly improves the performance on top of existing methods. More critically, our CCL is particularly effective under the label-scarce settings. For example, we yield an improvement of 2.43% over FixMatch on CIFAR-10 only with 40 labeled data.

We present a novel mathematical framework for computing the number of maintenance cycles in a system with critical and non-critical components, where "critical" (CR) means that the component's failure is fatal for the system's operation and renders any more repairs inapplicable, whereas "noncritical" (NC) means that the component can undergo corrective maintenance (replacement or minimal repair) whenever it fails, provided that the CR component is still in operation. Whenever the NC component fails, the CR component can optionally be preventively replaced. We extend traditional renewal theory (whether classical or generalized) for various maintenance scenarios for a system composed of one CR and one NC component in order to compute the average number of renewals of NC under the restriction ("bound") necessitated by CR. We also develop approximations in closed form for the proposed "bounded" renewal functions. We validate our formulas by simulations on a variety of component lifetime distributions, including actual lifetime distributions of wind turbine components.

We study the classic facility location setting, where we are given $n$ clients and $m$ possible facility locations in some arbitrary metric space, and want to choose a location to build a facility. The exact same setting also arises in spatial social choice, where voters are the clients and the goal is to choose a candidate or outcome, with the distance from a voter to an outcome representing the cost of this outcome for the voter (e.g., based on their ideological differences). Unlike most previous work, we do not focus on a single objective to optimize (e.g., the total distance from clients to the facility, or the maximum distance, etc.), but instead attempt to optimize several different objectives simultaneously. More specifically, we consider the $l$-centrum family of objectives, which includes the total distance, max distance, and many others. We present tight bounds on how well any pair of such objectives (e.g., max and sum) can be simultaneously approximated compared to their optimum outcomes. In particular, we show that for any such pair of objectives, it is always possible to choose an outcome which simultaneously approximates both objectives within a factor of $1+\sqrt{2}$, and give a precise characterization of how this factor improves as the two objectives being optimized become more similar. For $q>2$ different centrum objectives, we show that it is always possible to approximate all $q$ of these objectives within a small constant, and that this constant approaches 3 as $q\rightarrow \infty$. Our results show that when optimizing only a few simultaneous objectives, it is always possible to form an outcome which is a significantly better than 3 approximation for all of these objectives.

In this paper we present a general, axiomatical framework for the rigorous approximation of invariant densities and other important statistical features of dynamics. We approximate the system trough a finite element reduction, by composing the associated transfer operator with a suitable finite dimensional projection (a discretization scheme) as in the well-known Ulam method. We introduce a general framework based on a list of properties (of the system and of the projection) that need to be verified so that we can take advantage of a so-called ``coarse-fine'' strategy. This strategy is a novel method in which we exploit information coming from a coarser approximation of the system to get useful information on a finer approximation, speeding up the computation. This coarse-fine strategy allows a precise estimation of invariant densities and also allows to estimate rigorously the speed of mixing of the system by the speed of mixing of a coarse approximation of it, which can easily be estimated by the computer. The estimates obtained here are rigourous, i.e., they come with exact error bounds that are guaranteed to hold and take into account both the discretiazation and the approximations induced by finite-precision arithmetic. We apply this framework to several discretization schemes and examples of invariant density computation from previous works, obtaining a remarkable reduction in computation time. We have implemented the numerical methods described here in the Julia programming language, and released our implementation publicly as a Julia package.

According to the public goods game (PGG) protocol, participants decide freely whether they want to contribute to a common pool or not, but the resulting benefit is distributed equally. A conceptually similar dilemma situation may emerge when participants consider if they claim a common resource but the related cost is covered equally by all group members. The latter establishes a reversed form of the original public goods game (R-PGG). In this work, we show that R-PGG is equivalent to PGG in several circumstances, starting from the traditional analysis, via the evolutionary approach in unstructured populations, to Monte Carlo simulations in structured populations. However, there are also cases when the behavior of R-PGG could be surprisingly different from the outcome of PGG. When the key parameters are heterogeneous, for instance, the results of PGG and R-PGG could be diverse even if we apply the same amplitudes of heterogeneity. We find that the heterogeneity in R-PGG generally impedes cooperation, while the opposite is observed for PGG. These diverse system reactions can be understood if we follow how payoff functions change when introducing heterogeneity in the parameter space. This analysis also reveals the distinct roles of cooperator and defector strategies in the mentioned games. Our observations may hopefully stimulate further research to check the potential differences between PGG and R-PGG due to the alternative complexity of conditions.

Even though query evaluation is a fundamental task in databases, known classifications of conjunctive queries by their fine-grained complexity only apply to queries without self-joins. We study how self-joins affect enumeration complexity, with the aim of building upon the known results to achieve general classifications. We do this by examining the extension of two known dichotomies: one with respect to linear delay, and one with respect to constant delay after linear preprocessing. As this turns out to be an intricate investigation, this paper is structured as an example-driven discussion that initiates this analysis. We show enumeration algorithms that rely on self-joins to efficiently evaluate queries that otherwise cannot be answered with the same guarantees. Due to these additional tractable cases, the hardness proofs are more complex than the self-join-free case. We show how to harness a known tagging technique to prove hardness of queries with self-joins. Our study offers sufficient conditions and necessary conditions for tractability and settles the cases of queries of low arity and queries with cyclic cores. Nevertheless, many cases remain open.

We propose a monitoring strategy for efficient and robust estimation of disease prevalence and case numbers within closed and enumerated populations such as schools, workplaces, or retirement communities. The proposed design relies largely on voluntary testing, notoriously biased (e.g., in the case of COVID-19) due to non-representative sampling. The approach yields unbiased and comparatively precise estimates with no assumptions about factors underlying selection of individuals for voluntary testing, building on the strength of what can be a small random sampling component. This component unlocks a previously proposed "anchor stream" estimator, a well-calibrated alternative to classical capture-recapture (CRC) estimators based on two data streams. We show here that this estimator is equivalent to a direct standardization based on "capture", i.e., selection (or not) by the voluntary testing program, made possible by means of a key parameter identified by design. This equivalency simultaneously allows for novel two-stream CRC-like estimation of general means (e.g., of continuous variables such as antibody levels or biomarkers). For inference, we propose adaptations of a Bayesian credible interval when estimating case counts and bootstrapping when estimating means of continuous variables. We use simulations to demonstrate significant precision benefits relative to random sampling alone.

We study fair multi-objective reinforcement learning in which an agent must learn a policy that simultaneously achieves high reward on multiple dimensions of a vector-valued reward. Motivated by the fair resource allocation literature, we model this as an expected welfare maximization problem, for some non-linear fair welfare function of the vector of long-term cumulative rewards. One canonical example of such a function is the Nash Social Welfare, or geometric mean, the log transform of which is also known as the Proportional Fairness objective. We show that even approximately optimal optimization of the expected Nash Social Welfare is computationally intractable even in the tabular case. Nevertheless, we provide a novel adaptation of Q-learning that combines non-linear scalarized learning updates and non-stationary action selection to learn effective policies for optimizing nonlinear welfare functions. We show that our algorithm is provably convergent, and we demonstrate experimentally that our approach outperforms techniques based on linear scalarization, mixtures of optimal linear scalarizations, or stationary action selection for the Nash Social Welfare Objective.

With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.

北京阿比特科技有限公司