亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood weighting have shown practical effectiveness, a general theory characterizing the approximation error of the particle filtering techniques that these algorithms use has not previously been proposed. Our main contribution is bounding the error between any POMDP and its corresponding finite sample particle belief MDP (PB-MDP) approximation. This fundamental bridge between PB-MDPs and POMDPs allows us to adapt any sampling-based MDP algorithm to a POMDP by solving the corresponding particle belief MDP, thereby extending the convergence guarantees of the MDP algorithm to the POMDP. Practically, this is implemented by using the particle filter belief transition model as the generative model for the MDP solver. While this requires access to the observation density model from the POMDP, it only increases the transition sampling complexity of the MDP solver by a factor of $\mathcal{O}(C)$, where $C$ is the number of particles. Thus, when combined with sparse sampling MDP algorithms, this approach can yield algorithms for POMDPs that have no direct theoretical dependence on the size of the state and observation spaces. In addition to our theoretical contribution, we perform five numerical experiments on benchmark POMDPs to demonstrate that a simple MDP algorithm adapted using PB-MDP approximation, Sparse-PFT, achieves performance competitive with other leading continuous observation POMDP solvers.

相關內容

Undoing computations of a concurrent system is beneficial in many situations, e.g., in reversible debugging of multi-threaded programs and in recovery from errors due to optimistic execution in parallel discrete event simulation. A number of approaches have been proposed for how to reverse formal models of concurrent computation including process calculi such as CCS, languages like Erlang, prime event structures and occurrence nets. However it has not been settled what properties a reversible system should enjoy, nor how the various properties that have been suggested, such as the parabolic lemma and the causal-consistency property, are related. We contribute to a solution to these issues by using a generic labelled transition system equipped with a relation capturing whether transitions are independent to explore the implications between these properties. In particular, we show how they are derivable from a set of axioms. Our intention is that when establishing properties of some formalism it will be easier to verify the axioms rather than proving properties such as the parabolic lemma directly. We also introduce two new notions related to causal consistent reversibility, namely causal liveness and causal safety, stating, respectively, that an action can be undone if and only if it is independent from all the following ones. We show that both causal liveness and causal safety are derivable from our axioms.

We consider fair division of a set of indivisible goods among $n$ agents with additive valuations using the desirable fairness notion of maximin share (MMS). MMS is the most popular share-based notion, in which an agent finds an allocation fair to her if she receives goods worth at least her MMS value. An allocation is called MMS if all agents receive their MMS values. However, since MMS allocations do not always exist, the focus shifted to investigating its ordinal and multiplicative approximations. In the ordinal approximation, the goal is to show the existence of $1$-out-of-$d$ MMS allocations (for the smallest possible $d>n$). A series of works led to the state-of-the-art factor of $d=\lfloor 3n/2 \rfloor$ [HSSH21]. We show that $1$-out-of-$\lceil 4n/3\rceil$ MMS allocations always exist. In the multiplicative approximation, the goal is to show the existence of $\alpha$-MMS allocations (for the largest possible $\alpha < 1$) which guarantees each agent at least $\alpha$ times her MMS value. A series of works in the last decade led to the state-of-the-art factor of $\alpha = \frac{3}{4} + \frac{3}{3836}$ [AG23]. We introduce a general framework of $(\alpha, \beta, \gamma)$-MMS that guarantees $\alpha$ fraction of agents $\beta$ times their MMS values and the remaining $(1-\alpha)$ fraction of agents $\gamma$ times their MMS values. The $(\alpha, \beta, \gamma)$-MMS captures both ordinal and multiplicative approximations as its special cases. We show that $(2(1 -\beta)/\beta, \beta, 3/4)$-MMS allocations always exist. Furthermore, since we can choose the $2(1-\beta)/\beta$ fraction of agents arbitrarily in our algorithm, this implies (using $\beta=\sqrt{3}/2$) the existence of a randomized allocation that gives each agent at least 3/4 times her MMS value (ex-post) and at least $(17\sqrt{3} - 24)/4\sqrt{3} > 0.785$ times her MMS value in expectation (ex-ante).

The gyro-kinetic model is an approximation of the Vlasov-Maxwell system in a strongly magnetized magnetic field. We propose a new algorithm for solving it combining the Semi-Lagrangian (SL) method and the Arakawa (AKW) scheme with a time-integrator. Both methods are successfully used in practice for different kinds of applications, in our case, we combine them by first decomposing the problem into a fast (parallel) and a slow (perpendicular) dynamical system. The SL approach and the AKW scheme will be used to solve respectively the fast and the slow subsystems. Compared to the scheme in [1], where the entire model is solved using only the SL method, our goal is to replace the method used in the slow subsystem by the AKW scheme, in order to improve the conservation of the physical constants.

We study the query complexity of geodesically convex (g-convex) optimization on a manifold. To isolate the effect of that manifold's curvature, we primarily focus on hyperbolic spaces. In a variety of settings (smooth or not; strongly g-convex or not; high- or low-dimensional), known upper bounds worsen with curvature. It is natural to ask whether this is warranted, or an artifact. For many such settings, we propose a first set of lower bounds which indeed confirm that (negative) curvature is detrimental to complexity. To do so, we build on recent lower bounds (Hamilton and Moitra, 2021; Criscitiello and Boumal, 2022) for the particular case of smooth, strongly g-convex optimization. Using a number of techniques, we also secure lower bounds which capture dependence on condition number and optimality gap, which was not previously the case. We suspect these bounds are not optimal. We conjecture optimal ones, and support them with a matching lower bound for a class of algorithms which includes subgradient descent, and a lower bound for a related game. Lastly, to pinpoint the difficulty of proving lower bounds, we study how negative curvature influences (and sometimes obstructs) interpolation with g-convex functions.

We introduce a new class of numerical schemes which allow for low regularity approximations to the expectation $ \mathbb{E}(|u_{k}(\tau, v^{\eta})|^2)$, where $u_k$ denotes the $k$-th Fourier coefficient of the solution $u$ of the dispersive equation and $ v^{\eta}(x) $ the associated random initial data. This quantity plays an important role in physics, in particular in the study of wave turbulence where one needs to adopt a statistical approach in order to obtain deep insight into the generic long-time behaviour of solutions to dispersive equations. Our new class of schemes is based on Wick's theorem and Feynman diagrams together with a resonance based discretisation (see arXiv:2005.01649) set in a more general context: we introduce a novel combinatorial structure called paired decorated forests which are two decorated trees whose decorations on the leaves come in pair. The character of the scheme draws its inspiration from the treatment of singular stochastic partial differential equations via Regularity Structures. In contrast to classical approaches, we do not discretize the PDE itself, but rather its expectation. This allows us to heavily exploit the optimal resonance structure and underlying gain in regularity on the finite dimensional (discrete) level.

In this short note, we discuss the circumstances that can lead to a failure to observe the design order of discretization error convergence in accuracy verification when solving a time-dependent problem. In particular, we discuss the problem of failing to observe the design order of spatial accuracy with an extremely small time step. The same problem is encountered even if the time step is reduced with grid refinement. These can cause a serious problem because then one would wind up trying to find a coding error that does not exist. This short note clarifies the mechanism causing this failure and provides a guide for avoiding such pitfalls

The ability to envision future states is crucial to informed decision making while interacting with dynamic environments. With cameras providing a prevalent and information rich sensing modality, the problem of predicting future states from image sequences has garnered a lot of attention. Current state of the art methods typically train large parametric models for their predictions. Though often able to predict with accuracy, these models rely on the availability of large training datasets to converge to useful solutions. In this paper we focus on the problem of predicting future images of an image sequence from very little training data. To approach this problem, we use non-parametric models to take a probabilistic approach to image prediction. We generate probability distributions over sequentially predicted images and propagate uncertainty through time to generate a confidence metric for our predictions. Gaussian Processes are used for their data efficiency and ability to readily incorporate new training data online. We showcase our method by successfully predicting future frames of a smooth fluid simulation environment.

Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on \emph{determinantal point processes} (DPP). By incorporating the diversity metric into best-response dynamics, we develop \emph{diverse fictitious play} and \emph{diverse policy-space response oracle} for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the \emph{gamescape} -- convex polytopes spanned by agents' mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve much lower exploitability than state-of-the-art solvers by finding effective and diverse strategies.

The Q-learning algorithm is known to be affected by the maximization bias, i.e. the systematic overestimation of action values, an important issue that has recently received renewed attention. Double Q-learning has been proposed as an efficient algorithm to mitigate this bias. However, this comes at the price of an underestimation of action values, in addition to increased memory requirements and a slower convergence. In this paper, we introduce a new way to address the maximization bias in the form of a "self-correcting algorithm" for approximating the maximum of an expected value. Our method balances the overestimation of the single estimator used in conventional Q-learning and the underestimation of the double estimator used in Double Q-learning. Applying this strategy to Q-learning results in Self-correcting Q-learning. We show theoretically that this new algorithm enjoys the same convergence guarantees as Q-learning while being more accurate. Empirically, it performs better than Double Q-learning in domains with rewards of high variance, and it even attains faster convergence than Q-learning in domains with rewards of zero or low variance. These advantages transfer to a Deep Q Network implementation that we call Self-correcting DQN and which outperforms regular DQN and Double DQN on several tasks in the Atari 2600 domain.

Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.

北京阿比特科技有限公司