亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

A dynamic mean field theory is developed for model based Bayesian reinforcement learning in the large state space limit. In an analogy with the statistical physics of disordered systems, the transition probabilities are interpreted as couplings, and value functions as deterministic spins, and thus the sampled transition probabilities are considered to be quenched random variables. The results reveal that, under standard assumptions, the posterior over Q-values is asymptotically independent and Gaussian across state-action pairs, for infinite horizon problems. The finite horizon case exhibits the same behaviour for all state-actions pairs at each time but has an additional correlation across time, for each state-action pair. The results also hold for policy evaluation. The Gaussian statistics can be computed from a set of coupled mean field equations derived from the Bellman equation, which we call dynamic mean field programming (DMFP). For Q-value iteration, approximate equations are obtained by appealing to extreme value theory, and closed form expressions are found in the independent and identically distributed case. The Lyapunov stability of these closed form equations is studied.

相關內容

This paper revisits soundness and completeness of proof systems for proving that sets of states in infinite-state labeled transition systems satisfy formulas in the modal mu-calculus. Our results rely on novel results in lattice theory, which give constructive characterizations of both greatest and least fixpoints of monotonic functions over complete lattices. We show how these results may be used to reconstruct the sound and complete tableau method for this problem due to Bradfield and Stirling. We also show how the flexibility of our lattice-theoretic basis simplifies reasoning about tableau-based proof strategies for alternative classes of systems. In particular, we extend the modal mu-calculus with timed modalities, and prove that the resulting tableaux method is sound and complete for timed transition systems.

The scope of this paper is the analysis and approximation of an optimal control problem related to the Allen-Cahn equation. A tracking functional is minimized subject to the Allen-Cahn equation using distributed controls that satisfy point-wise control constraints. First and second order necessary and sufficient conditions are proved. The lowest order discontinuous Galerkin - in time - scheme is considered for the approximation of the control to state and adjoint state mappings. Under a suitable restriction on maximum size of the temporal and spatial discretization parameters $k$, $h$ respectively in terms of the parameter $\epsilon$ that describes the thickness of the interface layer, a-priori estimates are proved with constants depending polynomially upon $1/ \epsilon$. Unlike to previous works for the uncontrolled Allen-Cahn problem our approach does not rely on a construction of an approximation of the spectral estimate, and as a consequence our estimates are valid under low regularity assumptions imposed by the optimal control setting. These estimates are also valid in cases where the solution and its discrete approximation do not satisfy uniform space-time bounds independent of $\epsilon$. These estimates and a suitable localization technique, via the second order condition (see \cite{Arada-Casas-Troltzsch_2002,Casas-Mateos-Troltzsch_2005,Casas-Raymond_2006,Casas-Mateos-Raymond_2007}), allows to prove error estimates for the difference between local optimal controls and their discrete approximation as well as between the associated state and adjoint state variables and their discrete approximations

The purpose of this article is to study the convergence of a low order finite element approximation for a natural convection problem. We prove that the discretization based on P1 polynomials for every variable (velocity, pressure and temperature) is well-posed if used with a penalty term in the divergence equation, to compensate the loss of an inf-sup condition. With mild assumptions on the pressure regularity, we recover convergence for the Navier-Stokes-Boussinesq system, provided the penalty term is chosen in accordance with the mesh size. We express conditions to obtain optimal order of convergence. We illustrate theoretical convergence results with extensive examples. The computational cost that can be saved by this approach is also assessed.

We present a novel approach for teaching logic and the metatheory of logic to students who have some experience with functional programming. We define concepts in logic as a series of functional programs in the language of the proof assistant Isabelle/HOL. This allows us to make notions which are often unclear in textbooks precise, to experiment with definitions by executing them, and to prove metatheoretical theorems in full detail. We have surveyed student perceptions of our teaching approach to determine its usefulness and found that students felt that our formalizations helped them understand concepts in logic, and that they experimented with them as a learning tool. However, the approach was not enough to make students feel confident in their abilities to design and implement their own formal systems. Further studies are needed to confirm and generalize the results of our survey, but our initial results seem promising.

We survey the field of algorithms and complexity for graph problems parameterized above or below guaranteed values, a research area which was pioneered by Venkatesh Raman. Those problems seek, for a given graph $G$, a solution whose value is at least $g(G)+k$ or at most $g(G)-k$, where $g(G)$ is a guarantee on the value that any solution on $G$ takes. The goal is to design algorithms which find such solution in time whose complexity in k is decoupled from that in the guarantee, or to rule out the existence of such algorithms by means of intractability results. We discuss a large number of algorithms and intractability results, and complement them by several open problems.

The random batch method provides an efficient algorithm for computing statistical properties of a canonical ensemble of interacting particles. In this work, we study the error estimates of the fully discrete random batch method, especially in terms of approximating the invariant distribution. Using a triangle inequality framework, we show that the long-time error of the method is $O(\sqrt{\tau} + e^{-\gamma t})$, where $\tau$ is the time step and $\gamma$ is the convergence rate which does not depend on the time step $\tau$ or the number of particles $N$. Our results also apply to the McKean-Vlasov process, which is the mean-field limit of the interacting particle system as the number of particles $N\rightarrow\infty$.

We study \textit{rescaled gradient dynamical systems} in a Hilbert space $\mathcal{H}$, where implicit discretization in a finite-dimensional Euclidean space leads to high-order methods for solving monotone equations (MEs). Our framework can be interpreted as a natural generalization of celebrated dual extrapolation method~\citep{Nesterov-2007-Dual} from first order to high order via appeal to the regularization toolbox of optimization theory~\citep{Nesterov-2021-Implementable, Nesterov-2021-Inexact}. More specifically, we establish the existence and uniqueness of a global solution and analyze the convergence properties of solution trajectories. We also present discrete-time counterparts of our high-order continuous-time methods, and we show that the $p^{th}$-order method achieves an ergodic rate of $O(k^{-(p+1)/2})$ in terms of a restricted merit function and a pointwise rate of $O(k^{-p/2})$ in terms of a residue function. Under regularity conditions, the restarted version of $p^{th}$-order methods achieves local convergence with the order $p \geq 2$. Notably, our methods are \textit{optimal} since they have matched the lower bound established for solving the monotone equation problems under a standard linear span assumption~\citep{Lin-2022-Perseus}.

LU and Cholesky matrix factorization algorithms are core subroutines used to solve systems of linear equations (SLEs) encountered while solving an optimization problem. Standard factorization algorithms are highly efficient but remain susceptible to the accumulation of roundoff errors, which can lead solvers to return feasibility and optimality claims that are actually invalid. This paper introduces a novel approach for solving sequences of closely related SLEs encountered in nonlinear programming efficiently and without roundoff errors. Specifically, it introduces rank-one update algorithms for the roundoff-error-free (REF) factorization framework, a toolset built on integer-preserving arithmetic that has led to the development and implementation of fail-proof SLE solution subroutines for linear programming. The formal guarantees of the proposed algorithms are established through the derivation of theoretical insights. Their advantages are supported with computational experiments, which demonstrate upwards of 75x-improvements over exact factorization run-times on fully dense matrices with over one million entries. A significant advantage of the methodology is that the length of any coefficient calculated via the proposed algorithms is bounded polynomially in the size of the inputs without having to resort to greatest common divisor operations, which are required by and thereby hinder an efficient implementation of exact rational arithmetic approaches.

Subset-Sum is an NP-complete problem where one must decide if a multiset of $n$ integers contains a subset whose elements sum to a target value $m$. The best-known classical and quantum algorithms run in time $\tilde{O}(2^{n/2})$ and $\tilde{O}(2^{n/3})$, respectively, based on the well-known meet-in-the-middle technique. Here we introduce a novel classical dynamic-programming-based data structure with applications to Subset-Sum and a number of variants, including Equal-Sums (where one seeks two disjoint subsets with the same sum), 2-Subset-Sum (a relaxed version of Subset-Sum where each item in the input set can be used twice in the summation), and Shifted-Sums, a generalization of both of these variants, where one seeks two disjoint subsets whose sums differ by some specified value. Given any modulus $p$, our data structure can be constructed in time $O(n^2p)$, after which queries can be made in time $O(n^2)$ to the lists of subsets summing to any value modulo $p$. We use this data structure in combination with variable-time amplitude amplification and a new quantum pair finding algorithm, extending the quantum claw finding algorithm to the multiple solutions case, to give an $O(2^{0.504n})$ quantum algorithm for Shifted-Sums, an improvement on the best-known $O(2^{0.773n})$ classical running time. Incidentally, we obtain new $\tilde{O}(2^{n/2})$ and $\tilde{O}(2^{n/3})$ classical and quantum algorithms for Subset-Sum, not based on the seminal meet-in-the-middle method. We also study Pigeonhole Equal-Sums and Pigeonhole Modular Equal-Sums, where the existence of a solution is guaranteed by the pigeonhole principle. For the former problem, we give faster classical and quantum algorithms with running time $\tilde{O}(2^{n/2})$ and $\tilde{O}(2^{2n/5})$, respectively. For the more general modular problem, we give a classical algorithm that also runs in time $\tilde{O}(2^{n/2})$.

In this paper, we study a sequential decision making problem faced by e-commerce carriers related to when to send out a vehicle from the central depot to serve customer requests, and in which order to provide the service, under the assumption that the time at which parcels arrive at the depot is stochastic and dynamic. The objective is to maximize the number of parcels that can be delivered during the service hours. We propose two reinforcement learning approaches for solving this problem, one based on a policy function approximation (PFA) and the second on a value function approximation (VFA). Both methods are combined with a look-ahead strategy, in which future release dates are sampled in a Monte-Carlo fashion and a tailored batch approach is used to approximate the value of future states. Our PFA and VFA make a good use of branch-and-cut-based exact methods to improve the quality of decisions. We also establish sufficient conditions for partial characterization of optimal policy and integrate them into PFA/VFA. In an empirical study based on 720 benchmark instances, we conduct a competitive analysis using upper bounds with perfect information and we show that PFA and VFA greatly outperform two alternative myopic approaches. Overall, PFA provides best solutions, while VFA (which benefits from a two-stage stochastic optimization model) achieves a better tradeoff between solution quality and computing time.

北京阿比特科技有限公司