亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Two-player (antagonistic) games on (possibly stochastic) graphs are a prevalent model in theoretical computer science, notably as a framework for reactive synthesis. Optimal strategies may require randomisation when dealing with inherently probabilistic goals, balancing multiple objectives, or in contexts of partial information. There is no unique way to define randomised strategies. For instance, one can use so-called mixed strategies or behavioural ones. In the most general settings, these two classes do not share the same expressiveness. A seminal result in game theory - Kuhn's theorem - asserts their equivalence in games of perfect recall. This result crucially relies on the possibility for strategies to use infinite memory, i.e., unlimited knowledge of all the past of a play. However, computer systems are finite in practice. Hence it is pertinent to restrict our attention to finite-memory strategies, defined as automata with outputs. Randomisation can be implemented in these in different ways: the initialisation, outputs or transitions can be randomised or deterministic respectively. Depending on which aspects are randomised, the expressiveness of the corresponding class of finite-memory strategies differs. In this work, we study two-player turn-based stochastic games and provide a complete taxonomy of the classes of finite-memory strategies obtained by varying which of the three aforementioned components are randomised. Our taxonomy holds both in settings of perfect and imperfect information.

相關內容

分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)學是(shi)分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)的實(shi)踐和科學。Wikipedia類(lei)(lei)(lei)(lei)(lei)別(bie)(bie)說明了(le)一(yi)(yi)種(zhong)分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)法,可(ke)以(yi)(yi)通過(guo)自動方式提取Wikipedia類(lei)(lei)(lei)(lei)(lei)別(bie)(bie)的完整分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)法。截至2009年,已經證明,可(ke)以(yi)(yi)使用(yong)(yong)人工(gong)構(gou)建的分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)法(例如像WordNet這樣的計(ji)算詞典(dian)的分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)法)來改進(jin)(jin)和重組(zu)Wikipedia類(lei)(lei)(lei)(lei)(lei)別(bie)(bie)分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)法。 從廣義上(shang)(shang)講,分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)法還(huan)適用(yong)(yong)于(yu)除父(fu)子層(ceng)次結(jie)構(gou)以(yi)(yi)外的關系方案(an),例如網絡結(jie)構(gou)。然(ran)后分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)法可(ke)能包括有多父(fu)母的單身孩子,例如,“汽車(che)(che)”可(ke)能與父(fu)母雙方一(yi)(yi)起出(chu)現“車(che)(che)輛”和“鋼結(jie)構(gou)”;但(dan)是(shi)對(dui)某(mou)些人而言,這僅意味著“汽車(che)(che)”是(shi)幾種(zhong)不同分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)法的一(yi)(yi)部分(fen)(fen)(fen)(fen)。分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)法也可(ke)能只是(shi)將事物組(zu)織(zhi)成組(zu),或者是(shi)按字母順(shun)序排列(lie)(lie)的列(lie)(lie)表;但(dan)是(shi)在這里,術語詞匯更合適。在知(zhi)識(shi)管理(li)中(zhong)的當前用(yong)(yong)法中(zhong),分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)法被(bei)認為比本(ben)體論(lun)窄,因為本(ben)體論(lun)應用(yong)(yong)了(le)各種(zhong)各樣的關系類(lei)(lei)(lei)(lei)(lei)型。 在數(shu)學上(shang)(shang),分(fen)(fen)(fen)(fen)層(ceng)分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)法是(shi)給定對(dui)象集(ji)的分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)樹結(jie)構(gou)。該(gai)結(jie)構(gou)的頂部是(shi)適用(yong)(yong)于(yu)所(suo)有對(dui)象的單個分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei),即(ji)根(gen)節(jie)點。此根(gen)下的節(jie)點是(shi)更具體的分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei),適用(yong)(yong)于(yu)總分(fen)(fen)(fen)(fen)類(lei)(lei)(lei)(lei)(lei)對(dui)象集(ji)的子集(ji)。推理(li)的進(jin)(jin)展從一(yi)(yi)般(ban)到更具體。

知識薈萃

精品(pin)入門和進階教程、論文(wen)和代碼整理等

更多

查看相關VIP內容(rong)、論文(wen)、資訊(xun)等

Physical neural networks are promising candidates for next generation artificial intelligence hardware. In such architectures, neurons and connections are physically realized and do not leverage digital, i.e. practically infinite signal-to-noise ratio digital concepts. They therefore are prone to noise, and base don analytical derivations we here introduce connectivity topologies, ghost neurons as well as pooling as noise mitigation strategies. Finally, we demonstrate the effectiveness of the combined methods based on a fully trained neural network classifying the MNIST handwritten digits.

Applications of Reinforcement Learning (RL), in which agents learn to make a sequence of decisions despite lacking complete information about the latent states of the controlled system, that is, they act under partial observability of the states, are ubiquitous. Partially observable RL can be notoriously difficult -- well-known information-theoretic results show that learning partially observable Markov decision processes (POMDPs) requires an exponential number of samples in the worst case. Yet, this does not rule out the existence of large subclasses of POMDPs over which learning is tractable. In this paper we identify such a subclass, which we call weakly revealing POMDPs. This family rules out the pathological instances of POMDPs where observations are uninformative to a degree that makes learning hard. We prove that for weakly revealing POMDPs, a simple algorithm combining optimism and Maximum Likelihood Estimation (MLE) is sufficient to guarantee polynomial sample complexity. To the best of our knowledge, this is the first provably sample-efficient result for learning from interactions in overcomplete POMDPs, where the number of latent states can be larger than the number of observations.

We consider statistical models arising from the common set of solutions to a sparse polynomial system with general coefficients. The maximum likelihood degree counts the number of critical points of the likelihood function restricted to the model. We prove the maximum likelihood degree of a sparse polynomial system is determined by its Newton polytopes and equals the mixed volume of a related Lagrange system of equations.

Bayesian policy reuse (BPR) is a general policy transfer framework for selecting a source policy from an offline library by inferring the task belief based on some observation signals and a trained observation model. In this paper, we propose an improved BPR method to achieve more efficient policy transfer in deep reinforcement learning (DRL). First, most BPR algorithms use the episodic return as the observation signal that contains limited information and cannot be obtained until the end of an episode. Instead, we employ the state transition sample, which is informative and instantaneous, as the observation signal for faster and more accurate task inference. Second, BPR algorithms usually require numerous samples to estimate the probability distribution of the tabular-based observation model, which may be expensive and even infeasible to learn and maintain, especially when using the state transition sample as the signal. Hence, we propose a scalable observation model based on fitting state transition functions of source tasks from only a small number of samples, which can generalize to any signals observed in the target task. Moreover, we extend the offline-mode BPR to the continual learning setting by expanding the scalable observation model in a plug-and-play fashion, which can avoid negative transfer when faced with new unknown tasks. Experimental results show that our method can consistently facilitate faster and more efficient policy transfer.

We study the problem of testing whether a function $f: \mathbb{R}^n \to \mathbb{R}$ is a polynomial of degree at most $d$ in the \emph{distribution-free} testing model. Here, the distance between functions is measured with respect to an unknown distribution $\mathcal{D}$ over $\mathbb{R}^n$ from which we can draw samples. In contrast to previous work, we do not assume that $\mathcal{D}$ has finite support. We design a tester that given query access to $f$, and sample access to $\mathcal{D}$, makes $(d/\varepsilon)^{O(1)}$ many queries to $f$, accepts with probability $1$ if $f$ is a polynomial of degree $d$, and rejects with probability at least $2/3$ if every degree-$d$ polynomial $P$ disagrees with $f$ on a set of mass at least $\varepsilon$ with respect to $\mathcal{D}$. Our result also holds under mild assumptions when we receive only a polynomial number of bits of precision for each query to $f$, or when $f$ can only be queried on rational points representable using a logarithmic number of bits. Along the way, we prove a new stability theorem for multivariate polynomials that may be of independent interest.

The Mixture-of-Experts (MoE) technique can scale up the model size of Transformers with an affordable computational overhead. We point out that existing learning-to-route MoE methods suffer from the routing fluctuation issue, i.e., the target expert of the same input may change along with training, but only one expert will be activated for the input during inference. The routing fluctuation tends to harm sample efficiency because the same input updates different experts but only one is finally used. In this paper, we propose StableMoE with two training stages to address the routing fluctuation problem. In the first training stage, we learn a balanced and cohesive routing strategy and distill it into a lightweight router decoupled from the backbone model. In the second training stage, we utilize the distilled router to determine the token-to-expert assignment and freeze it for a stable routing strategy. We validate our method on language modeling and multilingual machine translation. The results show that StableMoE outperforms existing MoE methods in terms of both convergence speed and performance.

The success of large-scale models in recent years has increased the importance of statistical models with numerous parameters. Several studies have analyzed over-parameterized linear models with high-dimensional data that may not be sparse; however, existing results depend on the independent setting of samples. In this study, we analyze a linear regression model with dependent time series data under over-parameterization settings. We consider an estimator via interpolation and developed a theory for excess risk of the estimator under multiple dependence types. This theory can treat infinite-dimensional data without sparsity and handle long-memory processes in a unified manner. Moreover, we bound the risk in our theory via the integrated covariance and nondegeneracy of autocorrelation matrices. The results show that the convergence rate of risks with short-memory processes is identical to that of cases with independent data, while long-memory processes slow the convergence rate. We also present several examples of specific dependent processes that can be applied to our setting.

The minimum energy path (MEP) describes the mechanism of reaction, and the energy barrier along the path can be used to calculate the reaction rate in thermal systems. The nudged elastic band (NEB) method is one of the most commonly used schemes to compute MEPs numerically. It approximates an MEP by a discrete set of configuration images, where the discretization size determines both computational cost and accuracy of the simulations. In this paper, we consider a discrete MEP to be a stationary state of the NEB method and prove an optimal convergence rate of the discrete MEP with respect to the number of images. Numerical simulations for the transitions of some several proto-typical model systems are performed to support the theory.

There are many important high dimensional function classes that have fast agnostic learning algorithms when strong assumptions on the distribution of examples can be made, such as Gaussianity or uniformity over the domain. But how can one be sufficiently confident that the data indeed satisfies the distributional assumption, so that one can trust in the output quality of the agnostic learning algorithm? We propose a model by which to systematically study the design of tester-learner pairs $(\mathcal{A},\mathcal{T})$, such that if the distribution on examples in the data passes the tester $\mathcal{T}$ then one can safely trust the output of the agnostic learner $\mathcal{A}$ on the data. To demonstrate the power of the model, we apply it to the classical problem of agnostically learning halfspaces under the standard Gaussian distribution and present a tester-learner pair with a combined run-time of $n^{\tilde{O}(1/\epsilon^4)}$. This qualitatively matches that of the best known ordinary agnostic learning algorithms for this task. In contrast, finite sample Gaussian distribution testers do not exist for the $L_1$ and EMD distance measures. A key step in the analysis is a novel characterization of concentration and anti-concentration properties of a distribution whose low-degree moments approximately match those of a Gaussian. We also use tools from polynomial approximation theory. In contrast, we show strong lower bounds on the combined run-times of tester-learner pairs for the problems of agnostically learning convex sets under the Gaussian distribution and for monotone Boolean functions under the uniform distribution over $\{0,1\}^n$. Through these lower bounds we exhibit natural problems where there is a dramatic gap between standard agnostic learning run-time and the run-time of the best tester-learner pair.

We recall some of the history of the information-theoretic approach to deriving core results in probability theory and indicate parts of the recent resurgence of interest in this area with current progress along several interesting directions. Then we give a new information-theoretic proof of a finite version of de Finetti's classical representation theorem for finite-valued random variables. We derive an upper bound on the relative entropy between the distribution of the first $k$ in a sequence of $n$ exchangeable random variables, and an appropriate mixture over product distributions. The mixing measure is characterised as the law of the empirical measure of the original sequence, and de Finetti's result is recovered as a corollary. The proof is nicely motivated by the Gibbs conditioning principle in connection with statistical mechanics, and it follows along an appealing sequence of steps. The technical estimates required for these steps are obtained via the use of a collection of combinatorial tools known within information theory as `the method of types.'

北京阿比特科技有限公司