亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We consider online reinforcement learning (RL) in episodic Markov decision processes (MDPs) under the linear $q^\pi$-realizability assumption, where it is assumed that the action-values of all policies can be expressed as linear functions of state-action features. This class is known to be more general than linear MDPs, where the transition kernel and the reward function are assumed to be linear functions of the feature vectors. As our first contribution, we show that the difference between the two classes is the presence of states in linearly $q^\pi$-realizable MDPs where for any policy, all the actions have approximately equal values, and skipping over these states by following an arbitrarily fixed policy in those states transforms the problem to a linear MDP. Based on this observation, we derive a novel (computationally inefficient) learning algorithm for linearly $q^\pi$-realizable MDPs that simultaneously learns what states should be skipped over and runs another learning algorithm on the linear MDP hidden in the problem. The method returns an $\epsilon$-optimal policy after $\text{polylog}(H, d)/\epsilon^2$ interactions with the MDP, where $H$ is the time horizon and $d$ is the dimension of the feature vectors, giving the first polynomial-sample-complexity online RL algorithm for this setting. The results are proved for the misspecified case, where the sample complexity is shown to degrade gracefully with the misspecification error.

相關內容

We introduce off-policy distributional Q($\lambda$), a new addition to the family of off-policy distributional evaluation algorithms. Off-policy distributional Q($\lambda$) does not apply importance sampling for off-policy learning, which introduces intriguing interactions with signed measures. Such unique properties distributional Q($\lambda$) from other existing alternatives such as distributional Retrace. We characterize the algorithmic properties of distributional Q($\lambda$) and validate theoretical insights with tabular experiments. We show how distributional Q($\lambda$)-C51, a combination of Q($\lambda$) with the C51 agent, exhibits promising results on deep RL benchmarks.

We study the sample complexity of reinforcement learning (RL) in Mean-Field Games (MFGs) with model-based function approximation that requires strategic exploration to find a Nash Equilibrium policy. We introduce the Partial Model-Based Eluder Dimension (P-MBED), a more effective notion to characterize the model class complexity. Notably, P-MBED measures the complexity of the single-agent model class converted from the given mean-field model class, and potentially, can be exponentially lower than the MBED proposed by \citet{huang2023statistical}. We contribute a model elimination algorithm featuring a novel exploration strategy and establish sample complexity results polynomial w.r.t.~P-MBED. Crucially, our results reveal that, under the basic realizability and Lipschitz continuity assumptions, \emph{learning Nash Equilibrium in MFGs is no more statistically challenging than solving a logarithmic number of single-agent RL problems}. We further extend our results to Multi-Type MFGs, generalizing from conventional MFGs and involving multiple types of agents. This extension implies statistical tractability of a broader class of Markov Games through the efficacy of mean-field approximation. Finally, inspired by our theoretical algorithm, we present a heuristic approach with improved computational efficiency and empirically demonstrate its effectiveness.

An essential requirement of spanners in many applications is to be fault-tolerant: a $(1+\epsilon)$-spanner of a metric space is called (vertex) $f$-fault-tolerant ($f$-FT) if it remains a $(1+\epsilon)$-spanner (for the non-faulty points) when up to $f$ faulty points are removed from the spanner. Fault-tolerant (FT) spanners for Euclidean and doubling metrics have been extensively studied since the 90s. For low-dimensional Euclidean metrics, Czumaj and Zhao in SoCG'03 [CZ03] showed that the optimal guarantees $O(f n)$, $O(f)$ and $O(f^2)$ on the size, degree and lightness of $f$-FT spanners can be achieved via a greedy algorithm, which na\"{\i}vely runs in $O(n^3) \cdot 2^{O(f)}$ time. The question of whether the optimal bounds of [CZ03] can be achieved via a fast construction has remained elusive, with the lightness parameter being the bottleneck. Moreover, in the wider family of doubling metrics, it is not even clear whether there exists an $f$-FT spanner with lightness that depends solely on $f$ (even exponentially): all existing constructions have lightness $\Omega(\log n)$ since they are built on the net-tree spanner, which is induced by a hierarchical net-tree of lightness $\Omega(\log n)$. In this paper we settle in the affirmative these longstanding open questions. Specifically, we design a construction of $f$-FT spanners that is optimal with respect to all the involved parameters (size, degree, lightness and running time): For any $n$-point doubling metric, any $\epsilon > 0$, and any integer $1 \le f \le n-2$, our construction provides, within time $O(n \log n + f n)$, an $f$-FT $(1+\epsilon)$-spanner with size $O(f n)$, degree $O(f)$ and lightness $O(f^2)$.

In this paper, we utilize hyperspheres and regular $n$-simplexes and propose an approach to learning deep features equivariant under the transformations of $n$D reflections and rotations, encompassed by the powerful group of O$(n)$. Namely, we propose O$(n)$-equivariant neurons with spherical decision surfaces that generalize to any dimension $n$, which we call Deep Equivariant Hyperspheres. We demonstrate how to combine them in a network that directly operates on the basis of the input points and propose an invariant operator based on the relation between two points and a sphere, which as we show, turns out to be a Gram matrix. Using synthetic and real-world data in $n$D, we experimentally verify our theoretical contributions and find that our approach is superior to the competing methods for O$(n)$-equivariant benchmark datasets (classification and regression), demonstrating a favorable speed/performance trade-off.

A tree search algorithm called successive cancellation ordered search (SCOS) is proposed for $\boldsymbol{G}_N$-coset codes that implements maximum-likelihood (ML) decoding with adaptive complexity for transmission over binary-input AWGN channels. Unlike bit-flip decoders, no outer code is needed to terminate decoding; therefore, SCOS also applies to $\boldsymbol{G}_N$-coset codes modified with dynamic frozen bits. The average complexity is close to that of successive cancellation (SC) decoding at practical frame error rates (FERs) for codes with wide ranges of rate and lengths up to $512$ bits, which perform within $0.25$ dB or less from the random coding union bound and outperform Reed--Muller codes under ML decoding by up to $0.5$ dB. Simulations illustrate simultaneous gains for SCOS over SC-Fano, SC stack (SCS) and SC list (SCL) decoding in FER and the average complexity at various SNR regimes. SCOS is further extended by forcing it to look for candidates satisfying a threshold, thereby outperforming basic SCOS under complexity constraints. The modified SCOS enables strong error-detection capability without the need for an outer code. In particular, the $(128, 64)$ polarization-adjusted convolutional code under modified SCOS provides gains in overall and undetected FER compared to CRC-aided polar codes under SCL/dynamic SC flip decoding at high SNR.

Erd\H{o}s and West (Discrete Mathematics'85) considered the class of $n$ vertex intersection graphs which have a {\em $d$-dimensional} {\em $t$-representation}, that is, each vertex of a graph in the class has an associated set consisting of at most $t$ $d$-dimensional axis-parallel boxes. In particular, for a graph $G$ and for each $d \geq 1$, they consider $i_d(G)$ to be the minimum $t$ for which $G$ has such a representation. For fixed $t$ and $d$, they consider the class of $n$ vertex labeled graphs for which $i_d(G) \leq t$, and prove an upper bound of $(2nt+\frac{1}{2})d \log n - (n - \frac{1}{2})d \log(4\pi t)$ on the logarithm of size of the class. In this work, for fixed $t$ and $d$ we consider the class of $n$ vertex unlabeled graphs which have a {\em $d$-dimensional $t$-representation}, denoted by $\mathcal{G}_{t,d}$. We address the problem of designing a succinct data structure for the class $\mathcal{G}_{t,d}$ in an attempt to generalize the relatively recent results on succinct data structures for interval graphs (Algorithmica'21). To this end, for each $n$ such that $td^2$ is in $o(n / \log n)$, we first prove a lower bound of $(2dt-1)n \log n - O(ndt \log \log n)$-bits on the size of any data structure for encoding an arbitrary graph that belongs to $\mathcal{G}_{t,d}$. We then present a $((2dt-1)n \log n + dt\log t + o(ndt \log n))$-bit data structure for $\mathcal{G}_{t,d}$ that supports navigational queries efficiently. Contrasting this data structure with our lower bound argument, we show that for each fixed $t$ and $d$, and for all $n \geq 0$ when $td^2$ is in $o(n/\log n)$ our data structure for $\mathcal{G}_{t,d}$ is succinct. As a byproduct, we also obtain succinct data structures for graphs of bounded boxicity (denoted by $d$ and $t = 1$) and graphs of bounded interval number (denoted by $t$ and $d=1$) when $td^2$ is in $o(n/\log n)$.

The Maximum s-Bundle Problem (MBP) addresses the task of identifying a maximum s-bundle in a given graph. A graph G=(V, E) is called an s-bundle if its vertex connectivity is at least |V|-s, where the vertex connectivity equals the minimum number of vertices whose deletion yields a disconnected or trivial graph. MBP is NP-hard and holds relevance in numerous realworld scenarios emphasizing the vertex connectivity. Exact algorithms for MBP mainly follow the branch-and-bound (BnB) framework, whose performance heavily depends on the quality of the upper bound on the cardinality of a maximum s-bundle and the initial lower bound with graph reduction. In this work, we introduce a novel Partition-based Upper Bound (PUB) that leverages the graph partitioning technique to achieve a tighter upper bound compared to existing ones. To increase the lower bound, we propose to do short random walks on a clique to generate larger initial solutions. Then, we propose a new BnB algorithm that uses the initial lower bound and PUB in preprocessing for graph reduction, and uses PUB in the BnB search process for branch pruning. Extensive experiments with diverse s values demonstrate the significant progress of our algorithm over state-of-the-art BnB MBP algorithms. Moreover, our initial lower bound can also be generalized to other relaxation clique problems.

This paper presents the first sub-10$\mu$W, sub-0.1% total harmonic distortion (THD) sinusoidal current generator (CG) integrated circuit (IC) that is capable of 20kHz output for the bio-impedance (Bio-Z) sensing applications. To benefit from the ultra-low-power nature of near-threshold operation, a 9b pseudo-sine lookup table (LUT) is 3b $\Delta\Sigma$ modulated in the digital domain, thus linearity burden of the digital-to-analog converter (DAC) is avoided and only a 1.29$\mu$W of logic power is consumed, from a 0.5V supply and a 2.56MHz clock frequency. A half-period (HP) reset is introduced in the capacitive DAC, leading to around 30dB reduction of in-band noise by avoiding the sampling of data-dependent glitches and attenuating the kT/C noise and the non-idealities of reset switches (SW).

For the numerical solution of Dirichlet-type boundary value problems associated to nonlinear fractional differential equations of order $\alpha \in (1,2)$ that use Caputo derivatives, we suggest to employ shooting methods. In particular, we demonstrate that the so-called proportional secting technique for selecting the required initial values leads to numerical schemes that converge to high accuracy in a very small number of shooting iterations, and we provide an explanation of the analytical background for this favourable numerical behaviour.

Meta-reinforcement learning algorithms can enable robots to acquire new skills much more quickly, by leveraging prior experience to learn how to learn. However, much of the current research on meta-reinforcement learning focuses on task distributions that are very narrow. For example, a commonly used meta-reinforcement learning benchmark uses different running velocities for a simulated robot as different tasks. When policies are meta-trained on such narrow task distributions, they cannot possibly generalize to more quickly acquire entirely new tasks. Therefore, if the aim of these methods is to enable faster acquisition of entirely new behaviors, we must evaluate them on task distributions that are sufficiently broad to enable generalization to new behaviors. In this paper, we propose an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks. Our aim is to make it possible to develop algorithms that generalize to accelerate the acquisition of entirely new, held-out tasks. We evaluate 6 state-of-the-art meta-reinforcement learning and multi-task learning algorithms on these tasks. Surprisingly, while each task and its variations (e.g., with different object positions) can be learned with reasonable success, these algorithms struggle to learn with multiple tasks at the same time, even with as few as ten distinct training tasks. Our analysis and open-source environments pave the way for future research in multi-task learning and meta-learning that can enable meaningful generalization, thereby unlocking the full potential of these methods.

北京阿比特科技有限公司