亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

For the misspecified linear Markov decision process (MLMDP) model of Jin et al. [2020], we propose an algorithm with three desirable properties. (P1) Its regret after $K$ episodes scales as $K \max \{ \varepsilon_{\text{mis}}, \varepsilon_{\text{tol}} \}$, where $\varepsilon_{\text{mis}}$ is the degree of misspecification and $\varepsilon_{\text{tol}}$ is a user-specified error tolerance. (P2) Its space and per-episode time complexities remain bounded as $K \rightarrow \infty$. (P3) It does not require $\varepsilon_{\text{mis}}$ as input. To our knowledge, this is the first algorithm satisfying all three properties. For concrete choices of $\varepsilon_{\text{tol}}$, we also improve existing regret bounds (up to log factors) while achieving either (P2) or (P3) (existing algorithms satisfy neither). At a high level, our algorithm generalizes (to MLMDPs) and refines the Sup-Lin-UCB algorithm, which Takemura et al. [2021] recently showed satisfies (P3) for contextual bandits. We also provide an intuitive interpretation of their result, which informs the design of our algorithm.

相關內容

We establish estimates on the error made by the Deep Ritz Method for elliptic problems on the space $H^1(\Omega)$ with different boundary conditions. For Dirichlet boundary conditions, we estimate the error when the boundary values are approximately enforced through the boundary penalty method. Our results apply to arbitrary and in general non linear classes $V\subseteq H^1(\Omega)$ of ansatz functions and estimate the error in dependence of the optimization accuracy, the approximation capabilities of the ansatz class and -- in the case of Dirichlet boundary values -- the penalisation strength $\lambda$. For non-essential boundary conditions the error of the Ritz method decays with the same rate as the approximation rate of the ansatz classes. For essential boundary conditions, given an approximation rate of $r$ in $H^1(\Omega)$ and an approximation rate of $s$ in $L^2(\partial\Omega)$ of the ansatz classes, the optimal decay rate of the estimated error is $\min(s/2, r)$ and achieved by choosing $\lambda_n\sim n^{s}$. We discuss the implications for ansatz classes which are given through ReLU networks and the relation to existing estimates for finite element functions.

In this paper, a novel uplink communication for the transmissive reconfigurable metasurface (RMS) multi-antenna system with orthogonal frequency division multiple access (OFDMA) is investigated. Specifically, a transmissive RMS-based receiver equipped with a single receiving antenna is first proposed, and a far-near field channel model based on planar waves and spherical waves is given. Then, in order to maximize the system sum-rate of uplink communications, we formulate a joint optimization problem over subcarrier allocation, power allocation and RMS transmissive coefficient design. Due to the coupling of optimization variables, the optimization problem is non-convex, so it is challenging to solve it directly. In order to tackle this problem, the alternating optimization (AO) algorithm is used to decouple the optimization variables and divide the problem into two sub-problems to solve. First, the problem of joint subcarrier allocation and power allocation is solved via the Lagrangian dual decomposition method. Then, the RMS transmissive coefficient design can be obtained by applying difference-of-convex (DC) programming, successive convex approximation (SCA) and penalty function methods. Finally, the two sub-problems are iterated alternately until convergence is achieved. Numerical simulation results verify that the proposed algorithm has good convergence performance and can improve system sum-rate compared with other benchmark algorithms.

This paper studies the online correlated selection (OCS) problem. It was introduced by Fahrbach, Huang, Tao, and Zadimoghaddam (2020) to obtain the first edge-weighted online bipartite matching algorithm that breaks the $0.5$ barrier. Suppose that we receive a pair of elements in each round and immediately select one of them. Can we select with negative correlation to be more effective than independent random selections? Our contributions are threefold. For semi-OCS, which considers the probability that an element remains unselected after appearing in $k$ rounds, we give an optimal algorithm that minimizes this probability for all $k$. It leads to $0.536$-competitive unweighted and vertex-weighted online bipartite matching algorithms that randomize over only two options in each round, improving the $0.508$-competitive ratio by Fahrbach et al. (2020). Further, we develop the first multi-way semi-OCS that allows an arbitrary number of elements with arbitrary masses in each round. As an application, it rounds the Balance algorithm in unweighted and vertex-weighted online bipartite matching and is $0.593$-competitive. Finally, we study OCS, which further considers the probability that an element is unselected in an arbitrary subset of rounds. We prove that the optimal "level of negative correlation" is between $0.167$ and $0.25$, improving the previous bounds of $0.109$ and $1$ by Fahrbach et al. (2020). Our OCS gives a $0.519$-competitive edge-weighted online bipartite matching algorithm, improving the previous $0.508$-competitive ratio by Fahrbach et al. (2020).

Bilevel optimization has been widely applied in many important machine learning applications such as hyperparameter optimization and meta-learning. Recently, several momentum-based algorithms have been proposed to solve bilevel optimization problems faster. However, those momentum-based algorithms do not achieve provably better computational complexity than $\mathcal{\widetilde O}(\epsilon^{-2})$ of the SGD-based algorithm. In this paper, we propose two new algorithms for bilevel optimization, where the first algorithm adopts momentum-based recursive iterations, and the second algorithm adopts recursive gradient estimations in nested loops to decrease the variance. We show that both algorithms achieve the complexity of $\mathcal{\widetilde O}(\epsilon^{-1.5})$, which outperforms all existing algorithms by the order of magnitude. Our experiments validate our theoretical results and demonstrate the superior empirical performance of our algorithms in hyperparameter applications.

Decision trees are widely-used classification and regression models because of their interpretability and good accuracy. Classical methods such as CART are based on greedy approaches but a growing attention has recently been devoted to optimal decision trees. We investigate the nonlinear continuous optimization formulation proposed in Blanquero et al. (EJOR, vol. 284, 2020; COR, vol. 132, 2021) for (sparse) optimal randomized classification trees. Sparsity is important not only for feature selection but also to improve interpretability. We first consider alternative methods to sparsify such trees based on concave approximations of the $l_{0}$ ``norm". Promising results are obtained on 24 datasets in comparison with $l_1$ and $l_{\infty}$ regularizations. Then, we derive bounds on the VC dimension of multivariate randomized classification trees. Finally, since training is computationally challenging for large datasets, we propose a general decomposition scheme and an efficient version of it. Experiments on larger datasets show that the proposed decomposition method is able to significantly reduce the training times without compromising the accuracy.

Reinforcement learning typically assumes that agents observe feedback for their actions immediately, but in many real-world applications (like recommendation systems) feedback is observed in delay. This paper studies online learning in episodic Markov decision processes (MDPs) with unknown transitions, adversarially changing costs and unrestricted delayed feedback. That is, the costs and trajectory of episode $k$ are revealed to the learner only in the end of episode $k + d^k$, where the delays $d^k$ are neither identical nor bounded, and are chosen by an oblivious adversary. We present novel algorithms based on policy optimization that achieve near-optimal high-probability regret of $\sqrt{K + D}$ under full-information feedback, where $K$ is the number of episodes and $D = \sum_{k} d^k$ is the total delay. Under bandit feedback, we prove similar $\sqrt{K + D}$ regret assuming the costs are stochastic, and $(K + D)^{2/3}$ regret in the general case. We are the first to consider regret minimization in the important setting of MDPs with delayed feedback.

We study a variant of the classical $k$-median problem known as diversity-aware $k$-median (introduced by Thejaswi et al. 2021), where we are given a collection of facility subsets, and a solution must contain at least a specified number of facilities from each subset.We investigate the fixed-parameter tractability of this problem and show several negative hardness and inapproximability results, even when we afford exponential running time with respect to some parameters of the problem. Motivated by these results we present a fixed parameter approximation algorithm with approximation ratio $(1 + \frac{2}{e} +\epsilon)$, and argue that this ratio is essentially tight assuming the gap-exponential time hypothesis. We also present a simple, practical local-search algorithm that gives a bicriteria $(2k, 3+\epsilon)$ approximation with better running time bounds.

We prove that the number of unit distances among $n$ planar points is at most $1.94\cdot n^{4/3}$, improving on the previous best bound of $8n^{4/3}$. We also give better upper and lower bounds for several small values of $n$. We also prove some variants of the crossing lemma and improve some constant factors.

The random order graph streaming model has received significant attention recently, with problems such as matching size estimation, component counting, and the evaluation of bounded degree constant query testable properties shown to admit surprisingly space efficient algorithms. The main result of this paper is a space efficient single pass random order streaming algorithm for simulating nearly independent random walks that start at uniformly random vertices. We show that the distribution of $k$-step walks from $b$ vertices chosen uniformly at random can be approximated up to error $\varepsilon$ per walk using $(1/\varepsilon)^{O(k)} 2^{O(k^2)}\cdot b$ words of space with a single pass over a randomly ordered stream of edges, solving an open problem of Peng and Sohler [SODA `18]. Applications of our result include the estimation of the average return probability of the $k$-step walk (the trace of the $k^\text{th}$ power of the random walk matrix) as well as the estimation of PageRank. We complement our algorithm with a strong impossibility result for directed graphs.

北京阿比特科技有限公司