亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study a distributed stochastic multi-armed bandit where a client supplies the learner with communication-constrained feedback based on the rewards for the corresponding arm pulls. In our setup, the client must encode the rewards such that the second moment of the encoded rewards is no more than $P$, and this encoded reward is further corrupted by additive Gaussian noise of variance $\sigma^2$; the learner only has access to this corrupted reward. For this setting, we derive an information-theoretic lower bound of $\Omega\left(\sqrt{\frac{KT}{\mathtt{SNR} \wedge1}} \right)$ on the minimax regret of any scheme, where $ \mathtt{SNR} := \frac{P}{\sigma^2}$, and $K$ and $T$ are the number of arms and time horizon, respectively. Furthermore, we propose a multi-phase bandit algorithm, $\mathtt{UE\text{-}UCB++}$, which matches this lower bound to a minor additive factor. $\mathtt{UE\text{-}UCB++}$ performs uniform exploration in its initial phases and then utilizes the {\em upper confidence bound }(UCB) bandit algorithm in its final phase. An interesting feature of $\mathtt{UE\text{-}UCB++}$ is that the coarser estimates of the mean rewards formed during a uniform exploration phase help to refine the encoding protocol in the next phase, leading to more accurate mean estimates of the rewards in the subsequent phase. This positive reinforcement cycle is critical to reducing the number of uniform exploration rounds and closely matching our lower bound.

相關內容

We study a structured multi-agent multi-armed bandit (MAMAB) problem in a dynamic environment. A graph reflects the information-sharing structure among agents, and the arms' reward distributions are piecewise-stationary with several unknown change points. The agents face the identical piecewise-stationary MAB problem. The goal is to develop a decision-making policy for the agents that minimizes the regret, which is the expected total loss of not playing the optimal arm at each time step. Our proposed solution, Restarted Bayesian Online Change Point Detection in Cooperative Upper Confidence Bound Algorithm (RBO-Coop-UCB), involves an efficient multi-agent UCB algorithm as its core enhanced with a Bayesian change point detector. We also develop a simple restart decision cooperation that improves decision-making. Theoretically, we establish that the expected group regret of RBO-Coop-UCB is upper bounded by $\mathcal{O}(KNM\log T + K\sqrt{MT\log T})$, where K is the number of agents, M is the number of arms, and T is the number of time steps. Numerical experiments on synthetic and real-world datasets demonstrate that our proposed method outperforms the state-of-the-art algorithms.

Let $\Phi$ be a random $k$-CNF formula on $n$ variables and $m$ clauses, where each clause is a disjunction of $k$ literals chosen independently and uniformly. Our goal is to sample an approximately uniform solution of $\Phi$ (or equivalently, approximate the partition function of $\Phi$). Let $\alpha=m/n$ be the density. The previous best algorithm runs in time $n^{\mathsf{poly}(k,\alpha)}$ for any $\alpha\lesssim2^{k/300}$ [Galanis, Goldberg, Guo, and Yang, SIAM J. Comput.'21]. Our result significantly improves both bounds by providing an almost-linear time sampler for any $\alpha\lesssim2^{k/3}$. The density $\alpha$ captures the \emph{average degree} in the random formula. In the worst-case model with bounded \emph{maximum degree}, current best efficient sampler works up to degree bound $2^{k/5}$ [He, Wang, and Yin, FOCS'22 and SODA'23], which is, for the first time, superseded by its average-case counterpart due to our $2^{k/3}$ bound. Our result is the first progress towards establishing the intuition that the solvability of the average-case model (random $k$-CNF formula with bounded average degree) is better than the worst-case model (standard $k$-CNF formula with bounded maximal degree) in terms of sampling solutions.

We study a decentralized multi-agent multi-armed bandit problem in which multiple clients are connected by time dependent random graphs provided by an environment. The reward distributions of each arm vary across clients and rewards are generated independently over time by an environment based on distributions that include both sub-exponential and sub-gaussian distributions. Each client pulls an arm and communicates with neighbors based on the graph provided by the environment. The goal is to minimize the overall regret of the entire system through collaborations. To this end, we introduce a novel algorithmic framework, which first provides robust simulation methods for generating random graphs using rapidly mixing Markov chains or the random graph model, and then combines an averaging-based consensus approach with a newly proposed weighting technique and the upper confidence bound to deliver a UCB-type solution. Our algorithms account for the randomness in the graphs, removing the conventional doubly stochasticity assumption, and only require the knowledge of the number of clients at initialization. We derive optimal instance-dependent regret upper bounds of order $\log{T}$ in both sub-gaussian and sub-exponential environments, and a nearly optimal mean-gap independent regret upper bound of order $\sqrt{T}\log T$ up to a $\log T$ factor. Importantly, our regret bounds hold with high probability and capture graph randomness, whereas prior works consider expected regret under assumptions and require more stringent reward distributions.

We propose a new algorithm for the problem of recovering data that adheres to multiple, heterogeneous low-dimensional structures from linear observations. Focusing on data matrices that are simultaneously row-sparse and low-rank, we propose and analyze an iteratively reweighted least squares (IRLS) algorithm that is able to leverage both structures. In particular, it optimizes a combination of non-convex surrogates for row-sparsity and rank, a balancing of which is built into the algorithm. We prove locally quadratic convergence of the iterates to a simultaneously structured data matrix in a regime of minimal sample complexity (up to constants and a logarithmic factor), which is known to be impossible for a combination of convex surrogates. In experiments, we show that the IRLS method exhibits favorable empirical convergence, identifying simultaneously row-sparse and low-rank matrices from fewer measurements than state-of-the-art methods.

The design of codes for feedback-enabled communications has been a long-standing open problem. Recent research on non-linear, deep learning-based coding schemes have demonstrated significant improvements in communication reliability over linear codes, but are still vulnerable to the presence of forward and feedback noise over the channel. In this paper, we develop a new family of non-linear feedback codes that greatly enhance robustness to channel noise. Our autoencoder-based architecture is designed to learn codes based on consecutive blocks of bits, which obtains de-noising advantages over bit-by-bit processing to help overcome the physical separation between the encoder and decoder over a noisy channel. Moreover, we develop a power control layer at the encoder to explicitly incorporate hardware constraints into the learning optimization, and prove that the resulting average power constraint is satisfied asymptotically. Numerical experiments demonstrate that our scheme outperforms state-of-the-art feedback codes by wide margins over practical forward and feedback noise regimes, and provide information-theoretic insights on the behavior of our non-linear codes. Moreover, we observe that, in a long blocklength regime, canonical error correction codes are still preferable to feedback codes when the feedback noise becomes high.

Estimation of signal-to-noise ratios and residual variances in high-dimensional linear models has various important applications including, e.g. heritability estimation in bioinformatics. One commonly used estimator, usually referred to as REML, is based on the likelihood of the random effects model, in which both the regression coefficients and the noise variables are respectively assumed to be i.i.d Gaussian random variables. In this paper, we aim to establish the consistency and asymptotic distribution of the REML estimator for the SNR, when the actual coefficient vector is fixed, and the actual noise is heteroscedastic and correlated, at the cost of assuming the entries of the design matrix are independent and skew-free. The asymptotic variance can be also consistently estimated when the noise is heteroscedastic but uncorrelated. Extensive numerical simulations illustrate our theoretical findings and also suggest some assumptions imposed in our theoretical results are likely relaxable.

In many applications of online decision making, the environment is non-stationary and it is therefore crucial to use bandit algorithms that handle changes. Most existing approaches are designed to protect against non-smooth changes, constrained only by total variation or Lipschitzness over time, where they guarantee $\tilde \Theta(T^{2/3})$ regret. However, in practice environments are often changing {\bf smoothly}, so such algorithms may incur higher-than-necessary regret in these settings and do not leverage information on the rate of change. We study a non-stationary two-armed bandits problem where we assume that an arm's mean reward is a $\beta$-H\"older function over (normalized) time, meaning it is $(\beta-1)$-times Lipschitz-continuously differentiable. We show the first separation between the smooth and non-smooth regimes by presenting a policy with $\tilde O(T^{3/5})$ regret for $\beta=2$. We complement this result by an $\Omg(T^{(\beta+1)/(2\beta+1)})$ lower bound for any integer $\beta\ge 1$, which matches our upper bound for $\beta=2$.

Non-stationary parametric bandits have attracted much attention recently. There are three principled ways to deal with non-stationarity, including sliding-window, weighted, and restart strategies. As many non-stationary environments exhibit gradual drifting patterns, the weighted strategy is commonly adopted in real-world applications. However, previous theoretical studies show that its analysis is more involved and the algorithms are either computationally less efficient or statistically suboptimal. This paper revisits the weighted strategy for non-stationary parametric bandits. In linear bandits (LB), we discover that this undesirable feature is due to an inadequate regret analysis, which results in an overly complex algorithm design. We propose a refined analysis framework, which simplifies the derivation and importantly produces a simpler weight-based algorithm that is as efficient as window/restart-based algorithms while retaining the same regret as previous studies. Furthermore, our new framework can be used to improve regret bounds of other parametric bandits, including Generalized Linear Bandits (GLB) and Self-Concordant Bandits (SCB). For example, we develop a simple weighted GLB algorithm with an $\widetilde{O}(k_\mu^{\frac{5}{4}} c_\mu^{-\frac{3}{4}} d^{\frac{3}{4}} P_T^{\frac{1}{4}}T^{\frac{3}{4}})$ regret, improving the $\widetilde{O}(k_\mu^{2} c_\mu^{-1}d^{\frac{9}{10}} P_T^{\frac{1}{5}}T^{\frac{4}{5}})$ bound in prior work, where $k_\mu$ and $c_\mu$ characterize the reward model's nonlinearity, $P_T$ measures the non-stationarity, $d$ and $T$ denote the dimension and time horizon.

ISAC is recognized as a promising technology for the next-generation wireless networks, which provides significant performance gains over individual S&C systems via the shared use of wireless resources. The characterization of the S&C performance tradeoff is at the core of the theoretical foundation of ISAC. In this paper, we consider a point-to-point ISAC model under vector Gaussian channels, and propose to use the CRB-rate region as a basic tool for depicting the fundamental S&C tradeoff. In particular, we consider the scenario where a unified ISAC waveform is emitted from a dual-functional ISAC Tx, which simultaneously performs S&C tasks with a communication Rx and a sensing Rx. In order to perform both S&C tasks, the ISAC waveform is required to be random to convey communication information, with realizations being perfectly known at both the ISAC Tx and the sensing Rx as a reference sensing signal as in typical radar systems. As the main contribution of this paper, we characterize the S&C performance at the two corner points of the CRB-rate region, namely, $P_{SC}$ indicating the max. achievable rate constrained by the min. CRB, and $P_{CS}$ indicating the min. achievable CRB constrained by the max. rate. In particular, we derive the high-SNR capacity at $P_{SC}$, and provide lower and upper bounds for the sensing CRB at $P_{CS}$. We show that these two points can be achieved by the conventional Gaussian signaling and a novel strategy relying on the uniform distribution over the Stiefel manifold, respectively. Based on the above-mentioned analysis, we provide an outer bound and various inner bounds for the achievable CRB-rate regions. Our main results reveal a two-fold tradeoff in ISAC systems, consisting of the subspace tradeoff (ST) and the deterministic-random tradeoff (DRT) that depend on the resource allocation and data modulation schemes employed for S&C, respectively.

A parametric class of trust-region algorithms for unconstrained nonconvex optimization is considered where the value of the objective function is never computed. The class contains a deterministic version of the first-order Adagrad method typically used for minimization of noisy function, but also allows the use of (possibly approximate) second-order information when available. The rate of convergence of methods in the class is analyzed and is shown to be identical to that known for first-order optimization methods using both function and gradients values, recovering existing results for purely-first order variants and improving the explicit dependence on problem dimension. This rate is shown to be essentially sharp. A new class of methods is also presented, for which a slightly worse and essentially sharp complexity result holds. Limited numerical experiments show that the new methods' performance may be comparable to that of standard steepest descent, despite using significantly less information, and that this performance is relatively insensitive to noise.

北京阿比特科技有限公司