亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We consider a wireless uplink network consisting of multiple end devices and an access point (AP). Each device monitors a physical process with stochastic arrival of status updates and sends these updates to the AP over a shared channel. The AP aims to schedule the transmissions of these devices to optimize the network-wide information freshness, quantified by the Age of Information (AoI) metric. Due to the stochastic arrival of the status updates at the devices, the AP only has partial observations of system times of the latest status updates at the devices when making scheduling decisions. We formulate such a decision-making problem as a belief Markov Decision Process (belief-MDP). The belief-MDP in its original form is difficult to solve as the dimension of its states can go to infinity and its belief space is uncountable. By leveraging the properties of the status update arrival (i.e., Bernoulli) processes, we manage to simplify the feasible states of the belief-MDP to two-dimensional vectors. Built on that, we devise a low-complexity scheduling policy. We derive upper bounds for the AoI performance of the low-complexity policy and analyze the performance guarantee by comparing its performance with a universal lower bound. Numerical results validate our analyses.

相關內容

《計算機信息》雜志發表高質量的論文,擴大了運籌學和計算的范圍,尋求有關理論、方法、實驗、系統和應用方面的原創研究論文、新穎的調查和教程論文,以及描述新的和有用的軟件工具的論文。官網鏈接: · 估計/估計量 · 控制器 · 學成 · 強化學習 ·
2022 年 4 月 20 日

We study a new two-time-scale stochastic gradient method for solving optimization problems, where the gradients are computed with the aid of an auxiliary variable under samples generated by time-varying Markov random processes parameterized by the underlying optimization variable. These time-varying samples make gradient directions in our update biased and dependent, which can potentially lead to the divergence of the iterates. In our two-time-scale approach, one scale is to estimate the true gradient from these samples, which is then used to update the estimate of the optimal solution. While these two iterates are implemented simultaneously, the former is updated "faster" (using bigger step sizes) than the latter (using smaller step sizes). Our first contribution is to characterize the finite-time complexity of the proposed two-time-scale stochastic gradient method. In particular, we provide explicit formulas for the convergence rates of this method under different structural assumptions, namely, strong convexity, convexity, the Polyak-Lojasiewicz condition, and general non-convexity. We apply our framework to two problems in control and reinforcement learning. First, we look at the standard online actor-critic algorithm over finite state and action spaces and derive a convergence rate of O(k^(-2/5)), which recovers the best known rate derived specifically for this problem. Second, we study an online actor-critic algorithm for the linear-quadratic regulator and show that a convergence rate of O(k^(-2/3)) is achieved. This is the first time such a result is known in the literature. Finally, we support our theoretical analysis with numerical simulations where the convergence rates are visualized.

This paper revisits the problem of sampling and transmitting status updates through a channel with random delay under a sampling frequency constraint \cite{sun_17_tit}. We use the Age of Information (AoI) to characterize the status information freshness at the receiver. The goal is to design a sampling policy that can minimize the average AoI when the statistics of delay is unknown. We reformulate the problem as the optimization of a renewal-reward process, and propose an online sampling strategy based on the Robbins-Monro algorithm. We prove that the proposed algorithm satisfies the sampling frequency constraint. Moreover, when the transmission delay is bounded and its distribution is absolutely continuous, the average AoI obtained by the proposed algorithm converges to the minimum AoI when the number of samples $K$ goes to infinity with probability 1. We show that the optimality gap decays with rate $\mathcal{O}\left(\ln K/K\right)$, and the proposed algorithm is minimax rate optimal. Simulation results validate the performance of our proposed algorithm.

The advancements in peer-to-peer wireless power transfer (P2P-WPT) have empowered the portable and mobile devices to wirelessly replenish their battery by directly interacting with other nearby devices. The existing works unrealistically assume the users to exchange energy with any of the users and at every such opportunity. However, due to the users' mobility, the inter-node meetings in such opportunistic mobile networks vary, and P2P energy exchange in such scenarios remains uncertain. Additionally, the social interests and interactions of the users influence their mobility as well as the energy exchange between them. The existing P2P-WPT methods did not consider the joint problem for energy exchange due to user's inevitable mobility, and the influence of sociality on the latter. As a result of computing with imprecise information, the energy balance achieved by these works at a slower rate as well as impaired by energy loss for the crowd. Motivated by this problem scenario, in this work, we present a wireless crowd charging method, namely MoSaBa, which leverages mobility prediction and social information for improved energy balancing. MoSaBa incorporates two dimensions of social information, namely social context and social relationships, as additional features for predicting contact opportunities. In this method, we explore the different pairs of peers such that the energy balancing is achieved at a faster rate as well as the energy balance quality improves in terms of maintaining low energy loss for the crowd. We justify the peer selection method in MoSaBa by detailed performance evaluation. Compared to the existing state-of-the-art, the proposed method achieves better performance trade-offs between energy-efficiency, energy balance quality and convergence time.

Applications of Reinforcement Learning (RL), in which agents learn to make a sequence of decisions despite lacking complete information about the latent states of the controlled system, that is, they act under partial observability of the states, are ubiquitous. Partially observable RL can be notoriously difficult -- well-known information-theoretic results show that learning partially observable Markov decision processes (POMDPs) requires an exponential number of samples in the worst case. Yet, this does not rule out the existence of large subclasses of POMDPs over which learning is tractable. In this paper we identify such a subclass, which we call weakly revealing POMDPs. This family rules out the pathological instances of POMDPs where observations are uninformative to a degree that makes learning hard. We prove that for weakly revealing POMDPs, a simple algorithm combining optimism and Maximum Likelihood Estimation (MLE) is sufficient to guarantee polynomial sample complexity. To the best of our knowledge, this is the first provably sample-efficient result for learning from interactions in overcomplete POMDPs, where the number of latent states can be larger than the number of observations.

We extend the Deep Galerkin Method (DGM) introduced in Sirignano and Spiliopoulos (2018)} to solve a number of partial differential equations (PDEs) that arise in the context of optimal stochastic control and mean field games. First, we consider PDEs where the function is constrained to be positive and integrate to unity, as is the case with Fokker-Planck equations. Our approach involves reparameterizing the solution as the exponential of a neural network appropriately normalized to ensure both requirements are satisfied. This then gives rise to nonlinear a partial integro-differential equation (PIDE) where the integral appearing in the equation is handled by a novel application of importance sampling. Secondly, we tackle a number of Hamilton-Jacobi-Bellman (HJB) equations that appear in stochastic optimal control problems. The key contribution is that these equations are approached in their unsimplified primal form which includes an optimization problem as part of the equation. We extend the DGM algorithm to solve for the value function and the optimal control \simultaneously by characterizing both as deep neural networks. Training the networks is performed by taking alternating stochastic gradient descent steps for the two functions, a technique inspired by the policy improvement algorithms (PIA).

We study the problem of testing whether a function $f: \mathbb{R}^n \to \mathbb{R}$ is a polynomial of degree at most $d$ in the \emph{distribution-free} testing model. Here, the distance between functions is measured with respect to an unknown distribution $\mathcal{D}$ over $\mathbb{R}^n$ from which we can draw samples. In contrast to previous work, we do not assume that $\mathcal{D}$ has finite support. We design a tester that given query access to $f$, and sample access to $\mathcal{D}$, makes $(d/\varepsilon)^{O(1)}$ many queries to $f$, accepts with probability $1$ if $f$ is a polynomial of degree $d$, and rejects with probability at least $2/3$ if every degree-$d$ polynomial $P$ disagrees with $f$ on a set of mass at least $\varepsilon$ with respect to $\mathcal{D}$. Our result also holds under mild assumptions when we receive only a polynomial number of bits of precision for each query to $f$, or when $f$ can only be queried on rational points representable using a logarithmic number of bits. Along the way, we prove a new stability theorem for multivariate polynomials that may be of independent interest.

The success of large-scale models in recent years has increased the importance of statistical models with numerous parameters. Several studies have analyzed over-parameterized linear models with high-dimensional data that may not be sparse; however, existing results depend on the independent setting of samples. In this study, we analyze a linear regression model with dependent time series data under over-parameterization settings. We consider an estimator via interpolation and developed a theory for excess risk of the estimator under multiple dependence types. This theory can treat infinite-dimensional data without sparsity and handle long-memory processes in a unified manner. Moreover, we bound the risk in our theory via the integrated covariance and nondegeneracy of autocorrelation matrices. The results show that the convergence rate of risks with short-memory processes is identical to that of cases with independent data, while long-memory processes slow the convergence rate. We also present several examples of specific dependent processes that can be applied to our setting.

Split learning (SL) is a collaborative learning framework, which can train an artificial intelligence (AI) model between a device and an edge server by splitting the AI model into a device-side model and a server-side model at a cut layer. The existing SL approach conducts the training process sequentially across devices, which incurs significant training latency especially when the number of devices is large. In this paper, we design a novel SL scheme to reduce the training latency, named Cluster-based Parallel SL (CPSL) which conducts model training in a "first-parallel-then-sequential" manner. Specifically, the CPSL is to partition devices into several clusters, parallelly train device-side models in each cluster and aggregate them, and then sequentially train the whole AI model across clusters, thereby parallelizing the training process and reducing training latency. Furthermore, we propose a resource management algorithm to minimize the training latency of CPSL considering device heterogeneity and network dynamics in wireless networks. This is achieved by stochastically optimizing the cut layer selection, real-time device clustering, and radio spectrum allocation. The proposed two-timescale algorithm can jointly make the cut layer selection decision in a large timescale and device clustering and radio spectrum allocation decisions in a small timescale. Extensive simulation results on non-independent and identically distributed data demonstrate that the proposed solutions can greatly reduce the training latency as compared with the existing SL benchmarks, while adapting to network dynamics.

We recall some of the history of the information-theoretic approach to deriving core results in probability theory and indicate parts of the recent resurgence of interest in this area with current progress along several interesting directions. Then we give a new information-theoretic proof of a finite version of de Finetti's classical representation theorem for finite-valued random variables. We derive an upper bound on the relative entropy between the distribution of the first $k$ in a sequence of $n$ exchangeable random variables, and an appropriate mixture over product distributions. The mixing measure is characterised as the law of the empirical measure of the original sequence, and de Finetti's result is recovered as a corollary. The proof is nicely motivated by the Gibbs conditioning principle in connection with statistical mechanics, and it follows along an appealing sequence of steps. The technical estimates required for these steps are obtained via the use of a collection of combinatorial tools known within information theory as `the method of types.'

We present a pipelined multiplier with reduced activities and minimized interconnect based on online digit-serial arithmetic. The working precision has been truncated such that $p<n$ bits are used to compute $n$ bits product, resulting in significant savings in area and power. The digit slices follow variable precision according to input, increasing upto $p$ and then decreases according to the error profile. Pipelining has been done to achieve high throughput and low latency which is desirable for compute intensive inner products. Synthesis results of the proposed designs have been presented and compared with the non-pipelined online multiplier, pipelined online multiplier with full working precision and conventional serial-parallel and array multipliers. For $8, 16, 24$ and $32$ bit precision, the proposed low power pipelined design show upto $38\%$ and $44\%$ reduction in power and area respectively compared to the pipelined online multiplier without working precision truncation.

北京阿比特科技有限公司