亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

This paper considers a two-user non-orthogonal multiple access (NOMA) based infrastructure-to-vehicle (I2V) network, where one user requires reliable safety-critical data transmission and the other pursues high-capacity services. Leveraging only slow fading of channel state information, we aim to maximize the expected sum throughput of the capacity hungry user subject to a constraint on the payload delivery success probability of the reliability sensitive user, by jointly optimizing the transmit powers, target rates, and decoding order. We introduce a dual variable and formulate the optimization as an unconstrained single-objective sequential decision problem. Then, we design a dynamic programming based algorithm to derive the optimal policy that maximizes the Lagrangian. Afterwards, a bisection search based method is proposed to find the optimal dual variable. The proposed strategy is shown by numerical results to be superior to the baseline approaches from the perspectives of expected return, performance region, and objective value.

相關內容

We study a new two-time-scale stochastic gradient method for solving optimization problems, where the gradients are computed with the aid of an auxiliary variable under samples generated by time-varying Markov random processes parameterized by the underlying optimization variable. These time-varying samples make gradient directions in our update biased and dependent, which can potentially lead to the divergence of the iterates. In our two-time-scale approach, one scale is to estimate the true gradient from these samples, which is then used to update the estimate of the optimal solution. While these two iterates are implemented simultaneously, the former is updated "faster" (using bigger step sizes) than the latter (using smaller step sizes). Our first contribution is to characterize the finite-time complexity of the proposed two-time-scale stochastic gradient method. In particular, we provide explicit formulas for the convergence rates of this method under different structural assumptions, namely, strong convexity, convexity, the Polyak-Lojasiewicz condition, and general non-convexity. We apply our framework to two problems in control and reinforcement learning. First, we look at the standard online actor-critic algorithm over finite state and action spaces and derive a convergence rate of O(k^(-2/5)), which recovers the best known rate derived specifically for this problem. Second, we study an online actor-critic algorithm for the linear-quadratic regulator and show that a convergence rate of O(k^(-2/3)) is achieved. This is the first time such a result is known in the literature. Finally, we support our theoretical analysis with numerical simulations where the convergence rates are visualized.

This paper investigates a new downlink nonorthogonal multiple access (NOMA) system, where a multiantenna unmanned aerial vehicle (UAV) is powered by wireless power transfer (WPT) and serves as the base station for multiple pairs of ground users (GUs) running NOMA in each pair. An energy efficiency (EE) maximization problem is formulated to jointly optimize the WPT time and the placement for the UAV, and the allocation of the UAV's transmit power between different NOMA user pairs and within each pair. To efficiently solve this nonconvex problem, we decompose the problem into three subproblems using block coordinate descent. For the subproblem of intra-pair power allocation within each NOMA user pair, we construct a supermodular game with confirmed convergence to a Nash equilibrium. Given the intra-pair power allocation, successive convex approximation is applied to convexify and solve the subproblem of WPT time allocation and inter-pair power allocation between the user pairs. Finally, we solve the subproblem of UAV placement by using the Lagrange multiplier method. Simulations show that our approach can substantially outperform its alternatives that do not use NOMA and WPT techniques or that do not optimize the UAV location.

We introduce a new constrained optimization method for policy gradient reinforcement learning, which uses two trust regions to regulate each policy update. In addition to using the proximity of one single old policy as the first trust region as done by prior works, we propose to form a second trust region through the construction of another virtual policy that represents a wide range of past policies. We then enforce the new policy to stay closer to the virtual policy, which is beneficial in case the old policy performs badly. More importantly, we propose a mechanism to automatically build the virtual policy from a memory buffer of past policies, providing a new capability for dynamically selecting appropriate trust regions during the optimization process. Our proposed method, dubbed as Memory-Constrained Policy Optimization (MCPO), is examined on a diverse suite of environments including robotic locomotion control, navigation with sparse rewards and Atari games, consistently demonstrating competitive performance against recent on-policy constrained policy gradient methods.

Given a set $P$ of $n$ points in the plane, we consider the problem of computing the number of points of $P$ in a query unit disk (i.e., all query disks have the same radius). We show that the main techniques for simplex range searching in the plane can be adapted to this problem. For example, by adapting Matou\v{s}ek's results, we can build a data structure of $O(n)$ space so that each query can be answered in $O(\sqrt{n})$ time. Our techniques lead to improvements for several other classical problems, such as batched range searching, counting/reporting intersecting pairs of unit circles, distance selection, discrete 2-center, etc. For example, given a set of $n$ unit disks and a set of $n$ points in the plane, the batched range searching problem is to compute for each disk the number of points in it. Previous work [Katz and Sharir, 1997] solved the problem in $O(n^{4/3}\log n)$ time while our new algorithm runs in $O(n^{4/3})$ time.

Molecular communication has a key role to play in future medical applications, including detecting, analyzing, and addressing infectious disease outbreaks. Overcoming inter-symbol interference (ISI) is one of the key challenges in the design of molecular communication systems. In this paper, we propose to optimize the detection interval to minimize the impact of ISI while ensuring the accurate detection of the transmitted information symbol, which is suitable for the absorbing and passive receivers. For tractability, based on the signal-to-interference difference (SID) and signal-to-interference-and-noise amplitude ratio (SINAR), we propose a modified-SINAR (mSINAR) to measure the bit error rate (BER) performance for the molecular communication system with a variable detection interval. Besides, we derive the optimal detection interval in closed form. Using simulation results, we show that the BER performance of our proposed mSINAR scheme is superior to the competing schemes, and achieves similar performance to optimal intervals found by the exhaustive search.

This paper studies the application of reconfigurable intelligent surface (RIS) to cooperative non-orthogonal multiple access (C-NOMA) networks with simultaneous wireless information and power transfer (SWIPT). We aim for maximizing the rate of the strong user with guaranteed weak user's quality of service (QoS) by jointly optimizing power splitting factors, beamforming coefficients, and RIS reflection coefficients in two transmission phases. The formulated problem is difficult to solve due to its complex and non-convex constraints. To tackle this challenging problem, we first use alternating optimization (AO) framework to transform it into three subproblems, and then use the penalty-based arithmetic-geometric mean approximation (PBAGM) algorithm and the successive convex approximation (SCA)-based method to solve them. Numerical results verify the superiority of the proposed algorithm over the baseline schemes.

The numerical solution of singular eigenvalue problems is complicated by the fact that small perturbations of the coefficients may have an arbitrarily bad effect on eigenvalue accuracy. However, it has been known for a long time that such perturbations are exceptional and standard eigenvalue solvers, such as the QZ algorithm, tend to yield good accuracy despite the inevitable presence of roundoff error. Recently, Lotz and Noferini quantified this phenomenon by introducing the concept of $\delta$-weak eigenvalue condition numbers. In this work, we consider singular quadratic eigenvalue problems and two popular linearizations. Our results show that a correctly chosen linearization increases $\delta$-weak eigenvalue condition numbers only marginally, justifying the use of these linearizations in numerical solvers also in the singular case. We propose a very simple but often effective algorithm for computing well-conditioned eigenvalues of a singular quadratic eigenvalue problems by adding small random perturbations to the coefficients. We prove that the eigenvalue condition number is, with high probability, a reliable criterion for detecting and excluding spurious eigenvalues created from the singular part.

With the increasing penetration of distributed energy resources, distributed optimization algorithms have attracted significant attention for power systems applications due to their potential for superior scalability, privacy, and robustness to a single point-of-failure. The Alternating Direction Method of Multipliers (ADMM) is a popular distributed optimization algorithm; however, its convergence performance is highly dependent on the selection of penalty parameters, which are usually chosen heuristically. In this work, we use reinforcement learning (RL) to develop an adaptive penalty parameter selection policy for the AC optimal power flow (ACOPF) problem solved via ADMM with the goal of minimizing the number of iterations until convergence. We train our RL policy using deep Q-learning, and show that this policy can result in significantly accelerated convergence (up to a 59% reduction in the number of iterations compared to existing, curvature-informed penalty parameter selection methods). Furthermore, we show that our RL policy demonstrates promise for generalizability, performing well under unseen loading schemes as well as under unseen losses of lines and generators (up to a 50% reduction in iterations). This work thus provides a proof-of-concept for using RL for parameter selection in ADMM for power systems applications.

Multi-object tracking (MOT) is a crucial component of situational awareness in military defense applications. With the growing use of unmanned aerial systems (UASs), MOT methods for aerial surveillance is in high demand. Application of MOT in UAS presents specific challenges such as moving sensor, changing zoom levels, dynamic background, illumination changes, obscurations and small objects. In this work, we present a robust object tracking architecture aimed to accommodate for the noise in real-time situations. We propose a kinematic prediction model, called Deep Extended Kalman Filter (DeepEKF), in which a sequence-to-sequence architecture is used to predict entity trajectories in latent space. DeepEKF utilizes a learned image embedding along with an attention mechanism trained to weight the importance of areas in an image to predict future states. For the visual scoring, we experiment with different similarity measures to calculate distance based on entity appearances, including a convolutional neural network (CNN) encoder, pre-trained using Siamese networks. In initial evaluation experiments, we show that our method, combining scoring structure of the kinematic and visual models within a MHT framework, has improved performance especially in edge cases where entity motion is unpredictable, or the data presents frames with significant gaps.

Since deep neural networks were developed, they have made huge contributions to everyday lives. Machine learning provides more rational advice than humans are capable of in almost every aspect of daily life. However, despite this achievement, the design and training of neural networks are still challenging and unpredictable procedures. To lower the technical thresholds for common users, automated hyper-parameter optimization (HPO) has become a popular topic in both academic and industrial areas. This paper provides a review of the most essential topics on HPO. The first section introduces the key hyper-parameters related to model training and structure, and discusses their importance and methods to define the value range. Then, the research focuses on major optimization algorithms and their applicability, covering their efficiency and accuracy especially for deep learning networks. This study next reviews major services and toolkits for HPO, comparing their support for state-of-the-art searching algorithms, feasibility with major deep learning frameworks, and extensibility for new modules designed by users. The paper concludes with problems that exist when HPO is applied to deep learning, a comparison between optimization algorithms, and prominent approaches for model evaluation with limited computational resources.

北京阿比特科技有限公司