Semi-grant-free (SGF) transmission scheme enables grant-free (GF) users to utilize resource blocks allocated for grant-based (GB) users while maintaining the quality of service of GB users. This work investigates the secrecy performance of non-orthogonal multiple access (NOMA)-aided SGF systems. First, analytical expressions for the exact and asymptotic secrecy outage probability (SOP) of NOMA-aided SGF systems with a single GF user are derived. Then, the SGF systems with multiple GF users and a best-user scheduling scheme is considered. By utilizing order statistics theory, closed-form expressions for the exact and asymptotic SOP are derived. Monte Carlo simulation results demonstrate the effects of system parameters on the SOP of the considered system and verify the accuracy of the developed analytical results. The results indicate that both the outage target rate for GB and the secure target rate for GF are the main factors of the secrecy performance of SGF systems.
Many deep reinforcement learning algorithms rely on simple forms of exploration, such as the additive action-noise often used in continuous control domains. Typically, the scaling factor of this action noise is chosen as a hyper-parameter and kept constant during training. In this paper, we analyze how the learned policy is impacted by the noise type, scale, and reducing of the scaling factor over time. We consider the two most prominent types of action-noise: Gaussian and Ornstein-Uhlenbeck noise, and perform a vast experimental campaign by systematically varying the noise type and scale parameter, and by measuring variables of interest like the expected return of the policy and the state space coverage during exploration. For the latter, we propose a novel state-space coverage measure $\operatorname{X}_{\mathcal{U}\text{rel}}$ that is more robust to boundary artifacts than previously proposed measures. Larger noise scales generally increase state space coverage. However, we found that increasing the space coverage using a larger noise scale is often not beneficial. On the contrary, reducing the noise-scale over the training process reduces the variance and generally improves the learning performance. We conclude that the best noise-type and scale are environment dependent, and based on our observations, derive heuristic rules for guiding the choice of the action noise as a starting point for further optimization.
In this paper, we are concerned with the numerical solution for the two-dimensional time fractional Fokker-Planck equation with tempered fractional derivative of order $\alpha$. Although some of its variants are considered in many recent numerical analysis papers, there are still some significant differences. Here we first provide the regularity estimates of the solution. And then a modified $L$1 scheme inspired by the middle rectangle quadrature formula on graded meshes is employed to compensate for the singularity of the solution at $t\rightarrow 0^{+}$, while the five-point difference scheme is used in space. Stability and convergence are proved in the sence of $L^{\infty}$ norm, then a sharp error estimate $\mathscr{O}(\tau^{\min\{2-\alpha, r\alpha\}})$ is derived on graded meshes. Furthermore, unlike the bounds proved in the previous works, the constant multipliers in our analysis do not blow up as the Caputo fractional derivative $\alpha$ approaches the classical value of 1. Finally, we perform the numerical experiments to verify the effectiveness and convergence order of the presented algorithms.
In the prophet secretary problem, $n$ values are drawn independently from known distributions, and presented in a uniformly random order. A decision-maker must accept or reject each value when it is presented, and may accept at most $k$ values in total. The objective is to maximize the expected sum of accepted values. We analyze the performance of static threshold policies, which accept the first $k$ values exceeding a fixed threshold (or all such values, if fewer than $k$ exist). We show that an appropriate threshold guarantees $\gamma_k = 1 - e^{-k}k^k/k!$ times the value of the offline optimal solution. Note that $\gamma_1 = 1-1/e$, and by Stirling's approximation $\gamma_k \approx 1-1/\sqrt{2 \pi k}$. This represents the best-known guarantee for the prophet secretary problem for all $k>1$, and is tight for all $k$ for the class of static threshold policies. We provide two simple methods for setting the threshold. Our first method sets a threshold such that $k \cdot \gamma_k$ values are accepted in expectation, and offers an optimal guarantee for all $k$. Our second sets a threshold such that the expected number of values exceeding the threshold is equal to $k$. This approach gives an optimal guarantee if $k > 4$, but gives sub-optimal guarantees for $k \le 4$. Our proofs use a new result for optimizing sums of independent Bernoulli random variables, which extends a classical result of Hoeffding (1956) and is likely to be of independent interest. Finally, we note that our methods for setting thresholds can be implemented under limited information about agents' values.
This paper focuses on improving the resource allocation algorithm in terms of packet delivery ratio (PDR), i.e., the number of successfully received packets sent by end devices (EDs) in a long-range wide-area network (LoRaWAN). Setting the transmission parameters significantly affects the PDR. Employing reinforcement learning (RL), we propose a resource allocation algorithm that enables the EDs to configure their transmission parameters in a distributed manner. We model the resource allocation problem as a multi-armed bandit (MAB) and then address it by proposing a two-phase algorithm named MIX-MAB, which consists of the exponential weights for exploration and exploitation (EXP3) and successive elimination (SE) algorithms. We evaluate the MIX-MAB performance through simulation results and compare it with other existing approaches. Numerical results show that the proposed solution performs better than the existing schemes in terms of convergence time and PDR.
Bilevel optimization has arisen as a powerful tool in modern machine learning. However, due to the nested structure of bilevel optimization, even gradient-based methods require second-order derivative approximations via Jacobian- or/and Hessian-vector computations, which can be costly and unscalable in practice. Recently, Hessian-free bilevel schemes have been proposed to resolve this issue, where the general idea is to use zeroth- or first-order methods to approximate the full hypergradient of the bilevel problem. However, we empirically observe that such approximation can lead to large variance and unstable training, but estimating only the response Jacobian matrix as a partial component of the hypergradient turns out to be extremely effective. To this end, we propose a new Hessian-free method, which adopts the zeroth-order-like method to approximate the response Jacobian matrix via taking difference between two optimization paths. Theoretically, we provide the convergence rate analysis for the proposed algorithms, where our key challenge is to characterize the approximation and smoothness properties of the trajectory-dependent estimator, which can be of independent interest. This is the first known convergence rate result for this type of Hessian-free bilevel algorithms. Experimentally, we demonstrate that the proposed algorithms outperform baseline bilevel optimizers on various bilevel problems. Particularly, in our experiment on few-shot meta-learning with ResNet-12 network over the miniImageNet dataset, we show that our algorithm outperforms baseline meta-learning algorithms, while other baseline bilevel optimizers do not solve such meta-learning problems within a comparable time frame.
This paper investigates the problem of regret minimization in linear time-varying (LTV) dynamical systems. Due to the simultaneous presence of uncertainty and non-stationarity, designing online control algorithms for unknown LTV systems remains a challenging task. At a cost of NP-hard offline planning, prior works have introduced online convex optimization algorithms, although they suffer from nonparametric rate of regret. In this paper, we propose the first computationally tractable online algorithm with regret guarantees that avoids offline planning over the state linear feedback policies. Our algorithm is based on the optimism in the face of uncertainty (OFU) principle in which we optimistically select the best model in a high confidence region. Our algorithm is then more explorative when compared to previous approaches. To overcome non-stationarity, we propose either a restarting strategy (R-OFU) or a sliding window (SW-OFU) strategy. With proper configuration, our algorithm is attains sublinear regret $O(T^{2/3})$. These algorithms utilize data from the current phase for tracking variations on the system dynamics. We corroborate our theoretical findings with numerical experiments, which highlight the effectiveness of our methods. To the best of our knowledge, our study establishes the first model-based online algorithm with regret guarantees under LTV dynamical systems.
We revisit the classical problem of finding an approximately stationary point of the average of $n$ smooth and possibly nonconvex functions. The optimal complexity of stochastic first-order methods in terms of the number of gradient evaluations of individual functions is $\mathcal{O}\left(n + n^{1/2}\varepsilon^{-1}\right)$, attained by the optimal SGD methods $\small\sf\color{green}{SPIDER}$(arXiv:1807.01695) and $\small\sf\color{green}{PAGE}$(arXiv:2008.10898), for example, where $\varepsilon$ is the error tolerance. However, i) the big-$\mathcal{O}$ notation hides crucial dependencies on the smoothness constants associated with the functions, and ii) the rates and theory in these methods assume simplistic sampling mechanisms that do not offer any flexibility. In this work we remedy the situation. First, we generalize the $\small\sf\color{green}{PAGE}$ algorithm so that it can provably work with virtually any (unbiased) sampling mechanism. This is particularly useful in federated learning, as it allows us to construct and better understand the impact of various combinations of client and data sampling strategies. Second, our analysis is sharper as we make explicit use of certain novel inequalities that capture the intricate interplay between the smoothness constants and the sampling procedure. Indeed, our analysis is better even for the simple sampling procedure analyzed in the $\small\sf\color{green}{PAGE}$ paper. However, this already improved bound can be further sharpened by a different sampling scheme which we propose. In summary, we provide the most general and most accurate analysis of optimal SGD in the smooth nonconvex regime. Finally, our theoretical findings are supposed with carefully designed experiments.
The ability to accurately predict human behavior is central to the safety and efficiency of robot autonomy in interactive settings. Unfortunately, robots often lack access to key information on which these predictions may hinge, such as people's goals, attention, and willingness to cooperate. Dual control theory addresses this challenge by treating unknown parameters of a predictive model as stochastic hidden states and inferring their values at runtime using information gathered during system operation. While able to optimally and automatically trade off exploration and exploitation, dual control is computationally intractable for general interactive motion planning, mainly due to the fundamental coupling between robot trajectory optimization and human intent inference. In this paper, we present a novel algorithmic approach to enable active uncertainty reduction for interactive motion planning based on the implicit dual control paradigm. Our approach relies on sampling-based approximation of stochastic dynamic programming, leading to a model predictive control problem that can be readily solved by real-time gradient-based optimization methods. The resulting policy is shown to preserve the dual control effect for a broad class of predictive human models with both continuous and categorical uncertainty. The efficacy of our approach is demonstrated with simulated driving examples.
We consider the problem of secure distributed matrix multiplication (SDMM), where a user has two matrices and wishes to compute their product with the help of $N$ honest but curious servers under the security constraint that any information about either $A$ or $B$ is not leaked to any server. This paper presents a \emph{new scheme} that considers a grid product partition for matrices $A$ and $B$, which achieves an upload cost significantly lower than the existing results in the literature. Since the grid partition is a general partition that incorporates the inner and outer ones, it turns out that the communication load of the proposed scheme matches the best-known protocols for those extreme cases.
The automated localisation of damage in structures is a challenging but critical ingredient in the path towards predictive or condition-based maintenance of high value structures. The use of acoustic emission time of arrival mapping is a promising approach to this challenge, but is severely hindered by the need to collect a dense set of artificial acoustic emission measurements across the structure, resulting in a lengthy and often impractical data acquisition process. In this paper, we consider the use of physics-informed Gaussian processes for learning these maps to alleviate this problem. In the approach, the Gaussian process is constrained to the physical domain such that information relating to the geometry and boundary conditions of the structure are embedded directly into the learning process, returning a model that guarantees that any predictions made satisfy physically-consistent behaviour at the boundary. A number of scenarios that arise when training measurement acquisition is limited, including where training data are sparse, and also of limited coverage over the structure of interest. Using a complex plate-like structure as an experimental case study, we show that our approach significantly reduces the burden of data collection, where it is seen that incorporation of boundary condition knowledge significantly improves predictive accuracy as training observations are reduced, particularly when training measurements are not available across all parts of the structure.