Rate-splitting multiple access (RSMA) is a general multiple access scheme for downlink multi-antenna systems embracing both classical spatial division multiple access and more recent non-orthogonal multiple access. Finding a linear precoding strategy that maximizes the sum spectral efficiency of RSMA is a challenging yet significant problem. In this paper, we put forth a novel precoder design framework that jointly finds the linear precoders for the common and private messages for RSMA. Our approach is first to approximate the non-smooth minimum function part in the sum spectral efficiency of RSMA using a LogSumExp technique. Then, we reformulate the sum spectral efficiency maximization problem as a form of the log-sum of Rayleigh quotients to convert it into a tractable non-convex optimization problem. By interpreting the first-order optimality condition of the reformulated problem as an eigenvector-dependent nonlinear eigenvalue problem, we reveal that a leading eigenvector is a local optimal solution. To find the leading eigenvector, we propose a computationally efficient algorithm inspired by a power iteration method. Simulation results show that the proposed RSMA transmission strategy provides significant improvement in the sum spectral efficiency compared to the state-of-the-art RSMA transmission methods, while requiring considerably less computational complexity.
The discretization of robust quadratic optimal control problems under uncertainty using the finite element method and the stochastic collocation method leads to large saddle-point systems, which are fully coupled across the random realizations. Despite its relevance for numerous engineering problems, the solution of such systems is notoriusly challenging. In this manuscript, we study efficient preconditioners for all-at-once approaches using both an algebraic and an operator preconditioning framework. We show in particular that for values of the regularization parameter not too small, the saddle-point system can be efficiently solved by preconditioning in parallel all the state and adjoint equations. For small values of the regularization parameter, robustness can be recovered by the additional solution of a small linear system, which however couples all realizations. A mean approximation and a Chebyshev semi-iterative method are investigated to solve this reduced system. Our analysis considers a random elliptic partial differential equation whose diffusion coefficient $\kappa(x,\omega)$ is modeled as an almost surely continuous and positive random field, though not necessarily uniformly bounded and coercive. We further provide estimates on the dependence of the preconditioned system on the variance of the random field. Such estimates involve either the first or second moment of the random variables $1/\min_{x\in \overline{D}} \kappa(x,\omega)$ and $\max_{x\in \overline{D}}\kappa(x,\omega)$, where $D$ is the spatial domain. The theoretical results are confirmed by numerical experiments, and implementation details are further addressed.
Much recent interest has focused on the design of optimization algorithms from the discretization of an associated optimization flow, i.e., a system of differential equations (ODEs) whose trajectories solve an associated optimization problem. Such a design approach poses an important problem: how to find a principled methodology to design and discretize appropriate ODEs. This paper aims to provide a solution to this problem through the use of contraction theory. We first introduce general mathematical results that explain how contraction theory guarantees the stability of the implicit and explicit Euler integration methods. Then, we propose a novel system of ODEs, namely the Accelerated-Contracting-Nesterov flow, and use contraction theory to establish it is an optimization flow with exponential convergence rate, from which the linear convergence rate of its associated optimization algorithm is immediately established. Remarkably, a simple explicit Euler discretization of this flow corresponds to the Nesterov acceleration method. Finally, we present how our approach leads to performance guarantees in the design of optimization algorithms for time-varying optimization problems.
The proliferation of Internet-of-Things (IoT) devices and cloud-computing applications over siloed data centers is motivating renewed interest in the collaborative training of a shared model by multiple individual clients via federated learning (FL). To improve the communication efficiency of FL implementations in wireless systems, recent works have proposed compression and dimension reduction mechanisms, along with digital and analog transmission schemes that account for channel noise, fading, and interference. The prior art has mainly focused on star topologies consisting of distributed clients and a central server. In contrast, this paper studies FL over wireless device-to-device (D2D) networks by providing theoretical insights into the performance of digital and analog implementations of decentralized stochastic gradient descent (DSGD). First, we introduce generic digital and analog wireless implementations of communication-efficient DSGD algorithms, leveraging random linear coding (RLC) for compression and over-the-air computation (AirComp) for simultaneous analog transmissions. Next, under the assumptions of convexity and connectivity, we provide convergence bounds for both implementations. The results demonstrate the dependence of the optimality gap on the connectivity and on the signal-to-noise ratio (SNR) levels in the network. The analysis is corroborated by experiments on an image-classification task.
Simultaneous statistical inference problems are at the basis of almost any scientific discovery process. We consider a class of simultaneous inference problems that are invariant under permutations, meaning that all components of the problem are oblivious to the labelling of the multiple instances under consideration. For any such problem we identify the optimal solution which is itself permutation invariant, the most natural condition one could impose on the set of candidate solutions. Interpreted differently, for any possible value of the parameter we find a tight (non-asymptotic) lower bound on the statistical performance of any procedure that obeys the aforementioned condition. By generalizing the standard decision theoretic notions of permutation invariance, we show that the results apply to a myriad of popular problems in simultaneous inference, so that the ultimate benchmark for each of these problems is identified. The connection to the nonparametric empirical Bayes approach of Robbins is discussed in the context of asymptotic attainability of the bound uniformly in the parameter value.
Generalized approximate message passing (GAMP) is a promising technique for unknown signal reconstruction of generalized linear models (GLM). However, it requires that the transformation matrix has independent and identically distributed (IID) entries. In this context, generalized vector AMP (GVAMP) is proposed for general unitarily-invariant transformation matrices but it has a high-complexity matrix inverse. To this end, we propose a universal generalized memory AMP (GMAMP) framework including the existing orthogonal AMP/VAMP, GVAMP, and MAMP as special instances. Due to the characteristics that local processors are all memory, GMAMP requires stricter orthogonality to guarantee the asymptotic IID Gaussianity and state evolution. To satisfy such orthogonality, local orthogonal memory estimators are established. The GMAMP framework provides a new principle toward building new advanced AMP-type algorithms. As an example, we construct a Bayes-optimal GMAMP (BO-GMAMP), which uses a low-complexity memory linear estimator to suppress the linear interference, and thus its complexity is comparable to GAMP. Furthermore, we prove that for unitarily-invariant transformation matrices, BO-GMAMP achieves the replica minimum (i.e., Bayes-optimal) MSE if it has a unique fixed point.
This paper studies the problem of federated learning (FL) in the absence of a trustworthy server/clients. In this setting, each client needs to ensure the privacy of its own data without relying on the server or other clients. We study local differential privacy (LDP) and provide tight upper and lower bounds that establish the minimax optimal rates (up to logarithms) for LDP convex/strongly convex federated stochastic optimization. Our rates match the optimal statistical rates in certain practical parameter regimes ("privacy for free"). Second, we develop a novel time-varying noisy SGD algorithm, leading to the first non-trivial LDP risk bounds for FL with non-i.i.d. clients. Third, we consider the special case where each client's loss function is empirical and develop an accelerated LDP FL algorithm to improve communication complexity compared to existing works. We also provide matching lower bounds, establishing the optimality of our algorithm for convex/strongly convex settings. Fourth, with a secure shuffler to anonymize client reports (but without a trusted server), our algorithm attains the optimal central DP rates for stochastic convex/strongly convex optimization, thereby achieving optimality in the local and central models simultaneously. Our upper bounds quantify the role of network communication reliability in performance.
Interpretation of Deep Neural Networks (DNNs) training as an optimal control problem with nonlinear dynamical systems has received considerable attention recently, yet the algorithmic development remains relatively limited. In this work, we make an attempt along this line by reformulating the training procedure from the trajectory optimization perspective. We first show that most widely-used algorithms for training DNNs can be linked to the Differential Dynamic Programming (DDP), a celebrated second-order trajectory optimization algorithm rooted in the Approximate Dynamic Programming. In this vein, we propose a new variant of DDP that can accept batch optimization for training feedforward networks, while integrating naturally with the recent progress in curvature approximation. The resulting algorithm features layer-wise feedback policies which improve convergence rate and reduce sensitivity to hyper-parameter over existing methods. We show that the algorithm is competitive against state-ofthe-art first and second order methods. Our work opens up new avenues for principled algorithmic design built upon the optimal control theory.
Learning to classify unseen class samples at test time is popularly referred to as zero-shot learning (ZSL). If test samples can be from training (seen) as well as unseen classes, it is a more challenging problem due to the existence of strong bias towards seen classes. This problem is generally known as \emph{generalized} zero-shot learning (GZSL). Thanks to the recent advances in generative models such as VAEs and GANs, sample synthesis based approaches have gained considerable attention for solving this problem. These approaches are able to handle the problem of class bias by synthesizing unseen class samples. However, these ZSL/GZSL models suffer due to the following key limitations: $(i)$ Their training stage learns a class-conditioned generator using only \emph{seen} class data and the training stage does not \emph{explicitly} learn to generate the unseen class samples; $(ii)$ They do not learn a generic optimal parameter which can easily generalize for both seen and unseen class generation; and $(iii)$ If we only have access to a very few samples per seen class, these models tend to perform poorly. In this paper, we propose a meta-learning based generative model that naturally handles these limitations. The proposed model is based on integrating model-agnostic meta learning with a Wasserstein GAN (WGAN) to handle $(i)$ and $(iii)$, and uses a novel task distribution to handle $(ii)$. Our proposed model yields significant improvements on standard ZSL as well as more challenging GZSL setting. In ZSL setting, our model yields 4.5\%, 6.0\%, 9.8\%, and 27.9\% relative improvements over the current state-of-the-art on CUB, AWA1, AWA2, and aPY datasets, respectively.
Network embedding has attracted considerable research attention recently. However, the existing methods are incapable of handling billion-scale networks, because they are computationally expensive and, at the same time, difficult to be accelerated by distributed computing schemes. To address these problems, we propose RandNE, a novel and simple billion-scale network embedding method. Specifically, we propose a Gaussian random projection approach to map the network into a low-dimensional embedding space while preserving the high-order proximities between nodes. To reduce the time complexity, we design an iterative projection procedure to avoid the explicit calculation of the high-order proximities. Theoretical analysis shows that our method is extremely efficient, and friendly to distributed computing schemes without any communication cost in the calculation. We demonstrate the efficacy of RandNE over state-of-the-art methods in network reconstruction and link prediction tasks on multiple datasets with different scales, ranging from thousands to billions of nodes and edges.
Policy gradient methods are widely used in reinforcement learning algorithms to search for better policies in the parameterized policy space. They do gradient search in the policy space and are known to converge very slowly. Nesterov developed an accelerated gradient search algorithm for convex optimization problems. This has been recently extended for non-convex and also stochastic optimization. We use Nesterov's acceleration for policy gradient search in the well-known actor-critic algorithm and show the convergence using ODE method. We tested this algorithm on a scheduling problem. Here an incoming job is scheduled into one of the four queues based on the queue lengths. We see from experimental results that algorithm using Nesterov's acceleration has significantly better performance compared to algorithm which do not use acceleration. To the best of our knowledge this is the first time Nesterov's acceleration has been used with actor-critic algorithm.