Complex-variable matrix optimization problems (CMOPs) in Frobenius norm emerge in many areas of applied mathematics and engineering applications. In this letter, we focus on solving CMOPs by iterative methods. For unconstrained CMOPs, we prove that the gradient descent (GD) method is feasible in the complex domain. Further, in view of reducing the computation complexity, constrained CMOPs are solved by a projection gradient descent (PGD) method. The theoretical analysis shows that the PGD method maintains a good convergence in the complex domain. Experiment results well support the theoretical analysis.
Reinforcement learning on high-dimensional and complex problems relies on abstraction for improved efficiency and generalization. In this paper, we study abstraction in the continuous-control setting, and extend the definition of MDP homomorphisms to the setting of continuous state and action spaces. We derive a policy gradient theorem on the abstract MDP for both stochastic and deterministic policies. Our policy gradient results allow for leveraging approximate symmetries of the environment for policy optimization. Based on these theorems, we propose a family of actor-critic algorithms that are able to learn the policy and the MDP homomorphism map simultaneously, using the lax bisimulation metric. Finally, we introduce a series of environments with continuous symmetries to further demonstrate the ability of our algorithm for action abstraction in the presence of such symmetries. We demonstrate the effectiveness of our method on our environments, as well as on challenging visual control tasks from the DeepMind Control Suite. Our method's ability to utilize MDP homomorphisms for representation learning leads to improved performance, and the visualizations of the latent space clearly demonstrate the structure of the learned abstraction.
Stochastic dual dynamic programming is a cutting plane type algorithm for multi-stage stochastic optimization originated about 30 years ago. In spite of its popularity in practice, there does not exist any analysis on the convergence rates of this method. In this paper, we first establish the number of iterations, i.e., iteration complexity, required by a basic dynamic cutting plane method for solving relatively simple multi-stage optimization problems, by introducing novel mathematical tools including the saturation of search points. We then refine these basic tools and establish the iteration complexity for both deterministic and stochastic dual dynamic programming methods for solving more general multi-stage stochastic optimization problems under the standard stage-wise independence assumption. Our results indicate that the complexity of some deterministic variants of these methods mildly increases with the number of stages $T$, in fact linearly dependent on $T$ for discounted problems. Therefore, they are efficient for strategic decision making which involves a large number of stages, but with a relatively small number of decision variables in each stage. Without explicitly discretizing the state and action spaces, these methods might also be pertinent to the related reinforcement learning and stochastic control areas.
We show that there exist infinitely many $n \in \mathbb{Z}^+$ such that for any constant $\epsilon > 0$, any deterministic algorithm to solve $k$-\textsf{SAT} for $k \geq 3$ must perform at least $(2^{k-\frac{3}{2}-\epsilon})^{\frac{n}{k+1}}$ operations, where $n$ is the number of variables in the $k$\textsf{-SAT} instance.
Wasserstein distributionally robust optimization (\textsf{WDRO}) is a popular model to enhance the robustness of machine learning with ambiguous data. However, the complexity of \textsf{WDRO} can be prohibitive in practice since solving its ``minimax'' formulation requires a great amount of computation. Recently, several fast \textsf{WDRO} training algorithms for some specific machine learning tasks (e.g., logistic regression) have been developed. However, the research on designing efficient algorithms for general large-scale \textsf{WDRO}s is still quite limited, to the best of our knowledge. \textit{Coreset} is an important tool for compressing large dataset, and thus it has been widely applied to reduce the computational complexities for many optimization problems. In this paper, we introduce a unified framework to construct the $\epsilon$-coreset for the general \textsf{WDRO} problems. Though it is challenging to obtain a conventional coreset for \textsf{WDRO} due to the uncertainty issue of ambiguous data, we show that we can compute a ``dual coreset'' by using the strong duality property of \textsf{WDRO}. Also, the error introduced by the dual coreset can be theoretically guaranteed for the original \textsf{WDRO} objective. To construct the dual coreset, we propose a novel grid sampling approach that is particularly suitable for the dual formulation of \textsf{WDRO}. Finally, we implement our coreset approach and illustrate its effectiveness for several \textsf{WDRO} problems in the experiments.
We introduce a priori Sobolev-space error estimates for the solution of nonlinear, and possibly parametric, PDEs using Gaussian process and kernel based methods. The primary assumptions are: (1) a continuous embedding of the reproducing kernel Hilbert space of the kernel into a Sobolev space of sufficient regularity; and (2) the stability of the differential operator and the solution map of the PDE between corresponding Sobolev spaces. The proof is articulated around Sobolev norm error estimates for kernel interpolants and relies on the minimizing norm property of the solution. The error estimates demonstrate dimension-benign convergence rates if the solution space of the PDE is smooth enough. We illustrate these points with applications to high-dimensional nonlinear elliptic PDEs and parametric PDEs. Although some recent machine learning methods have been presented as breaking the curse of dimensionality in solving high-dimensional PDEs, our analysis suggests a more nuanced picture: there is a trade-off between the regularity of the solution and the presence of the curse of dimensionality. Therefore, our results are in line with the understanding that the curse is absent when the solution is regular enough.
We consider a finite-dimensional vector space $W\subset K^E$ over an arbitrary field $K$ and an arbitrary set $E$. We show that the set $C(W)\subset 2^E$ consisting of the minimal supports of $W$ are the circuits of a matroid on $E$. In particular, we show that this matroid is cofinitary (hence, tame). When the cardinality of $K$ is large enough (with respect to the cardinality of $E$), then the set $trop(W)\subset 2^E$ consisting of all the supports of $W$ is a matroid itself. Afterwards we apply these results to tropical differential algebraic geometry and study the set of supports $trop(Sol(\Sigma))\subset (2^{\mathbb{N}^{m}})^n$ of spaces of formal power series solutions $\text{Sol}(\Sigma)$ of systems of linear differential equations $\Sigma$ in differential variables $x_1,\ldots,x_n$ having coefficients in the ring ${K}[\![t_1,\ldots,t_m]\!]$. If $\Sigma$ is of differential type zero, then the set $C(Sol(\Sigma))\subset (2^{\mathbb{N}^{m}})^n$ of minimal supports defines a matroid on $E=\mathbb{N}^{mn}$, and if the cardinality of $K$ is large enough, then the set of supports $trop(Sol(\Sigma))$ itself is a matroid on $E$ as well. By applying the fundamental theorem of tropical differential algebraic geometry (fttdag), we give a necessary condition under which the set of solutions $Sol(U)$ of a system $U$ of tropical linear differential equations to be a matroid. We also give a counterexample to the fttdag for systems $\Sigma$ of linear differential equations over countable fields. In this case, the set $trop(Sol(\Sigma))$ may not form a matroid.
Conventionally, federated learning aims to optimize a single objective, typically the utility. However, for a federated learning system to be trustworthy, it needs to simultaneously satisfy multiple/many objectives, such as maximizing model performance, minimizing privacy leakage and training cost, and being robust to malicious attacks. Multi-Objective Optimization (MOO) aiming to optimize multiple conflicting objectives at the same time is quite suitable for solving the optimization problem of Trustworthy Federated Learning (TFL). In this paper, we unify MOO and TFL by formulating the problem of constrained multi-objective federated learning (CMOFL). Under this formulation, existing MOO algorithms can be adapted to TFL straightforwardly. Different from existing CMOFL works focusing on utility, efficiency, fairness, and robustness, we consider optimizing privacy leakage along with utility loss and training cost, the three primary objectives of a TFL system. We develop two improved CMOFL algorithms based on NSGA-II and PSL, respectively, for effectively and efficiently finding Pareto optimal solutions, and we provide theoretical analysis on their convergence. We design specific measurements of privacy leakage, utility loss, and training cost for three privacy protection mechanisms: Randomization, BatchCrypt (An efficient version of homomorphic encryption), and Sparsification. Empirical experiments conducted under each of the three protection mechanisms demonstrate the effectiveness of our proposed algorithms.
The ParaOpt algorithm was recently introduced as a time-parallel solver for optimal-control problems with a terminal-cost objective, and convergence results have been presented for the linear diffusive case with implicit-Euler time integrators. We reformulate ParaOpt for tracking problems and provide generalized convergence analyses for both objectives. We focus on linear diffusive equations and prove convergence bounds that are generic in the time integrators used. For large problem dimensions, ParaOpt's performance depends crucially on having a good preconditioner to solve the arising linear systems. For the case where ParaOpt's cheap, coarse-grained propagator is linear, we introduce diagonalization-based preconditioners inspired by recent advances in the ParaDiag family of methods. These preconditioners not only lead to a weakly-scalable ParaOpt version, but are themselves invertible in parallel, making maximal use of available concurrency. They have proven convergence properties in the linear diffusive case that are generic in the time discretization used, similarly to our ParaOpt results. Numerical results confirm that the iteration count of the iterative solvers used for ParaOpt's linear systems becomes constant in the limit of an increasing processor count. The paper is accompanied by a sequential MATLAB implementation.
It is known that standard stochastic Galerkin methods encounter challenges when solving partial differential equations with high dimensional random inputs, which are typically caused by the large number of stochastic basis functions required. It becomes crucial to properly choose effective basis functions, such that the dimension of the stochastic approximation space can be reduced. In this work, we focus on the stochastic Galerkin approximation associated with generalized polynomial chaos (gPC), and explore the gPC expansion based on the analysis of variance (ANOVA) decomposition. A concise form of the gPC expansion is presented for each component function of the ANOVA expansion, and an adaptive ANOVA procedure is proposed to construct the overall stochastic Galerkin system. Numerical results demonstrate the efficiency of our proposed adaptive ANOVA stochastic Galerkin method.
Linear inverse problems arise in diverse engineering fields especially in signal and image reconstruction. The development of computational methods for linear inverse problems with sparsity tool is one of the recent trends in this area. The so-called optimal $k$-thresholding is a newly introduced method for sparse optimization and linear inverse problems. Compared to other sparsity-aware algorithms, the advantage of optimal $k$-thresholding method lies in that it performs thresholding and error metric reduction simultaneously and thus works stably and robustly for solving medium-sized linear inverse problems. However, the runtime of this method remains high when the problem size is relatively large. The purpose of this paper is to propose an acceleration strategy for this method. Specifically, we propose a heavy-ball-based optimal $k$-thresholding (HBOT) algorithm and its relaxed variants for sparse linear inverse problems. The convergence of these algorithms is shown under the restricted isometry property. In addition, the numerical performance of the heavy-ball-based relaxed optimal $k$-thresholding pursuit (HBROTP) has been studied, and simulations indicate that HBROTP admits robust capability for signal and image reconstruction even in noisy environments.