Evolution strategy (ES) is one of promising classes of algorithms for black-box continuous optimization. Despite its broad successes in applications, theoretical analysis on the speed of its convergence is limited on convex quadratic functions and their monotonic transformation. In this study, an upper bound and a lower bound of the rate of linear convergence of the (1+1)-ES on locally $L$-strongly convex functions with $U$-Lipschitz continuous gradient are derived as $\exp\left(-\Omega_{d\to\infty}\left(\frac{L}{d\cdot U}\right)\right)$ and $\exp\left(-\frac1d\right)$, respectively. Notably, any prior knowledge on the mathematical properties of the objective function such as Lipschitz constant is not given to the algorithm, whereas the existing analyses of derivative-free optimization algorithms require them.
Cyclic block coordinate methods are a fundamental class of optimization methods widely used in practice and implemented as part of standard software packages for statistical learning. Nevertheless, their convergence is generally not well understood and so far their good practical performance has not been explained by existing convergence analyses. In this work, we introduce a new block coordinate method that applies to the general class of variational inequality (VI) problems with monotone operators. This class includes composite convex optimization problems and convex-concave min-max optimization problems as special cases and has not been addressed by the existing work. The resulting convergence bounds match the optimal convergence bounds of full gradient methods, but are provided in terms of a novel gradient Lipschitz condition w.r.t.~a Mahalanobis norm. For $m$ coordinate blocks, the resulting gradient Lipschitz constant in our bounds is never larger than a factor $\sqrt{m}$ compared to the traditional Euclidean Lipschitz constant, while it is possible for it to be much smaller. Further, for the case when the operator in the VI has finite-sum structure, we propose a variance reduced variant of our method which further decreases the per-iteration cost and has better convergence rates in certain regimes. To obtain these results, we use a gradient extrapolation strategy that allows us to view a cyclic collection of block coordinate-wise gradients as one implicit gradient.
One of the most critical problems in machine learning is HyperParameter Optimization (HPO), since choice of hyperparameters has a significant impact on final model performance. Although there are many HPO algorithms, they either have no theoretical guarantees or require strong assumptions. To this end, we introduce BLiE -- a Lipschitz-bandit-based algorithm for HPO that only assumes Lipschitz continuity of the objective function. BLiE exploits the landscape of the objective function to adaptively search over the hyperparameter space. Theoretically, we show that $(i)$ BLiE finds an $\epsilon$-optimal hyperparameter with $\mathcal{O} \left( \epsilon^{-(d_z + \beta)}\right)$ total budgets, where $d_z$ and $\beta$ are problem intrinsic; $(ii)$ BLiE is highly parallelizable. Empirically, we demonstrate that BLiE outperforms the state-of-the-art HPO algorithms on benchmark tasks. We also apply BLiE to search for noise schedule of diffusion models. Comparison with the default schedule shows that BLiE schedule greatly improves the sampling speed.
We consider the point-to-point lossy coding for computing and channel coding problems with two-sided information. We first unify these problems by considering a new generalized problem. Then we develop graph-based characterizations and derive interesting reductions through explicit graph operations, which reduce the number of decision variables. After that, we design alternating optimization algorithms for the unified problems, so that numerical computations for both the source and channel problems are covered. With the help of extra root-finding techniques, proper multiplier update strategies are developed. Thus our algorithms can compute the problems for a given distortion or cost constraint and the convergence can be proved. Also, extra heuristic deflation techniques are introduced which largely reduce the computational time. Numerical results show the accuracy and efficiency of our algorithms.
SeDuMi and SDPT3 are two solvers for solving Semi-definite Programming (SDP) or Linear Matrix Inequality (LMI) problems. A computational performance comparison of these two are undertaken in this paper regarding the Stability of Continuous-time Linear Systems. The comparison mainly focuses on computational times and memory requirements for different scales of problems. To implement and compare the two solvers on a set of well-posed problems, we employ YALMIP, a widely used toolbox for modeling and optimization in MATLAB. The primary goal of this study is to provide an empirical assessment of the relative computational efficiency of SeDuMi and SDPT3 under varying problem conditions. Our evaluation indicates that SDPT3 performs much better in large-scale, high-precision calculations.
Two combined numerical methods for solving time-varying semilinear differential-algebraic equations (DAEs) are obtained. The convergence and correctness of the methods are proved. When constructing the methods, time-varying spectral projectors which can be found numerically are used. This enables to numerically solve the DAE in the original form without additional analytical transformations. To improve the accuracy of the second method, recalculation is used. The developed methods are applicable to the DAEs with the continuous nonlinear part which may not be differentiable in time, and the restrictions of the type of the global Lipschitz condition are not used in the presented theorems on the DAE global solvability and the convergence of the methods. This extends the scope of methods. The fulfillment of the conditions of the global solvability theorem ensures the existence of a unique exact solution on any given time interval, which enables to seek an approximate solution also on any time interval. Numerical examples illustrating the capabilities of the methods and their effectiveness in various situations are provided. To demonstrate this, mathematical models of the dynamics of electrical circuits are considered. It is shown that the results of the theoretical and numerical analyses of these models are consistent.
Motivated by various computational applications, we investigate the problem of estimating nested expectations. Building upon recent work by the authors, we propose a novel Monte Carlo estimator for nested expectations, inspired by sparse grid quadrature, that does not require sampling from inner conditional distributions. Theoretical analysis establishes an upper bound on the mean squared error of our estimator under mild assumptions on the problem, demonstrating its efficiency for cases with low-dimensional outer variables. We illustrate the effectiveness of our estimator through its application to problems related to value of information analysis, with moderate dimensionality. Overall, our method presents a promising approach to efficiently estimate nested expectations in practical computational settings.
ISAC is recognized as a promising technology for the next-generation wireless networks, which provides significant performance gains over individual S&C systems via the shared use of wireless resources. The characterization of the S&C performance tradeoff is at the core of the theoretical foundation of ISAC. In this paper, we consider a point-to-point ISAC model under vector Gaussian channels, and propose to use the CRB-rate region as a basic tool for depicting the fundamental S&C tradeoff. In particular, we consider the scenario where a unified ISAC waveform is emitted from a dual-functional ISAC Tx, which simultaneously performs S&C tasks with a communication Rx and a sensing Rx. In order to perform both S&C tasks, the ISAC waveform is required to be random to convey communication information, with realizations being perfectly known at both the ISAC Tx and the sensing Rx as a reference sensing signal as in typical radar systems. As the main contribution of this paper, we characterize the S&C performance at the two corner points of the CRB-rate region, namely, $P_{SC}$ indicating the max. achievable rate constrained by the min. CRB, and $P_{CS}$ indicating the min. achievable CRB constrained by the max. rate. In particular, we derive the high-SNR capacity at $P_{SC}$, and provide lower and upper bounds for the sensing CRB at $P_{CS}$. We show that these two points can be achieved by the conventional Gaussian signaling and a novel strategy relying on the uniform distribution over the Stiefel manifold, respectively. Based on the above-mentioned analysis, we provide an outer bound and various inner bounds for the achievable CRB-rate regions. Our main results reveal a two-fold tradeoff in ISAC systems, consisting of the subspace tradeoff (ST) and the deterministic-random tradeoff (DRT) that depend on the resource allocation and data modulation schemes employed for S&C, respectively.
U-statistics play central roles in many statistical learning tools but face the haunting issue of scalability. Significant efforts have been devoted into accelerating computation by U-statistic reduction. However, existing results almost exclusively focus on power analysis, while little work addresses risk control accuracy -- comparatively, the latter requires distinct and much more challenging techniques. In this paper, we establish the first statistical inference procedure with provably higher-order accurate risk control for incomplete U-statistics. The sharpness of our new result enables us to reveal how risk control accuracy also trades off with speed for the first time in literature, which complements the well-known variance-speed trade-off. Our proposed general framework converts the long-standing challenge of formulating accurate statistical inference procedures for many different designs into a surprisingly routine task. This paper covers non-degenerate and degenerate U-statistics, and network moments. We conducted comprehensive numerical studies and observed results that validate our theory's sharpness. Our method also demonstrates effectiveness on real-world data applications.
We investigate a generalized framework for estimating latent low-rank tensors in an online setting, encompassing both linear and generalized linear models. This framework offers a flexible approach for handling continuous or categorical variables. Additionally, we investigate two specific applications: online tensor completion and online binary tensor learning. To address these challenges, we propose the online Riemannian gradient descent algorithm, which demonstrates linear convergence and the ability to recover the low-rank component under appropriate conditions in all applications. Furthermore, we establish a precise entry-wise error bound for online tensor completion. Notably, our work represents the first attempt to incorporate noise in the online low-rank tensor recovery task. Intriguingly, we observe a surprising trade-off between computational and statistical aspects in the presence of noise. Increasing the step size accelerates convergence but leads to higher statistical error, whereas a smaller step size yields a statistically optimal estimator at the expense of slower convergence. Moreover, we conduct regret analysis for online tensor regression. Under the fixed step size regime, a fascinating trilemma concerning the convergence rate, statistical error rate, and regret is observed. With an optimal choice of step size we achieve an optimal regret of $O(\sqrt{T})$. Furthermore, we extend our analysis to the adaptive setting where the horizon T is unknown. In this case, we demonstrate that by employing different step sizes, we can attain a statistically optimal error rate along with a regret of $O(\log T)$. To validate our theoretical claims, we provide numerical results that corroborate our findings and support our assertions.
In many practical applications including remote sensing, multi-task learning, and multi-spectrum imaging, data are described as a set of matrices sharing a common column space. We consider the joint estimation of such matrices from their noisy linear measurements. We study a convex estimator regularized by a pair of matrix norms. The measurement model corresponds to block-wise sensing and the reconstruction is possible only when the total energy is well distributed over blocks. The first norm, which is the maximum-block-Frobenius norm, favors such a solution. This condition is analogous to the notion of low-spikiness in matrix completion or column-wise sensing. The second norm, which is a tensor norm on a pair of suitable Banach spaces, induces low-rankness in the solution together with the first norm. We demonstrate that the joint estimation provides a significant gain over the individual recovery of each matrix when the number of matrices sharing a column space and the ambient dimension of the shared column space are large relative to the number of columns in each matrix. The convex estimator is cast as a semidefinite program and an efficient ADMM algorithm is derived. The empirical behavior of the convex estimator is illustrated using Monte Carlo simulations and recovery performance is compared to existing methods in the literature.