Optimization of parameterized quantum circuits is indispensable for applications of near-term quantum devices to computational tasks with variational quantum algorithms (VQAs). However, the existing optimization algorithms for VQAs require an excessive number of quantum-measurement shots in estimating expectation values of observables or iterating updates of circuit parameters, whose cost has been a crucial obstacle for practical use. To address this problem, we develop an efficient framework, \textit{stochastic gradient line Bayesian optimization} (SGLBO), for the circuit optimization with fewer measurement shots. The SGLBO reduces the cost of measurement shots by estimating an appropriate direction of updating the parameters based on stochastic gradient descent (SGD) and further by utilizing Bayesian optimization (BO) to estimate the optimal step size in each iteration of the SGD. We formulate an adaptive measurement-shot strategy to achieve the optimization feasibly without relying on precise expectation-value estimation and many iterations; moreover, we show that a technique of suffix averaging can significantly reduce the effect of statistical and hardware noise in the optimization for the VQAs. Our numerical simulation demonstrates that the SGLBO augmented with these techniques can drastically reduce the required number of measurement shots, improve the accuracy in the optimization, and enhance the robustness against the noise compared to other state-of-art optimizers in representative tasks for the VQAs. These results establish a framework of quantum-circuit optimizers integrating two different optimization approaches, SGD and BO, to reduce the cost of measurement shots significantly.
We consider stochastic differential equations (SDEs) driven by small L\'evy noise with some unknown parameters, and propose a new type of least squares estimators based on discrete samples from the SDEs. To approximate the increments of a process from the SDEs, we shall use not the usual Euler method, but the Adams method, that is, a well-known numerical approximation of the solution to the ordinary differential equation appearing in the limit of the SDE. We show the consistency of the proposed estimators as well as the asymptotic distribution in a suitable observation scheme. We also show that our estimators can be better than the usual LSE based on the Euler method in the finite sample performance.
As the problems to be optimized with deep learning become more practical, their datasets inevitably contain a variety of noise, such as mislabeling and substitution by estimated inputs/outputs, which would have negative impacts on the optimization results. As a safety net, it is a natural idea to improve a stochastic gradient descent (SGD) optimizer, which updates the network parameters as the final process of learning, to be more robust to noise. The related work revealed that the first momentum utilized in the Adam-like SGD optimizers can be modified based on the noise-robust student's t-distribution, resulting in inheriting the robustness to noise. In this paper, we propose AdaTerm, which derives not only the first momentum but also all the involved statistics based on the student's t-distribution. If the computed gradients seem to probably be aberrant, AdaTerm is expected to exclude the computed gradients for updates, and reinforce the robustness for the next updates; otherwise, it updates the network parameters normally, and can relax the robustness for the next updates. With this noise-adaptive behavior, the excellent learning performance of AdaTerm was confirmed via typical optimization problems with several cases where the noise ratio would be different.
A systematic theory of structural limits for finite models has been developed by Nesetril and Ossona de Mendez. It is based on the insight that the collection of finite structures can be embedded, via a map they call the Stone pairing, in a space of measures, where the desired limits can be computed. We show that a closely related but finer grained space of (finitely additive) measures arises -- via Stone-Priestley duality and the notion of types from model theory -- by enriching the expressive power of first-order logic with certain "probabilistic operators". We provide a sound and complete calculus for this extended logic and expose the functorial nature of this construction. The consequences are two-fold. On the one hand, we identify the logical gist of the theory of structural limits. On the other hand, our construction shows that the duality theoretic variant of the Stone pairing captures the adding of a layer of quantifiers, thus making a strong link to recent work on semiring quantifiers in logic on words. In the process, we identify the model theoretic notion of types as the unifying concept behind this link. These results contribute to bridging the strands of logic in computer science which focus on semantics and on more algorithmic and complexity related areas, respectively.
Stochastic PDE eigenvalue problems often arise in the field of uncertainty quantification, whereby one seeks to quantify the uncertainty in an eigenvalue, or its eigenfunction. In this paper we present an efficient multilevel quasi-Monte Carlo (MLQMC) algorithm for computing the expectation of the smallest eigenvalue of an elliptic eigenvalue problem with stochastic coefficients. Each sample evaluation requires the solution of a PDE eigenvalue problem, and so tackling this problem in practice is notoriously computationally difficult. We speed up the approximation of this expectation in four ways: we use a multilevel variance reduction scheme to spread the work over a hierarchy of FE meshes and truncation dimensions; we use QMC methods to efficiently compute the expectations on each level; we exploit the smoothness in parameter space and reuse the eigenvector from a nearby QMC point to reduce the number of iterations of the eigensolver; and we utilise a two-grid discretisation scheme to obtain the eigenvalue on the fine mesh with a single linear solve. The full error analysis of a basic MLQMC algorithm is given in the companion paper [Gilbert and Scheichl, 2022], and so in this paper we focus on how to further improve the efficiency and provide theoretical justification for using nearby QMC points and two-grid methods. Numerical results are presented that show the efficiency of our algorithm, and also show that the four strategies we employ are complementary.
Stochastic PDE eigenvalue problems are useful models for quantifying the uncertainty in several applications from the physical sciences and engineering, e.g., structural vibration analysis, the criticality of a nuclear reactor or photonic crystal structures. In this paper we present a simple multilevel quasi-Monte Carlo (MLQMC) method for approximating the expectation of the minimal eigenvalue of an elliptic eigenvalue problem with coefficients that are given as a series expansion of countably-many stochastic parameters. The MLQMC algorithm is based on a hierarchy of discretisations of the spatial domain and truncations of the dimension of the stochastic parameter domain. To approximate the expectations, randomly shifted lattice rules are employed. This paper is primarily dedicated to giving a rigorous analysis of the error of this algorithm. A key step in the error analysis requires bounds on the mixed derivatives of the eigenfunction with respect to both the stochastic and spatial variables simultaneously. Under stronger smoothness assumptions on the parametric dependence, our analysis also extends to multilevel higher-order quasi-Monte Carlo rules. An accompanying paper [Gilbert and Scheichl, 2022], focusses on practical extensions of the MLQMC algorithm to improve efficiency, and presents numerical results.
Improving software performance through configuration parameter tuning is a common activity during software maintenance. Beyond traditional performance metrics like latency, mobile app developers are interested in reducing app energy usage. Some mobile apps have centralized locations for parameter tuning, similar to databases and operating systems, but it is common for mobile apps to have hundreds of parameters scattered around the source code. The correlation between these "deep" parameters and app energy usage is unclear. Researchers have studied the energy effects of deep parameters in specific modules, but we lack a systematic understanding of the energy impact of mobile deep parameters. In this paper we empirically investigate this topic, combining a developer survey with systematic energy measurements. Our motivational survey of 25 Android developers suggests that developers do not understand, and largely ignore, the energy impact of deep parameters. To assess the potential implications of this practice, we propose a deep parameter energy profiling framework that can analyze the energy impact of deep parameters in an app. Our framework identifies deep parameters, mutates them based on our parameter value selection scheme, and performs reliable energy impact analysis. Applying the framework to 16 popular Android apps, we discovered that deep parameter-induced energy inefficiency is rare. We found only 2 out of 1644 deep parameters for which a different value would significantly improve its app's energy efficiency. A detailed analysis found that most deep parameters have either no energy impact, limited energy impact, or an energy impact only under extreme values. Our study suggests that it is generally safe for developers to ignore the energy impact when choosing deep parameter values in mobile apps.
In this paper we consider a linearized variable-time-step two-step backward differentiation formula (BDF2) scheme for solving nonlinear parabolic equations. The scheme is constructed by using the variable time-step BDF2 for the linear term and a Newton linearized method for the nonlinear term in time combining with a Galerkin finite element method (FEM) in space. We prove the unconditionally optimal error estimate of the proposed scheme under mild restrictions on the ratio of adjacent time-steps, i.e. $0<r_k < r_{\max} \approx 4.8645$ and on the maximum time step. The proof involves the discrete orthogonal convolution (DOC) and discrete complementary convolution (DCC) kernels, and the error splitting approach. In addition, our analysis also shows that the first level solution $u^1$ obtained by BDF1 (i.e. backward Euler scheme) does not cause the loss of global accuracy of second order. Numerical examples are provided to demonstrate our theoretical results.
This paper aims to explore models based on the extreme gradient boosting (XGBoost) approach for business risk classification. Feature selection (FS) algorithms and hyper-parameter optimizations are simultaneously considered during model training. The five most commonly used FS methods including weight by Gini, weight by Chi-square, hierarchical variable clustering, weight by correlation, and weight by information are applied to alleviate the effect of redundant features. Two hyper-parameter optimization approaches, random search (RS) and Bayesian tree-structured Parzen Estimator (TPE), are applied in XGBoost. The effect of different FS and hyper-parameter optimization methods on the model performance are investigated by the Wilcoxon Signed Rank Test. The performance of XGBoost is compared to the traditionally utilized logistic regression (LR) model in terms of classification accuracy, area under the curve (AUC), recall, and F1 score obtained from the 10-fold cross validation. Results show that hierarchical clustering is the optimal FS method for LR while weight by Chi-square achieves the best performance in XG-Boost. Both TPE and RS optimization in XGBoost outperform LR significantly. TPE optimization shows a superiority over RS since it results in a significantly higher accuracy and a marginally higher AUC, recall and F1 score. Furthermore, XGBoost with TPE tuning shows a lower variability than the RS method. Finally, the ranking of feature importance based on XGBoost enhances the model interpretation. Therefore, XGBoost with Bayesian TPE hyper-parameter optimization serves as an operative while powerful approach for business risk modeling.
We study the problem of training deep neural networks with Rectified Linear Unit (ReLU) activiation function using gradient descent and stochastic gradient descent. In particular, we study the binary classification problem and show that for a broad family of loss functions, with proper random weight initialization, both gradient descent and stochastic gradient descent can find the global minima of the training loss for an over-parameterized deep ReLU network, under mild assumption on the training data. The key idea of our proof is that Gaussian random initialization followed by (stochastic) gradient descent produces a sequence of iterates that stay inside a small perturbation region centering around the initial weights, in which the empirical loss function of deep ReLU networks enjoys nice local curvature properties that ensure the global convergence of (stochastic) gradient descent. Our theoretical results shed light on understanding the optimization of deep learning, and pave the way to study the optimization dynamics of training modern deep neural networks.
We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates the number of topics K from the observed data. We derive new finite sample minimax lower bounds for the estimation of A, as well as new upper bounds for our proposed estimator. We describe the scenarios where our estimator is minimax adaptive. Our finite sample analysis is valid for any number of documents (n), individual document length (N_i), dictionary size (p) and number of topics (K), and both p and K are allowed to increase with n, a situation not handled well by previous analyses. We complement our theoretical results with a detailed simulation study. We illustrate that the new algorithm is faster and more accurate than the current ones, although we start out with a computational and theoretical disadvantage of not knowing the correct number of topics K, while we provide the competing methods with the correct value in our simulations.