In this paper, we propose a modified nonlinear conjugate gradient (NCG) method for functions with a non-Lipschitz continuous gradient. First, we present a new formula for the conjugate coefficient \beta_k in NCG, conducting a search direction that provides an adequate function decrease. We can derive that our NCG algorithm guarantees strongly convergent for continuous differential functions without Lipschitz continuous gradient. Second, we present a simple interpolation approach that could automatically achieve shrinkage, generating a step length satisfying the standard Wolfe conditions in each step. Our framework considerably broadens the applicability of NCG and preserves the superior numerical performance of the PRP-type methods.
Many real-world settings involve costs for performing actions; transaction costs in financial systems and fuel costs being common examples. In these settings, performing actions at each time step quickly accumulates costs leading to vastly suboptimal outcomes. Additionally, repeatedly acting produces wear and tear and ultimately, damage. Determining when to act is crucial for achieving successful outcomes and yet, the challenge of efficiently learning to behave optimally when actions incur minimally bounded costs remains unresolved. In this paper, we introduce a reinforcement learning (RL) framework named Learnable Impulse Control Reinforcement Algorithm (LICRA), for learning to optimally select both when to act and which actions to take when actions incur costs. At the core of LICRA is a nested structure that combines RL and a form of policy known as impulse control which learns to maximise objectives when actions incur costs. We prove that LICRA, which seamlessly adopts any RL method, converges to policies that optimally select when to perform actions and their optimal magnitudes. We then augment LICRA to handle problems in which the agent can perform at most $k<\infty$ actions and more generally, faces a budget constraint. We show LICRA learns the optimal value function and ensures budget constraints are satisfied almost surely. We demonstrate empirically LICRA's superior performance against benchmark RL methods in OpenAI gym's Lunar Lander and in Highway environments and a variant of the Merton portfolio problem within finance.
In this work, we propose an efficient minimax optimal global optimization algorithm for multivariate Lipschitz continuous functions. To evaluate the performance of our approach, we utilize the average regret instead of the traditional simple regret, which, as we show, is not suitable for use in the multivariate non-convex optimization because of the inherent hardness of the problem itself. Since we study the average regret of the algorithm, our results directly imply a bound for the simple regret as well. Instead of constructing lower bounding proxy functions, our method utilizes a predetermined query creation rule, which makes it computationally superior to the Piyavskii-Shubert variants. We show that our algorithm achieves an average regret bound of $O(L\sqrt{n}T^{-\frac{1}{n}})$ for the optimization of an $n$-dimensional $L$-Lipschitz continuous objective in a time horizon $T$, which we show to be minimax optimal.
Photo retouching aims at improving the aesthetic visual quality of images that suffer from photographic defects, especially for poor contrast, over/under exposure, and inharmonious saturation. In practice, photo retouching can be accomplished by a series of image processing operations. As most commonly-used retouching operations are pixel-independent, i.e., the manipulation on one pixel is uncorrelated with its neighboring pixels, we can take advantage of this property and design a specialized algorithm for efficient global photo retouching. We analyze these global operations and find that they can be mathematically formulated by a Multi-Layer Perceptron (MLP). Based on this observation, we propose an extremely lightweight framework -- Conditional Sequential Retouching Network (CSRNet). Benefiting from the utilization of $1\times1$ convolution, CSRNet only contains less than 37K trainable parameters, which are orders of magnitude smaller than existing learning-based methods. Experiments show that our method achieves state-of-the-art performance on the benchmark MIT-Adobe FiveK dataset quantitively and qualitatively. In addition to achieve global photo retouching, the proposed framework can be easily extended to learn local enhancement effects. The extended model, namely CSRNet-L, also achieves competitive results in various local enhancement tasks. Codes are available at //github.com/lyh-18/CSRNet.
Gradient estimation -- approximating the gradient of an expectation with respect to the parameters of a distribution -- is central to the solution of many machine learning problems. However, when the distribution is discrete, most common gradient estimators suffer from excessive variance. To improve the quality of gradient estimation, we introduce a variance reduction technique based on Stein operators for discrete distributions. We then use this technique to build flexible control variates for the REINFORCE leave-one-out estimator. Our control variates can be adapted online to minimize variance and do not require extra evaluations of the target function. In benchmark generative modeling tasks such as training binary variational autoencoders, our gradient estimator achieves substantially lower variance than state-of-the-art estimators with the same number of function evaluations.
Solving the time-dependent Schr\"odinger equation is an important application area for quantum algorithms. We consider Schr\"odinger's equation in the semi-classical regime. Here the solutions exhibit strong multiple-scale behavior due to a small parameter $\hbar$, in the sense that the dynamics of the quantum states and the induced observables can occur on different spatial and temporal scales. Such a Schr\"odinger equation finds many applications, including in Born-Oppenheimer molecular dynamics and Ehrenfest dynamics. This paper considers quantum analogues of pseudo-spectral (PS) methods on classical computers. Estimates on the gate counts in terms of $\hbar$ and the precision $\varepsilon$ are obtained. It is found that the number of required qubits, $m$, scales only logarithmically with respect to $\hbar$. When the solution has bounded derivatives up to order $\ell$, the symmetric Trotting method has gate complexity $\mathcal{O}\Big({ (\varepsilon \hbar)^{-\frac12} \mathrm{polylog}(\varepsilon^{-\frac{3}{2\ell}} \hbar^{-1-\frac{1}{2\ell}})}\Big),$ provided that the diagonal unitary operators in the pseudo-spectral methods can be implemented with $\mathrm{poly}(m)$ operations. When physical observables are the desired outcomes, however, the step size in the time integration can be chosen independently of $\hbar$. The gate complexity in this case is reduced to $\mathcal{O}\Big({\varepsilon^{-\frac12} \mathrm{polylog}( \varepsilon^{-\frac3{2\ell}} \hbar^{-1} )}\Big),$ with $\ell$ again indicating the smoothness of the solution.
Bayesian optimization (BO) is a sample-efficient approach for tuning design parameters to optimize expensive-to-evaluate, black-box performance metrics. In many manufacturing processes, the design parameters are subject to random input noise, resulting in a product that is often less performant than expected. Although BO methods have been proposed for optimizing a single objective under input noise, no existing method addresses the practical scenario where there are multiple objectives that are sensitive to input perturbations. In this work, we propose the first multi-objective BO method that is robust to input noise. We formalize our goal as optimizing the multivariate value-at-risk (MVaR), a risk measure of the uncertain objectives. Since directly optimizing MVaR is computationally infeasible in many settings, we propose a scalable, theoretically-grounded approach for optimizing MVaR using random scalarizations. Empirically, we find that our approach significantly outperforms alternative methods and efficiently identifies optimal robust designs that will satisfy specifications across multiple metrics with high probability.
In this paper we propose a solution strategy for the Cahn-Larch\'e equations, which is a model for linearized elasticity in a medium with two elastic phases that evolve subject to a Ginzburg-Landau type energy functional. The system can be seen as a combination of the Cahn-Hilliard regularized interface equation and linearized elasticity, and is non-linearly coupled, has a fourth order term that comes from the Cahn-Hilliard subsystem, and is non-convex and nonlinear in both the phase-field and displacement variables. We propose a novel semi-implicit discretization in time that uses a standard convex-concave splitting method of the nonlinear double-well potential, as well as special treatment to the elastic energy. We show that the resulting discrete system is equivalent to a convex minimization problem, and propose and prove the convergence of alternating minimization applied to it. Finally, we present numerical experiments that show the robustness and effectiveness of both alternating minimization and the monolithic Newton method applied to the newly proposed discrete system of equations. We compare it to a system of equations that has been discretized with a standard convex-concave splitting of the double-well potential, and implicit evaluations of the elasticity contributions and show that the newly proposed discrete system is better conditioned for linearization techniques.
Providing non-trivial certificates of safety for non-linear stochastic systems is an important open problem that limits the wider adoption of autonomous systems in safety-critical applications. One promising solution to address this problem is barrier functions. The composition of a barrier function with a stochastic system forms a supermartingale, thus enabling the computation of the probability that the system stays in a safe set over a finite time horizon via martingale inequalities. However, existing approaches to find barrier functions for stochastic systems generally rely on convex optimization programs that restrict the search of a barrier to a small class of functions such as low degree SoS polynomials and can be computationally expensive. In this paper, we parameterize a barrier function as a neural network and show that techniques for robust training of neural networks can be successfully employed to find neural barrier functions. Specifically, we leverage bound propagation techniques to certify that a neural network satisfies the conditions to be a barrier function via linear programming and then employ the resulting bounds at training time to enforce the satisfaction of these conditions. We also present a branch-and-bound scheme that makes the certification framework scalable. We show that our approach outperforms existing methods in several case studies and often returns certificates of safety that are orders of magnitude larger.
High-dimensional Partial Differential Equations (PDEs) are a popular mathematical modelling tool, with applications ranging from finance to computational chemistry. However, standard numerical techniques for solving these PDEs are typically affected by the curse of dimensionality. In this work, we tackle this challenge while focusing on stationary diffusion equations defined over a high-dimensional domain with periodic boundary conditions. Inspired by recent progress in sparse function approximation in high dimensions, we propose a new method called compressive Fourier collocation. Combining ideas from compressive sensing and spectral collocation, our method replaces the use of structured collocation grids with Monte Carlo sampling and employs sparse recovery techniques, such as orthogonal matching pursuit and $\ell^1$ minimization, to approximate the Fourier coefficients of the PDE solution. We conduct a rigorous theoretical analysis showing that the approximation error of the proposed method is comparable with the best $s$-term approximation (with respect to the Fourier basis) to the solution. Using the recently introduced framework of random sampling in bounded Riesz systems, our analysis shows that the compressive Fourier collocation method mitigates the curse of dimensionality with respect to the number of collocation points under sufficient conditions on the regularity of the diffusion coefficient. We also present numerical experiments that illustrate the accuracy and stability of the method for the approximation of sparse and compressible solutions.
Multimodal learning helps to comprehensively understand the world, by integrating different senses. Accordingly, multiple input modalities are expected to boost model performance, but we actually find that they are not fully exploited even when the multimodal model outperforms its uni-modal counterpart. Specifically, in this paper we point out that existing multimodal discriminative models, in which uniform objective is designed for all modalities, could remain under-optimized uni-modal representations, caused by another dominated modality in some scenarios, e.g., sound in blowing wind event, vision in drawing picture event, etc. To alleviate this optimization imbalance, we propose on-the-fly gradient modulation to adaptively control the optimization of each modality, via monitoring the discrepancy of their contribution towards the learning objective. Further, an extra Gaussian noise that changes dynamically is introduced to avoid possible generalization drop caused by gradient modulation. As a result, we achieve considerable improvement over common fusion methods on different multimodal tasks, and this simple strategy can also boost existing multimodal methods, which illustrates its efficacy and versatility. The source code is available at \url{//github.com/GeWu-Lab/OGM-GE_CVPR2022}.