We consider stochastic optimization problems involving an expected value of a nonlinear function of a base random vector and a conditional expectation of another function depending on the base random vector, a dependent random vector, and the decision variables. We call such problems conditional stochastic optimization problems. They arise in many applications, such as uplift modeling, reinforcement learning, and contextual optimization. We propose a specialized single time-scale stochastic method for nonconvex constrained conditional stochastic optimization problems with a Lipschitz smooth outer function and a generalized differentiable inner function. In the method, we approximate the inner conditional expectation with a rich parametric model whose mean squared error satisfies a stochastic version of a {\L}ojasiewicz condition. The model is used by an inner learning algorithm. The main feature of our approach is that unbiased stochastic estimates of the directions used by the method can be generated with one observation from the joint distribution per iteration, which makes it applicable to real-time learning. The directions, however, are not gradients or subgradients of any overall objective function. We prove the convergence of the method with probability one, using the method of differential inclusions and a specially designed Lyapunov function, involving a stochastic generalization of the Bregman distance. Finally, a numerical illustration demonstrates the viability of our approach.
Machine Learning-based heuristics have recently shown impressive performance in solving a variety of hard combinatorial optimization problems (COPs). However they generally rely on a separate neural model, specialized and trained for each single problem. Any variation of a problem requires adjustment of its model and re-training from scratch. In this paper, we propose GOAL (for Generalist combinatorial Optimization Agent Learning), a generalist model capable of efficiently solving multiple COPs and which can be fine-tuned to solve new COPs. GOAL consists of a single backbone plus light-weight problem-specific adapters, mostly for input and output processing. The backbone is based on a new form of mixed-attention blocks which allows to handle problems defined on graphs with arbitrary combinations of node, edge and instance-level features. Additionally, problems which involve heterogeneous nodes or edges, such as in multi-partite graphs, are handled through a novel multi-type transformer architecture, where the attention blocks are duplicated to attend only the relevant combination of types while relying on the same shared parameters. We train GOAL on a set of routing, scheduling and classic graph problems and show that it is only slightly inferior to the specialized baselines while being the first multi-task model that solves a variety of COPs. Finally, we showcase the strong transfer learning capacity of GOAL by fine-tuning or learning the adapters for new problems, with only few shots and little data.
We consider sequential maximization of performance metrics that are general functions of a confusion matrix of a classifier (such as precision, F-measure, or G-mean). Such metrics are, in general, non-decomposable over individual instances, making their optimization very challenging. While they have been extensively studied under different frameworks in the batch setting, their analysis in the online learning regime is very limited, with only a few distinguished exceptions. In this paper, we introduce and analyze a general online algorithm that can be used in a straightforward way with a variety of complex performance metrics in binary, multi-class, and multi-label classification problems. The algorithm's update and prediction rules are appealingly simple and computationally efficient without the need to store any past data. We show the algorithm attains $\mathcal{O}(\frac{\ln n}{n})$ regret for concave and smooth metrics and verify the efficiency of the proposed algorithm in empirical studies.
Symmetric submodular maximization is an important class of combinatorial optimization problems, including MAX-CUT on graphs and hyper-graphs. The state-of-the-art algorithm for the problem over general constraints has an approximation ratio of $0.432$. The algorithm applies the canonical continuous greedy technique that involves a sampling process. It, therefore, suffers from high query complexity and is inherently randomized. In this paper, we present several efficient deterministic algorithms for maximizing a symmetric submodular function under various constraints. Specifically, for the cardinality constraint, we design a deterministic algorithm that attains a $0.432$ ratio and uses $O(kn)$ queries. Previously, the best deterministic algorithm attains a $0.385-\epsilon$ ratio and uses $O\left(kn (\frac{10}{9\epsilon})^{\frac{20}{9\epsilon}-1}\right)$ queries. For the matroid constraint, we design a deterministic algorithm that attains a $1/3-\epsilon$ ratio and uses $O(kn\log \epsilon^{-1})$ queries. Previously, the best deterministic algorithm can also attain $1/3-\epsilon$ ratio but it uses much larger $O(\epsilon^{-1}n^4)$ queries. For the packing constraints with a large width, we design a deterministic algorithm that attains a $0.432-\epsilon$ ratio and uses $O(n^2)$ queries. To the best of our knowledge, there is no deterministic algorithm for the constraint previously. The last algorithm can be adapted to attain a $0.432$ ratio for single knapsack constraint using $O(n^4)$ queries. Previously, the best deterministic algorithm attains a $0.316-\epsilon$ ratio and uses $\widetilde{O}(n^3)$ queries.
Efficient algorithms for solving high-dimensional partial differential equations (PDEs) has been an exceedingly difficult task for a long time, due to the curse of dimensionality. We extend the forward-backward stochastic neural networks (FBSNNs) which depends on forward-backward stochastic differential equation (FBSDE) to solve incompressible Navier-Stokes equation. For Cahn-Hilliard equation, we derive a modified Cahn-Hilliard equation from a widely used stabilized scheme for original Cahn-Hilliard equation. This equation can be written as a continuous parabolic system, where FBSDE can be applied and the unknown solution is approximated by neural network. Also our method is successfully developed to Cahn-Hilliard-Navier-Stokes (CHNS) equation. The accuracy and stability of our methods are shown in many numerical experiments, specially in high dimension.
We develop statistical models for samples of distribution-valued stochastic processes featuring time-indexed univariate distributions, with emphasis on functional principal component analysis. The proposed model presents an intrinsic rather than transformation-based approach. The starting point is a transport process representation for distribution-valued processes under the Wasserstein metric. Substituting transports for distributions addresses the challenge of centering distribution-valued processes and leads to a useful and interpretable decomposition of each realized process into a process-specific single transport and a real-valued trajectory. This representation makes it possible to utilize a scalar multiplication operation for transports and facilitates not only functional principal component analysis but also to introduce a latent Gaussian process. This Gaussian process proves especially useful for the case where the distribution-valued processes are only observed on a sparse grid of time points, establishing an approach for longitudinal distribution-valued data. We study the convergence of the key components of this novel representation to their population targets and demonstrate the practical utility of the proposed approach through simulations and several data illustrations.
We propose a new Monte Carlo-based estimator for digital options with assets modelled by a stochastic differential equation (SDE). The new estimator is based on repeated path splitting and relies on the correlation of approximate paths of the underlying SDE that share parts of a Brownian path. Combining this new estimator with Multilevel Monte Carlo (MLMC) leads to an estimator with a computational complexity that is similar to the complexity of a MLMC estimator when applied to options with Lipschitz payoffs. This preprint includes detailed calculations and proofs (in grey colour) which are not peer-reviewed and not included in the published article.
Inductive conformal predictors (ICPs) are algorithms that are able to generate prediction sets, instead of point predictions, which are valid at a user-defined confidence level, only assuming exchangeability. These algorithms are useful for reliable machine learning and are increasing in popularity. The ICP development process involves dividing development data into three parts: training, calibration and test. With access to limited or expensive development data, it is an open question regarding the most efficient way to divide the data. This study provides several experiments to explore this question and consider the case for allowing overlap of examples between training and calibration sets. Conclusions are drawn that will be of value to academics and practitioners planning to use ICPs.
We give a new coalgebraic semantics for intuitionistic modal logic with $\Box$. In particular, we provide a colagebraic representation of intuitionistic descriptive modal frames and of intuitonistic modal Kripke frames based on image-finite posets. This gives a solution to a problem in the area of coalgebaic logic for these classes of frames, raised explicitly by Litak (2014) and de Groot and Pattinson (2020). Our key technical tool is a recent generalization of a construction by Ghilardi, in the form of a right adjoint to the inclusion of the category of Esakia spaces in the category of Priestley spaces. As an application of these results, we study bisimulations of intuitionistic modal frames, describe dual spaces of free modal Heyting algebras, and provide a path towards a theory of coalgebraic intuitionistic logics.
It was recently conjectured that every component of a discrete-time rational dynamical system is a solution to an algebraic difference equation that is linear in its highest-shift term (a quasi-linear equation). We prove that the conjecture holds in the special case of holonomic sequences, which can straightforwardly be represented by rational dynamical systems. We propose two algorithms for converting holonomic recurrence equations into such quasi-linear equations. The two algorithms differ in their efficiency and the minimality of orders in their outputs.
Surrogate neural network-based partial differential equation (PDE) solvers have the potential to solve PDEs in an accelerated manner, but they are largely limited to systems featuring fixed domain sizes, geometric layouts, and boundary conditions. We propose Specialized Neural Accelerator-Powered Domain Decomposition Methods (SNAP-DDM), a DDM-based approach to PDE solving in which subdomain problems containing arbitrary boundary conditions and geometric parameters are accurately solved using an ensemble of specialized neural operators. We tailor SNAP-DDM to 2D electromagnetics and fluidic flow problems and show how innovations in network architecture and loss function engineering can produce specialized surrogate subdomain solvers with near unity accuracy. We utilize these solvers with standard DDM algorithms to accurately solve freeform electromagnetics and fluids problems featuring a wide range of domain sizes.