We consider a Cauchy problem for the inhomogeneous differential equation given in terms of an unbounded linear operator $A$ and the Caputo fractional derivative of order $\alpha \in (0, 2)$ in time. The previously known representation of the mild solution to such a problem does not have a conventional variation-of-constants like form, with the propagator derived from the associated homogeneous problem. Instead, it relies on the existence of two propagators with different analytical properties. This fact limits theoretical and especially numerical applicability of the existing solution representation. Here, we propose an alternative representation of the mild solution to the given problem that consolidates the solution formulas for sub-parabolic, parabolic and sub-hyperbolic equations with a positive sectorial operator $A$ and non-zero initial data. The new representation is solely based on the propagator of the homogeneous problem and, therefore, can be considered as a more natural fractional extension of the solution to the classical parabolic Cauchy problem. By exploiting a trade-off between the regularity assumptions on the initial data in terms of the fractional powers of $A$ and the regularity assumptions on the right-hand side in time, we show that the proposed solution formula is strongly convergent for $t \geq 0$ under considerably weaker assumptions compared to the standard results from the literature. Crucially, the achieved relaxation of space regularity assumptions ensures that the new solution representation is practically feasible for any $\alpha \in (0, 2)$ and is amenable to the numerical evaluation using uniformly accurate quadrature-based algorithms.
We present quantum algorithms for sampling from non-logconcave probability distributions in the form of $\pi(x) \propto \exp(-\beta f(x))$. Here, $f$ can be written as a finite sum $f(x):= \frac{1}{N}\sum_{k=1}^N f_k(x)$. Our approach is based on quantum simulated annealing on slowly varying Markov chains derived from unadjusted Langevin algorithms, removing the necessity for function evaluations which can be computationally expensive for large data sets in mixture modeling and multi-stable systems. We also incorporate a stochastic gradient oracle that implements the quantum walk operators inexactly by only using mini-batch gradients. As a result, our stochastic gradient based algorithm only accesses small subsets of data points in implementing the quantum walk. One challenge of quantizing the resulting Markov chains is that they do not satisfy the detailed balance condition in general. Consequently, the mixing time of the algorithm cannot be expressed in terms of the spectral gap of the transition density, making the quantum algorithms nontrivial to analyze. To overcome these challenges, we first build a hypothetical Markov chain that is reversible, and also converges to the target distribution. Then, we quantified the distance between our algorithm's output and the target distribution by using this hypothetical chain as a bridge to establish the total complexity. Our quantum algorithms exhibit polynomial speedups in terms of both dimension and precision dependencies when compared to the best-known classical algorithms.
Recently, machine learning of the branch and bound algorithm has shown promise in approximating competent solutions to NP-hard problems. In this paper, we utilize and comprehensively compare the outcomes of three neural networks--graph convolutional neural network (GCNN), GraphSAGE, and graph attention network (GAT)--to solve the capacitated vehicle routing problem. We train these neural networks to emulate the decision-making process of the computationally expensive Strong Branching strategy. The neural networks are trained on six instances with distinct topologies from the CVRPLIB and evaluated on eight additional instances. Moreover, we reduced the minimum number of vehicles required to solve a CVRP instance to a bin-packing problem, which was addressed in a similar manner. Through rigorous experimentation, we found that this approach can match or improve upon the performance of the branch and bound algorithm with the Strong Branching strategy while requiring significantly less computational time. The source code that corresponds to our research findings and methodology is readily accessible and available for reference at the following web address: //isotlaboratory.github.io/ml4vrp
We study the sample complexity of obtaining an $\epsilon$-optimal policy in \emph{Robust} discounted Markov Decision Processes (RMDPs), given only access to a generative model of the nominal kernel. This problem is widely studied in the non-robust case, and it is known that any planning approach applied to an empirical MDP estimated with $\tilde{\mathcal{O}}(\frac{H^3 \mid S \mid\mid A \mid}{\epsilon^2})$ samples provides an $\epsilon$-optimal policy, which is minimax optimal. Results in the robust case are much more scarce. For $sa$- (resp $s$-)rectangular uncertainty sets, the best known sample complexity is $\tilde{\mathcal{O}}(\frac{H^4 \mid S \mid^2\mid A \mid}{\epsilon^2})$ (resp. $\tilde{\mathcal{O}}(\frac{H^4 \mid S \mid^2\mid A \mid^2}{\epsilon^2})$), for specific algorithms and when the uncertainty set is based on the total variation (TV), the KL or the Chi-square divergences. In this paper, we consider uncertainty sets defined with an $L_p$-ball (recovering the TV case), and study the sample complexity of \emph{any} planning algorithm (with high accuracy guarantee on the solution) applied to an empirical RMDP estimated using the generative model. In the general case, we prove a sample complexity of $\tilde{\mathcal{O}}(\frac{H^4 \mid S \mid\mid A \mid}{\epsilon^2})$ for both the $sa$- and $s$-rectangular cases (improvements of $\mid S \mid$ and $\mid S \mid\mid A \mid$ respectively). When the size of the uncertainty is small enough, we improve the sample complexity to $\tilde{\mathcal{O}}(\frac{H^3 \mid S \mid\mid A \mid }{\epsilon^2})$, recovering the lower-bound for the non-robust case for the first time and a robust lower-bound when the size of the uncertainty is small enough.
Higher-order singular value decomposition (HOSVD) is one of the most celebrated tensor decompositions that generalizes matrix SVD to higher-order tensors. It was recently extended to the quaternion domain \cite{miao2023quat} (we refer to it as L-QHOSVD in this work). However, due to the non-commutativity of quaternion multiplications, L-QHOSVD is not consistent with matrix SVD when the order of the quaternion tensor reduces to 2; moreover, theoretical guaranteed truncated L-QHOSVD was not investigated. To derive a more natural higher-order generalization of the quaternion matrix SVD, we first utilize the feature that left and right multiplications of quaternions are inconsistent to define left and right quaternion tensor unfoldings and left and right mode-k products. Then, by using these basic tools, we propose a two-sided quaternion higher-order singular value decomposition (TS-QHOSVD). TS-QHOSVD has the following two main features: 1) it computes two factor matrices at a time from SVDs of left and right unfoldings, inheriting certain parallel properties of the original HOSVD; 2) it is consistent with matrix SVD when the order of the tensor is 2. In addition, we study truncated TS-QHOSVD and establish its error bound measured by the tail energy; correspondingly, we also present truncated L-QHOSVD and its error bound. Deriving the error bounds is nontrivial, as the proofs are more complicated than their real counterparts, again due to the non-commutativity of quaternion multiplications. Finally, we illustrate the derived properties of TS-QHOSVD and its efficacy via some numerical examples.
We consider a simple setting in neuroevolution where an evolutionary algorithm optimizes the weights and activation functions of a simple artificial neural network. We then define simple example functions to be learned by the network and conduct rigorous runtime analyses for networks with a single neuron and for a more advanced structure with several neurons and two layers. Our results show that the proposed algorithm is generally efficient on two example problems designed for one neuron and efficient with at least constant probability on the example problem for a two-layer network. In particular, the so-called harmonic mutation operator choosing steps of size $j$ with probability proportional to $1/j$ turns out as a good choice for the underlying search space. However, for the case of one neuron, we also identify situations with hard-to-overcome local optima. Experimental investigations of our neuroevolutionary algorithm and a state-of-the-art CMA-ES support the theoretical findings.
We propose a data structure in $d$-dimensional hyperbolic space that can be considered a natural counterpart to quadtrees in Euclidean spaces. Based on this data structure we propose a so-called L-order for hyperbolic point sets, which is an extension of the Z-order defined in Euclidean spaces. Using these quadtrees and the L-order we build geometric spanners. Near-linear size $(1+\epsilon)$-spanners do not exist in hyperbolic spaces, but we are able to create a Steiner spanner that achieves a spanning ratio of $1+\epsilon$ with $\mathcal O_{d,\epsilon}(n)$ edges, using a simple construction that can be maintained dynamically. As a corollary we also get a $(2+\epsilon)$-spanner (in the classical sense) of the same size, where the spanning ratio $2+\epsilon$ is almost optimal among spanners of subquadratic size. Finally, we show that our Steiner spanner directly provides a solution to the approximate nearest neighbour problem: given a point set $P$ in $d$-dimensional hyperbolic space we build the data structure in $\mathcal O_{d,\epsilon}(n\log n)$ time, using $\mathcal O_{d,\epsilon}(n)$ space. Then for any query point $q$ we can find a point $p\in P$ that is at most $1+\epsilon$ times farther from $q$ than its nearest neighbour in $P$ in $\mathcal O_{d,\epsilon}(\log n)$ time. Moreover, the data structure is dynamic and can handle point insertions and deletions with update time $\mathcal O_{d,\epsilon}(\log n)$.
Accurate reorientation and segmentation of the left ventricular (LV) is essential for the quantitative analysis of myocardial perfusion imaging (MPI), in which one critical step is to reorient the reconstructed transaxial nuclear cardiac images into standard short-axis slices for subsequent image processing. Small-scale LV myocardium (LV-MY) region detection and the diverse cardiac structures of individual patients pose challenges to LV segmentation operation. To mitigate these issues, we propose an end-to-end model, named as multi-scale spatial transformer UNet (MS-ST-UNet), that involves the multi-scale spatial transformer network (MSSTN) and multi-scale UNet (MSUNet) modules to perform simultaneous reorientation and segmentation of LV region from nuclear cardiac images. The proposed method is trained and tested using two different nuclear cardiac image modalities: 13N-ammonia PET and 99mTc-sestamibi SPECT. We use a multi-scale strategy to generate and extract image features with different scales. Our experimental results demonstrate that the proposed method significantly improves the reorientation and segmentation performance. This joint learning framework promotes mutual enhancement between reorientation and segmentation tasks, leading to cutting edge performance and an efficient image processing workflow. The proposed end-to-end deep network has the potential to reduce the burden of manual delineation for cardiac images, thereby providing multimodal quantitative analysis assistance for physicists.
We present a framework and algorithms to learn controlled dynamics models using neural stochastic differential equations (SDEs) -- SDEs whose drift and diffusion terms are both parametrized by neural networks. We construct the drift term to leverage a priori physics knowledge as inductive bias, and we design the diffusion term to represent a distance-aware estimate of the uncertainty in the learned model's predictions -- it matches the system's underlying stochasticity when evaluated on states near those from the training dataset, and it predicts highly stochastic dynamics when evaluated on states beyond the training regime. The proposed neural SDEs can be evaluated quickly enough for use in model predictive control algorithms, or they can be used as simulators for model-based reinforcement learning. Furthermore, they make accurate predictions over long time horizons, even when trained on small datasets that cover limited regions of the state space. We demonstrate these capabilities through experiments on simulated robotic systems, as well as by using them to model and control a hexacopter's flight dynamics: A neural SDE trained using only three minutes of manually collected flight data results in a model-based control policy that accurately tracks aggressive trajectories that push the hexacopter's velocity and Euler angles to nearly double the maximum values observed in the training dataset.
We propose a volumetric formulation for computing the Optimal Transport problem defined on surfaces in $\mathbb{R}^3$, found in disciplines like optics, computer graphics, and computational methodologies. Instead of directly tackling the original problem on the surface, we define a new Optimal Transport problem on a thin tubular region, $T_{\epsilon}$, adjacent to the surface. This extension offers enhanced flexibility and simplicity for numerical discretization on Cartesian grids. The Optimal Transport mapping and potential function computed on $T_{\epsilon}$ are consistent with the original problem on surfaces. We demonstrate that, with the proposed volumetric approach, it is possible to use simple and straightforward numerical methods to solve Optimal Transport for $\Gamma = \mathbb{S}^2$.
We consider dynamic pricing strategies in a streamed longitudinal data set-up where the objective is to maximize, over time, the cumulative profit across a large number of customer segments. We consider a dynamic model with the consumers' preferences as well as price sensitivity varying over time. Building on the well-known finding that consumers sharing similar characteristics act in similar ways, we consider a global shrinkage structure, which assumes that the consumers' preferences across the different segments can be well approximated by a spatial autoregressive (SAR) model. In such a streamed longitudinal set-up, we measure the performance of a dynamic pricing policy via regret, which is the expected revenue loss compared to a clairvoyant that knows the sequence of model parameters in advance. We propose a pricing policy based on penalized stochastic gradient descent (PSGD) and explicitly characterize its regret as functions of time, the temporal variability in the model parameters as well as the strength of the auto-correlation network structure spanning the varied customer segments. Our regret analysis results not only demonstrate asymptotic optimality of the proposed policy but also show that for policy planning it is essential to incorporate available structural information as policies based on unshrunken models are highly sub-optimal in the aforementioned set-up. We conduct simulation experiments across a wide range of regimes as well as real-world networks based studies and report encouraging performance for our proposed method.