The Gaussian process state-space model (GPSSM) has garnered considerable attention over the past decade. However, the standard GP with a preliminary kernel, such as the squared exponential kernel or Mat\'{e}rn kernel, that is commonly used in GPSSM studies, limits the model's representation power and substantially restricts its applicability to complex scenarios. To address this issue, we propose a new class of probabilistic state-space models called TGPSSMs, which leverage a parametric normalizing flow to enrich the GP priors in the standard GPSSM, enabling greater flexibility and expressivity. Additionally, we present a scalable variational inference algorithm that offers a flexible and optimal structure for the variational distribution of latent states. The proposed algorithm is interpretable and computationally efficient due to the sparse GP representation and the bijective nature of normalizing flow. Moreover, we incorporate a constrained optimization framework into the algorithm to enhance the state-space representation capabilities and optimize the hyperparameters, leading to superior learning and inference performance. Experimental results on synthetic and real datasets corroborate that the proposed TGPSSM outperforms several state-of-the-art methods. The accompanying source code is available at \url{//github.com/zhidilin/TGPSSM}.
The analysis of human movements has been extensively studied due to its wide variety of practical applications, such as human-robot interaction, human learning applications, or clinical diagnosis. Nevertheless, the state-of-the-art still faces scientific challenges when modeling human movements. To begin, new models must account for the stochasticity of human movement and the physical structure of the human body in order to accurately predict the evolution of full-body motion descriptors over time. Second, while utilizing deep learning algorithms, their explainability in terms of body posture predictions needs to be improved as they lack comprehensible representations of human movement. This paper addresses these challenges by introducing three novel methods for creating explainable representations of human movement. In this study, human body movement is formulated as a state-space model adhering to the structure of the Gesture Operational Model (GOM), whose parameters are estimated through the application of deep learning and statistical algorithms. The trained models are used for the full-body dexterity analysis of expert professionals, in which dynamic associations between body joints are identified, and for generating artificially professional movements.
In this work, we present a phase-field model for tumour growth, where a diffuse interface separates a tumour from the surrounding host tissue. In our model, we consider transport processes by an internal, non-solenoidal velocity field. We include viscoelastic effects with the help of a general Oldroyd-B type description with relaxation and possible stress generation by growth. The elastic energy density is coupled to the phase-field variable which allows to model invasive growth towards areas with less mechanical resistance. The main analytical result is the existence of weak solutions in two and three space dimensions in the case of additional stress diffusion. The idea behind the proof is to use a numerical approximation with a fully-practical, stable and (subsequence) converging finite element scheme. The physical properties of the model are preserved with the help of a regularization technique, uniform estimates and a limit passage on the fully-discrete level. Finally, we illustrate the practicability of the discrete scheme with the help of numerical simulations in two and three dimensions.
Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training. Adam and its variants have been state-of-the-art for years, and more sophisticated second-order (Hessian-based) optimizers often incur too much per-step overhead. In this paper, we propose Sophia, Second-order Clipped Stochastic Optimization, a simple scalable second-order optimizer that uses a light-weight estimate of the diagonal Hessian as the pre-conditioner. The update is the moving average of the gradients divided by the moving average of the estimated Hessian, followed by element-wise clipping. The clipping controls the worst-case update size and tames the negative impact of non-convexity and rapid change of Hessian along the trajectory. Sophia only estimates the diagonal Hessian every handful of iterations, which has negligible average per-step time and memory overhead. On language modeling with GPT-2 models of sizes ranging from 125M to 770M, Sophia achieves a 2x speed-up compared with Adam in the number of steps, total compute, and wall-clock time. Theoretically, we show that Sophia adapts to the curvature in different components of the parameters, which can be highly heterogeneous for language modeling tasks. Our run-time bound does not depend on the condition number of the loss.
Stein Variational Gradient Descent (SVGD) is a nonparametric particle-based deterministic sampling algorithm. Despite its wide usage, understanding the theoretical properties of SVGD has remained a challenging problem. For sampling from a Gaussian target, the SVGD dynamics with a bilinear kernel will remain Gaussian as long as the initializer is Gaussian. Inspired by this fact, we undertake a detailed theoretical study of the Gaussian-SVGD, i.e., SVGD projected to the family of Gaussian distributions via the bilinear kernel, or equivalently Gaussian variational inference (GVI) with SVGD. We present a complete picture by considering both the mean-field PDE and discrete particle systems. When the target is strongly log-concave, the mean-field Gaussian-SVGD dynamics is proven to converge linearly to the Gaussian distribution closest to the target in KL divergence. In the finite-particle setting, there is both uniform in time convergence to the mean-field limit and linear convergence in time to the equilibrium if the target is Gaussian. In the general case, we propose a density-based and a particle-based implementation of the Gaussian-SVGD, and show that several recent algorithms for GVI, proposed from different perspectives, emerge as special cases of our unified framework. Interestingly, one of the new particle-based instance from this framework empirically outperforms existing approaches. Our results make concrete contributions towards obtaining a deeper understanding of both SVGD and GVI.
Autonomous parking (AP) is an emering technique to navigate an intelligent vehicle to a parking space without any human intervention. Existing AP methods based on mathematical optimization or machine learning may lead to potential failures due to either excessive execution time or lack of generalization. To fill this gap, this paper proposes an integrated constrained optimization and imitation learning (iCOIL) approach to achieve efficient and reliable AP. The iCOIL method has two candidate working modes, i.e., CO and IL, and adopts a hybrid scenario analysis (HSA) model to determine the better mode under various scenarios. We implement and verify iCOIL on the Macao Car Racing Metaverse (MoCAM) platform. Results show that iCOIL properly adapts to different scenarios during the entire AP procedure, and achieves significantly larger success rates than other benchmarks.
We developed a new method PROTES for black-box optimization, which is based on the probabilistic sampling from a probability density function given in the low-parametric tensor train format. We tested it on complex multidimensional arrays and discretized multivariable functions taken, among others, from real-world applications, including unconstrained binary optimization and optimal control problems, for which the possible number of elements is up to $2^{100}$. In numerical experiments, both on analytic model functions and on complex problems, PROTES outperforms existing popular discrete optimization methods (Particle Swarm Optimization, Covariance Matrix Adaptation, Differential Evolution, and others).
Due to the curse of dimensionality, it is often prohibitively expensive to generate deterministic space-filling designs. On the other hand, when using na{\"i}ve uniform random sampling to generate designs cheaply, design points tend to concentrate in a small region of the design space. Although, it is preferable in these cases to utilize quasi-random techniques such as Sobol sequences and Latin hypercube designs over uniform random sampling in many settings, these methods have their own caveats especially in high-dimensional spaces. In this paper, we propose a technique that addresses the fundamental issue of measure concentration by updating high-dimensional distribution functions to produce better space-filling designs. Then, we show that our technique can outperform Latin hypercube sampling and Sobol sequences by the discrepancy metric while generating moderately-sized space-filling samples for high-dimensional problems.
Markov chain Monte Carlo (MCMC) allows one to generate dependent replicates from a posterior distribution for effectively any Bayesian hierarchical model. However, MCMC can produce a significant computational burden. This motivates us to consider finding expressions of the posterior distribution that are computationally straightforward to obtain independent replicates from directly. We focus on a broad class of Bayesian latent Gaussian process (LGP) models that allow for spatially dependent data. First, we derive a new class of distributions we refer to as the generalized conjugate multivariate (GCM) distribution. The GCM distribution's theoretical development is similar to that of the CM distribution with two main differences; namely, (1) the GCM allows for latent Gaussian process assumptions, and (2) the GCM explicitly accounts for hyperparameters through marginalization. The development of GCM is needed to obtain independent replicates directly from the exact posterior distribution, which has an efficient projection/regression form. Hence, we refer to our method as Exact Posterior Regression (EPR). Illustrative examples are provided including simulation studies for weakly stationary spatial processes and spatial basis function expansions. An additional analysis of poverty incidence data from the U.S. Census Bureau's American Community Survey (ACS) using a conditional autoregressive model is presented.
This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.
Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.