Multivariate Hawkes processes are temporal point processes extensively applied to model event data with dependence on past occurrences and interaction phenomena. In the generalised nonlinear model, positive and negative interactions between the components of the process are allowed, therefore accounting for so-called excitation and inhibition effects. In the nonparametric setting, learning the temporal dependence structure of Hawkes processes is often a computationally expensive task, all the more with Bayesian estimation methods. In general, the posterior distribution in the nonlinear Hawkes model is non-conjugate and doubly intractable. Moreover, existing Monte-Carlo Markov Chain methods are often slow and not scalable to high-dimensional processes in practice. Recently, efficient algorithms targeting a mean-field variational approximation of the posterior distribution have been proposed. In this work, we unify existing variational Bayes inference approaches under a general framework, that we theoretically analyse under easily verifiable conditions on the prior, the variational class, and the model. We notably apply our theory to a novel spike-and-slab variational class, that can induce sparsity through the connectivity graph parameter of the multivariate Hawkes model. Then, in the context of the popular sigmoid Hawkes model, we leverage existing data augmentation technique and design adaptive and sparsity-inducing mean-field variational methods. In particular, we propose a two-step algorithm based on a thresholding heuristic to select the graph parameter. Through an extensive set of numerical simulations, we demonstrate that our approach enjoys several benefits: it is computationally efficient, can reduce the dimensionality of the problem by selecting the graph parameter, and is able to adapt to the smoothness of the underlying parameter.
Recently, a class of machine learning methods called physics-informed neural networks (PINNs) has been proposed and gained prevalence in solving various scientific computing problems. This approach enables the solution of partial differential equations (PDEs) via embedding physical laws into the loss function. Many inverse problems can be tackled by simply combining the data from real life scenarios with existing PINN algorithms. In this paper, we present a multi-task learning method using uncertainty weighting to improve the training efficiency and accuracy of PINNs for inverse problems in linear elasticity and hyperelasticity. Furthermore, we demonstrate an application of PINNs to a practical inverse problem in structural analysis: prediction of external loads of diverse engineering structures based on limited displacement monitoring points. To this end, we first determine a simplified loading scenario at the offline stage. By setting unknown boundary conditions as learnable parameters, PINNs can predict the external loads with the support of measured data. When it comes to the online stage in real engineering projects, transfer learning is employed to fine-tune the pre-trained model from offline stage. Our results show that, even with noisy gappy data, satisfactory results can still be obtained from the PINN model due to the dual regularization of physics laws and prior knowledge, which exhibits better robustness compared to traditional analysis methods. Our approach is capable of bridging the gap between various structures with geometric scaling and under different loading scenarios, and the convergence of training is also greatly accelerated through not only the layer freezing but also the multi-task weight inheritance from pre-trained models, thus making it possible to be applied as surrogate models in actual engineering projects.
It is crucial to build multiscale modeling for the coupling effects between microstructure and the physical mechanisms in multiphysics problems. In the paper, we develop a coupling formulation of the generalized multiscale finite element method (GMsFEM) to solve coupled thermomechanical problems, and it is referred as the coupling generalized multiscale finite element method (CGMsFEM). The approach consists in defining the coupling multiscale basis functions through local coupling spectral problems in each coarse-grid block, which can be solved by a novel design of two relaxation parameters. Compared to the standard GMsFEM, the proposed strategy can not only accurately capture the multiscale coupling correlation effects of multiphysics problems, but also greatly improve the computational efficiency with fewer multiscale basis functions. In addition, the convergence analysis is also established, and the optimal error estimates are derived, where the upper bound of errors is independent of the magnitude of the relaxation coefficient. Several numerical examples for periodic, random microstructure, and random material coefficients are presented to validate the theoretical analysis. The numerical results show that the CGMsFEM approach shows better robustness and efficiency than uncoupled GMsFEM.
Ensemble models in E-commerce combine predictions from multiple sub-models for ranking and revenue improvement. Industrial ensemble models are typically deep neural networks, following the supervised learning paradigm to infer conversion rate given inputs from sub-models. However, this process has the following two problems. Firstly, the point-wise scoring approach disregards the relationships between items and leads to homogeneous displayed results, while diversified display benefits user experience and revenue. Secondly, the learning paradigm focuses on the ranking metrics and does not directly optimize the revenue. In our work, we propose a new Learning-To-Ensemble (LTE) framework RAEGO, which replaces the ensemble model with a contextual Rank Aggregator (RA) and explores the best weights of sub-models by the Evaluator-Generator Optimization (EGO). To achieve the best online performance, we propose a new rank aggregation algorithm TournamentGreedy as a refinement of classic rank aggregators, which also produces the best average weighted Kendall Tau Distance (KTD) amongst all the considered algorithms with quadratic time complexity. Under the assumption that the best output list should be Pareto Optimal on the KTD metric for sub-models, we show that our RA algorithm has higher efficiency and coverage in exploring the optimal weights. Combined with the idea of Bayesian Optimization and gradient descent, we solve the online contextual Black-Box Optimization task that finds the optimal weights for sub-models given a chosen RA model. RA-EGO has been deployed in our online system and has improved the revenue significantly.
In many areas of science and engineering, computer simulations are widely used as proxies for physical experiments, which can be infeasible or unethical. Such simulations can often be computationally expensive, and an emulator can be trained to efficiently predict the desired response surface. A widely-used emulator is the Gaussian process (GP), which provides a flexible framework for efficient prediction and uncertainty quantification. Standard GPs, however, do not capture structured sparsity on the underlying response surface, which is present in many applications, particularly in the physical sciences. We thus propose a new hierarchical shrinkage GP (HierGP), which incorporates such structure via cumulative shrinkage priors within a GP framework. We show that the HierGP implicitly embeds the well-known principles of effect sparsity, heredity and hierarchy for analysis of experiments, which allows our model to identify structured sparse features from the response surface with limited data. We propose efficient posterior sampling algorithms for model training and prediction, and prove desirable consistency properties for the HierGP. Finally, we demonstrate the improved performance of HierGP over existing models, in a suite of numerical experiments and an application to dynamical system recovery.
Generalized linear mixed models (GLMMs) are commonly used to analyze correlated discrete or continuous response data. In Bayesian GLMMs, the often-used improper priors may yield undesirable improper posterior distributions. Thus, verifying posterior propriety is crucial for valid applications of Bayesian GLMMs with improper priors. Here, we consider the popular improper uniform prior on the regression coefficients and several proper or improper priors, including the widely used gamma and power priors on the variance components of the random effects. We also construct an approximate Jeffreys' prior for objective Bayesian analysis of GLMMs. We derive necessary and sufficient conditions for posterior propriety for Bayesian GLMMs where the response variables have distributions from the exponential family. For the two most widely used GLMMs, namely, the binomial and Poisson GLMMs, we further refine our results by providing easily verifiable conditions compared to the currently available results. Finally, we use examples involving one-way and two-way random effects models to demonstrate the theoretical results derived here.
Many networks have event-driven dynamics (such as communication, social media and criminal networks), where the mean rate of the events occurring at a node in the network changes according to the occurrence of other events in the network. In particular, events associated with a node of the network could increase the rate of events at other nodes, depending on their influence relationship. Thus, it is of interest to use temporal data to uncover the directional, time-dependent, influence structure of a given network while also quantifying uncertainty even when knowledge of a physical network is lacking. Typically, methods for inferring the influence structure in networks require knowledge of a physical network or are only able to infer small network structures. In this paper, we model event-driven dynamics on a network by a multidimensional Hawkes process. We then develop a novel ensemble-based filtering approach for a time-series of count data (i.e., data that provides the number of events per unit time for each node in the network) that not only tracks the influence network structure over time but also approximates the uncertainty via ensemble spread. The method overcomes several deficiencies in existing methods such as existing methods for inferring multidimensional Hawkes processes are too slow to be practical for any network over ~50 nodes, can only deal with timestamp data (i.e. data on just when events occur not the number of events at each node), and that we do not need a physical network to start with. Our method is massively parallelizable, allowing for its use to infer the influence structure of large networks (~10,000 nodes). We demonstrate our method for large networks using both synthetic and real-world email communication data.
Off-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy. In recommender systems, this is especially challenging due to the imbalance in logged data: some items are recommended and thus logged more frequently than others. This is further perpetuated when recommending a list of items, as the action space is combinatorial. To address this challenge, we study pessimistic off-policy optimization for learning to rank. The key idea is to compute lower confidence bounds on parameters of click models and then return the list with the highest pessimistic estimate of its value. This approach is computationally efficient and we analyze it. We study its Bayesian and frequentist variants, and overcome the limitation of unknown prior by incorporating empirical Bayes. To show the empirical effectiveness of our approach, we compare it to off-policy optimizers that use inverse propensity scores or neglect uncertainty. Our approach outperforms all baselines, is robust, and is also general.
In consumer theory, ranking available objects by means of preference relations yields the most common description of individual choices. However, preference-based models assume that individuals: (1) give their preferences only between pairs of objects; (2) are always able to pick the best preferred object. In many situations, they may be instead choosing out of a set with more than two elements and, because of lack of information and/or incomparability (objects with contradictory characteristics), they may not able to select a single most preferred object. To address these situations, we need a choice-model which allows an individual to express a set-valued choice. Choice functions provide such a mathematical framework. We propose a Gaussian Process model to learn choice functions from choice-data. The proposed model assumes a multiple utility representation of a choice function based on the concept of Pareto rationalization, and derives a strategy to learn both the number and the values of these latent multiple utilities. Simulation experiments demonstrate that the proposed model outperforms the state-of-the-art methods.
This manuscript portrays optimization as a process. In many practical applications the environment is so complex that it is infeasible to lay out a comprehensive theoretical model and use classical algorithmic theory and mathematical optimization. It is necessary as well as beneficial to take a robust approach, by applying an optimization method that learns as one goes along, learning from experience as more aspects of the problem are observed. This view of optimization as a process has become prominent in varied fields and has led to some spectacular success in modeling and systems that are now part of our daily lives.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.