Steam power turbine-based power plant approximately contributes 90% of the total electricity produced in the United States. Mainly steam turbine consists of multiple types of turbine, boiler, attemperator, reheater, etc. Power is produced through the steam with high pressure and temperature that is conducted by the turbines. The total power generation of the power plant is highly nonlinear considering all these elements in the model. We perform a predictive modeling approach to detect the power generation from these turbines by the Gaussian process (GP) model. As there are multiple interconnected turbines, we consider a multivariate Gaussian process (MGP) modeling to predict the power generation from these turbines which can capture the cross-correlations between the turbines. Also, the sensitivity analysis of the input parameters is constructed for each turbine to find out the most important parameters.
Sequences of events including infectious disease outbreaks, social network activities, and crimes are ubiquitous and the data on such events carry essential information about the underlying diffusion processes between communities (e.g., regions, online user groups). Modeling diffusion processes and predicting future events are crucial in many applications including epidemic control, viral marketing, and predictive policing. Hawkes processes offer a central tool for modeling the diffusion processes, in which the influence from the past events is described by the triggering kernel. However, the triggering kernel parameters, which govern how each community is influenced by the past events, are assumed to be static over time. In the real world, the diffusion processes depend not only on the influences from the past, but also the current (time-evolving) states of the communities, e.g., people's awareness of the disease and people's current interests. In this paper, we propose a novel Hawkes process model that is able to capture the underlying dynamics of community states behind the diffusion processes and predict the occurrences of events based on the dynamics. Specifically, we model the latent dynamic function that encodes these hidden dynamics by a mixture of neural networks. Then we design the triggering kernel using the latent dynamic function and its integral. The proposed method, termed DHP (Dynamic Hawkes Processes), offers a flexible way to learn complex representations of the time-evolving communities' states, while at the same time it allows to computing the exact likelihood, which makes parameter learning tractable. Extensive experiments on four real-world event datasets show that DHP outperforms five widely adopted methods for event prediction.
Power laws have been found to describe a wide variety of natural (physical, biological, astronomic, meteorological, geological) and man-made (social, financial, computational) phenomena over a wide range of magnitudes, although their underlying mechanisms are not always clear. In statistics, power law distribution is often found to fit data exceptionally well when the normal (Gaussian) distribution fails. Nevertheless, predicting power law phenomena is notoriously difficult because some of its idiosyncratic properties such as lack of well-defined average value, and potentially unbounded variance. TPL (Taylor power law), a power law first discovered to characterize the spatial and/or temporal distribution of biological populations and recently extended to describe the spatiotemporal heterogeneities (distributions) of human microbiomes and other natural and artificial systems such as fitness distribution in computational (artificial) intelligence. The power law with exponential cutoff (PLEC) is a variant of power-law function that tapers off the exponential growth of power-law function ultimately and can be particularly useful for certain predictive problems such as biodiversity estimation and turning-point prediction for COVID-19 infection/fatality. Here, we propose coupling (integration) of TPL and PLEC to offer improved prediction quality of certain power-law phenomena. The coupling takes advantages of variance prediction using TPL and the asymptote estimation using PLEC and delivers confidence interval for the asymptote. We demonstrate the integrated approach to the estimation of potential (dark) biodiversity and turning point of COVID-19 fatality. We expect this integrative approach should have wide applications given the duel relationship between power law and normal statistical distributions.
Under-representation of certain populations, based on gender, race/ethnicity, and age, in data collection for predictive modeling may yield less-accurate predictions for the under-represented groups. Recently, this issue of fairness in predictions has attracted significant attention, as data-driven models are increasingly utilized to perform crucial decision-making tasks. Methods to achieve fairness in the machine learning literature typically build a single prediction model subject to some fairness criteria in a manner that encourages fair prediction performances for all groups. These approaches have two major limitations: i) fairness is often achieved by compromising accuracy for some groups; ii) the underlying relationship between dependent and independent variables may not be the same across groups. We propose a Joint Fairness Model (JFM) approach for binary outcomes that estimates group-specific classifiers using a joint modeling objective function that incorporates fairness criteria for prediction. We introduce an Accelerated Smoothing Proximal Gradient Algorithm to solve the convex objective function, and demonstrate the properties of the proposed JFM estimates. Next, we presented the key asymptotic properties for the JFM parameter estimates. We examined the efficacy of the JFM approach in achieving prediction performances and parities, in comparison with the Single Fairness Model, group-separate model, and group-ignorant model through extensive simulations. Finally, we demonstrated the utility of the JFM method in the motivating example to obtain fair risk predictions for under-represented older patients diagnosed with coronavirus disease 2019 (COVID-19).
A multivariate score-driven model is developed to extract signals from noisy vector processes. By assuming that the conditional location vector from a multivariate Student's \emph{t} distribution changes over time, we construct a robust filter which is able to overcome several issues that naturally arise when modeling heavy-tailed phenomena and, more in general, vectors of dependent non-Gaussian time series. We derive conditions for stationarity and invertibility and estimate the unknown parameters by maximum likelihood (ML). Strong consistency and asymptotic normality of the estimator are proved and the finite sample properties are illustrated by a Monte-Carlo study. From a computational point of view, analytical formulae are derived, which consent to develop estimation procedures based on the Fisher scoring method. The theory is supported by a novel empirical illustration that shows how the model can be effectively applied to estimate consumer prices from home scanner data.
In this paper we propose a method for the approximation of high-dimensional functions over finite intervals with respect to complete orthonormal systems of polynomials. An important tool for this is the multivariate classical analysis of variance (ANOVA) decomposition. For functions with a low-dimensional structure, i.e., a low superposition dimension, we are able to achieve a reconstruction from scattered data and simultaneously understand relationships between different variables.
Many real-world applications require the prediction of long sequence time-series, such as electricity consumption planning. Long sequence time-series forecasting (LSTF) demands a high prediction capacity of the model, which is the ability to capture precise long-range dependency coupling between output and input efficiently. Recent studies have shown the potential of Transformer to increase the prediction capacity. However, there are several severe issues with Transformer that prevent it from being directly applicable to LSTF, such as quadratic time complexity, high memory usage, and inherent limitation of the encoder-decoder architecture. To address these issues, we design an efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics: (i) a $ProbSparse$ Self-attention mechanism, which achieves $O(L \log L)$ in time complexity and memory usage, and has comparable performance on sequences' dependency alignment. (ii) the self-attention distilling highlights dominating attention by halving cascading layer input, and efficiently handles extreme long input sequences. (iii) the generative style decoder, while conceptually simple, predicts the long time-series sequences at one forward operation rather than a step-by-step way, which drastically improves the inference speed of long-sequence predictions. Extensive experiments on four large-scale datasets demonstrate that Informer significantly outperforms existing methods and provides a new solution to the LSTF problem.
Over the past decade, knowledge graphs became popular for capturing structured domain knowledge. Relational learning models enable the prediction of missing links inside knowledge graphs. More specifically, latent distance approaches model the relationships among entities via a distance between latent representations. Translating embedding models (e.g., TransE) are among the most popular latent distance approaches which use one distance function to learn multiple relation patterns. However, they are not capable of capturing symmetric relations. They also force relations with reflexive patterns to become symmetric and transitive. In order to improve distance based embedding, we propose multi-distance embeddings (MDE). Our solution is based on the idea that by learning independent embedding vectors for each entity and relation one can aggregate contrasting distance functions. Benefiting from MDE, we also develop supplementary distances resolving the above-mentioned limitations of TransE. We further propose an extended loss function for distance based embeddings and show that MDE and TransE are fully expressive using this loss function. Furthermore, we obtain a bound on the size of their embeddings for full expressivity. Our empirical results show that MDE significantly improves the translating embeddings and outperforms several state-of-the-art embedding models on benchmark datasets.
We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black-box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.
Multivariate time series forecasting is extensively studied throughout the years with ubiquitous applications in areas such as finance, traffic, environment, etc. Still, concerns have been raised on traditional methods for incapable of modeling complex patterns or dependencies lying in real word data. To address such concerns, various deep learning models, mainly Recurrent Neural Network (RNN) based methods, are proposed. Nevertheless, capturing extremely long-term patterns while effectively incorporating information from other variables remains a challenge for time-series forecasting. Furthermore, lack-of-explainability remains one serious drawback for deep neural network models. Inspired by Memory Network proposed for solving the question-answering task, we propose a deep learning based model named Memory Time-series network (MTNet) for time series forecasting. MTNet consists of a large memory component, three separate encoders, and an autoregressive component to train jointly. Additionally, the attention mechanism designed enable MTNet to be highly interpretable. We can easily tell which part of the historic data is referenced the most.
Person re-identification is being widely used in the forensic, and security and surveillance system, but person re-identification is a challenging task in real life scenario. Hence, in this work, a new feature descriptor model has been proposed using a multilayer framework of Gaussian distribution model on pixel features, which include color moments, color space values and Schmid filter responses. An image of a person usually consists of distinct body regions, usually with differentiable clothing followed by local colors and texture patterns. Thus, the image is evaluated locally by dividing the image into overlapping regions. Each region is further fragmented into a set of local Gaussians on small patches. A global Gaussian encodes, these local Gaussians for each region creating a multi-level structure. Hence, the global picture of a person is described by local level information present in it, which is often ignored. Also, we have analyzed the efficiency of earlier metric learning methods on this descriptor. The performance of the descriptor is evaluated on four public available challenging datasets and the highest accuracy achieved on these datasets are compared with similar state-of-the-arts, which demonstrate the superior performance.