The Maximum Entropy Spectral Analysis (MESA) method, developed by Burg, provides a powerful tool to perform spectral estimation of a time-series. The method relies on a Jaynes' maximum entropy principle and provides the means of inferring the spectrum of a stochastic process in terms of the coefficients of some autoregressive process AR($p$) of order $p$. A closed form recursive solution provides an estimate of the autoregressive coefficients as well as of the order $p$ of the process. We provide a ready-to-use implementation of the algorithm in the form of a python package \texttt{memspectrum}. We characterize our implementation by performing a power spectral density analysis on synthetic data (with known power spectral density) and we compare different criteria for stopping the recursion. Furthermore, we compare the performance of our code with the ubiquitous Welch algorithm, using synthetic data generated from the released spectrum by the LIGO-Virgo collaboration. We find that, when compared to Welch's method, Burg's method provides a power spectral density (PSD) estimation with a systematically lower variance and bias. This is particularly manifest in the case of a little number of data points, making Burg's method most suitable to work in this regime.
Non-volatile random access memory (NVRAM) offers byte-addressable persistence at speeds comparable to DRAM. However, with caches remaining volatile, automatic cache evictions can reorder updates to memory, potentially leaving persistent memory in an inconsistent state upon a system crash. Flush and fence instructions can be used to force ordering among updates, but are expensive. This has motivated significant work studying how to write correct and efficient persistent programs for NVRAM. In this paper, we present FliT, a C++ library that facilitates writing efficient persistent code. Using the library's default mode makes any linearizable data structure durable with minimal changes to the code. FliT avoids many redundant flush instructions by using a novel algorithm to track dirty cache lines. The FliT library also allows for extra optimizations, but achieves good performance even in its default setting. To describe the FliT library's capabilities and guarantees, we define a persistent programming interface, called the P-V Interface, which FliT implements. The P-V Interface captures the expected behavior of code in which some instructions' effects are persisted and some are not. We show that the interface captures the desired semantics of many practical algorithms in the literature. We apply the FliT library to four different persistent data structures, and show that across several workloads, persistence implementations, and data structure sizes, the FliT library always improves operation throughput, by at least $2.1\times$ over a naive implementation in all but one workload.
We continue the investigation on the spectrum of operators arising from the discretization of partial differential equations. In this paper we consider a three field formulation recently introduced for the finite element least-squares approximation of linear elasticity. We discuss in particular the distribution of the discrete eigenvalues in the complex plane and how they approximate the positive real eigenvalues of the continuous problem. The dependence of the spectrum on the Lam\'e parameters is considered as well and its behavior when approaching the incompressible limit.
Bootstrap aggregation, known as bagging, is one of the most popular ensemble methods used in machine learning (ML). An ensemble method is a ML method that combines multiple hypotheses to form a single hypothesis used for prediction. A bagging algorithm combines multiple classifiers modeled on different sub-samples of the same data set to build one large classifier. Banks, and their retail banking activities, are nowadays using the power of ML algorithms, including decision trees and random forests, to optimize their processes. However, banks have to comply with regulators and governance and, hence, delivering effective ML solutions is a challenging task. It starts with the bank's validation and governance department, followed by the deployment of the solution in a production environment up to the external validation of the national financial regulator. Each proposed ML model has to be validated and clear rules for every algorithm-based decision must be justified. In this context, we propose XtracTree, an algorithm capable of efficiently converting an ML bagging classifier, such as a random forest, into simple "if-then" rules satisfying the requirements of model validation. We use a public loan data set from Kaggle to illustrate the usefulness of our approach. Our experiments demonstrate that using XtracTree, one can convert an ML model into a rule-based algorithm, leading to easier model validation by national financial regulators and the bank's validation department. The proposed approach allowed our banking institution to reduce up to 50% the time of delivery of our AI solutions to the end-user.
The extent to which a matching engine can cloud the modelling of underlying order submission and management processes in a financial market remains an unanswered concern with regards to market models. Here we consider a 10-variate Hawkes process with simple rules to simulate common order types which are submitted to a matching engine. Hawkes processes can be used to model the time and order of events, and how these events relate to each other. However, they provide a freedom with regards to implementation mechanics relating to the prices and volumes of injected orders. This allows us to consider a reference Hawkes model and two additional models which have rules that change the behaviour of limit orders. The resulting trade and quote data from the simulations are then calibrated and compared with the original order generating process to determine the extent with which implementation rules can distort model parameters. Evidence from validation and hypothesis tests suggest that the true model specification can be significantly distorted by market mechanics, and that practical considerations not directly due to model specification can be important with regards to model identification within an inherently asynchronous trading environment.
In this paper we consider the linear regression model $Y =S X+\varepsilon $ with functional regressors and responses. We develop new inference tools to quantify deviations of the true slope $S$ from a hypothesized operator $S_0$ with respect to the Hilbert--Schmidt norm $\| S- S_0\|^2$, as well as the prediction error $\mathbb{E} \| S X - S_0 X \|^2$. Our analysis is applicable to functional time series and based on asymptotically pivotal statistics. This makes it particularly user friendly, because it avoids the choice of tuning parameters inherent in long-run variance estimation or bootstrap of dependent data. We also discuss two sample problems as well as change point detection. Finite sample properties are investigated by means of a simulation study.\\ Mathematically our approach is based on a sequential version of the popular spectral cut-off estimator $\hat S_N$ for $S$. It is well-known that the $L^2$-minimax rates in the functional regression model, both in estimation and prediction, are substantially slower than $1/\sqrt{N}$ (where $N$ denotes the sample size) and that standard estimators for $S$ do not converge weakly to non-degenerate limits. However, we demonstrate that simple plug-in estimators - such as $\| \hat S_N - S_0 \|^2$ for $\| S - S_0 \|^2$ - are $\sqrt{N}$-consistent and its sequential versions satisfy weak invariance principles. These results are based on the smoothing effect of $L^2$-norms and established by a new proof-technique, the {\it smoothness shift}, which has potential applications in other statistical inverse problems.
Random graph models are used to describe the complex structure of real-world networks in diverse fields of knowledge. Studying their behavior and fitting properties are still critical challenges, that in general, require model specific techniques. An important line of research is to develop generic methods able to fit and select the best model among a collection. Approaches based on spectral density (i.e., distribution of the graph adjacency matrix eigenvalues) are appealing for that purpose: they apply to different random graph models. Also, they can benefit from the theoretical background of random matrix theory. This work investigates the convergence properties of model fitting procedures based on the graph spectral density and the corresponding cumulative distribution function. We also review results on the convergence of the spectral density for the most widely used random graph models. Moreover, we explore through simulations the limits of these graph spectral density convergence results, particularly in the case of the block model, where only partial results have been established.
The purpose of this study is to introduce software technologies and models and artificial intelligence algorithms to improve the weaknesses of CBT (Cognitive Behavior Therapy) method in psychotherapy. The presentation method for this purpose is the implementation of psychometric experiments in which the hidden human variables are inferred from the answers of tests. In this report, we describe the various models of Item Response Theory and measure the hidden components of ability and complementary parameters of the reality of the individual's situation. Psychometrics, selecting the appropriate model and estimating its parameters have been introduced and implemented using R language developed libraries. Due to the high flexibility of the Multi variant Rasch mixture Model, machine learning has been applied to this method of data modeling. BIC and CML were used to determine the number of hidden classes of the model and its parameters respectively, to obtain Measurement Invariance. The sensitivity of items to hidden attributes varies between groups (DIF), so methods for detecting it are introduced. This simulation is done based on the Verbal Aggression Dataset. We also analyze and compile a reference model based on this certificate based on the discovered patterns of software engineering. Other achievements of this study are related to providing a solution to explain the reengineering problems of the mind, by preparing an identity card for the clients by an ontology. Finally, applying the developed knowledge in the form of system thinking and recommended patterns in software engineering during the treatment process is pointed out.
A simple third order compact finite element method is proposed for one-dimensional Sturm-Liouville boundary value problems. The key idea is based on the interpolation error estimate, which can be related to the source term. Thus, a simple posterior error analysis or a modified basis functions based on original piecewise linear basis function will lead to a third order accurate solution in the $L^2$ norm, and second order in the $H^1$ or the energy norm. Numerical examples have confirmed our analysis.
In this work, we study the class of stochastic process that generalizes the Ornstein-Uhlenbeck processes, hereafter called by \emph{Generalized Ornstein-Uhlenbeck Type Process} and denoted by GOU type process. We consider them driven by the class of noise processes such as Brownian motion, symmetric $\alpha$-stable L\'evy process, a L\'evy process, and even a Poisson process. We give necessary and sufficient conditions under the memory kernel function for the time-stationary and the Markov properties for these processes. When the GOU type process is driven by a L\'evy noise we prove that it is infinitely divisible showing its generating triplet. Several examples derived from the GOU type process are illustrated showing some of their basic properties as well as some time series realizations. These examples also present their theoretical and empirical autocorrelation or normalized codifference functions depending on whether the process has a finite or infinite second moment. We also present the maximum likelihood estimation as well as the Bayesian estimation procedures for the so-called \emph{Cosine process}, a particular process in the class of GOU type processes. For the Bayesian estimation method, we consider the power series representation of Fox's H-function to better approximate the density function of a random variable $\alpha$-stable distributed. We consider four goodness-of-fit tests for helping to decide which \emph{Cosine process} (driven by a Gaussian or an $\alpha$-stable noise) best fit real data sets. Two applications of GOU type model are presented: one based on the Apple company stock market price data and the other based on the cardiovascular mortality in Los Angeles County data.
The area of Data Analytics on graphs promises a paradigm shift as we approach information processing of classes of data, which are typically acquired on irregular but structured domains (social networks, various ad-hoc sensor networks). Yet, despite its long history, current approaches mostly focus on the optimization of graphs themselves, rather than on directly inferring learning strategies, such as detection, estimation, statistical and probabilistic inference, clustering and separation from signals and data acquired on graphs. To fill this void, we first revisit graph topologies from a Data Analytics point of view, and establish a taxonomy of graph networks through a linear algebraic formalism of graph topology (vertices, connections, directivity). This serves as a basis for spectral analysis of graphs, whereby the eigenvalues and eigenvectors of graph Laplacian and adjacency matrices are shown to convey physical meaning related to both graph topology and higher-order graph properties, such as cuts, walks, paths, and neighborhoods. Next, to illustrate estimation strategies performed on graph signals, spectral analysis of graphs is introduced through eigenanalysis of mathematical descriptors of graphs and in a generic way. Finally, a framework for vertex clustering and graph segmentation is established based on graph spectral representation (eigenanalysis) which illustrates the power of graphs in various data association tasks. The supporting examples demonstrate the promise of Graph Data Analytics in modeling structural and functional/semantic inferences. At the same time, Part I serves as a basis for Part II and Part III which deal with theory, methods and applications of processing Data on Graphs and Graph Topology Learning from data.