Hierarchical time series are common in several applied fields. The forecasts for these time series are required to be coherent, that is, to satisfy the constraints given by the hierarchy. The most popular technique to enforce coherence is called reconciliation, which adjusts the base forecasts computed for each time series. However, recent works on probabilistic reconciliation present several limitations. In this paper, we propose a new approach based on conditioning to reconcile any type of forecast distribution. We then introduce a new algorithm, called Bottom-Up Importance Sampling, to efficiently sample from the reconciled distribution. It can be used for any base forecast distribution: discrete, continuous, or in the form of samples, providing a major speedup compared to the current methods. Experiments on several temporal hierarchies show a significant improvement over base probabilistic forecasts.
Helmholtz decompositions of elastic fields is a common approach for the solution of Navier scattering problems. Used in the context of Boundary Integral Equations (BIE), this approach affords solutions of Navier problems via the simpler Helmholtz boundary integral operators (BIOs). Approximations of Helmholtz Dirichlet-to-Neumann (DtN) can be employed within a regularizing combined field strategy to deliver BIE formulations of the second kind for the solution of Navier scattering problems in two dimensions with Dirichlet boundary conditions, at least in the case of smooth boundaries. Unlike the case of scattering and transmission Helmholtz problems, the approximations of the DtN maps we use in the Helmholtz decomposition BIE in the Navier case require incorporation of lower order terms in their pseudodifferential asymptotic expansions. The presence of these lower order terms in the Navier regularized BIE formulations complicates the stability analysis of their Nystr\"om discretizations in the framework of global trigonometric interpolation and the Kussmaul-Martensen kernel singularity splitting strategy. The main difficulty stems from compositions of pseudodifferential operators of opposite orders, whose Nystr\"om discretization must be performed with care via pseudodifferential expansions beyond the principal symbol. The error analysis is significantly simpler in the case of arclength boundary parametrizations and considerably more involved in the case of general smooth parametrizations which are typically encountered in the description of one dimensional closed curves.
Threshold selection is a fundamental problem in any threshold-based extreme value analysis. While models are asymptotically motivated, selecting an appropriate threshold for finite samples can be difficult through standard methods. Inference can also be highly sensitive to the choice of threshold. Too low a threshold choice leads to bias in the fit of the extreme value model, while too high a choice leads to unnecessary additional uncertainty in the estimation of model parameters. In this paper, we develop a novel methodology for automated threshold selection that directly tackles this bias-variance trade-off. We also develop a method to account for the uncertainty in this threshold choice and propagate this uncertainty through to high quantile inference. Through a simulation study, we demonstrate the effectiveness of our method for threshold selection and subsequent extreme quantile estimation. We apply our method to the well-known, troublesome example of the River Nidd dataset.
Statistical analysis of extremes can be used to predict the probability of future extreme events, such as large rainfalls or devastating windstorms. The quality of these forecasts can be measured through scoring rules. Locally scale invariant scoring rules give equal importance to the forecasts at different locations regardless of differences in the prediction uncertainty. This is a useful feature when computing average scores but can be an unnecessarily strict requirement when mostly concerned with extremes. We propose the concept of local weight-scale invariance, describing scoring rules fulfilling local scale invariance in a certain region of interest, and as a special case local tail-scale invariance, for large events. Moreover, a new version of the weighted Continuous Ranked Probability score (wCRPS) called the scaled wCRPS (swCRPS) that possesses this property is developed and studied. The score is a suitable alternative for scoring extreme value models over areas with varying scale of extreme events, and we derive explicit formulas of the score for the Generalised Extreme Value distribution. The scoring rules are compared through simulation, and their usage is illustrated in modelling of extreme water levels, annual maximum rainfalls, and in an application to non-extreme forecast for the prediction of air pollution.
Linear regression and classification methods with repeated functional data are considered. For each statistical unit in the sample, a real-valued parameter is observed over time under different conditions. Two regression methods based on fusion penalties are presented. The first one is a generalization of the variable fusion methodology based on the 1-nearest neighbor. The second one, called group fusion lasso, assumes some grouping structure of conditions and allows for homogeneity among the regression coefficient functions within groups. A finite sample numerical simulation and an application on EEG data are presented.
A novel method for detecting faults in power grids using a graph neural network (GNN) has been developed, aimed at enhancing intelligent fault diagnosis in network operation and maintenance. This GNN-based approach identifies faulty nodes within the power grid through a specialized electrical feature extraction model coupled with a knowledge graph. Incorporating temporal data, the method leverages the status of nodes from preceding and subsequent time periods to aid in current fault detection. To validate the effectiveness of this GNN in extracting node features, a correlation analysis of the output features from each node within the neural network layer was conducted. The results from experiments show that this method can accurately locate fault nodes in simulated scenarios with a remarkable 99.53% accuracy. Additionally, the graph neural network's feature modeling allows for a qualitative examination of how faults spread across nodes, providing valuable insights for analyzing fault nodes.
Many modern datasets exhibit dependencies among observations as well as variables. This gives rise to the challenging problem of analyzing high-dimensional matrix-variate data with unknown dependence structures. To address this challenge, Kalaitzis et. al. (2013) proposed the Bigraphical Lasso (BiGLasso), an estimator for precision matrices of matrix-normals based on the Cartesian product of graphs. Subsequently, Greenewald, Zhou and Hero (GZH 2019) introduced a multiway tensor generalization of the BiGLasso estimator, known as the TeraLasso estimator. In this paper, we provide sharper rates of convergence in the Frobenius and operator norm for both BiGLasso and TeraLasso estimators for estimating inverse covariance matrices. This improves upon the rates presented in GZH 2019. In particular, (a) we strengthen the bounds for the relative errors in the operator and Frobenius norm by a factor of approximately $\log p$; (b) Crucially, this improvement allows for finite-sample estimation errors in both norms to be derived for the two-way Kronecker sum model. The two-way regime is important because it is the setting that is the most theoretically challenging, and simultaneously the most common in applications. Normality is not needed in our proofs; instead, we consider sub-gaussian ensembles and derive tight concentration of measure bounds, using tensor unfolding techniques. The proof techniques may be of independent interest.
Aberrant respondents are common but yet extremely detrimental to the quality of social surveys or questionnaires. Recently, factor mixture models have been employed to identify individuals providing deceptive or careless responses. We propose a comprehensive factor mixture model that combines confirmatory and exploratory factor models to represent both the non-aberrant and aberrant components of the responses. The flexibility of the proposed solution allows for the identification of two of the most common aberant response styles, namely faking and careless responding. We validated our approach by means of two simulations and two case studies. The results indicate the effectiveness of the proposed model in handling with aberrant responses in social and behavioral surveys.
A simple way of obtaining robust estimates of the "center" (or the "location") and of the "scatter" of a dataset is to use the maximum likelihood estimate with a class of heavy-tailed distributions, regardless of the "true" distribution generating the data. We observe that the maximum likelihood problem for the Cauchy distributions, which have particularly heavy tails, is geodesically convex and therefore efficiently solvable (Cauchy distributions are parametrized by the upper half plane, i.e. by the hyperbolic plane). Moreover, it has an appealing geometrical meaning: the datapoints, living on the boundary of the hyperbolic plane, are attracting the parameter by unit forces, and we search the point where these forces are in equilibrium. This picture generalizes to several classes of multivariate distributions with heavy tails, including, in particular, the multivariate Cauchy distributions. The hyperbolic plane gets replaced by symmetric spaces of noncompact type. Geodesic convexity gives us an efficient numerical solution of the maximum likelihood problem for these distribution classes. This can then be used for robust estimates of location and spread, thanks to the heavy tails of these distributions.
Appendicitis is among the most frequent reasons for pediatric abdominal surgeries. Previous decision support systems for appendicitis have focused on clinical, laboratory, scoring, and computed tomography data and have ignored abdominal ultrasound, despite its noninvasive nature and widespread availability. In this work, we present interpretable machine learning models for predicting the diagnosis, management and severity of suspected appendicitis using ultrasound images. Our approach utilizes concept bottleneck models (CBM) that facilitate interpretation and interaction with high-level concepts understandable to clinicians. Furthermore, we extend CBMs to prediction problems with multiple views and incomplete concept sets. Our models were trained on a dataset comprising 579 pediatric patients with 1709 ultrasound images accompanied by clinical and laboratory data. Results show that our proposed method enables clinicians to utilize a human-understandable and intervenable predictive model without compromising performance or requiring time-consuming image annotation when deployed. For predicting the diagnosis, the extended multiview CBM attained an AUROC of 0.80 and an AUPR of 0.92, performing comparably to similar black-box neural networks trained and tested on the same dataset.
Hashing has been widely used in approximate nearest search for large-scale database retrieval for its computation and storage efficiency. Deep hashing, which devises convolutional neural network architecture to exploit and extract the semantic information or feature of images, has received increasing attention recently. In this survey, several deep supervised hashing methods for image retrieval are evaluated and I conclude three main different directions for deep supervised hashing methods. Several comments are made at the end. Moreover, to break through the bottleneck of the existing hashing methods, I propose a Shadow Recurrent Hashing(SRH) method as a try. Specifically, I devise a CNN architecture to extract the semantic features of images and design a loss function to encourage similar images projected close. To this end, I propose a concept: shadow of the CNN output. During optimization process, the CNN output and its shadow are guiding each other so as to achieve the optimal solution as much as possible. Several experiments on dataset CIFAR-10 show the satisfying performance of SRH.