Point processes are widely used statistical models for uncovering the temporal patterns in dependent event data. In many applications, the event time cannot be observed exactly, calling for the incorporation of time uncertainty into the modeling of point process data. In this work, we introduce a framework to model time-uncertain point processes possibly on a network. We start by deriving the formulation in the continuous-time setting under a few assumptions motivated by application scenarios. After imposing a time grid, we obtain a discrete-time model that facilitates inference and can be computed by first-order optimization methods such as Gradient Descent or Variation inequality (VI) using batch-based Stochastic Gradient Descent (SGD). The parameter recovery guarantee is proved for VI inference at an $O(1/k)$ convergence rate using $k$ SGD steps. Our framework handles non-stationary processes by modeling the inference kernel as a matrix (or tensor on a network) and it covers the stationary process, such as the classical Hawkes process, as a special case. We experimentally show that the proposed approach outperforms previous General Linear model (GLM) baselines on simulated and real data and reveals meaningful causal relations on a Sepsis-associated Derangements dataset.
Delay Tolerant Networking (DTN) aims to address a myriad of significant networking challenges that appear in time-varying settings, such as mobile and satellite networks, wherein changes in network topology are frequent and often subject to environmental constraints. Within this paradigm, routing problems are often solved by extending classical graph-theoretic path finding algorithms, such as the Bellman-Ford or Floyd-Warshall algorithms, to the time-varying setting; such extensions are simple to understand, but they have strict optimality criteria and can exhibit non-polynomial scaling. Acknowledging this, we study time-varying shortest path problems on metric graphs whose vertices are traced by semi-algebraic curves. As an exemplary application, we establish a polynomial upper bound on the number of topological critical events encountered by a set of $n$ satellites moving along elliptic curves in low Earth orbit (per orbital period). Experimental evaluations on networks derived from STARLINK satellite TLE's demonstrate that not only does this geometric framework allow for routing schemes between satellites requiring recomputation an order of magnitude less than graph-based methods, but it also demonstrates metric spanner properties exist in metric graphs derived from real-world data, opening the door for broader applications of geometric DTN routing.
The objective of clusterability evaluation is to check whether a clustering structure exists within the data set. As a crucial yet often-overlooked issue in cluster analysis, it is essential to conduct such a test before applying any clustering algorithm. If a data set is unclusterable, any subsequent clustering analysis would not yield valid results. Despite its importance, the majority of existing studies focus on numerical data, leaving the clusterability evaluation issue for categorical data as an open problem. Here we present TestCat, a testing-based approach to assess the clusterability of categorical data in terms of an analytical $p$-value. The key idea underlying TestCat is that clusterable categorical data possess many strongly associated attribute pairs and hence the sum of chi-squared statistics of all attribute pairs is employed as the test statistic for $p$-value calculation. We apply our method to a set of benchmark categorical data sets, showing that TestCat outperforms those solutions based on existing clusterability evaluation methods for numeric data. To the best of our knowledge, our work provides the first way to effectively recognize the clusterability of categorical data in a statistically sound manner.
Event reconstruction is a fundamental part of the digital forensic process, helping to answer key questions like who, what, when, and how. A common way of accomplishing that is to use tools to create timelines, which are then analyzed. However, various challenges exist, such as large volumes of data or contamination. While prior research has focused on simplifying timelines, less attention has been given to tampering, i.e., the deliberate manipulation of evidence, which can lead to errors in interpretation. This article addresses the issue by proposing a framework to assess the tamper resistance of data sources used in event reconstruction. We discuss factors affecting data resilience, introduce a scoring system for evaluation, and illustrate its application with case studies. This work aims to improve the reliability of forensic event reconstruction by considering tamper resistance.
A new integer--valued autoregressive process (INAR) with Generalised Lagrangian Katz (GLK) innovations is defined. This process family provides a flexible modelling framework for count data, allowing for under and over--dispersion, asymmetry, and excess of kurtosis and includes standard INAR models such as Generalized Poisson and Negative Binomial as special cases. We show that the GLK--INAR process is discrete semi--self--decomposable, infinite divisible, stable by aggregation and provides stationarity conditions. Some extensions are discussed, such as the Markov--Switching and the zero--inflated GLK--INARs. A Bayesian inference framework and an efficient posterior approximation procedure are introduced. The proposed models are applied to 130 time series from Google Trend, which proxy the worldwide public concern about climate change. New evidence is found of heterogeneity across time, countries and keywords in the persistence, uncertainty, and long--run public awareness level.
For several types of information relations, the induced rough sets system RS does not form a lattice but only a partially ordered set. However, by studying its Dedekind-MacNeille completion DM(RS), one may reveal new important properties of rough set structures. Building upon D. Umadevi's work on describing joins and meets in DM(RS), we previously investigated pseudo-Kleene algebras defined on DM(RS) for reflexive relations. This paper delves deeper into the order-theoretic properties of DM(RS) in the context of reflexive relations. We describe the completely join-irreducible elements of DM(RS) and characterize when DM(RS) is a spatial completely distributive lattice. We show that even in the case of a non-transitive reflexive relation, DM(RS) can form a Nelson algebra, a property generally associated with quasiorders. We introduce a novel concept, the core of a relational neighborhood, and use it to provide a necessary and sufficient condition for DM(RS) to determine a Nelson algebra.
Most existing tests in the literature for model checking do not work in high dimension settings due to challenges arising from the "curse of dimensionality", or dependencies on the normality of parameter estimators. To address these challenges, we proposed a new goodness of fit test based on random projections for generalized linear models, when the dimension of covariates may substantially exceed the sample size. The tests only require the convergence rate of parameter estimators to derive the limiting distribution. The growing rate of the dimension is allowed to be of exponential order in relation to the sample size. As random projection converts covariates to one-dimensional space, our tests can detect the local alternative departing from the null at the rate of $n^{-1/2}h^{-1/4}$ where $h$ is the bandwidth, and $n$ is the sample size. This sensitive rate is not related to the dimension of covariates, and thus the "curse of dimensionality" for our tests would be largely alleviated. An interesting and unexpected result is that for randomly chosen projections, the resulting test statistics can be asymptotic independent. We then proposed combination methods to enhance the power performance of the tests. Detailed simulation studies and a real data analysis are conducted to illustrate the effectiveness of our methodology.
In current multimodal tasks, models typically freeze the encoder and decoder while adapting intermediate layers to task-specific goals, such as region captioning. Region-level visual understanding presents significant challenges for large-scale vision-language models. While limited spatial awareness is a known issue, coarse-grained pretraining, in particular, exacerbates the difficulty of optimizing latent representations for effective encoder-decoder alignment. We propose AlignCap, a framework designed to enhance region-level understanding through fine-grained alignment of latent spaces. Our approach introduces a novel latent feature refinement module that enhances conditioned latent space representations to improve region-level captioning performance. We also propose an innovative alignment strategy, the semantic space alignment module, which boosts the quality of multimodal representations. Additionally, we incorporate contrastive learning in a novel manner within both modules to further enhance region-level captioning performance. To address spatial limitations, we employ a General Object Detection (GOD) method as a data preprocessing pipeline that enhances spatial reasoning at the regional level. Extensive experiments demonstrate that our approach significantly improves region-level captioning performance across various tasks
In reinsurance, Poisson and Negative binomial distributions are employed for modeling frequency. However, the incomplete data regarding reported incurred claims above a priority level presents challenges in estimation. This paper focuses on frequency estimation using Schnieper's framework for claim numbering. We demonstrate that Schnieper's model is consistent with a Poisson distribution for the total number of claims above a priority at each year of development, providing a robust basis for parameter estimation. Additionally, we explain how to build an alternative assumption based on a Negative binomial distribution, which yields similar results. The study includes a bootstrap procedure to manage uncertainty in parameter estimation and a case study comparing assumptions and evaluating the impact of the bootstrap approach.
We consider the bidimensional Stokes problem for incompressible fluids in stream function-vorticity. For this problem, the classical finite elements method of degree one converges only to order one-half for the L2 norm of the vorticity. We propose to use harmonic functions to approach the vorticity along the boundary. Discrete harmonics are functions that are used in practice to derive a new numerical method. We prove that we obtain with this numerical scheme an error of order one for the L2 norm of the vorticity.
Approximating field variables and data vectors from sparse samples is a key challenge in computational science. Widely used methods such as gappy proper orthogonal decomposition and empirical interpolation rely on linear approximation spaces, limiting their effectiveness for data representing transport-dominated and wave-like dynamics. To address this limitation, we introduce quadratic manifold sparse regression, which trains quadratic manifolds with a sparse greedy method and computes approximations on the manifold through novel nonlinear projections of sparse samples. The nonlinear approximations obtained with quadratic manifold sparse regression achieve orders of magnitude higher accuracies than linear methods on data describing transport-dominated dynamics in numerical experiments.