Link streams offer a good model for representing interactions over time. They consist of links $(b,e,u,v)$, where $u$ and $v$ are vertices interacting during the whole time interval $[b,e]$. In this paper, we deal with the problem of enumerating maximal cliques in link streams. A clique is a pair $(C,[t_0,t_1])$, where $C$ is a set of vertices that all interact pairwise during the full interval $[t_0,t_1]$. It is maximal when neither its set of vertices nor its time interval can be increased. Some of the main works solving this problem are based on the famous Bron-Kerbosch algorithm for enumerating maximal cliques in graphs. We take this idea as a starting point to propose a new algorithm which matches the cliques of the instantaneous graphs formed by links existing at a given time $t$ to the maximal cliques of the link stream. We prove its validity and compute its complexity, which is better than the state-of-the art ones in many cases of interest. We also study the output-sensitive complexity, which is close to the output size, thereby showing that our algorithm is efficient. To confirm this, we perform experiments on link streams used in the state of the art, and on massive link streams, up to 100 million links. In all cases our algorithm is faster, mostly by a factor of at least 10 and up to a factor of $10^4$. Moreover, it scales to massive link streams for which the existing algorithms are not able to provide the solution.
Linear structural vector autoregressive models can be identified statistically without imposing restrictions on the model if the shocks are mutually independent and at most one of them is Gaussian. We show that this result extends to structural threshold and smooth transition vector autoregressive models incorporating a time-varying impact matrix defined as a weighted sum of the impact matrices of the regimes. We also discuss labelling of the shocks, maximum likelihood estimation of the parameters, and stationarity the model. The introduced methods are implemented to the accompanying R package sstvars. Our empirical application studies the effects of the climate policy uncertainty shock on the U.S. macroeconomy. In a structural logistic smooth transition vector autoregressive model consisting of two regimes, we find that a positive climate policy uncertainty shock decreases production in times of low economic policy uncertainty but slightly increases it in times of high economic policy uncertainty.
A tiling of a vector space $S$ is the pair $(U,V)$ of its subsets such that every vector in $S$ is uniquely represented as the sum of a vector from $U$ and a vector from $V$. A tiling is connected to a perfect codes if one of the sets, say $U$, is projective, i.e., the union of one-dimensional subspaces of $S$. A tiling $(U,V)$ is full-rank if the affine span of each of $U$, $V$ is $S$. For finite non-binary vector spaces of dimension at least $6$ (at least $10$), we construct full-rank tilings $(U,V)$ with projective $U$ (both $U$ and $V$, respectively). In particular, that construction gives a full-rank ternary $1$-perfect code of length $13$, solving a known problem. We also discuss the treatment of tilings with projective components as factorizations of projective spaces. Keywords: perfect codes, tilings, group factorization, full-rank tilings, projective geometry
We study the problem of estimating the score function of an unknown probability distribution $\rho^*$ from $n$ independent and identically distributed observations in $d$ dimensions. Assuming that $\rho^*$ is subgaussian and has a Lipschitz-continuous score function $s^*$, we establish the optimal rate of $\tilde \Theta(n^{-\frac{2}{d+4}})$ for this estimation problem under the loss function $\|\hat s - s^*\|^2_{L^2(\rho^*)}$ that is commonly used in the score matching literature, highlighting the curse of dimensionality where sample complexity for accurate score estimation grows exponentially with the dimension $d$. Leveraging key insights in empirical Bayes theory as well as a new convergence rate of smoothed empirical distribution in Hellinger distance, we show that a regularized score estimator based on a Gaussian kernel attains this rate, shown optimal by a matching minimax lower bound. We also discuss extensions to estimating $\beta$-H\"older continuous scores with $\beta \leq 1$, as well as the implication of our theory on the sample complexity of score-based generative models.
Numerical simulations are a highly valuable tool to evaluate the impact of the uncertainties of various modelparameters, and to optimize e.g. injection-production scenarios in the context of underground storage (of CO2typically). Finite volume approximations of Darcy's parabolic model for flows in porous media are typically runmany times, for many values of parameters like permeability and porosity, at costly computational efforts.We study the relevance of reduced basis methods as a way to lower the overall simulation cost of finite volumeapproximations to Darcy's parabolic model for flows in porous media for different values of the parameters suchas permeability. In the context of underground gas storage (of CO2 typically) in saline aquifers, our aim isto evaluate quickly, for many parameter values, the flux along some interior boundaries near the well injectionarea-regarded as a quantity of interest-. To this end, we construct reduced bases by a standard POD-Greedyalgorithm. Our POD-Greedy algorithm uses a new goal-oriented error estimator designed from a discrete space-time energy norm independent of the parameter. We provide some numerical experiments that validate theefficiency of the proposed estimator.
This work develops, for the first time, a face-centred finite volume (FCFV) solver for the simulation of laminar and turbulent viscous incompressible flows. The formulation relies on the Reynolds-averaged Navier-Stokes (RANS) equations coupled with the negative Spalart-Allmaras (SA) model and three novel convective stabilisations, inspired by Riemann solvers, are derived and compared numerically. The resulting method achieves first-order convergence of the velocity, the velocity-gradient tensor and the pressure. FCFV accurately predicts engineering quantities of interest, such as drag and lift, on unstructured meshes and, by avoiding gradient reconstruction, the method is less sensitive to mesh quality than other FV methods, even in the presence of highly distorted and stretched cells. A monolithic and a staggered solution strategies for the RANS-SA system are derived and compared numerically. Numerical benchmarks, involving laminar and turbulent, steady and transient cases are used to assess the performance, accuracy and robustness of the proposed FCFV method.
Deriving exact density functions for Gibbs point processes has been challenging due to their general intractability, stemming from the intractability of their normalising constants/partition functions. This paper offers a solution to this open problem by exploiting a recent alternative representation of point process densities. Here, for a finite point process, the density is expressed as the void probability multiplied by a higher-order Papangelou conditional intensity function. By leveraging recent results on dependent thinnings, exact expressions for generating functionals and void probabilities of locally stable point processes are derived. Consequently, exact expressions for density/likelihood functions, partition functions and posterior densities are also obtained. The paper finally extends the results to locally stable Gibbsian random fields on lattices by representing them as point processes.
We consider a recently proposed approach to graph signal processing based on graphons. We show how the graphon-based approach to GSP applies to graphs sampled from a stochastic block model. We obtain a basis for the graphon Fourier transform on such samples directly from the link probability matrix and the block sizes of the model. This formulation allows us to bound the sensitivity of the Fourier transform to small changes in block sizes. We then focus on the case where the probability matrix corresponds to a (weighted) Cayley graph. If block sizes are equal, a nice Fourier basis can be derived from the underlying group. We explore how, in the case where block sizes are not equal, some or all nice properties of the group basis can be maintained. We complement the theoretical results with simulations.
We consider a convex constrained Gaussian sequence model and characterize necessary and sufficient conditions for the least squares estimator (LSE) to be optimal in a minimax sense. For a closed convex set $K\subset \mathbb{R}^n$ we observe $Y=\mu+\xi$ for $\xi\sim N(0,\sigma^2\mathbb{I}_n)$ and $\mu\in K$ and aim to estimate $\mu$. We characterize the worst case risk of the LSE in multiple ways by analyzing the behavior of the local Gaussian width on $K$. We demonstrate that optimality is equivalent to a Lipschitz property of the local Gaussian width mapping. We also provide theoretical algorithms that search for the worst case risk. We then provide examples showing optimality or suboptimality of the LSE on various sets, including $\ell_p$ balls for $p\in[1,2]$, pyramids, solids of revolution, and multivariate isotonic regression, among others.
Depth first search is a fundamental graph problem having a wide range of applications. For a graph $G=(V,E)$ having $n$ vertices and $m$ edges, the DFS tree can be computed in $O(m+n)$ using $O(m)$ space where $m=O(n^2)$. In the streaming environment, most graph problems are studied in the semi-streaming model where several passes (preferably one) are allowed over the input, allowing $O(nk)$ local space for some $k=o(n)$. Trivially, using $O(m)$ space, DFS can be computed in one pass, and using $O(n)$ space, it can be computed in $O(n)$ passes. Khan and Mehta [STACS19] presented several algorithms allowing trade-offs between space and passes, where $O(nk)$ space results in $O(n/k)$ passes. They also empirically analyzed their algorithm to require only a few passes in practice for even $O(n)$ space. Chang et al. [STACS20] presented an alternate proof for the same and also presented $O(\sqrt{n})$ pass algorithm requiring $O(n~poly\log n)$ space with a finer trade-off between space and passes. However, their algorithm uses complex black box algorithms, making it impractical. We perform an experimental analysis of the practical semi-streaming DFS algorithms. Our analysis ranges from real graphs to random graphs (uniform and power-law). We also present several heuristics to improve the state-of-the-art algorithms and study their impact. Our heuristics improve state of the art by $40-90\%$, achieving optimal one pass in almost $40-50\%$ cases (improved from zero). In random graphs, they improve from $30-90\%$, again requiring optimal one pass for even very small values of $k$. Overall, our heuristics improved the relatively complex state-of-the-art algorithm significantly, requiring merely two passes in the worst case for random graphs. Additionally, our heuristics made the relatively simpler algorithm practically usable even for very small space bounds, which was impractical earlier.
The widespread use of maximum Jeffreys'-prior penalized likelihood in binomial-response generalized linear models, and in logistic regression, in particular, are supported by the results of Kosmidis and Firth (2021, Biometrika), who show that the resulting estimates are always finite-valued, even in cases where the maximum likelihood estimates are not, which is a practical issue regardless of the size of the data set. In logistic regression, the implied adjusted score equations are formally bias-reducing in asymptotic frameworks with a fixed number of parameters and appear to deliver a substantial reduction in the persistent bias of the maximum likelihood estimator in high-dimensional settings where the number of parameters grows asymptotically as a proportion of the number of observations. In this work, we develop and present two new variants of iteratively reweighted least squares for estimating generalized linear models with adjusted score equations for mean bias reduction and maximization of the likelihood penalized by a positive power of the Jeffreys-prior penalty, which eliminate the requirement of storing $O(n)$ quantities in memory, and can operate with data sets that exceed computer memory or even hard drive capacity. We achieve that through incremental QR decompositions, which enable IWLS iterations to have access only to data chunks of predetermined size. Both procedures can also be readily adapted to fit generalized linear models when distinct parts of the data is stored across different sites and, due to privacy concerns, cannot be fully transferred across sites. We assess the procedures through a real-data application with millions of observations.