We revisit the Heaviest Induced Ancestors (HIA) problem that was introduced by Gagie, Gawrychowski, and Nekrich [CCCG 2013] and has a number of applications in string algorithms. Let $T_1$ and $T_2$ be two rooted trees whose nodes have weights that are increasing in all root-to-leaf paths, and labels on the leaves, such that no two leaves of a tree have the same label. A pair of nodes $(u, v)\in T_1 \times T_2$ is \emph{induced} if and only if there is a label shared by leaf-descendants of $u$ and $v$. In an HIA query, given nodes $x \in T_1$ and $y \in T_2$, the goal is to find an induced pair of nodes $(u, v)$ of the maximum total weight such that $u$ is an ancestor of~$x$ and $v$ is an ancestor of $y$. Let $n$ be the upper bound on the sizes of the two trees. It is known that no data structure of size $\tilde{\mathcal{O}}(n)$ can answer HIA queries in $o(\log n / \log \log n)$ time [Charalampopoulos, Gawrychowski, Pokorski; ICALP 2020]. This (unconditional) lower bound is a $\operatorname{polyloglog} n$ factor away from the query time of the fastest $\tilde{\mathcal{O}}(n)$-size data structure known to date for the HIA problem [Abedin, Hooshmand, Ganguly, Thankachan; Algorithmica 2022]. In this work, we resolve the query-time complexity of the HIA problem for the near-linear space regime by presenting a data structure that can be built in $\tilde{\mathcal{O}}(n)$ time and answers HIA queries in $\mathcal{O}(\log n/\log\log n)$ time. As a direct corollary, we obtain an $\tilde{\mathcal{O}}(n)$-size data structure that maintains the LCS of a static string and a dynamic string, both of length at most $n$, in time optimal for this space regime. The main ingredients of our approach are fractional cascading and the utilization of an $\mathcal{O}(\log n/ \log\log n)$-depth tree decomposition.
This work introduces a method to select linear functional measurements of a vector-valued time series optimized for forecasting distant time-horizons. By formulating and solving the problem of sequential linear measurement design as an infinite-horizon problem with the time-averaged trace of the Cram\'{e}r-Rao lower bound (CRLB) for forecasting as the cost, the most informative data can be collected irrespective of the eventual forecasting algorithm. By introducing theoretical results regarding measurements under additive noise from natural exponential families, we construct an equivalent problem from which a local dimensionality reduction can be derived. This alternative formulation is based on the future collapse of dimensionality inherent in the limiting behavior of many differential equations and can be directly observed in the low-rank structure of the CRLB for forecasting. Implementations of both an approximate dynamic programming formulation and the proposed alternative are illustrated using an extended Kalman filter for state estimation, with results on simulated systems with limit cycles and chaotic behavior demonstrating a linear improvement in the CRLB as a function of the number of collapsing dimensions of the system.
Beeping models are models for networks of weak devices, such as sensor networks or biological networks. In these networks, nodes are allowed to communicate only via emitting beeps: unary pulses of energy. Listening nodes only the capability of {\it carrier sensing}: they can only distinguish between the presence or absence of a beep, but receive no other information. The noisy beeping model further assumes listening nodes may be disrupted by random noise. Despite this extremely restrictive communication model, it transpires that complex distributed tasks can still be performed by such networks. In this paper we provide an optimal procedure for simulating general message passing in the beeping and noisy beeping models. We show that a round of \textsf{Broadcast CONGEST} can be simulated in $O(\Delta\log n)$ round of the noisy (or noiseless) beeping model, and a round of \textsf{CONGEST} can be simulated in $O(\Delta^2\log n)$ rounds (where $\Delta$ is the maximum degree of the network). We also prove lower bounds demonstrating that no simulation can use asymptotically fewer rounds. This allows a host of graph algorithms to be efficiently implemented in beeping models. As an example, we present an $O(\log n)$-round \textsf{Broadcast CONGEST} algorithm for maximal matching, which, when simulated using our method, immediately implies a near-optimal $O(\Delta \log^2 n)$-round maximal matching algorithm in the noisy beeping model.
Deep learning algorithms have recently shown to be a successful tool in estimating parameters of statistical models for which simulation is easy, but likelihood computation is challenging. But the success of these approaches depends on simulating parameters that sufficiently reproduce the observed data, and, at present, there is a lack of efficient methods to produce these simulations. We develop new black-box procedures to estimate parameters of statistical models based only on weak parameter structure assumptions. For well-structured likelihoods with frequent occurrences, such as in time series, this is achieved by pre-training a deep neural network on an extensive simulated database that covers a wide range of data sizes. For other types of complex dependencies, an iterative algorithm guides simulations to the correct parameter region in multiple rounds. These approaches can successfully estimate and quantify the uncertainty of parameters from non-Gaussian models with complex spatial and temporal dependencies. The success of our methods is a first step towards a fully flexible automatic black-box estimation framework.
In this paper we introduce a variant of optimal transport adapted to the causal structure given by an underlying directed graph. Different graph structures lead to different specifications of the optimal transport problem. For instance, a fully connected graph yields standard optimal transport, a linear graph structure corresponds to adapted optimal transport, and an empty graph leads to a notion of optimal transport related to CO-OT, Gromov-Wasserstein distances and factored OT. We derive different characterizations of causal transport plans and introduce Wasserstein distances between causal models that respect the underlying graph structure. We show that average treatment effects are continuous with respect to causal Wasserstein distances and small perturbations of structural causal models lead to small deviations in causal Wasserstein distance. We also introduce an interpolation between causal models based on causal Wasserstein distance and compare it to standard Wasserstein interpolation.
We consider a multi-process remote estimation system observing $K$ independent Ornstein-Uhlenbeck processes. In this system, a shared sensor samples the $K$ processes in such a way that the long-term average sum mean square error (MSE) is minimized. The sensor operates under a total sampling frequency constraint $f_{\max}$. The samples from all processes consume random processing delays in a shared queue and then are transmitted over an erasure channel with probability $\epsilon$. We study two variants of the problem: first, when the samples are scheduled according to a Maximum-Age-First (MAF) policy, and the receiver provides an erasure status feedback; and second, when samples are scheduled according to a Round-Robin (RR) policy, when there is no erasure status feedback from the receiver. Aided by optimal structural results, we show that the optimal sampling policy for both settings, under some conditions, is a \emph{threshold policy}. We characterize the optimal threshold and the corresponding optimal long-term average sum MSE as a function of $K$, $f_{\max}$, $\epsilon$, and the statistical properties of the observed processes. Our results show that, with an exponentially distributed service rate, the optimal threshold $\tau^*$ increases as the number of processes $K$ increases, for both settings. Additionally, we show that the optimal threshold is an \emph{increasing} function of $\epsilon$ in the case of \emph{available} erasure status feedback, while it exhibits the \emph{opposite behavior}, i.e., $\tau^*$ is a \emph{decreasing} function of $\epsilon$, in the case of \emph{absent} erasure status feedback.
We investigate the convergence and convergence rate of stochastic training algorithms for Neural Networks (NNs) that have been inspired by Dropout (Hinton et al., 2012). With the goal of avoiding overfitting during training of NNs, dropout algorithms consist in practice of multiplying the weight matrices of a NN componentwise by independently drawn random matrices with $\{0, 1 \}$-valued entries during each iteration of Stochastic Gradient Descent (SGD). This paper presents a probability theoretical proof that for fully-connected NNs with differentiable, polynomially bounded activation functions, if we project the weights onto a compact set when using a dropout algorithm, then the weights of the NN converge to a unique stationary point of a projected system of Ordinary Differential Equations (ODEs). After this general convergence guarantee, we go on to investigate the convergence rate of dropout. Firstly, we obtain generic sample complexity bounds for finding $\epsilon$-stationary points of smooth nonconvex functions using SGD with dropout that explicitly depend on the dropout probability. Secondly, we obtain an upper bound on the rate of convergence of Gradient Descent (GD) on the limiting ODEs of dropout algorithms for NNs with the shape of arborescences of arbitrary depth and with linear activation functions. The latter bound shows that for an algorithm such as Dropout or Dropconnect (Wan et al., 2013), the convergence rate can be impaired exponentially by the depth of the arborescence. In contrast, we experimentally observe no such dependence for wide NNs with just a few dropout layers. We also provide a heuristic argument for this observation. Our results suggest that there is a change of scale of the effect of the dropout probability in the convergence rate that depends on the relative size of the width of the NN compared to its depth.
In this paper we deal with global approximation of solutions of stochastic differential equations (SDEs) driven by countably dimensional Wiener process. Under certain regularity conditions imposed on the coefficients, we show lower bounds for exact asymptotic error behaviour. For that reason, we analyse separately two classes of admissible algorithms: based on equidistant, and possibly not equidistant meshes. Our results indicate that in both cases, decrease of any method error requires significant increase of the cost term, which is illustrated by the product of cost and error diverging to infinity. This is, however, not visible in the finite dimensional case. In addition, we propose an implementable, path-independent Euler algorithm with adaptive step-size control, which is asymptotically optimal among algorithms using specified truncation levels of the underlying Wiener process. Our theoretical findings are supported by numerical simulation in Python language.
The conditional moment problem is a powerful formulation for describing structural causal parameters in terms of observables, a prominent example being instrumental variable regression. A standard approach reduces the problem to a finite set of marginal moment conditions and applies the optimally weighted generalized method of moments (OWGMM), but this requires we know a finite set of identifying moments, can still be inefficient even if identifying, or can be theoretically efficient but practically unwieldy if we use a growing sieve of moment conditions. Motivated by a variational minimax reformulation of OWGMM, we define a very general class of estimators for the conditional moment problem, which we term the variational method of moments (VMM) and which naturally enables controlling infinitely-many moments. We provide a detailed theoretical analysis of multiple VMM estimators, including ones based on kernel methods and neural nets, and provide conditions under which these are consistent, asymptotically normal, and semiparametrically efficient in the full conditional moment model. We additionally provide algorithms for valid statistical inference based on the same kind of variational reformulations, both for kernel- and neural-net-based varieties. Finally, we demonstrate the strong performance of our proposed estimation and inference algorithms in a detailed series of synthetic experiments.
We present a solution to consensus on a torus with Byzantine faults. Any solution to classic consensus that is tolerant to $f$ Byzantine faults requires $2f+1$ node-disjoint paths. Due to limited torus connectivity, this bound necessitates spatial separation between faults. Our solution does not require this many disjoint paths and tolerates dense faults. Specifically, we consider the case where all faults are in the one column. We address the version of consensus where only processes in fault-free columns must agree. We prove that even this weaker version is not solvable if the column may be completely faulty. We then present a solution for the case where at least one row is fault-free. The correct processes share orientation but do not know the identities of other processes or the torus dimensions. The communication is synchronous. To achieve our solution, we build and prove correct an all-to-all broadcast algorithm \PROG{BAT} that guarantees delivery to all processes in fault-free columns. We use this algorithm to solve our weak consensus problem. Our solution, \PROG{CBAT}, runs in $O(H+W)$ rounds, where $H$ and $W$ are torus height and width respectively. We extend our consensus solution to the fixed message size model where it runs in $O(H^3W^2)$ rounds. Our results are immediately applicable if the faults are located in a single row, rather than a column.
We consider the problem of discovering $K$ related Gaussian directed acyclic graphs (DAGs), where the involved graph structures share a consistent causal order and sparse unions of supports. Under the multi-task learning setting, we propose a $l_1/l_2$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models. We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order (or topological order) than separate estimations. Moreover, the joint estimator is able to recover non-identifiable DAGs, by estimating them together with some identifiable DAGs. Lastly, our analysis also shows the consistency of union support recovery of the structures. To allow practical implementation, we design a continuous optimization problem whose optimizer is the same as the joint estimator and can be approximated efficiently by an iterative algorithm. We validate the theoretical analysis and the effectiveness of the joint estimator in experiments.