A minimum chain cover (MCC) of a $k$-width directed acyclic graph (DAG) $G = (V, E)$ is a set of $k$ chains (paths in the transitive closure) of $G$ such that every vertex appears in at least one chain in the cover. The state-of-the-art solutions for MCC run in time $\tilde{O}(k(|V|+|E|))$ [M\"akinen et at., TALG], $O(T_{MF}(|E|) + k|V|)$, $O(k^2|V| + |E|)$ [C\'aceres et al., SODA 2022], $\tilde{O}(|V|^{3/2} + |E|)$ [Kogan and Parter, ICALP 2022] and $\tilde{O}(T_{MCF}(|E|) + \sqrt{k}|V|)$ [Kogan and Parter, SODA 2023], where $T_{MF}(|E|)$ and $T_{MCF}(|E|)$ are the running times for solving maximum flow (MF) and minimum-cost flow (MCF), respectively. In this work we present an algorithm running in time $O(T_{MF}(|E|) + (|V|+|E|)\log{k})$. By considering the recent result for solving MF [Chen et al., FOCS 2022] our algorithm is the first running in almost linear time. Moreover, our techniques are deterministic and derive a deterministic near-linear time algorithm for MCC if the same is provided for MF. At the core of our solution we use a modified version of the mergeable dictionaries [Farach and Thorup, Algorithmica], [Iacono and \"Ozkan, ICALP 2010] data structure boosted with the SIZE-SPLIT operation and answering queries in amortized logarithmic time, which can be of independent interest.
Gaussian processes (GPs) are widely-used tools in spatial statistics and machine learning and the formulae for the mean function and covariance kernel of a GP $v$ that is the image of another GP $u$ under a linear transformation $T$ acting on the sample paths of $u$ are well known, almost to the point of being folklore. However, these formulae are often used without rigorous attention to technical details, particularly when $T$ is an unbounded operator such as a differential operator, which is common in several modern applications. This note provides a self-contained proof of the claimed formulae for the case of a closed, densely-defined operator $T$ acting on the sample paths of a square-integrable stochastic process. Our proof technique relies upon Hille's theorem for the Bochner integral of a Banach-valued random variable.
Learning to control unknown nonlinear dynamical systems is a fundamental problem in reinforcement learning and control theory. A commonly applied approach is to first explore the environment (exploration), learn an accurate model of it (system identification), and then compute an optimal controller with the minimum cost on this estimated system (policy optimization). While existing work has shown that it is possible to learn a uniformly good model of the system~\citep{mania2020active}, in practice, if we aim to learn a good controller with a low cost on the actual system, certain system parameters may be significantly more critical than others, and we therefore ought to focus our exploration on learning such parameters. In this work, we consider the setting of nonlinear dynamical systems and seek to formally quantify, in such settings, (a) which parameters are most relevant to learning a good controller, and (b) how we can best explore so as to minimize uncertainty in such parameters. Inspired by recent work in linear systems~\citep{wagenmaker2021task}, we show that minimizing the controller loss in nonlinear systems translates to estimating the system parameters in a particular, task-dependent metric. Motivated by this, we develop an algorithm able to efficiently explore the system to reduce uncertainty in this metric, and prove a lower bound showing that our approach learns a controller at a near-instance-optimal rate. Our algorithm relies on a general reduction from policy optimization to optimal experiment design in arbitrary systems, and may be of independent interest. We conclude with experiments demonstrating the effectiveness of our method in realistic nonlinear robotic systems.
In this paper we consider an orthonormal basis, generated by a tensor product of Fourier basis functions, half period cosine basis functions, and the Chebyshev basis functions. We deal with the approximation problem in high dimensions related to this basis and design a fast algorithm to multiply with the underlying matrix, consisting of rows of the non-equidistant Fourier matrix, the non-equidistant cosine matrix and the non-equidistant Chebyshev matrix, and its transposed. This leads us to an ANOVA (analysis of variance) decomposition for functions with partially periodic boundary conditions through using the Fourier basis in some dimensions and the half period cosine basis or the Chebyshev basis in others. We consider sensitivity analysis in this setting, in order to find an adapted basis for the underlying approximation problem. More precisely, we find the underlying index set of the multidimensional series expansion. Additionally, we test this ANOVA approximation with mixed basis at numerical experiments, and refer to the advantage of interpretable results.
A treatment policy defines when and what treatments are applied to affect some outcome of interest. Data-driven decision-making requires the ability to predict what happens if a policy is changed. Existing methods that predict how the outcome evolves under different scenarios assume that the tentative sequences of future treatments are fixed in advance, while in practice the treatments are determined stochastically by a policy and may depend, for example, on the efficiency of previous treatments. Therefore, the current methods are not applicable if the treatment policy is unknown or a counterfactual analysis is needed. To handle these limitations, we model the treatments and outcomes jointly in continuous time, by combining Gaussian processes and point processes. Our model enables the estimation of a treatment policy from observational sequences of treatments and outcomes, and it can predict the interventional and counterfactual progression of the outcome after an intervention on the treatment policy (in contrast with the causal effect of a single treatment). We show with real-world and semi-synthetic data on blood glucose progression that our method can answer causal queries more accurately than existing alternatives.
Bayesian inference and the use of posterior or posterior predictive probabilities for decision making have become increasingly popular in clinical trials. The current approach toward Bayesian clinical trials is, however, a hybrid Bayesian-frequentist approach where the design and decision criteria are assessed with respect to frequentist operating characteristics such as power and type I error rate. These operating characteristics are commonly obtained via simulation studies. In this article we propose methodology to utilize large sample theory of the posterior distribution to define simple parametric models for the sampling distribution of the Bayesian test statistics, i.e., posterior tail probabilities. The parameters of these models are then estimated using a small number of simulation scenarios, thereby refining these models to capture the sampling distribution for small to moderate sample size. The proposed approach toward assessment of operating characteristics and sample size determination can be considered as simulation-assisted rather than simulation-based and significantly reduces the computational burden for design of Bayesian trials.
Representation learning based on multi-task pretraining has become a powerful approach in many domains. In particular, task-aware representation learning aims to learn an optimal representation for a specific target task by sampling data from a set of source tasks, while task-agnostic representation learning seeks to learn a universal representation for a class of tasks. In this paper, we propose a general and versatile algorithmic and theoretic framework for \textit{active representation learning}, where the learner optimally chooses which source tasks to sample from. This framework, along with a tractable meta algorithm, allows most arbitrary target and source task spaces (from discrete to continuous), covers both task-aware and task-agnostic settings, and is compatible with deep representation learning practices. We provide several instantiations under this framework, from bilinear and feature-based nonlinear to general nonlinear cases. In the bilinear case, by leveraging the non-uniform spectrum of the task representation and the calibrated source-target relevance, we prove that the sample complexity to achieve $\varepsilon$-excess risk on target scales with $ (k^*)^2 \|v^*\|_2^2 \varepsilon^{-2}$ where $k^*$ is the effective dimension of the target and $\|v^*\|_2^2 \in (0,1]$ represents the connection between source and target space. Compared to the passive one, this can save up to $\frac{1}{d_W}$ of sample complexity, where $d_W$ is the task space dimension. Finally, we demonstrate different instantiations of our meta algorithm in synthetic datasets and robotics problems, from pendulum simulations to real-world drone flight datasets. On average, our algorithms outperform baselines by $20\%-70\%$.
Motivated by the virtual machine scheduling problem in today's computing systems, we propose a new setting of stochastic bin-packing in service systems that allows the item sizes (job resource requirements) to vary over time. In this setting, items (jobs) arrive to the system, vary their sizes, and depart from the system following certain Markovian assumptions. We focus on minimizing the expected number of non-empty bins (active servers) in steady state, where the expectation in steady state is equal to the long-run time-average with probability $1$ under the Markovian assumptions. Our main result is a policy that achieves an optimality gap of $O(\sqrt{r})$ in the objective, where the optimal objective value is $\Theta(r)$ and $r$ is a scaling factor such that the item arrival intensity scales linearly with it. When specialized to the setting where the item sizes do not vary over time, our result improves upon the state-of-the-art $o(r)$ optimality gap. Our technical approach highlights a novel policy conversion framework that reduces the policy design problem to that in a single-bin (single-server) system.
Causal effect estimation from observational data is a fundamental task in empirical sciences. It becomes particularly challenging when unobserved confounders are involved in a system. This paper focuses on front-door adjustment -- a classic technique which, using observed mediators allows to identify causal effects even in the presence of unobserved confounding. While the statistical properties of the front-door estimation are quite well understood, its algorithmic aspects remained unexplored for a long time. Recently, Jeong, Tian, and Barenboim [NeurIPS 2022] have presented the first polynomial-time algorithm for finding sets satisfying the front-door criterion in a given directed acyclic graph (DAG), with an $O(n^3(n+m))$ run time, where $n$ denotes the number of variables and $m$ the number of edges of the causal graph. In our work, we give the first linear-time, i.e., $O(n+m)$, algorithm for this task, which thus reaches the asymptotically optimal time complexity. This result implies an $O(n(n+m))$ delay enumeration algorithm of all front-door adjustment sets, again improving previous work by Jeong et al. by a factor of $n^3$. Moreover, we provide the first linear-time algorithm for finding a minimal front-door adjustment set. We offer implementations of our algorithms in multiple programming languages to facilitate practical usage and empirically validate their feasibility, even for large graphs.
Seeking the equivalent entities among multi-source Knowledge Graphs (KGs) is the pivotal step to KGs integration, also known as \emph{entity alignment} (EA). However, most existing EA methods are inefficient and poor in scalability. A recent summary points out that some of them even require several days to deal with a dataset containing 200,000 nodes (DWY100K). We believe over-complex graph encoder and inefficient negative sampling strategy are the two main reasons. In this paper, we propose a novel KG encoder -- Dual Attention Matching Network (Dual-AMN), which not only models both intra-graph and cross-graph information smartly, but also greatly reduces computational complexity. Furthermore, we propose the Normalized Hard Sample Mining Loss to smoothly select hard negative samples with reduced loss shift. The experimental results on widely used public datasets indicate that our method achieves both high accuracy and high efficiency. On DWY100K, the whole running process of our method could be finished in 1,100 seconds, at least 10* faster than previous work. The performances of our method also outperform previous works across all datasets, where Hits@1 and MRR have been improved from 6% to 13%.
Deep learning techniques have received much attention in the area of image denoising. However, there are substantial differences in the various types of deep learning methods dealing with image denoising. Specifically, discriminative learning based on deep learning can ably address the issue of Gaussian noise. Optimization models based on deep learning are effective in estimating the real noise. However, there has thus far been little related research to summarize the different deep learning techniques for image denoising. In this paper, we offer a comparative study of deep techniques in image denoising. We first classify the deep convolutional neural networks (CNNs) for additive white noisy images; the deep CNNs for real noisy images; the deep CNNs for blind denoising and the deep CNNs for hybrid noisy images, which represents the combination of noisy, blurred and low-resolution images. Then, we analyze the motivations and principles of the different types of deep learning methods. Next, we compare the state-of-the-art methods on public denoising datasets in terms of quantitative and qualitative analysis. Finally, we point out some potential challenges and directions of future research.