We study the fine-grained complexity of graph connectivity problems in unweighted undirected graphs. Recent development shows that all variants of edge connectivity problems, including single-source-single-sink, global, Steiner, single-source, and all-pairs connectivity, are solvable in $m^{1+o(1)}$ time, collapsing the complexity of these problems into the almost-linear-time regime. While, historically, vertex connectivity has been much harder, the recent results showed that both single-source-single-sink and global vertex connectivity can be solved in $m^{1+o(1)}$ time, raising the hope of putting all variants of vertex connectivity problems into the almost-linear-time regime too. We show that this hope is impossible, assuming conjectures on finding 4-cliques. Moreover, we essentially settle the complexity landscape by giving tight bounds for combinatorial algorithms in dense graphs. There are three separate regimes: (1) all-pairs and Steiner vertex connectivity have complexity $\hat{\Theta}(n^{4})$, (2) single-source vertex connectivity has complexity $\hat{\Theta}(n^{3})$, and (3) single-source-single-sink and global vertex connectivity have complexity $\hat{\Theta}(n^{2})$. For graphs with general density, we obtain tight bounds of $\hat{\Theta}(m^{2})$, $\hat{\Theta}(m^{1.5})$, $\hat{\Theta}(m)$, respectively, assuming Gomory-Hu trees for element connectivity can be computed in almost-linear time.
In this note, we prove that the following function space with absolutely convergent Fourier series \[ F_d:=\left\{ f\in L^2([0,1)^d)\:\middle| \: \|f\|:=\sum_{\boldsymbol{k}\in \mathbb{Z}^d}|\hat{f}(\boldsymbol{k})| \max\left(1,\min_{j\in \mathrm{supp}(\boldsymbol{k})}\log |k_j|\right) <\infty \right\}\] with $\hat{f}(\boldsymbol{k})$ being the $\boldsymbol{k}$-th Fourier coefficient of $f$ and $\mathrm{supp}(\boldsymbol{k}):=\{j\in \{1,\ldots,d\}\mid k_j\neq 0\}$ is polynomially tractable for multivariate integration in the worst-case setting. Here polynomial tractability means that the minimum number of function evaluations required to make the worst-case error less than or equal to a tolerance $\varepsilon$ grows only polynomially with respect to $\varepsilon^{-1}$ and $d$. It is important to remark that the function space $F_d$ is unweighted, that is, all variables contribute equally to the norm of functions. Our tractability result is in contrast to those for most of the unweighted integration problems studied in the literature, in which polynomial tractability does not hold and the problem suffers from the curse of dimensionality. Our proof is constructive in the sense that we provide an explicit quasi-Monte Carlo rule that attains a desired worst-case error bound.
With continuous outcomes, the average causal effect is typically defined using a contrast of expected potential outcomes. However, in the presence of skewed outcome data, the expectation may no longer be meaningful. In practice the typical approach is to either "ignore or transform" - ignore the skewness altogether or transform the outcome to obtain a more symmetric distribution, although neither approach is entirely satisfactory. Alternatively the causal effect can be redefined as a contrast of median potential outcomes, yet discussion of confounding-adjustment methods to estimate this parameter is limited. In this study we described and compared confounding-adjustment methods to address this gap. The methods considered were multivariable quantile regression, an inverse probability weighted (IPW) estimator, weighted quantile regression and two little-known implementations of g-computation for this problem. Motivated by a cohort investigation in the Longitudinal Study of Australian Children, we conducted a simulation study that found the IPW estimator, weighted quantile regression and g-computation implementations minimised bias when the relevant models were correctly specified, with g-computation additionally minimising the variance. These methods provide appealing alternatives to the common "ignore or transform" approach and multivariable quantile regression, enhancing our capability to obtain meaningful causal effect estimates with skewed outcome data.
In recent years, recommender systems have advanced rapidly, where embedding learning for users and items plays a critical role. A standard method learns a unique embedding vector for each user and item. However, such a method has two important limitations in real-world applications: 1) it is hard to learn embeddings that generalize well for users and items with rare interactions on their own; and 2) it may incur unbearably high memory costs when the number of users and items scales up. Existing approaches either can only address one of the limitations or have flawed overall performances. In this paper, we propose Clustered Embedding Learning (CEL) as an integrated solution to these two problems. CEL is a plug-and-play embedding learning framework that can be combined with any differentiable feature interaction model. It is capable of achieving improved performance, especially for cold users and items, with reduced memory cost. CEL enables automatic and dynamic clustering of users and items in a top-down fashion, where clustered entities jointly learn a shared embedding. The accelerated version of CEL has an optimal time complexity, which supports efficient online updates. Theoretically, we prove the identifiability and the existence of a unique optimal number of clusters for CEL in the context of nonnegative matrix factorization. Empirically, we validate the effectiveness of CEL on three public datasets and one business dataset, showing its consistently superior performance against current state-of-the-art methods. In particular, when incorporating CEL into the business model, it brings an improvement of $+0.6\%$ in AUC, which translates into a significant revenue gain; meanwhile, the size of the embedding table gets $2650$ times smaller.
This paper studies the fundamental limits of reinforcement learning (RL) in the challenging \emph{partially observable} setting. While it is well-established that learning in Partially Observable Markov Decision Processes (POMDPs) requires exponentially many samples in the worst case, a surge of recent work shows that polynomial sample complexities are achievable under the \emph{revealing condition} -- A natural condition that requires the observables to reveal some information about the unobserved latent states. However, the fundamental limits for learning in revealing POMDPs are much less understood, with existing lower bounds being rather preliminary and having substantial gaps from the current best upper bounds. We establish strong PAC and regret lower bounds for learning in revealing POMDPs. Our lower bounds scale polynomially in all relevant problem parameters in a multiplicative fashion, and achieve significantly smaller gaps against the current best upper bounds, providing a solid starting point for future studies. In particular, for \emph{multi-step} revealing POMDPs, we show that (1) the latent state-space dependence is at least $\Omega(S^{1.5})$ in the PAC sample complexity, which is notably harder than the $\widetilde{\Theta}(S)$ scaling for fully-observable MDPs; (2) Any polynomial sublinear regret is at least $\Omega(T^{2/3})$, suggesting its fundamental difference from the \emph{single-step} case where $\widetilde{O}(\sqrt{T})$ regret is achievable. Technically, our hard instance construction adapts techniques in \emph{distribution testing}, which is new to the RL literature and may be of independent interest.
In this paper, we introduce a new simple approach to developing and establishing the convergence of splitting methods for a large class of stochastic differential equations (SDEs), including additive, diagonal and scalar noise types. The central idea is to view the splitting method as a replacement of the driving signal of an SDE, namely Brownian motion and time, with a piecewise linear path that yields a sequence of ODEs $-$ which can be discretised to produce a numerical scheme. This new way of understanding splitting methods is inspired by, but does not use, rough path theory. We show that when the driving piecewise linear path matches certain iterated stochastic integrals of Brownian motion, then a high order splitting method can be obtained. We propose a general proof methodology for establishing the strong convergence of these approximations that is akin to the general framework of Milstein and Tretyakov. That is, once local error estimates are obtained for the splitting method, then a global rate of convergence follows. This approach can then be readily applied in future research on SDE splitting methods. By incorporating recently developed approximations for iterated integrals of Brownian motion into these piecewise linear paths, we propose several high order splitting methods for SDEs satisfying a certain commutativity condition. In our experiments, which include the Cox-Ingersoll-Ross model and additive noise SDEs (noisy anharmonic oscillator, stochastic FitzHugh-Nagumo model, underdamped Langevin dynamics), the new splitting methods exhibit convergence rates of $O(h^{3/2})$ and outperform schemes previously proposed in the literature.
During recent years the interest of optimization and machine learning communities in high-probability convergence of stochastic optimization methods has been growing. One of the main reasons for this is that high-probability complexity bounds are more accurate and less studied than in-expectation ones. However, SOTA high-probability non-asymptotic convergence results are derived under strong assumptions such as the boundedness of the gradient noise variance or of the objective's gradient itself. In this paper, we propose several algorithms with high-probability convergence results under less restrictive assumptions. In particular, we derive new high-probability convergence results under the assumption that the gradient/operator noise has bounded central $\alpha$-th moment for $\alpha \in (1,2]$ in the following setups: (i) smooth non-convex / Polyak-Lojasiewicz / convex / strongly convex / quasi-strongly convex minimization problems, (ii) Lipschitz / star-cocoercive and monotone / quasi-strongly monotone variational inequalities. These results justify the usage of the considered methods for solving problems that do not fit standard functional classes studied in stochastic optimization.
We give an improved theoretical analysis of score-based generative modeling. Under a score estimate with small $L^2$ error (averaged across timesteps), we provide efficient convergence guarantees for any data distribution with second-order moment, by either employing early stopping or assuming smoothness condition on the score function of the data distribution. Our result does not rely on any log-concavity or functional inequality assumption and has a logarithmic dependence on the smoothness. In particular, we show that under only a finite second moment condition, approximating the following in reverse KL divergence in $\epsilon$-accuracy can be done in $\tilde O\left(\frac{d \log (1/\delta)}{\epsilon}\right)$ steps: 1) the variance-$\delta$ Gaussian perturbation of any data distribution; 2) data distributions with $1/\delta$-smooth score functions. Our analysis also provides a quantitative comparison between different discrete approximations and may guide the choice of discretization points in practice.
The Lov\'{a}sz Local Lemma (LLL) is a keystone principle in probability theory, guaranteeing the existence of configurations which avoid a collection $\mathcal B$ of "bad" events which are mostly independent and have low probability. In its simplest "symmetric" form, it asserts that whenever a bad-event has probability $p$ and affects at most $d$ bad-events, and $e p d < 1$, then a configuration avoiding all $\mathcal B$ exists. A seminal algorithm of Moser & Tardos (2010) gives nearly-automatic randomized algorithms for most constructions based on the LLL. However, deterministic algorithms have lagged behind. We address three specific shortcomings of the prior deterministic algorithms. First, our algorithm applies to the LLL criterion of Shearer (1985); this is more powerful than alternate LLL criteria and also removes a number of nuisance parameters and leads to cleaner and more legible bounds. Second, we provide parallel algorithms with much greater flexibility in the functional form of of the bad-events. Third, we provide a derandomized version of the MT-distribution, that is, the distribution of the variables at the termination of the MT algorithm. We show applications to non-repetitive vertex coloring, independent transversals, strong coloring, and other problems. These give deterministic algorithms which essentially match the best previous randomized sequential and parallel algorithms.
Safety in the automotive domain is a well-known topic, which has been in constant development in the past years. The complexity of new systems that add more advanced components in each function has opened new trends that have to be covered from the safety perspective. In this case, not only specifications and requirements have to be covered but also scenarios, which cover all relevant information of the vehicle environment. Many of them are not yet still sufficient defined or considered. In this context, Safety of the Intended Functionality (SOTIF) appears to ensure the system when it might fail because of technological shortcomings or misuses by users. An identification of the plausibly insufficiencies of ADAS/ADS functions has to be done to discover the potential triggering conditions that can lead to these unknown scenarios, which might effect a hazardous behaviour. The main goal of this publication is the definition of an use case to identify these triggering conditions that have been applied to the collision avoidance function implemented in our self-developed mobile Hardware-in-Loop (HiL) platform.
Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.