Consider the problem of estimating a random variable $X$ from noisy observations $Y = X+ Z$, where $Z$ is standard normal, under the $L^1$ fidelity criterion. It is well known that the optimal Bayesian estimator in this setting is the conditional median. This work shows that the only prior distribution on $X$ that induces linearity in the conditional median is Gaussian. Along the way, several other results are presented. In particular, it is demonstrated that if the conditional distribution $P_{X|Y=y}$ is symmetric for all $y$, then $X$ must follow a Gaussian distribution. Additionally, we consider other $L^p$ losses and observe the following phenomenon: for $p \in [1,2]$, Gaussian is the only prior distribution that induces a linear optimal Bayesian estimator, and for $p \in (2,\infty)$, infinitely many prior distributions on $X$ can induce linearity. Finally, extensions are provided to encompass noise models leading to conditional distributions from certain exponential families.
Differentially private (DP) machine learning algorithms incur many sources of randomness, such as random initialization, random batch subsampling, and shuffling. However, such randomness is difficult to take into account when proving differential privacy bounds because it induces mixture distributions for the algorithm's output that are difficult to analyze. This paper focuses on improving privacy bounds for shuffling models and one-iteration differentially private gradient descent (DP-GD) with random initializations using $f$-DP. We derive a closed-form expression of the trade-off function for shuffling models that outperforms the most up-to-date results based on $(\epsilon,\delta)$-DP. Moreover, we investigate the effects of random initialization on the privacy of one-iteration DP-GD. Our numerical computations of the trade-off function indicate that random initialization can enhance the privacy of DP-GD. Our analysis of $f$-DP guarantees for these mixture mechanisms relies on an inequality for trade-off functions introduced in this paper. This inequality implies the joint convexity of $F$-divergences. Finally, we study an $f$-DP analog of the advanced joint convexity of the hockey-stick divergence related to $(\epsilon,\delta)$-DP and apply it to analyze the privacy of mixture mechanisms.
We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting, where each worker has its own computation and communication speeds, as well as data distribution. In these algorithms, workers compute possibly stale and stochastic gradients associated with their local data at some iteration back in history and then return those gradients to the server without synchronizing with other workers. We present a unified convergence theory for non-convex smooth functions in the heterogeneous regime. The proposed analysis provides convergence for pure asynchronous SGD and its various modifications. Moreover, our theory explains what affects the convergence rate and what can be done to improve the performance of asynchronous algorithms. In particular, we introduce a novel asynchronous method based on worker shuffling. As a by-product of our analysis, we also demonstrate convergence guarantees for gradient-type algorithms such as SGD with random reshuffling and shuffle-once mini-batch SGD. The derived rates match the best-known results for those algorithms, highlighting the tightness of our approach. Finally, our numerical evaluations support theoretical findings and show the good practical performance of our method.
A \emph{geometric graph} is a graph whose vertex set is a set of points in general position in the plane, and its edges are straight line segments joining these points. We show that for every integer $k \ge 2$, there exists a constat $c>0$ such that the following holds. The edges of every dense geometric graph can be colored with $k$ colors, such that the number of pairs of edges of the same color that cross is at most $(1/k-c)$ times the total number of pairs of edges that cross. The case when $k=2$ and $G$ is a complete geometric graph, was proved by Aichholzer et al.[\emph{GD} 2019].
We introduce a new class of algorithms, Stochastic Generalized Method of Moments (SGMM), for estimation and inference on (overidentified) moment restriction models. Our SGMM is a novel stochastic approximation alternative to the popular Hansen (1982) (offline) GMM, and offers fast and scalable implementation with the ability to handle streaming datasets in real time. We establish the almost sure convergence, and the (functional) central limit theorem for the inefficient online 2SLS and the efficient SGMM. Moreover, we propose online versions of the Durbin-Wu-Hausman and Sargan-Hansen tests that can be seamlessly integrated within the SGMM framework. Extensive Monte Carlo simulations show that as the sample size increases, the SGMM matches the standard (offline) GMM in terms of estimation accuracy and gains over computational efficiency, indicating its practical value for both large-scale and online datasets. We demonstrate the efficacy of our approach by a proof of concept using two well known empirical examples with large sample sizes.
Kuiper's $V_n$ statistic, a measure for comparing the difference of ideal distribution and empirical distribution, is of great significance in the goodness-of-fit test. However, Kuiper's formulae for computing the cumulative distribution function, false positive probability and the upper tail quantile of $V_n$can not be applied to the case of small sample capacity $n$ since the approximation error is $\mathcal{O}(n^{-1})$. In this work, our contributions lie in three perspectives: firstly the approximation error is reduced to $\mathcal{O}(n^{-(k+1)/2})$ where $k$ is the expansion order with the \textit{high order expansion} (HOE) for the exponent of differential operator; secondly, a novel high order formula with approximation error $\mathcal{O}(n^{-3})$ is obtained by massive calculations; thirdly, the fixed-point algorithms are designed for solving the Kuiper pair of critical values and upper tail quantiles based on the novel formula. The high order expansion method for Kuiper's $V_n$-statistic is applicable for various applications where there are more than $5$ samples of data. The principles, algorithms and code for the high order expansion method are attractive for the goodness-of-fit test.
The Weisfeiler-Leman (WL) dimension of a graph parameter $f$ is the minimum $k$ such that, if $G_1$ and $G_2$ are indistinguishable by the $k$-dimensional WL-algorithm then $f(G_1)=f(G_2)$. The WL-dimension of $f$ is $\infty$ if no such $k$ exists. We study the WL-dimension of graph parameters characterised by the number of answers from a fixed conjunctive query to the graph. Given a conjunctive query $\varphi$, we quantify the WL-dimension of the function that maps every graph $G$ to the number of answers of $\varphi$ in $G$. The works of Dvor\'ak (J. Graph Theory 2010), Dell, Grohe, and Rattan (ICALP 2018), and Neuen (ArXiv 2023) have answered this question for full conjunctive queries, which are conjunctive queries without existentially quantified variables. For such queries $\varphi$, the WL-dimension is equal to the treewidth of the Gaifman graph of $\varphi$. In this work, we give a characterisation that applies to all conjunctive qureies. Given any conjunctive query $\varphi$, we prove that its WL-dimension is equal to the semantic extension width $\mathsf{sew}(\varphi)$, a novel width measure that can be thought of as a combination of the treewidth of $\varphi$ and its quantified star size, an invariant introduced by Durand and Mengel (ICDT 2013) describing how the existentially quantified variables of $\varphi$ are connected with the free variables. Using the recently established equivalence between the WL-algorithm and higher-order Graph Neural Networks (GNNs) due to Morris et al. (AAAI 2019), we obtain as a consequence that the function counting answers to a conjunctive query $\varphi$ cannot be computed by GNNs of order smaller than $\mathsf{sew}(\varphi)$.
We present a structure-preserving Eulerian algorithm for solving $L^2$-gradient flows and a structure-preserving Lagrangian algorithm for solving generalized diffusions. Both algorithms employ neural networks as tools for spatial discretization. Unlike most existing methods that construct numerical discretizations based on the strong or weak form of the underlying PDE, the proposed schemes are constructed based on the energy-dissipation law directly. This guarantees the monotonic decay of the system's energy, which avoids unphysical states of solutions and is crucial for the long-term stability of numerical computations. To address challenges arising from nonlinear neural-network discretization, we first perform temporal discretization on these variational systems. This approach is computationally memory-efficient when implementing neural network-based algorithms. The proposed neural-network-based schemes are mesh-free, allowing us to solve gradient flows in high dimensions. Various numerical experiments are presented to demonstrate the accuracy and energy stability of the proposed numerical schemes.
We present an approach to the verification of systems for whose description some elements - constants or functions - are underspecified and can be regarded as parameters, and, in particular, describe a method for automatically generating constraints on such parameters under which certain safety conditions are guaranteed to hold. We present an implementation and illustrate its use on several examples.
Distributional reinforcement learning algorithms have attempted to utilize estimated uncertainty for exploration, such as optimism in the face of uncertainty. However, using the estimated variance for optimistic exploration may cause biased data collection and hinder convergence or performance. In this paper, we present a novel distributional reinforcement learning algorithm that selects actions by randomizing risk criterion to avoid one-sided tendency on risk. We provide a perturbed distributional Bellman optimality operator by distorting the risk measure and prove the convergence and optimality of the proposed method with the weaker contraction property. Our theoretical results support that the proposed method does not fall into biased exploration and is guaranteed to converge to an optimal return. Finally, we empirically show that our method outperforms other existing distribution-based algorithms in various environments including Atari 55 games.
The minimum linear ordering problem (MLOP) generalizes well-known combinatorial optimization problems such as minimum linear arrangement and minimum sum set cover. MLOP seeks to minimize an aggregated cost $f(\cdot)$ due to an ordering $\sigma$ of the items (say $[n]$), i.e., $\min_{\sigma} \sum_{i\in [n]} f(E_{i,\sigma})$, where $E_{i,\sigma}$ is the set of items mapped by $\sigma$ to indices $[i]$. Despite an extensive literature on MLOP variants and approximations for these, it was unclear whether the graphic matroid MLOP was NP-hard. We settle this question through non-trivial reductions from mininimum latency vertex cover and minimum sum vertex cover problems. We further propose a new combinatorial algorithm for approximating monotone submodular MLOP, using the theory of principal partitions. This is in contrast to the rounding algorithm by Iwata, Tetali, and Tripathi [ITT2012], using Lov\'asz extension of submodular functions. We show a $(2-\frac{1+\ell_{f}}{1+|E|})$-approximation for monotone submodular MLOP where $\ell_{f}=\frac{f(E)}{\max_{x\in E}f(\{x\})}$ satisfies $1 \leq \ell_f \leq |E|$. Our theory provides new approximation bounds for special cases of the problem, in particular a $(2-\frac{1+r(E)}{1+|E|})$-approximation for the matroid MLOP, where $f = r$ is the rank function of a matroid. We further show that minimum latency vertex cover (MLVC) is $\frac{4}{3}$-approximable, by which we also lower bound the integrality gap of its natural LP relaxation, which might be of independent interest.