We consider a simple one-way averaging protocol on graphs. Initially, every node of the graph has a value. A node $u$ is chosen uniformly at random and $u$ samples $k$ neighbours $v_1,v_2,\cdots, v_k \in N(u)$ uniformly at random. Then, $u$ averages its value with $v$ as follows: $\xi_u(t+1) = \alpha \xi_u(t) + \frac{(1-\alpha)}{k} \sum_{i=1}^k \xi_{v_i}(t)$ for some $\alpha \in (0,1)$, where $\xi_u(t)$ is the value of node $u$ at time $t$. Note that, in contrast to neighbourhood value balancing, only $u$ changes its value. Hence, the sum (and the average) of the values of all nodes changes over time. Our results are two-fold. First, we show a bound on the convergence time (the time it takes until all values are roughly the same) that is asymptotically tight for some initial assignments of values to the nodes. Our second set of results concerns the ability of this protocol to approximate well the initial average of all values: we bound the probability that the final outcome is significantly away from the initial average. Interestingly, the variance of the outcome does not depend on the graph structure. The proof introduces an interesting generalisation of the duality between coalescing random walks and the voter model.
We give an improved theoretical analysis of score-based generative modeling. Under a score estimate with small $L^2$ error (averaged across timesteps), we provide efficient convergence guarantees for any data distribution with second-order moment, by either employing early stopping or assuming smoothness condition on the score function of the data distribution. Our result does not rely on any log-concavity or functional inequality assumption and has a logarithmic dependence on the smoothness. In particular, we show that under only a finite second moment condition, approximating the following in reverse KL divergence in $\epsilon$-accuracy can be done in $\tilde O\left(\frac{d \log (1/\delta)}{\epsilon}\right)$ steps: 1) the variance-$\delta$ Gaussian perturbation of any data distribution; 2) data distributions with $1/\delta$-smooth score functions. Our analysis also provides a quantitative comparison between different discrete approximations and may guide the choice of discretization points in practice.
This paper presents a novel mechanism design for multi-item auction settings with uncertain bidders' type distributions. Our proposed approach utilizes nonparametric density estimation to accurately estimate bidders' types from historical bids, and is built upon the Vickrey-Clarke-Groves (VCG) mechanism, ensuring satisfaction of Bayesian incentive compatibility (BIC) and $\delta$-individual rationality (IR). To further enhance the efficiency of our mechanism, we introduce two novel strategies for query reduction: a filtering method that screens potential winners' value regions within the confidence intervals generated by our estimated distribution, and a classification strategy that designates the lower bound of an interval as the estimated type when the length is below a threshold value. Simulation experiments conducted on both small-scale and large-scale data demonstrate that our mechanism consistently outperforms existing methods in terms of revenue maximization and query reduction, particularly in large-scale scenarios. This makes our proposed mechanism a highly desirable and effective option for sellers in the realm of multi-item auctions.
Generalized approximate message passing (GAMP) is a computationally efficient algorithm for estimating an unknown signal \(w_0\in\mathbb{R}^N\) from a random linear measurement \(y= Xw_0 + \epsilon\in\mathbb{R}^M\), where \(X\in\mathbb{R}^{M\times N}\) is a known measurement matrix and \(\epsilon\) is the noise vector. The salient feature of GAMP is that it can provide an unbiased estimator \(\hat{r}^{\rm G}\sim\mathcal{N}(w_0, \hat{s}^2I_N)\), which can be used for various hypothesis-testing methods. In this study, we consider the bootstrap average of an unbiased estimator of GAMP for the elastic net. By numerically analyzing the state evolution of \emph{approximate message passing with resampling}, which has been proposed for computing bootstrap statistics of the elastic net estimator, we investigate when the bootstrap averaging reduces the variance of the unbiased estimator and the effect of optimizing the size of each bootstrap sample and hyperparameter of the elastic net regularization in the asymptotic setting \(M, N\to\infty, M/N\to\alpha\in(0,\infty)\). The results indicate that bootstrap averaging effectively reduces the variance of the unbiased estimator when the actual data generation process is inconsistent with the sparsity assumption of the regularization and the sample size is small. Furthermore, we find that when \(w_0\) is less sparse, and the data size is small, the system undergoes a phase transition. The phase transition indicates the existence of the region where the ensemble average of unbiased estimators of GAMP for the elastic net norm minimization problem yields the unbiased estimator with the minimum variance.
We expect the generalization error to improve with more samples from a similar task, and to deteriorate with more samples from an out-of-distribution (OOD) task. In this work, we show a counter-intuitive phenomenon: the generalization error of a task can be a non-monotonic function of the number of OOD samples. As the number of OOD samples increases, the generalization error on the target task improves before deteriorating beyond a threshold. In other words, there is value in training on small amounts of OOD data. We use Fisher's Linear Discriminant on synthetic datasets and deep networks on computer vision benchmarks such as MNIST, CIFAR-10, CINIC-10, PACS and DomainNet to demonstrate and analyze this phenomenon. In the idealistic setting where we know which samples are OOD, we show that these non-monotonic trends can be exploited using an appropriately weighted objective of the target and OOD empirical risk. While its practical utility is limited, this does suggest that if we can detect OOD samples, then there may be ways to benefit from them. When we do not know which samples are OOD, we show how a number of go-to strategies such as data-augmentation, hyper-parameter optimization, and pre-training are not enough to ensure that the target generalization error does not deteriorate with the number of OOD samples in the dataset.
Gradient Balancing (GraB) is a recently proposed technique that finds provably better data permutations when training models with multiple epochs over a finite dataset. It converges at a faster rate than the widely adopted Random Reshuffling, by minimizing the discrepancy of the gradients on adjacently selected examples. However, GraB only operates under critical assumptions such as small batch sizes and centralized data, leaving open the question of how to order examples at large scale -- i.e. distributed learning with decentralized data. To alleviate the limitation, in this paper we propose D-GraB that involves two novel designs: (1) $\textsf{PairBalance}$ that eliminates the requirement to use stale gradient mean in GraB which critically relies on small learning rates; (2) an ordering protocol that runs $\textsf{PairBalance}$ in a distributed environment with negligible overhead, which benefits from both data ordering and parallelism. We prove D-GraB enjoys linear speed up at rate $\tilde{O}((mnT)^{-2/3})$ on smooth non-convex objectives and $\tilde{O}((mnT)^{-2})$ under PL condition, where $n$ denotes the number of parallel workers, $m$ denotes the number of examples per worker and $T$ denotes the number of epochs. Empirically, we show on various applications including GLUE, CIFAR10 and WikiText-2 that D-GraB outperforms naive parallel GraB and Distributed Random Reshuffling in terms of both training and validation performance.
The Shapley value is arguably the most popular approach for assigning a meaningful contribution value to players in a cooperative game, which has recently been used intensively in various areas of machine learning, most notably in explainable artificial intelligence. The meaningfulness is due to axiomatic properties that only the Shapley value satisfies, which, however, comes at the expense of an exact computation growing exponentially with the number of agents. Accordingly, a number of works are devoted to the efficient approximation of the Shapley values, all of which revolve around the notion of an agent's marginal contribution. In this paper, we propose with SVARM and Stratified SVARM two parameter-free and domain-independent approximation algorithms based on a representation of the Shapley value detached from the notion of marginal contributions. We prove unmatched theoretical guarantees regarding their approximation quality and provide satisfying empirical results.
In the Colored Clustering problem, one is asked to cluster edge-colored (hyper-)graphs whose colors represent interaction types. More specifically, the goal is to select as many edges as possible without choosing two edges that share an endpoint and are colored differently. Equivalently, the goal can also be described as assigning colors to the vertices in a way that fits the edge-coloring as well as possible. As this problem is NP-hard, we build on previous work by studying its parameterized complexity. We give a $2^{\mathcal O(k)} \cdot n^{\mathcal O(1)}$-time algorithm where $k$ is the number of edges to be selected and $n$ the number of vertices. We also prove the existence of a problem kernel of size $\mathcal O(k^{5/2} )$, resolving an open problem posed in the literature. We consider parameters that are smaller than $k$, the number of edges to be selected, and $r$, the number of edges that can be deleted. Such smaller parameters are obtained by considering the difference between $k$ or $r$ and some lower bound on these values. We give both algorithms and lower bounds for Colored Clustering with such parameterizations. Finally, we settle the parameterized complexity of Colored Clustering with respect to structural graph parameters by showing that it is $W[1]$-hard with respect to both vertex cover number and tree-cut width, but fixed-parameter tractable with respect to slim tree-cut width.
Probabilistic sensitivity analysis identifies the influential uncertain input to guide decision-making. We propose a general sensitivity framework with respect to the input distribution parameters that unifies a wide range of sensitivity measures, including information theoretical metrics such as the Fisher information. The framework is derived analytically via a constrained maximization and the sensitivity analysis is reformulated into an eigenvalue problem. There are only two main steps to implement the sensitivity framework utilising the likelihood ratio/score function method, a Monte Carlo type sampling followed by solving an eigenvalue equation. The resulting eigenvectors then provide the directions for simultaneous variations of the input parameters and guide the focus to perturb uncertainty the most. Not only is it conceptually simple, but numerical examples demonstrate that the proposed framework also provides new sensitivity insights, such as the combined sensitivity of multiple correlated uncertainty metrics, robust sensitivity analysis with an entropic constraint, and approximation of deterministic sensitivities. Three different examples, ranging from a simple cantilever beam to an offshore marine riser, are used to demonstrate the potential applications of the proposed sensitivity framework to applied mechanics problems.
The linear combination of Student's $t$ random variables (RVs) appears in many statistical applications. Unfortunately, the Student's $t$ distribution is not closed under convolution, thus, deriving an exact and general distribution for the linear combination of $K$ Student's $t$ RVs is infeasible, which motivates a fitting/approximation approach. Here, we focus on the scenario where the only constraint is that the number of degrees of freedom of each $t-$RV is greater than two. Notice that since the odd moments/cumulants of the Student's $t$ distribution are zero, and the even moments/cumulants do not exist when their order is greater than the number of degrees of freedom, it becomes impossible to use conventional approaches based on moments/cumulants of order one or higher than two. To circumvent this issue, herein we propose fitting such a distribution to that of a scaled Student's $t$ RV by exploiting the second moment together with either the first absolute moment or the characteristic function (CF). For the fitting based on the absolute moment, we depart from the case of the linear combination of $K= 2$ Student's $t$ RVs and then generalize to $K\ge 2$ through a simple iterative procedure. Meanwhile, the CF-based fitting is direct, but its accuracy (measured in terms of the Bhattacharyya distance metric) depends on the CF parameter configuration, for which we propose a simple but accurate approach. We numerically show that the CF-based fitting usually outperforms the absolute moment -based fitting and that both the scale and number of degrees of freedom of the fitting distribution increase almost linearly with $K$.
Gaussian elimination with partial pivoting (GEPP) remains the most common method to solve dense linear systems. Each GEPP step uses a row transposition pivot movement if needed to ensure the leading pivot entry is maximal in magnitude for the leading column of the remaining untriangularized subsystem. We will use theoretical and numerical approaches to study how often this pivot movement is needed. We provide full distributional descriptions for the number of pivot movements needed using GEPP using particular Haar random ensembles, as well as compare these models to other common transformations from randomized numerical linear algebra. Additionally, we introduce new random ensembles with fixed pivot movement counts and fixed sparsity, $\alpha$. Experiments estimating the empirical spectral density (ESD) of these random ensembles leads to a new conjecture on a universality class of random matrices with fixed sparsity whose scaled ESD converges to a measure on the complex unit disk that depends on $\alpha$ and is an interpolation of the uniform measure on the unit disk and the Dirac measure at the origin.