In this paper, a general framework for linear secure distributed matrix multiplication (SDMM) is introduced. The model allows for a neat treatment of straggling and Byzantine servers via a star product interpretation as well as simplified security proofs. Known properties of star products also immediately yield a lower bound for the recovery threshold as well as an upper bound for the number of colluding workers the system can tolerate. Another bound on the recovery threshold is given by the decodability condition, which generalizes a bound for GASP codes. The framework produces many of the known SDMM schemes as special cases, thereby providing unification for the previous literature on the topic. Furthermore, error behavior specific to SDMM is discussed and interleaved codes are proposed as a suitable means for efficient error correction in the proposed model. Analysis of the error correction capability under natural assumptions about the error distribution is also provided, largely based on well-known results on interleaved codes. Error detection and other error distributions are also discussed.
Sampling from Gibbs distributions $p(x) \propto \exp(-V(x)/\varepsilon)$ and computing their log-partition function are fundamental tasks in statistics, machine learning, and statistical physics. However, while efficient algorithms are known for convex potentials $V$, the situation is much more difficult in the non-convex case, where algorithms necessarily suffer from the curse of dimensionality in the worst case. For optimization, which can be seen as a low-temperature limit of sampling, it is known that smooth functions $V$ allow faster convergence rates. Specifically, for $m$-times differentiable functions in $d$ dimensions, the optimal rate for algorithms with $n$ function evaluations is known to be $O(n^{-m/d})$, where the constant can potentially depend on $m, d$ and the function to be optimized. Hence, the curse of dimensionality can be alleviated for smooth functions at least in terms of the convergence rate. Recently, it has been shown that similarly fast rates can also be achieved with polynomial runtime $O(n^{3.5})$, where the exponent $3.5$ is independent of $m$ or $d$. Hence, it is natural to ask whether similar rates for sampling and log-partition computation are possible, and whether they can be realized in polynomial time with an exponent independent of $m$ and $d$. We show that the optimal rates for sampling and log-partition computation are sometimes equal and sometimes faster than for optimization. We then analyze various polynomial-time sampling algorithms, including an extension of a recent promising optimization approach, and find that they sometimes exhibit interesting behavior but no near-optimal rates. Our results also give further insights on the relation between sampling, log-partition, and optimization problems.
Fast matrix multiplication is one of the most fundamental problems in algorithm research. The exponent of the optimal time complexity of matrix multiplication is usually denoted by $\omega$. This paper discusses new ideas for improving the laser method for fast matrix multiplication. We observe that the analysis of higher powers of the Coppersmith-Winograd tensor [Coppersmith & Winograd 1990] incurs a "combination loss", and we partially compensate for it using an asymmetric version of CW's hashing method. By analyzing the eighth power of the CW tensor, we give a new bound of $\omega<2.371866$, which improves the previous best bound of $\omega<2.372860$ [Alman & Vassilevska Williams 2020]. Our result breaks the lower bound of $2.3725$ in [Ambainis, Filmus & Le Gall 2015] because of the new method for analyzing component (constituent) tensors.
We consider transporting a heavy payload that is attached to multiple quadrotors. The current state-of-the-art controllers either do not avoid inter-robot collision at all, leading to crashes when tasked with carrying payloads that are small in size compared to the cable lengths, or use computational demanding nonlinear optimization. We propose an extension to an existing efficient geometric payload transport controller to effectively avoid such collisions by designing an optimized cable force allocation method, and thus retaining the original stability properties. Our approach introduces a cascade of carefully designed quadratic programs that can be solved efficiently on highly constrained embedded flight controllers. We demonstrate our method on challenging scenarios with up to three small quadrotors with various payloads and cable lengths, with our controller running in real-time directly on the robots.
We consider the classic 1-center problem: Given a set $P$ of $n$ points in a metric space find the point in $P$ that minimizes the maximum distance to the other points of $P$. We study the complexity of this problem in $d$-dimensional $\ell_p$-metrics and in edit and Ulam metrics over strings of length $d$. Our results for the 1-center problem may be classified based on $d$ as follows. $\bullet$ Small $d$: Assuming the hitting set conjecture (HSC), we show that when $d=\omega(\log n)$, no subquadratic algorithm can solve 1-center problem in any of the $\ell_p$-metrics, or in edit or Ulam metrics. $\bullet$ Large $d$: When $d=\Omega(n)$, we extend our conditional lower bound to rule out subquartic algorithms for 1-center problem in edit metric (assuming Quantified SETH). On the other hand, we give a $(1+\epsilon)$-approximation for 1-center in Ulam metric with running time $\tilde{O_{\varepsilon}}(nd+n^2\sqrt{d})$. We also strengthen some of the above lower bounds by allowing approximations or by reducing the dimension $d$, but only against a weaker class of algorithms which list all requisite solutions. Moreover, we extend one of our hardness results to rule out subquartic algorithms for the well-studied 1-median problem in the edit metric, where given a set of $n$ strings each of length $n$, the goal is to find a string in the set that minimizes the sum of the edit distances to the rest of the strings in the set.
We analyze the bit complexity of efficient algorithms for fundamental optimization problems, such as linear regression, $p$-norm regression, and linear programming (LP). State-of-the-art algorithms are iterative, and in terms of the number of arithmetic operations, they match the current time complexity of multiplying two $n$-by-$n$ matrices (up to polylogarithmic factors). However, previous work has typically assumed infinite precision arithmetic, and due to complicated inverse maintenance techniques, the actual running times of these algorithms are unknown. To settle the running time and bit complexity of these algorithms, we demonstrate that a core common subroutine, known as \emph{inverse maintenance}, is backward-stable. Additionally, we show that iterative approaches for solving constrained weighted regression problems can be accomplished with bounded-error pre-conditioners. Specifically, we prove that linear programs can be solved approximately in matrix multiplication time multiplied by polylog factors that depend on the condition number $\kappa$ of the matrix and the inner and outer radius of the LP problem. $p$-norm regression can be solved approximately in matrix multiplication time multiplied by polylog factors in $\kappa$. Lastly, linear regression can be solved approximately in input-sparsity time multiplied by polylog factors in $\kappa$. Furthermore, we present results for achieving lower than matrix multiplication time for $p$-norm regression by utilizing faster solvers for sparse linear systems.
Probabilities of Causation play a fundamental role in decision making in law, health care and public policy. Nevertheless, their point identification is challenging, requiring strong assumptions such as monotonicity. In the absence of such assumptions, existing work requires multiple observations of datasets that contain the same treatment and outcome variables, in order to establish bounds on these probabilities. However, in many clinical trials and public policy evaluation cases, there exist independent datasets that examine the effect of a different treatment each on the same outcome variable. Here, we outline how to significantly tighten existing bounds on the probabilities of causation, by imposing counterfactual consistency between SCMs constructed from such independent datasets ('causal marginal problem'). Next, we describe a new information theoretic approach on falsification of counterfactual probabilities, using conditional mutual information to quantify counterfactual influence. The latter generalises to arbitrary discrete variables and number of treatments, and renders the causal marginal problem more interpretable. Since the question of 'tight enough' is left to the user, we provide an additional method of inference when the bounds are unsatisfactory: A maximum entropy based method that defines a metric for the space of plausible SCMs and proposes the entropy maximising SCM for inferring counterfactuals in the absence of more information.
We introduce a generalized additive model for location, scale, and shape (GAMLSS) next of kin aiming at distribution-free and parsimonious regression modelling for arbitrary outcomes. We replace the strict parametric distribution formulating such a model by a transformation function, which in turn is estimated from data. Doing so not only makes the model distribution-free but also allows to limit the number of linear or smooth model terms to a pair of location-scale predictor functions. We derive the likelihood for continuous, discrete, and randomly censored observations, along with corresponding score functions. A plethora of existing algorithms is leveraged for model estimation, including constrained maximum-likelihood, the original GAMLSS algorithm, and transformation trees. Parameter interpretability in the resulting models is closely connected to model selection. We propose the application of a novel best subset selection procedure to achieve especially simple ways of interpretation. All techniques are motivated and illustrated by a collection of applications from different domains, including crossing and partial proportional hazards, complex count regression, non-linear ordinal regression, and growth curves. All analyses are reproducible with the help of the "tram" add-on package to the R system for statistical computing and graphics.
Distribution testing is a fundamental statistical task with many applications, but we are interested in a variety of problems where systematic mislabelings of the sample prevent us from applying the existing theory. To apply distribution testing to these problems, we introduce distribution testing under the parity trace, where the algorithm receives an ordered sample $S$ that reveals only the least significant bit of each element. This abstraction reveals connections between the following three problems of interest, allowing new upper and lower bounds: 1. In distribution testing with a confused collector, the collector of the sample may be incapable of distinguishing between nearby elements of a domain (e.g. a machine learning classifier). We prove bounds for distribution testing with a confused collector on domains structured as a cycle or a path. 2. Recent work on the fundamental testing vs. learning question established tight lower bounds on distribution-free sample-based property testing by reduction from distribution testing, but the tightness is limited to symmetric properties. The parity trace allows a broader family of equivalences to non-symmetric properties, while recovering and strengthening many of the previous results with a different technique. 3. We give the first results for property testing in the well-studied trace reconstruction model, where the goal is to test whether an unknown string $x$ satisfies some property or is far from satisfying that property, given only independent random traces of $x$. Our main technical result is a tight bound of $\widetilde \Theta\left((n/\epsilon)^{4/5} + \sqrt n/\epsilon^2\right)$ for testing uniformity of distributions over $[n]$ under the parity trace, leading also to results for the problems above.
This study addresses a class of linear mixed-integer programming (MIP) problems that involve uncertainty in the objective function coefficients. The coefficients are assumed to form a random vector, which probability distribution can only be observed through a finite training data set. Unlike most of the related studies in the literature, we also consider uncertainty in the underlying data set. The data uncertainty is described by a set of linear constraints for each random sample, and the uncertainty in the distribution (for a fixed realization of data) is defined using a type-1 Wasserstein ball centered at the empirical distribution of the data. The overall problem is formulated as a three-level distributionally robust optimization (DRO) problem. We prove that for a class of bi-affine loss functions the three-level problem admits a linear MIP reformulation. Furthermore, it turns out that in several important particular cases the three-level problem can be solved reasonably fast by leveraging the nominal MIP problem. Finally, we conduct a computational study, where the out-of-sample performance of our model and computational complexity of the proposed MIP reformulation are explored numerically for several application domains.
As an intrinsic and fundamental property of big data, data heterogeneity exists in a variety of real-world applications, such as precision medicine, autonomous driving, financial applications, etc. For machine learning algorithms, the ignorance of data heterogeneity will greatly hurt the generalization performance and the algorithmic fairness, since the prediction mechanisms among different sub-populations are likely to differ from each other. In this work, we focus on the data heterogeneity that affects the prediction of machine learning models, and firstly propose the \emph{usable predictive heterogeneity}, which takes into account the model capacity and computational constraints. We prove that it can be reliably estimated from finite data with probably approximately correct (PAC) bounds. Additionally, we design a bi-level optimization algorithm to explore the usable predictive heterogeneity from data. Empirically, the explored heterogeneity provides insights for sub-population divisions in income prediction, crop yield prediction and image classification tasks, and leveraging such heterogeneity benefits the out-of-distribution generalization performance.