In the single winner determination problem, we have n voters and m candidates and each voter j incurs a cost c(i, j) if candidate i is chosen. Our objective is to choose a candidate that minimizes the expected total cost incurred by the voters; however as we only have access to the agents' preference rankings over the outcomes, a loss of efficiency is inevitable. This loss of efficiency is quantified by distortion. We give an instance of the metric single winner determination problem for which any randomized social choice function has distortion at least 2.063164. This disproves the long-standing conjecture that there exists a randomized social choice function that has a worst-case distortion of at most 2.
We study the off-policy evaluation (OPE) problem in an infinite-horizon Markov decision process with continuous states and actions. We recast the $Q$-function estimation into a special form of the nonparametric instrumental variables (NPIV) estimation problem. We first show that under one mild condition the NPIV formulation of $Q$-function estimation is well-posed in the sense of $L^2$-measure of ill-posedness with respect to the data generating distribution, bypassing a strong assumption on the discount factor $\gamma$ imposed in the recent literature for obtaining the $L^2$ convergence rates of various $Q$-function estimators. Thanks to this new well-posed property, we derive the first minimax lower bounds for the convergence rates of nonparametric estimation of $Q$-function and its derivatives in both sup-norm and $L^2$-norm, which are shown to be the same as those for the classical nonparametric regression (Stone, 1982). We then propose a sieve two-stage least squares estimator and establish its rate-optimality in both norms under some mild conditions. Our general results on the well-posedness and the minimax lower bounds are of independent interest to study not only other nonparametric estimators for $Q$-function but also efficient estimation on the value of any target policy in off-policy settings.
In metric learning, the goal is to learn an embedding so that data points with the same class are close to each other and data points with different classes are far apart. We propose a distance-ratio-based (DR) formulation for metric learning. Like softmax-based formulation for metric learning, it models $p(y=c|x')$, which is a probability that a query point $x'$ belongs to a class $c$. The DR formulation has two useful properties. First, the corresponding loss is not affected by scale changes of an embedding. Second, it outputs the optimal (maximum or minimum) classification confidence scores on representing points for classes. To demonstrate the effectiveness of our formulation, we conduct few-shot classification experiments using softmax-based and DR formulations on CUB and mini-ImageNet datasets. The results show that DR formulation generally enables faster and more stable metric learning than the softmax-based formulation. As a result, using DR formulation achieves improved or comparable generalization performances.
Given input-output pairs of an elliptic partial differential equation (PDE) in three dimensions, we derive the first theoretically-rigorous scheme for learning the associated Green's function $G$. By exploiting the hierarchical low-rank structure of $G$, we show that one can construct an approximant to $G$ that converges almost surely and achieves a relative error of $\mathcal{O}(\Gamma_\epsilon^{-1/2}\log^3(1/\epsilon)\epsilon)$ using at most $\mathcal{O}(\epsilon^{-6}\log^4(1/\epsilon))$ input-output training pairs with high probability, for any $0<\epsilon<1$. The quantity $0<\Gamma_\epsilon\leq 1$ characterizes the quality of the training dataset. Along the way, we extend the randomized singular value decomposition algorithm for learning matrices to Hilbert--Schmidt operators and characterize the quality of covariance kernels for PDE learning.
The Extended Randomized Kaczmarz method is a well known iterative scheme which can find the Moore-Penrose inverse solution of a possibly inconsistent linear system and requires only one additional column of the system matrix in each iteration in comparison with the standard randomized Kaczmarz method. Also, the Sparse Randomized Kaczmarz method has been shown to converge linearly to a sparse solution of a consistent linear system. Here, we combine both ideas and propose an Extended Sparse Randomized Kaczmarz method. We show linear expected convergence to a sparse least squares solution in the sense that an extended variant of the regularized basis pursuit problem is solved. Moreover, we generalize the additional step in the method and prove convergence to a more abstract optimization problem. We demonstrate numerically that our method can find sparse least squares solutions of real and complex systems if the noise is concentrated in the complement of the range of the system matrix and that our generalization can handle impulsive noise.
In this paper, we consider a distributed lossy compression network with $L$ encoders and a decoder. Each encoder observes a source and compresses it, which is sent to the decoder. Moreover, each observed source can be written as the sum of a target signal and a noise which are independently generated from two symmetric multivariate Gaussian distributions. The decoder jointly constructs the target signals given a threshold on the mean squared error distortion. We are interested in the minimum compression rate of this network versus the distortion threshold which is known as the \emph{rate-distortion function}. We derive a lower bound on the rate-distortion function by solving a convex program, explicitly. The proposed lower bound matches the well-known Berger-Tung's upper bound for some values of the distortion threshold. The asymptotic expressions of the upper and lower bounds are derived in the large $L$ limit. Under specific constraints, the bounds match in the asymptotic regime yielding the characterization of the rate-distortion function.
We prove upper and lower bounds on the minimal spherical dispersion, improving upon previous estimates obtained by Rote and Tichy [Spherical dispersion with an application to polygonal approximation of curves, Anz. \"Osterreich. Akad. Wiss. Math.-Natur. Kl. 132 (1995), 3--10]. In particular, we see that the inverse $N(\varepsilon,d)$ of the minimal spherical dispersion is, for fixed $\varepsilon>0$, linear in the dimension $d$ of the ambient space. We also derive upper and lower bounds on the expected dispersion for points chosen independently and uniformly at random from the Euclidean unit sphere. In terms of the corresponding inverse $\widetilde{N}(\varepsilon,d)$, our bounds are optimal with respect to the dependence on $\varepsilon$.
Person re-identification (re-ID) has attracted much attention recently due to its great importance in video surveillance. In general, distance metrics used to identify two person images are expected to be robust under various appearance changes. However, our work observes the extreme vulnerability of existing distance metrics to adversarial examples, generated by simply adding human-imperceptible perturbations to person images. Hence, the security danger is dramatically increased when deploying commercial re-ID systems in video surveillance, especially considering the highly strict requirement of public safety. Although adversarial examples have been extensively applied for classification analysis, it is rarely studied in metric analysis like person re-identification. The most likely reason is the natural gap between the training and testing of re-ID networks, that is, the predictions of a re-ID network cannot be directly used during testing without an effective metric. In this work, we bridge the gap by proposing Adversarial Metric Attack, a parallel methodology to adversarial classification attacks, which can effectively generate adversarial examples for re-ID. Comprehensive experiments clearly reveal the adversarial effects in re-ID systems. Moreover, by benchmarking various adversarial settings, we expect that our work can facilitate the development of robust feature learning with the experimental conclusions we have drawn.
Clustering and classification critically rely on distance metrics that provide meaningful comparisons between data points. We present mixed-integer optimization approaches to find optimal distance metrics that generalize the Mahalanobis metric extensively studied in the literature. Additionally, we generalize and improve upon leading methods by removing reliance on pre-designated "target neighbors," "triplets," and "similarity pairs." Another salient feature of our method is its ability to enable active learning by recommending precise regions to sample after an optimal metric is computed to improve classification performance. This targeted acquisition can significantly reduce computational burden by ensuring training data completeness, representativeness, and economy. We demonstrate classification and computational performance of the algorithms through several simple and intuitive examples, followed by results on real image and medical datasets.
Methods that align distributions by minimizing an adversarial distance between them have recently achieved impressive results. However, these approaches are difficult to optimize with gradient descent and they often do not converge well without careful hyperparameter tuning and proper initialization. We investigate whether turning the adversarial min-max problem into an optimization problem by replacing the maximization part with its dual improves the quality of the resulting alignment and explore its connections to Maximum Mean Discrepancy. Our empirical results suggest that using the dual formulation for the restricted family of linear discriminators results in a more stable convergence to a desirable solution when compared with the performance of a primal min-max GAN-like objective and an MMD objective under the same restrictions. We test our hypothesis on the problem of aligning two synthetic point clouds on a plane and on a real-image domain adaptation problem on digits. In both cases, the dual formulation yields an iterative procedure that gives more stable and monotonic improvement over time.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.