We prove tight H\"olderian error bounds for all $p$-cones. Surprisingly, the exponents differ in several ways from those that have been previously conjectured; moreover, they illuminate $p$-cones as a curious example of a class of objects that possess properties in 3 dimensions that they do not in 4 or more. Using our error bounds, we analyse least squares problems with $p$-norm regularization, where our results enable us to compute the corresponding KL exponents for previously inaccessible values of $p$. Another application is a (relatively) simple proof that most $p$-cones are neither self-dual nor homogeneous. Our error bounds are obtained under the framework of facial residual functions and we expand it by establishing for general cones an optimality criterion under which the resulting error bound must be tight.
In the framework of inverse linear problems on infinite-dimensional Hilbert space, we prove the convergence of the conjugate gradient iterates to an exact solution to the inverse problem in the most general case where the self-adjoint, non-negative operator is unbounded and with minimal, technically unavoidable assumptions on the initial guess of the iterative algorithm. The convergence is proved to always hold in the Hilbert space norm (error convergence), as well as at other levels of regularity (energy norm, residual, etc.) depending on the regularity of the iterates. We also discuss, both analytically and through a selection of numerical tests, the main features and differences of our convergence result as compared to the case, already available in the literature, where the operator is bounded.
The Sum-of-Squares (SoS) hierarchy of semidefinite programs is a powerful algorithmic paradigm which captures state-of-the-art algorithmic guarantees for a wide array of problems. In the average case setting, SoS lower bounds provide strong evidence of algorithmic hardness or information-computation gaps. Prior to this work, SoS lower bounds have been obtained for problems in the "dense" input regime, where the input is a collection of independent Rademacher or Gaussian random variables, while the sparse regime has remained out of reach. We make the first progress in this direction by obtaining strong SoS lower bounds for the problem of Independent Set on sparse random graphs. We prove that with high probability over an Erdos-Renyi random graph $G\sim G_{n,\frac{d}{n}}$ with average degree $d>\log^2 n$, degree-$D_{SoS}$ SoS fails to refute the existence of an independent set of size $k = \Omega\left(\frac{n}{\sqrt{d}(\log n)(D_{SoS})^{c_0}} \right)$ in $G$ (where $c_0$ is an absolute constant), whereas the true size of the largest independent set in $G$ is $O\left(\frac{n\log d}{d}\right)$. Our proof involves several significant extensions of the techniques used for proving SoS lower bounds in the dense setting. Previous lower bounds are based on the pseudo-calibration heuristic of Barak et al [FOCS 2016] which produces a candidate SoS solution using a planted distribution indistinguishable from the input distribution via low-degree tests. In the sparse case the natural planted distribution does admit low-degree distinguishers, and we show how to adapt the pseudo-calibration heuristic to overcome this. Another notorious technical challenge for the sparse regime is the quest for matrix norm bounds. In this paper, we obtain new norm bounds for graph matrices in the sparse setting.
Spanners have been shown to be a powerful tool in graph algorithms. Many spanner constructions use a certain type of clustering at their core, where each cluster has small diameter and there are relatively few spanner edges between clusters. In this paper, we provide a clustering algorithm that, given $k\geq 2$, can be used to compute a spanner of stretch $2k-1$ and expected size $O(n^{1+1/k})$ in $k$ rounds in the CONGEST model. This improves upon the state of the art (by Elkin, and Neiman [TALG'19]) by making the bounds on both running time and stretch independent of the random choices of the algorithm, whereas they only hold with high probability in previous results. Spanners are used in certain synchronizers, thus our improvement directly carries over to such synchronizers. Furthermore, for keeping the \emph{total} number of inter-cluster edges small in low diameter decompositions, our clustering algorithm provides the following guarantees. Given $\beta\in (0,1]$, we compute a low diameter decomposition with diameter bound $O\left(\frac{\log n}{\beta}\right)$ such that each edge $e\in E$ is an inter-cluster edge with probability at most $\beta\cdot w(e)$ in $O\left(\frac{\log n}{\beta}\right)$ rounds in the CONGEST model. Again, this improves upon the state of the art (by Miller, Peng, and Xu [SPAA'13]) by making the bounds on both running time and diameter independent of the random choices of the algorithm, whereas they only hold with high probability in previous results.
We analyze the impact of transient and Byzantine faults on the construction of a maximal independent set in a general network. We adapt the self-stabilizing algorithm presented by Turau \cite{turau2007linear} for computing such a vertex set. Our algorithm is self-stabilizing and also works under the more difficult context of arbitrary Byzantine faults. Byzantine nodes can prevent nodes close to them from taking part in the independent set for an arbitrarily long time. We give boundaries to their impact by focusing on the set of all nodes excluding nodes at distance 1 or less of Byzantine nodes, and excluding some of the nodes at distance 2. As far as we know, we present the first algorithm tolerating both transient and Byzantine faults under the fair distributed daemon. We prove that this algorithm converges in $ \mathcal O(\Delta n)$ rounds w.h.p., where $n$ and $\Delta$ are the size and the maximum degree of the network, resp. Additionally, we present a modified version of this algorithm for anonymous systems under the adversarial distributed daemon that converges in $ \mathcal O(n^{2})$ expected number of steps.
In this paper, we study the statistical limits of deep learning techniques for solving elliptic partial differential equations (PDEs) from random samples using the Deep Ritz Method (DRM) and Physics-Informed Neural Networks (PINNs). To simplify the problem, we focus on a prototype elliptic PDE: the Schr\"odinger equation on a hypercube with zero Dirichlet boundary condition, which has wide application in the quantum-mechanical systems. We establish upper and lower bounds for both methods, which improves upon concurrently developed upper bounds for this problem via a fast rate generalization bound. We discover that the current Deep Ritz Methods is sub-optimal and propose a modified version of it. We also prove that PINN and the modified version of DRM can achieve minimax optimal bounds over Sobolev spaces. Empirically, following recent work which has shown that the deep model accuracy will improve with growing training sets according to a power law, we supply computational experiments to show a similar behavior of dimension dependent power law for deep PDE solvers.
We consider the problem of upper bounding the expected log-likelihood sub-optimality of the maximum likelihood estimate (MLE), or a conjugate maximum a posteriori (MAP) for an exponential family, in a non-asymptotic way. Surprisingly, we found no general solution to this problem in the literature. In particular, current theories do not hold for a Gaussian or in the interesting few samples regime. After exhibiting various facets of the problem, we show we can interpret the MAP as running stochastic mirror descent (SMD) on the log-likelihood. However, modern convergence results do not apply for standard examples of the exponential family, highlighting holes in the convergence literature. We believe solving this very fundamental problem may bring progress to both the statistics and optimization communities.
We consider the classical contention resolution problem where nodes arrive over time, each with a message to send. In each synchronous slot, each node can send or remain idle. If in a slot one node sends alone, it succeeds; otherwise, if multiple nodes send simultaneously, messages collide and none succeeds. Nodes can differentiate collision and silence only if collision detection is available. Ideally, a contention resolution algorithm should satisfy three criteria: low time complexity (or high throughput); low energy complexity, meaning each node does not make too many broadcast attempts; strong robustness, meaning the algorithm can maintain good performance even if slots can be jammed. Previous work has shown, with collision detection, there are "perfect" contention resolution algorithms satisfying all three criteria. On the other hand, without collision detection, it was not until 2020 that an algorithm was discovered which can achieve optimal time complexity and low energy cost, assuming there is no jamming. More recently, the trade-off between throughput and robustness was studied. However, an intriguing and important question remains unknown: without collision detection, are there robust algorithms achieving both low total time complexity and low per-node energy cost? In this paper, we answer the above question affirmatively. Specifically, we develop a new randomized algorithm for robust contention resolution without collision detection. Lower bounds show that it has both optimal time and energy complexity. If all nodes start execution simultaneously, we design another algorithm that is even faster, with similar energy complexity as the first algorithm. The separation on time complexity suggests for robust contention resolution without collision detection, ``batch'' instances (nodes start simultaneously) are inherently easier than ``scattered'' ones (nodes arrive over time).
The per-pixel cross-entropy loss (CEL) has been widely used in structured output prediction tasks as a spatial extension of generic image classification. However, its i.i.d. assumption neglects the structural regularity present in natural images. Various attempts have been made to incorporate structural reasoning mostly through structure priors in a cooperative way where co-occuring patterns are encouraged. We, on the other hand, approach this problem from an opposing angle and propose a new framework for training such structured prediction networks via an adversarial process, in which we train a structure analyzer that provides the supervisory signals, the adversarial structure matching loss (ASML). The structure analyzer is trained to maximize ASML, or to exaggerate recurring structural mistakes usually among co-occurring patterns. On the contrary, the structured output prediction network is trained to reduce those mistakes and is thus enabled to distinguish fine-grained structures. As a result, training structured output prediction networks using ASML reduces contextual confusion among objects and improves boundary localization. We demonstrate that ASML outperforms its counterpart CEL especially in context and boundary aspects on figure-ground segmentation and semantic segmentation tasks with various base architectures, such as FCN, U-Net, DeepLab, and PSPNet.
Recent works showed that Generative Adversarial Networks (GANs) can be successfully applied in unsupervised domain adaptation, where, given a labeled source dataset and an unlabeled target dataset, the goal is to train powerful classifiers for the target samples. In particular, it was shown that a GAN objective function can be used to learn target features indistinguishable from the source ones. In this work, we extend this framework by (i) forcing the learned feature extractor to be domain-invariant, and (ii) training it through data augmentation in the feature space, namely performing feature augmentation. While data augmentation in the image space is a well established technique in deep learning, feature augmentation has not yet received the same level of attention. We accomplish it by means of a feature generator trained by playing the GAN minimax game against source features. Results show that both enforcing domain-invariance and performing feature augmentation lead to superior or comparable performance to state-of-the-art results in several unsupervised domain adaptation benchmarks.
The Residual Networks of Residual Networks (RoR) exhibits excellent performance in the image classification task, but sharply increasing the number of feature map channels makes the characteristic information transmission incoherent, which losses a certain of information related to classification prediction, limiting the classification performance. In this paper, a Pyramidal RoR network model is proposed by analysing the performance characteristics of RoR and combining with the PyramidNet. Firstly, based on RoR, the Pyramidal RoR network model with channels gradually increasing is designed. Secondly, we analysed the effect of different residual block structures on performance, and chosen the residual block structure which best favoured the classification performance. Finally, we add an important principle to further optimize Pyramidal RoR networks, drop-path is used to avoid over-fitting and save training time. In this paper, image classification experiments were performed on CIFAR-10/100 and SVHN datasets, and we achieved the current lowest classification error rates were 2.96%, 16.40% and 1.59%, respectively. Experiments show that the Pyramidal RoR network optimization method can improve the network performance for different data sets and effectively suppress the gradient disappearance problem in DCNN training.