We consider the constrained Linear Inverse Problem (LIP), where a certain atomic norm (like the $\ell_1 $ and the Nuclear norm) is minimized subject to a quadratic constraint. Typically, such cost functions are non-differentiable which makes them not amenable to the fast optimization methods existing in practice. We propose two equivalent reformulations of the constrained LIP with improved convex regularity: (i) a smooth convex minimization problem, and (ii) a strongly convex min-max problem. These problems could be solved by applying existing acceleration based convex optimization methods which provide better \mmode{ O \left( \nicefrac{1}{k^2} \right) } theoretical convergence guarantee. However, to fully exploit the utility of these reformulations, we also provide a novel algorithm, to which we refer as the Fast Linear Inverse Problem Solver (FLIPS), that is tailored to solve the reformulation of the LIP. We demonstrate the performance of FLIPS on the sparse coding problem arising in image processing tasks. In this setting, we observe that FLIPS consistently outperforms the Chambolle-Pock and C-SALSA algorithms--two of the current best methods in the literature.
We propose a class of randomized quantum algorithms for the task of sampling from matrix functions, without the use of quantum block encodings or any other coherent oracle access to the matrix elements. As such, our use of qubits is purely algorithmic, and no additional qubits are required for quantum data structures. For $N\times N$ Hermitian matrices, the space cost is $\log(N)+1$ qubits and depending on the structure of the matrices, the gate complexity can be comparable to state-of-the-art methods that use quantum data structures of up to size $O(N^2)$, when considering equivalent end-to-end problems. Within our framework, we present a quantum linear system solver that allows one to sample properties of the solution vector, as well as an algorithm for sampling properties of ground states of Hamiltonians. As a concrete application, we combine these two sub-routines to present a scheme for calculating Green's functions of quantum many-body systems.
Deep learning based methods for single-image super-resolution (SR) have drawn a lot of attention lately. In particular, various papers have shown that the learning stage can be performed on a single image, resulting in the so-called internal approaches. The SinGAN method is one of these contributions, where the distribution of image patches is learnt on the image at hand and propagated at finer scales. Now, there are situations where some statistical a priori can be assumed for the final image. In particular, many natural phenomena yield images having power law Fourier spectrum, such as clouds and other texture images. In this work, we show how such a priori information can be integrated into an internal super-resolution approach, by constraining the learned up-sampling procedure of SinGAN. We consider various types of constraints, related to the Fourier power spectrum, the color histograms and the consistency of the upsampling scheme. We demonstrate on various experiments that these constraints are indeed satisfied, but also that some perceptual quality measures can be improved by the proposed approach.
In machine learning, the selection of a promising model from a potentially large number of competing models and the assessment of its generalization performance are critical tasks that need careful consideration. Typically, model selection and evaluation are strictly separated endeavors, splitting the sample at hand into a training, validation, and evaluation set, and only compute a single confidence interval for the prediction performance of the final selected model. We however propose an algorithm how to compute valid lower confidence bounds for multiple models that have been selected based on their prediction performances in the evaluation set by interpreting the selection problem as a simultaneous inference problem. We use bootstrap tilting and a maxT-type multiplicity correction. The approach is universally applicable for any combination of prediction models, any model selection strategy, and any prediction performance measure that accepts weights. We conducted various simulation experiments which show that our proposed approach yields lower confidence bounds that are at least comparably good as bounds from standard approaches, and that reliably reach the nominal coverage probability. In addition, especially when sample size is small, our proposed approach yields better performing prediction models than the default selection of only one model for evaluation does.
In this note, we prove that the following function space with absolutely convergent Fourier series \[ F_d:=\left\{ f\in L^2([0,1)^d)\:\middle| \: \|f\|:=\sum_{\boldsymbol{k}\in \mathbb{Z}^d}|\hat{f}(\boldsymbol{k})| \max\left(1,\min_{j\in \mathrm{supp}(\boldsymbol{k})}\log |k_j|\right) <\infty \right\}\] with $\hat{f}(\boldsymbol{k})$ being the $\boldsymbol{k}$-th Fourier coefficient of $f$ and $\mathrm{supp}(\boldsymbol{k}):=\{j\in \{1,\ldots,d\}\mid k_j\neq 0\}$ is polynomially tractable for multivariate integration in the worst-case setting. Here polynomial tractability means that the minimum number of function evaluations required to make the worst-case error less than or equal to a tolerance $\varepsilon$ grows only polynomially with respect to $\varepsilon^{-1}$ and $d$. It is important to remark that the function space $F_d$ is unweighted, that is, all variables contribute equally to the norm of functions. Our tractability result is in contrast to those for most of the unweighted integration problems studied in the literature, in which polynomial tractability does not hold and the problem suffers from the curse of dimensionality. Our proof is constructive in the sense that we provide an explicit quasi-Monte Carlo rule that attains a desired worst-case error bound.
The top-k operator returns a k-sparse vector, where the non-zero values correspond to the k largest values of the input. Unfortunately, because it is a discontinuous function, it is difficult to incorporate in neural networks trained end-to-end with backpropagation. Recent works have considered differentiable relaxations, based either on regularization or perturbation techniques. However, to date, no approach is fully differentiable and sparse. In this paper, we propose new differentiable and sparse top-k operators. We view the top-k operator as a linear program over the permutahedron, the convex hull of permutations. We then introduce a p-norm regularization term to smooth out the operator, and show that its computation can be reduced to isotonic optimization. Our framework is significantly more general than the existing one and allows for example to express top-k operators that select values in magnitude. On the algorithmic side, in addition to pool adjacent violator (PAV) algorithms, we propose a new GPU/TPU-friendly Dykstra algorithm to solve isotonic optimization problems. We successfully use our operators to prune weights in neural networks, to fine-tune vision transformers, and as a router in sparse mixture of experts.
In the Colored Clustering problem, one is asked to cluster edge-colored (hyper-)graphs whose colors represent interaction types. More specifically, the goal is to select as many edges as possible without choosing two edges that share an endpoint and are colored differently. Equivalently, the goal can also be described as assigning colors to the vertices in a way that fits the edge-coloring as well as possible. As this problem is NP-hard, we build on previous work by studying its parameterized complexity. We give a $2^{\mathcal O(k)} \cdot n^{\mathcal O(1)}$-time algorithm where $k$ is the number of edges to be selected and $n$ the number of vertices. We also prove the existence of a problem kernel of size $\mathcal O(k^{5/2} )$, resolving an open problem posed in the literature. We consider parameters that are smaller than $k$, the number of edges to be selected, and $r$, the number of edges that can be deleted. Such smaller parameters are obtained by considering the difference between $k$ or $r$ and some lower bound on these values. We give both algorithms and lower bounds for Colored Clustering with such parameterizations. Finally, we settle the parameterized complexity of Colored Clustering with respect to structural graph parameters by showing that it is $W[1]$-hard with respect to both vertex cover number and tree-cut width, but fixed-parameter tractable with respect to slim tree-cut width.
We propose improved exact and heuristic algorithms for solving the maximum weight clique problem, a well-known problem in graph theory with many applications. Our algorithms interleave successful techniques from related work with novel data reduction rules that use local graph structure to identify and remove vertices and edges while retaining the optimal solution. We evaluate our algorithms on a range of synthetic and real-world graphs, and find that they outperform the current state of the art on most inputs. Our data reductions always produce smaller reduced graphs than existing data reductions alone. As a result, our exact algorithm, MWCRedu, finds solutions orders of magnitude faster on naturally weighted, medium-sized map labeling graphs and random hyperbolic graphs. Our heuristic algorithm, MWCPeel, outperforms its competitors on these instances, but is slightly less effective on extremely dense or large instances.
The well-known discrete Fourier transform (DFT) can easily be generalized to arbitrary nodes in the spatial domain. The fast procedure for this generalization is referred to as nonequispaced fast Fourier transform (NFFT). Various applications such as MRI, solution of PDEs, etc., are interested in the inverse problem, i.,e., computing Fourier coefficients from given nonequispaced data. In this paper we survey different kinds of approaches to tackle this problem. In contrast to iterative procedures, where multiple iteration steps are needed for computing a solution, we focus especially on so-called direct inversion methods. We review density compensation techniques and introduce a new scheme that leads to an exact reconstruction for trigonometric polynomials. In addition, we consider a matrix optimization approach using Frobenius norm minimization to obtain an inverse NFFT.
Online prediction from experts is a fundamental problem in machine learning and several works have studied this problem under privacy constraints. We propose and analyze new algorithms for this problem that improve over the regret bounds of the best existing algorithms for non-adaptive adversaries. For approximate differential privacy, our algorithms achieve regret bounds of $\tilde{O}(\sqrt{T \log d} + \log d/\varepsilon)$ for the stochastic setting and $\tilde O(\sqrt{T \log d} + T^{1/3} \log d/\varepsilon)$ for oblivious adversaries (where $d$ is the number of experts). For pure DP, our algorithms are the first to obtain sub-linear regret for oblivious adversaries in the high-dimensional regime $d \ge T$. Moreover, we prove new lower bounds for adaptive adversaries. Our results imply that unlike the non-private setting, there is a strong separation between the optimal regret for adaptive and non-adaptive adversaries for this problem. Our lower bounds also show a separation between pure and approximate differential privacy for adaptive adversaries where the latter is necessary to achieve the non-private $O(\sqrt{T})$ regret.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.