We present a randomized approximation scheme for the permanent of a matrix with nonnegative entries. Our scheme extends a recursive rejection sampling method of Huber and Law (SODA 2008) by replacing the upper bound for the permanent with a linear combination of the subproblem bounds at a moderately large depth of the recursion tree. This method, we call deep rejection sampling, is empirically shown to outperform the basic, depth-zero variant, as well as a related method by Kuck et al. (NeurIPS 2019). We analyze the expected running time of the scheme on random $(0, 1)$-matrices where each entry is independently $1$ with probability $p$. Our bound is superior to a previous one for $p$ less than $1/5$, matching another bound that was known to hold when every row and column has density exactly $p$.
In this work we consider a class of non-linear eigenvalue problems that admit a spectrum similar to that of a Hamiltonian matrix, in the sense that the spectrum is symmetric with respect to both the real and imaginary axis. More precisely, we present a method to iteratively approximate the eigenvalues of such non-linear eigenvalue problems closest to a given purely real or imaginary shift, while preserving the symmetries of the spectrum. To this end the presented method exploits the equivalence between the considered non-linear eigenvalue problem and the eigenvalue problem associated with a linear but infinite-dimensional operator. To compute the eigenvalues closest to the given shift, we apply a specifically chosen shift-invert transformation to this linear operator and compute the eigenvalues with the largest modulus of the new shifted and inverted operator using an (infinite) Arnoldi procedure. The advantage of the chosen shift-invert transformation is that the spectrum of the transformed operator has a "real skew-Hamiltonian"-like structure. Furthermore, it is proven that the Krylov space constructed by applying this operator, satisfies an orthogonality property in terms of a specifically chosen bilinear form. By taking this property into account in the orthogonalization process, it is ensured that even in the presence of rounding errors, the obtained approximation for, e.g., a simple, purely imaginary eigenvalue is simple and purely imaginary. The presented work can thus be seen as an extension of [V. Mehrmann and D. Watkins, "Structure-Preserving Methods for Computing Eigenpairs of Large Sparse Skew-Hamiltonian/Hamiltonian Pencils", SIAM J. Sci. Comput. (22.6), 2001], to the considered class of non-linear eigenvalue problems. Although the presented method is initially defined on function spaces, it can be implemented using finite dimensional linear algebra operations.
We consider the problem of maximizing the minimum (weighted) value of all components of a vector over a polymatroid. This is a special case of the lexicographically optimal base problem introduced and solved by Fujishige. We give an alternative formulation of the problem as a zero-sum game between a maximizing player whose mixed strategy set is the base of the polymatroid and a minimizing player whose mixed strategy set is a simplex. We show that this game and three variations of it unify several problems in search, sequential testing and queuing. We give a new, short derivation of optimal strategies for both players and an expression for the value of the game. Furthermore, we give a characterization of the set of optimal strategies for the minimizing player and we consider special cases for which optimal strategies can be found particularly easily.
We study the model-based reward-free reinforcement learning with linear function approximation for episodic Markov decision processes (MDPs). In this setting, the agent works in two phases. In the exploration phase, the agent interacts with the environment and collects samples without the reward. In the planning phase, the agent is given a specific reward function and uses samples collected from the exploration phase to learn a good policy. We propose a new provably efficient algorithm, called UCRL-RFE under the Linear Mixture MDP assumption, where the transition probability kernel of the MDP can be parameterized by a linear function over certain feature mappings defined on the triplet of state, action, and next state. We show that to obtain an $\epsilon$-optimal policy for arbitrary reward function, UCRL-RFE needs to sample at most $\tilde O(H^5d^2\epsilon^{-2})$ episodes during the exploration phase. Here, $H$ is the length of the episode, $d$ is the dimension of the feature mapping. We also propose a variant of UCRL-RFE using Bernstein-type bonus and show that it needs to sample at most $\tilde O(H^4d(H + d)\epsilon^{-2})$ to achieve an $\epsilon$-optimal policy. By constructing a special class of linear Mixture MDPs, we also prove that for any reward-free algorithm, it needs to sample at least $\tilde \Omega(H^2d\epsilon^{-2})$ episodes to obtain an $\epsilon$-optimal policy. Our upper bound matches the lower bound in terms of the dependence on $\epsilon$ and the dependence on $d$ if $H \ge d$.
Generalized approximate message passing (GAMP) is a promising technique for unknown signal reconstruction of generalized linear models (GLM). However, it requires that the transformation matrix has independent and identically distributed (IID) entries. In this context, generalized vector AMP (GVAMP) is proposed for general unitarily-invariant transformation matrices but it has a high-complexity matrix inverse. To this end, we propose a universal generalized memory AMP (GMAMP) framework including the existing orthogonal AMP/VAMP, GVAMP, and MAMP as special instances. Due to the characteristics that local processors are all memory, GMAMP requires stricter orthogonality to guarantee the asymptotic IID Gaussianity and state evolution. To satisfy such orthogonality, local orthogonal memory estimators are established. The GMAMP framework provides a new principle toward building new advanced AMP-type algorithms. As an example, we construct a Bayes-optimal GMAMP (BO-GMAMP), which uses a low-complexity memory linear estimator to suppress the linear interference, and thus its complexity is comparable to GAMP. Furthermore, we prove that for unitarily-invariant transformation matrices, BO-GMAMP achieves the replica minimum (i.e., Bayes-optimal) MSE if it has a unique fixed point.
In this paper, we propose a monotone approximation scheme for a class of fully nonlinear partial integro-differential equations (PIDEs) which characterize the nonlinear $\alpha$-stable L\'{e}vy processes under sublinear expectation space with $\alpha \in(1,2)$. Two main results are obtained: (i) the error bounds for the monotone approximation scheme of nonlinear PIDEs, and (ii) the convergence rates of a generalized central limit theorem of Bayraktar-Munk for $\alpha$-stable random variables under sublinear expectation. Our proofs use and extend techniques introduced by Krylov and Barles-Jakobsen.
We consider a new functional inequality controlling the rate of relative entropy decay for random walks, the interchange process and more general block-type dynamics for permutations. The inequality lies between the classical logarithmic Sobolev inequality and the modified logarithmic Sobolev inequality, roughly interpolating between the two as the size of the blocks grows. Our results suggest that the new inequality may have some advantages with respect to the latter well known inequalities when multi-particle processes are considered. We prove a strong form of tensorization for independent particles interacting through synchronous updates. Moreover, for block dynamics on permutations we compute the optimal constants in all mean field settings, namely whenever the rate of update of a block depends only on the size of the block. Along the way we establish the independence of the spectral gap on the number of particles for these mean field processes. As an application of our entropy inequalities we prove a new subadditivity estimate for permutations, which implies a sharp upper bound on the permanent of arbitrary matrices with nonnegative entries, thus resolving a well known conjecture.
Motivated by the serious problem that hospitals in rural areas suffer from a shortage of residents, we study the Hospitals/Residents model in which hospitals are associated with lower quotas and the objective is to satisfy them as much as possible. When preference lists are strict, the number of residents assigned to each hospital is the same in any stable matching due to the well-known rural hospitals theorem; thus there is no room for algorithmic interventions. However, when ties are introduced in preference lists, this is not the case because the number of residents may vary over stable matchings. In this paper, we formulate an optimization problem that asks to find a stable matching with the maximum total satisfaction ratio for lower quotas. We first investigate how the total satisfaction ratio varies over choices of stable matchings in four natural scenarios. We provide exact values of these maximum gaps in all scenarios. Subsequently, we propose a strategy-proof approximation algorithm for our problem; in one scenario it solves the problem optimally, and in the other three scenarios, which are NP-hard, it yields a better approximation factor than a naive tie-breaking method. Finally, we show inapproximability results for the above-mentioned three NP-hard scenarios.
Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.
In order to avoid the curse of dimensionality, frequently encountered in Big Data analysis, there was a vast development in the field of linear and nonlinear dimension reduction techniques in recent years. These techniques (sometimes referred to as manifold learning) assume that the scattered input data is lying on a lower dimensional manifold, thus the high dimensionality problem can be overcome by learning the lower dimensionality behavior. However, in real life applications, data is often very noisy. In this work, we propose a method to approximate $\mathcal{M}$ a $d$-dimensional $C^{m+1}$ smooth submanifold of $\mathbb{R}^n$ ($d \ll n$) based upon noisy scattered data points (i.e., a data cloud). We assume that the data points are located "near" the lower dimensional manifold and suggest a non-linear moving least-squares projection on an approximating $d$-dimensional manifold. Under some mild assumptions, the resulting approximant is shown to be infinitely smooth and of high approximation order (i.e., $O(h^{m+1})$, where $h$ is the fill distance and $m$ is the degree of the local polynomial approximation). The method presented here assumes no analytic knowledge of the approximated manifold and the approximation algorithm is linear in the large dimension $n$. Furthermore, the approximating manifold can serve as a framework to perform operations directly on the high dimensional data in a computationally efficient manner. This way, the preparatory step of dimension reduction, which induces distortions to the data, can be avoided altogether.
We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ queries. Our algorithm works for any field $F$. This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix. Our algorithm is the first such algorithm which does not read a submatrix, and instead reads a carefully selected non-adaptive pattern of entries in rows and columns of $A$. We complement our algorithm with a matching query complexity lower bound for non-adaptive testers over any field. We also give tight bounds of $\widetilde{\Theta}(d^2)$ queries in the sensing model for which query access comes in the form of $\langle X_i, A\rangle:=tr(X_i^\top A)$; perhaps surprisingly these bounds do not depend on $\epsilon$. We next develop a novel property testing framework for testing numerical properties of a real-valued matrix $A$ more generally, which includes the stable rank, Schatten-$p$ norms, and SVD entropy. Specifically, we propose a bounded entry model, where $A$ is required to have entries bounded by $1$ in absolute value. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above.