We present simple randomized and exchangeable improvements of Markov's inequality, as well as Chebyshev's inequality and Chernoff bounds. Our variants are never worse and typically strictly more powerful than the original inequalities. The proofs are short and elementary, and can easily yield similarly randomized or exchangeable versions of a host of other inequalities that employ Markov's inequality as an intermediate step. We point out some simple statistical applications involving tests that combine dependent e-values. In particular, we uniformly improve the power of universal inference, and obtain tighter betting-based nonparametric confidence intervals. Simulations reveal nontrivial gains in power (and no losses) in a variety of settings.
Gaussianization is a simple generative model that can be trained without backpropagation. It has shown compelling performance on low dimensional data. As the dimension increases, however, it has been observed that the convergence speed slows down. We show analytically that the number of required layers scales linearly with the dimension for Gaussian input. We argue that this is because the model is unable to capture dependencies between dimensions. Empirically, we find the same linear increase in cost for arbitrary input $p(x)$, but observe favorable scaling for some distributions. We explore potential speed-ups and formulate challenges for further research.
This paper presents a novel generic asymptotic expansion formula of expectations of multidimensional Wiener functionals through a Malliavin calculus technique. The uniform estimate of the asymptotic expansion is shown under a weaker condition on the Malliavin covariance matrix of the target Wiener functional. In particular, the method provides a tractable expansion for the expectation of an irregular functional of the solution to a multidimensional rough differential equation driven by fractional Brownian motion with Hurst index $H<1/2$, without using complicated fractional integral calculus for the singular kernel. In a numerical experiment, our expansion shows a much better approximation for a probability distribution function than its normal approximation, which demonstrates the validity of the proposed method.
We use the lens of weak signal asymptotics to study a class of sequentially randomized experiments, including those that arise in solving multi-armed bandit problems. In an experiment with $n$ time steps, we let the mean reward gaps between actions scale to the order $1/\sqrt{n}$ so as to preserve the difficulty of the learning task as $n$ grows. In this regime, we show that the sample paths of a class of sequentially randomized experiments -- adapted to this scaling regime and with arm selection probabilities that vary continuously with state -- converge weakly to a diffusion limit, given as the solution to a stochastic differential equation. The diffusion limit enables us to derive refined, instance-specific characterization of stochastic dynamics, and to obtain several insights on the regret and belief evolution of a number of sequential experiments including Thompson sampling (but not UCB, which does not satisfy our continuity assumption). We show that all sequential experiments whose randomization probabilities have a Lipschitz-continuous dependence on the observed data suffer from sub-optimal regret performance when the reward gaps are relatively large. Conversely, we find that a version of Thompson sampling with an asymptotically uninformative prior variance achieves near-optimal instance-specific regret scaling, including with large reward gaps, but these good regret properties come at the cost of highly unstable posterior beliefs.
In this work, we present a generic step-size choice for the ADMM type proximal algorithms. It admits a closed-form expression and is theoretically optimal with respect to a worst-case convergence rate bound. It is simply given by the ratio of Euclidean norms of the dual and primal solutions, i.e., $ ||{\lambda}^\star|| / ||{x}^\star||$. Numerical tests show that its practical performance is near-optimal in general. The only challenge is that such a ratio is not known a priori and we provide two strategies to address it. The derivation of our step-size choice is based on studying the fixed-point structure of ADMM using the proximal operator. However, we demonstrate that the classical proximal operator definition contains an input scaling issue. This leads to a scaled step-size optimization problem which would yield a false solution. Such an issue is naturally avoided by our proposed new definition of the proximal operator. A series of its properties is established.
Deep Metric Learning (DML) models rely on strong representations and similarity-based measures with specific loss functions. Proxy-based losses have shown great performance compared to pair-based losses in terms of convergence speed. However, proxies that are assigned to different classes may end up being closely located in the embedding space and hence having a hard time to distinguish between positive and negative items. Alternatively, they may become highly correlated and hence provide redundant information with the model. To address these issues, we propose a novel approach that introduces Soft Orthogonality (SO) constraint on proxies. The constraint ensures the proxies to be as orthogonal as possible and hence control their positions in the embedding space. Our approach leverages Data-Efficient Image Transformer (DeiT) as an encoder to extract contextual features from images along with a DML objective. The objective is made of the Proxy Anchor loss along with the SO regularization. We evaluate our method on four public benchmarks for category-level image retrieval and demonstrate its effectiveness with comprehensive experimental results and ablation studies. Our evaluations demonstrate the superiority of our proposed approach over state-of-the-art methods by a significant margin.
This survey explores modern approaches for computing low-rank approximations of high-dimensional matrices by means of the randomized SVD, randomized subspace iteration, and randomized block Krylov iteration. The paper compares the procedures via theoretical analyses and numerical studies to highlight how the best choice of algorithm depends on spectral properties of the matrix and the computational resources available. Despite superior performance for many problems, randomized block Krylov iteration has not been widely adopted in computational science. This paper strengthens the case for this method in three ways. First, it presents new pseudocode that can significantly reduce computational costs. Second, it provides a new analysis that yields simple, precise, and informative error bounds. Last, it showcases applications to challenging scientific problems, including principal component analysis for genetic data and spectral clustering for molecular dynamics data.
Smooth Csisz\'ar $f$-divergences can be expressed as integrals over so-called hockey stick divergences. This motivates a natural quantum generalization in terms of quantum Hockey stick divergences, which we explore here. Using this recipe, the Kullback-Leibler divergence generalises to the Umegaki relative entropy, in the integral form recently found by Frenkel. We find that the R\'enyi divergences defined via our new quantum $f$-divergences are not additive in general, but that their regularisations surprisingly yield the Petz R\'enyi divergence for $\alpha < 1$ and the sandwiched R\'enyi divergence for $\alpha > 1$, unifying these two important families of quantum R\'enyi divergences. Moreover, we find that the contraction coefficients for the new quantum $f$ divergences collapse for all $f$ that are operator convex, mimicking the classical behaviour and resolving some long-standing conjectures by Lesniewski and Ruskai. We derive various inequalities, including new reverse Pinsker inequalites with applications in differential privacy and also explore various other applications of the new divergences.
In this paper, we generalize the log-Sobolev inequalities to R\'enyi--Sobolev inequalities by replacing the entropy with the two-parameter entropy, which is a generalized version of entropy closely related to R\'enyi divergences. We derive the sharp nonlinear dimension-free version of this kind of inequalities. Interestingly, the resultant inequalities show a phase transition depending on the parameters. We then connect R\'enyi--Sobolev inequalities to the spectral graph theory. Our proofs in this paper are based on the information-theoretic characterization of the R\'enyi--Sobolev inequalities, as well as the method of types.
Matrix diagonalization is at the cornerstone of numerous fields of scientific computing. Diagonalizing a matrix to solve an eigenvalue problem requires a sequential path of iterations that eventually reaches a sufficiently converged and accurate solution for all the eigenvalues and eigenvectors. This typically translates into a high computational cost. Here we demonstrate how reinforcement learning, using the AlphaZero framework, can accelerate Jacobi matrix diagonalizations by viewing the selection of the fastest path to solution as a board game. To demonstrate the viability of our approach we apply the Jacobi diagonalization algorithm to symmetric Hamiltonian matrices that appear in quantum chemistry calculations. We find that a significant acceleration can often be achieved. Our findings highlight the opportunity to use machine learning as a promising tool to improve the performance of numerical linear algebra.
This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.