We study the equivalence testing problem where the goal is to determine if the given two unknown distributions on $[n]$ are equal or $\epsilon$-far in the total variation distance in the conditional sampling model (CFGM, SICOMP16; CRS, SICOMP15) wherein a tester can get a sample from the distribution conditioned on any subset. Equivalence testing is a central problem in distribution testing, and there has been a plethora of work on this topic in various sampling models. Despite significant efforts over the years, there remains a gap in the current best-known upper bound of $\tilde{O}(\log \log n)$ [FJOPS, COLT 2015] and lower bound of $\Omega(\sqrt{\log \log n})$[ACK, RANDOM 2015, Theory of Computing 2018]. Closing this gap has been repeatedly posed as an open problem (listed as problems 66 and 87 at sublinear.info). In this paper, we completely resolve the query complexity of this problem by showing a lower bound of $\tilde{\Omega}(\log \log n)$. For that purpose, we develop a novel and generic proof technique that enables us to break the $\sqrt{\log \log n}$ barrier, not only for the equivalence testing problem but also for other distribution testing problems, such as uniblock property.
Efficiently solving sparse linear algebraic equations is an important research topic of numerical simulation. Commonly used approaches include direct methods and iterative methods. Compared with the direct methods, the iterative methods have lower computational complexity and memory consumption, and are thus often used to solve large-scale sparse linear equations. However, there are numerous iterative methods, parameters and components needed to be carefully chosen, and an inappropriate combination may eventually lead to an inefficient solution process in practice. With the development of deep learning, intelligent iterative methods become popular in these years, which can intelligently make a sufficiently good combination, optimize the parameters and components in accordance with the properties of the input matrix. This survey then reviews these intelligent iterative methods. To be clearer, we shall divide our discussion into three aspects: a method aspect, a component aspect and a parameter aspect. Moreover, we summarize the existing work and propose potential research directions that may deserve a deep investigation.
We consider a voting model, where a number of candidates need to be selected subject to certain feasibility constraints. The model generalises committee elections (where there is a single constraint on the number of candidates that need to be selected), various elections with diversity constraints, the model of public decisions (where decisions needs to be taken on a number of independent issues), and the model of collective scheduling. A critical property of voting is that it should be fair -- not only to individuals but also to groups of voters with similar opinions on the subject of the vote; in other words, the outcome of an election should proportionally reflect the voters' preferences. We formulate axioms of proportionality in this general model. Our axioms do not require predefining groups of voters; to the contrary, we ensure that the opinion of every subset of voters whose preferences are cohesive-enough are taken into account to the extent that is proportional to the size of the subset. Our axioms generalise the strongest known satisfiable axioms for the more specific models. We explain how to adapt two prominent committee election rules, Proportional Approval Voting (PAV) and Phragm\'{e}n Sequential Rule, as well as the concept of stable-priceability to our general model. The two rules satisfy our proportionality axioms if and only if the feasibility constraints are matroids.
We study probability density functions that are log-concave. Despite the space of all such densities being infinite-dimensional, the maximum likelihood estimate is the exponential of a piecewise linear function determined by finitely many quantities, namely the function values, or heights, at the data points. We explore in what sense exact solutions to this problem are possible. First, we show that the heights given by the maximum likelihood estimate are generically transcendental. For a cell in one dimension, the maximum likelihood estimator is expressed in closed form using the generalized W-Lambert function. Even more, we show that finding the log-concave maximum likelihood estimate is equivalent to solving a collection of polynomial-exponential systems of a special form. Even in the case of two equations, very little is known about solutions to these systems. As an alternative, we use Smale's alpha-theory to refine approximate numerical solutions and to certify solutions to log-concave density estimation.
Graph diffusion equations are intimately related to graph neural networks (GNNs) and have recently attracted attention as a principled framework for analyzing GNN dynamics, formalizing their expressive power, and justifying architectural choices. One key open questions in graph learning is the generalization capabilities of GNNs. A major limitation of current approaches hinges on the assumption that the graph topologies in the training and test sets come from the same distribution. In this paper, we make steps towards understanding the generalization of GNNs by exploring how graph diffusion equations extrapolate and generalize in the presence of varying graph topologies. We first show deficiencies in the generalization capability of existing models built upon local diffusion on graphs, stemming from the exponential sensitivity to topology variation. Our subsequent analysis reveals the promise of non-local diffusion, which advocates for feature propagation over fully-connected latent graphs, under the assumption of a specific data-generating condition. In addition to these findings, we propose a novel graph encoder backbone, Advective Diffusion Transformer (ADiT), inspired by advective graph diffusion equations that have a closed-form solution backed up with theoretical guarantees of desired generalization under topological distribution shifts. The new model, functioning as a versatile graph Transformer, demonstrates superior performance across a wide range of graph learning tasks.
We describe a three precision variant of Newton's method for nonlinear equations. We evaluate the nonlinear residual in double precision, store the Jacobian matrix in single precision, and solve the equation for the Newton step with iterative refinement with a factorization in half precision. We analyze the method as an inexact Newton method. This analysis shows that, except for very poorly conditioned Jacobians, the number of nonlinear iterations needed is the same that one would get if one stored and factored the Jacobian in double precision. In many ill-conditioned cases one can use the low precision factorization as a preconditioner for a GMRES iteration. That approach can recover fast convergence of the nonlinear iteration. We present an example to illustrate the results.
This work considers charged systems described by the modified Poisson--Nernst--Planck (PNP) equations, which incorporate ionic steric effects and the Born solvation energy for dielectric inhomogeneity. Solving the steady-state modified PNP equations poses numerical challenges due to the emergence of sharp boundary layers caused by small Debye lengths, particularly when local ionic concentrations reach saturation. To address this, we first reformulate the steady-state problem as a constraint optimization, where the ionic concentrations on unstructured Delaunay nodes are treated as fractional particles moving along edges between nodes. The electric fields are then updated to minimize the objective free energy while satisfying the discrete Gauss's law. We develop a local relaxation method on unstructured meshes that inherently respects the discrete Gauss's law, ensuring curl-free electric fields. Numerical analysis demonstrates that the optimal mass of the moving fractional particles guarantees the positivity of both ionic and solvent concentrations. Additionally, the free energy of the charged system consistently decreases during successive updates of ionic concentrations and electric fields. We conduct numerical tests to validate the expected numerical accuracy, positivity, free-energy dissipation, and robustness of our method in simulating charged systems with sharp boundary layers.
Quantum multiplication is a fundamental operation in quantum computing. Most existing quantum multipliers require $O(n)$ qubits to multiply two $n$-bit integer numbers, limiting their applicability to multiply large integer numbers using near-term quantum computers. In this paper, we propose the Quantum Multiplier Based on Exponent Adder (QMbead), a new approach that addresses this limitation by requiring just $\log_2(n)$ qubits to multiply two $n$-bit integer numbers. QMbead uses a so-called exponent encoding to represent two multiplicands as two superposition states, respectively, and then employs a quantum adder to obtain the sum of these two superposition states, and subsequently measures the outputs of the quantum adder to calculate the product of the multiplicands. This paper presents two types of quantum adders based on the quantum Fourier transform (QFT) for use in QMbead. The circuit depth of QMbead is determined by the chosen quantum adder, being $O(\log^2 n)$ when using the two QFT-based adders. If leveraging a logarithmic-depth quantum adder, the time complexity of QMbead is $O(n \log n)$, identical to that of the fastest classical multiplication algorithm, Harvey-Hoeven algorithm. Interestingly, QMbead maintains an advantage over the Harvey-Hoeven algorithm, given that the latter is only suitable for excessively large numbers, whereas QMbead is valid for both small and large numbers. The multiplicand can be either an integer or a decimal number. QMbead has been successfully implemented on quantum simulators to compute products with a bit length of up to 273 bits using only 17 qubits. This establishes QMbead as an efficient solution for multiplying large integer or decimal numbers with many bits.
Policy gradient methods, where one searches for the policy of interest by maximizing the value functions using first-order information, become increasingly popular for sequential decision making in reinforcement learning, games, and control. Guaranteeing the global optimality of policy gradient methods, however, is highly nontrivial due to nonconcavity of the value functions. In this exposition, we highlight recent progresses in understanding and developing policy gradient methods with global convergence guarantees, putting an emphasis on their finite-time convergence rates with regard to salient problem parameters.
We make an experimental comparison of methods for computing the numerical radius of an $n\times n$ complex matrix, based on two well-known characterizations, the first a nonconvex optimization problem in one real variable and the second a convex optimization problem in $n^{2}+1$ real variables. We make comparisons with respect to both accuracy and computation time using publicly available software.
We consider the problem of forming prediction sets in an online setting where the distribution generating the data is allowed to vary over time. Previous approaches to this problem suffer from over-weighting historical data and thus may fail to quickly react to the underlying dynamics. Here we correct this issue and develop a novel procedure with provably small regret over all local time intervals of a given width. We achieve this by modifying the adaptive conformal inference (ACI) algorithm of Gibbs and Cand\`{e}s (2021) to contain an additional step in which the step-size parameter of ACI's gradient descent update is tuned over time. Crucially, this means that unlike ACI, which requires knowledge of the rate of change of the data-generating mechanism, our new procedure is adaptive to both the size and type of the distribution shift. Our methods are highly flexible and can be used in combination with any baseline predictive algorithm that produces point estimates or estimated quantiles of the target without the need for distributional assumptions. We test our techniques on two real-world datasets aimed at predicting stock market volatility and COVID-19 case counts and find that they are robust and adaptive to real-world distribution shifts.