Measuring the semantic similarity between two sentences is still an important task. The word mover's distance (WMD) computes the similarity via the optimal alignment between the sets of word embeddings. However, WMD does not utilize word order, making it challenging to distinguish sentences with significant overlaps of similar words, even if they are semantically very different. Here, we attempt to improve WMD by incorporating the sentence structure represented by BERT's self-attention matrix (SAM). The proposed method is based on the Fused Gromov-Wasserstein distance, which simultaneously considers the similarity of the word embedding and the SAM for calculating the optimal transport between two sentences. Experiments demonstrate the proposed method enhances WMD and its variants in paraphrase identification with near-equivalent performance in semantic textual similarity. Our code is available at \url{//github.com/ymgw55/WSMD}.
Functions with singularities are notoriously difficult to approximate with conventional approximation schemes. In computational applications they are often resolved with low-order piecewise polynomials, multilevel schemes or other types of grading strategies. Rational functions are an exception to this rule: for univariate functions with point singularities, such as branch points, rational approximations exist with root-exponential convergence in the rational degree. This is typically enabled by the clustering of poles near the singularity. Both the theory and computational practice of rational functions for function approximation have focused on the univariate case, with extensions to two dimensions via identification with the complex plane. Multivariate rational functions, i.e., quotients of polynomials of several variables, are relatively unexplored in comparison. Yet, apart from a steep increase in theoretical complexity, they also offer a wealth of opportunities. A first observation is that singularities of multivariate rational functions may be continuous curves of poles, rather than isolated ones. By generalizing the clustering of poles from points to curves, we explore constructions of multivariate rational approximations to functions with curves of singularities.
This study investigates the misclassification excess risk bound in the context of 1-bit matrix completion, a significant problem in machine learning involving the recovery of an unknown matrix from a limited subset of its entries. Matrix completion has garnered considerable attention in the last two decades due to its diverse applications across various fields. Unlike conventional approaches that deal with real-valued samples, 1-bit matrix completion is concerned with binary observations. While prior research has predominantly focused on the estimation error of proposed estimators, our study shifts attention to the prediction error. This paper offers theoretical analysis regarding the prediction errors of two previous works utilizing the logistic regression model: one employing a max-norm constrained minimization and the other employing nuclear-norm penalization. Significantly, our findings demonstrate that the latter achieves the minimax-optimal rate without the need for an additional logarithmic term. These novel results contribute to a deeper understanding of 1-bit matrix completion by shedding light on the predictive performance of specific methodologies.
Every sufficiently big matrix with small spectral norm has a nearby low-rank matrix if the distance is measured in the maximum norm (Udell \& Townsend, SIAM J Math Data Sci, 2019). We use the Hanson--Wright inequality to improve the estimate of the distance for matrices with incoherent column and row spaces. In numerical experiments with several classes of matrices we study how well the theoretical upper bound describes the approximation errors achieved with the method of alternating projections.
We derive and analyze a symmetric interior penalty discontinuous Galerkin scheme for the approximation of the second-order form of the radiative transfer equation in slab geometry. Using appropriate trace lemmas, the analysis can be carried out as for more standard elliptic problems. Supporting examples show the accuracy and stability of the method also numerically, for different polynomial degrees. For discretization, we employ quad-tree grids, which allow for local refinement in phase-space, and we show exemplary that adaptive methods can efficiently approximate discontinuous solutions. We investigate the behavior of hierarchical error estimators and error estimators based on local averaging.
This work puts forth low-complexity Riemannian subspace descent algorithms for the minimization of functions over the symmetric positive definite (SPD) manifold. Different from the existing Riemannian gradient descent variants, the proposed approach utilizes carefully chosen subspaces that allow the update to be written as a product of the Cholesky factor of the iterate and a sparse matrix. The resulting updates avoid the costly matrix operations like matrix exponentiation and dense matrix multiplication, which are generally required in almost all other Riemannian optimization algorithms on SPD manifold. We further identify a broad class of functions, arising in diverse applications, such as kernel matrix learning, covariance estimation of Gaussian distributions, maximum likelihood parameter estimation of elliptically contoured distributions, and parameter estimation in Gaussian mixture model problems, over which the Riemannian gradients can be calculated efficiently. The proposed uni-directional and multi-directional Riemannian subspace descent variants incur per-iteration complexities of $O(n)$ and $O(n^2)$ respectively, as compared to the $O(n^3)$ or higher complexity incurred by all existing Riemannian gradient descent variants. The superior runtime and low per-iteration complexity of the proposed algorithms is also demonstrated via numerical tests on large-scale covariance estimation and matrix square root problems. MATLAB code implementation is publicly available on GitHub : //github.com/yogeshd-iitk/subspace_descent_over_SPD_manifold
Quantum computing becomes more of a reality as time passes, bringing several cybersecurity challenges. Modern cryptography is based on the computational complexity of specific mathematical problems, but as new quantum-based computers appear, classical methods might not be enough to secure communications. In this paper, we analyse the state of the Galileo Open Service Navigation Message Authentication (OSNMA) to overcome these new threats. This analysis and its assessment have been performed using OSNMA documentation, reviewing the available Post Quantum Cryptography (PQC) algorithms competing in the National Institute of Standards and Technology (NIST) standardization process, and studying the possibility of its implementation in the Galileo service. The main barrier to adopting the PQC approach is the size of both the signature and the key. The analysis shows that OSNMA is not yet prepared to face the quantum threat, and a significant change would be required. This work concludes by assessing different temporal countermeasures that can be implemented to sustain the system's integrity in the short term.
By incorporating a new matrix splitting and the momentum acceleration into the relaxed-based matrix splitting (RMS) method \cite{soso2023}, a generalization of the RMS (GRMS) iterative method for solving the generalized absolute value equations (GAVEs) is proposed. Unlike some existing methods, by using the Cauchy's convergence principle, we give some sufficient conditions for the existence and uniqueness of the solution to the GAVEs and prove that our method can converge to the unique solution of the GAVEs. Moreover, we can obtain a few new and weaker convergence conditions for some existing methods. Preliminary numerical experiments show that the proposed method is efficient.
We describe and analyze a quasi-Trefftz DG method for solving boundary value problems for the homogeneous diffusion-advection-reaction equation with piecewise-smooth coefficients. Trefftz schemes are high-order Galerkin methods whose discrete functions are elementwise exact solutions of the underlying PDE. Trefftz basis functions can be computed for many PDEs that are linear, homogeneous and with piecewise-constant coefficients. However, if the equation has varying coefficients, in general, exact solutions are unavailable, hence the construction of discrete Trefftz spaces is impossible. Quasi-Trefftz methods have been introduced to overcome this limitation, relying on discrete spaces of functions that are elementwise "approximate solutions" of the PDE. A space-time quasi-Trefftz DG method for the acoustic wave equation with smoothly varying coefficients has recently been studied; since it has shown excellent results, we propose a related method that can be applied to second-order elliptic equations. The DG weak formulation is derived using an interior penalty parameter and the upwind numerical fluxes. We choose polynomial quasi-Trefftz basis functions, whose coefficients can be computed with a simple algorithm based on the Taylor expansion of the PDE's coefficients. The main advantage of Trefftz and quasi-Trefftz schemes over more classical ones is the higher accuracy for comparable numbers of degrees of freedom. We prove that the dimension of the quasi-Trefftz space is smaller than the dimension of the full polynomial space of the same degree and that yields the same optimal convergence rates. The quasi-Trefftz DG method is well-posed, consistent and stable and we prove its high-order convergence. We present some numerical experiments in two dimensions that show excellent properties in terms of approximation and convergence rate.
This paper presents an approach for efficiently approximating the inverse of Fisher information, a key component in variational Bayes inference. A notable aspect of our approach is the avoidance of analytically computing the Fisher information matrix and its explicit inversion. Instead, we introduce an iterative procedure for generating a sequence of matrices that converge to the inverse of Fisher information. The natural gradient variational Bayes algorithm without matrix inversion is provably convergent and achieves a convergence rate of order O(log s/s), with s the number of iterations. We also obtain a central limit theorem for the iterates. Our algorithm exhibits versatility, making it applicable across a diverse array of variational Bayes domains, including Gaussian approximation and normalizing flow Variational Bayes. We offer a range of numerical examples to demonstrate the efficiency and reliability of the proposed variational Bayes method.
The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard's universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants' predictions of the AI.