In this work, we propose a simple yet generic preconditioned Krylov subspace method for a large class of nonsymmetric block Toeplitz all-at-once systems arising from discretizing evolutionary partial differential equations. Namely, our main result is to propose two novel symmetric positive definite preconditioners, which can be efficiently diagonalized by the discrete sine transform matrix. More specifically, our approach is to first permute the original linear system to obtain a symmetric one, and subsequently develop desired preconditioners based on the spectral symbol of the modified matrix. Then, we show that the eigenvalues of the preconditioned matrix sequences are clustered around $\pm 1$, which entails rapid convergence when the minimal residual method is devised. Alternatively, when the conjugate gradient method on the normal equations is used, we show that our preconditioner is effective in the sense that the eigenvalues of the preconditioned matrix sequence are clustered around unity. An extension of our proposed preconditioned method is given for high-order backward difference time discretization schemes, which can be applied on a wide range of time-dependent equations. Numerical examples are given, also in the variable-coefficient setting, to demonstrate the effectiveness of our proposed preconditioners, which consistently outperforms an existing block circulant preconditioner discussed in the relevant literature.
With the emergence of Machine Learning, there has been a surge in leveraging its capabilities for problem-solving across various domains. In the code clone realm, the identification of type-4 or semantic clones has emerged as a crucial yet challenging task. Researchers aim to utilize Machine Learning to tackle this challenge, often relying on the BigCloneBench dataset. However, it's worth noting that BigCloneBench, originally not designed for semantic clone detection, presents several limitations that hinder its suitability as a comprehensive training dataset for this specific purpose. Furthermore, CLCDSA dataset suffers from a lack of reusable examples aligning with real-world software systems, rendering it inadequate for cross-language clone detection approaches. In this work, we present a comprehensive semantic clone and cross-language clone benchmark, GPTCloneBench by exploiting SemanticCloneBench and OpenAI's GPT-3 model. In particular, using code fragments from SemanticCloneBench as sample inputs along with appropriate prompt engineering for GPT-3 model, we generate semantic and cross-language clones for these specific fragments and then conduct a combination of extensive manual analysis, tool-assisted filtering, functionality testing and automated validation in building the benchmark. From 79,928 clone pairs of GPT-3 output, we created a benchmark with 37,149 true semantic clone pairs, 19,288 false semantic pairs(Type-1/Type-2), and 20,770 cross-language clones across four languages (Java, C, C#, and Python). Our benchmark is 15-fold larger than SemanticCloneBench, has more functional code examples for software systems and programming language support than CLCDSA, and overcomes BigCloneBench's qualities, quantification, and language variety limitations.
We develop new tools to study landscapes in nonconvex optimization. Given one optimization problem, we pair it with another by smoothly parametrizing the domain. This is either for practical purposes (e.g., to use smooth optimization algorithms with good guarantees) or for theoretical purposes (e.g., to reveal that the landscape satisfies a strict saddle property). In both cases, the central question is: how do the landscapes of the two problems relate? More precisely: how do desirable points such as local minima and critical points in one problem relate to those in the other problem? A key finding in this paper is that these relations are often determined by the parametrization itself, and are almost entirely independent of the cost function. Accordingly, we introduce a general framework to study parametrizations by their effect on landscapes. The framework enables us to obtain new guarantees for an array of problems, some of which were previously treated on a case-by-case basis in the literature. Applications include: optimizing low-rank matrices and tensors through factorizations; solving semidefinite programs via the Burer-Monteiro approach; training neural networks by optimizing their weights and biases; and quotienting out symmetries.
We provide a new sequent calculus that enjoys syntactic cut-elimination and strongly terminating backward proof search for the intuitionistic Strong L\"ob logic $\sf{iSL}$, an intuitionistic modal logic with a provability interpretation. A novel measure on sequents is used to prove both the termination of the naive backward proof search strategy, and the admissibility of cut in a syntactic and direct way, leading to a straightforward cut-elimination procedure. All proofs have been formalised in the interactive theorem prover Coq.
Deep neural networks have shown remarkable performance when trained on independent and identically distributed data from a fixed set of classes. However, in real-world scenarios, it can be desirable to train models on a continuous stream of data where multiple classification tasks are presented sequentially. This scenario, known as Continual Learning (CL) poses challenges to standard learning algorithms which struggle to maintain knowledge of old tasks while learning new ones. This stability-plasticity dilemma remains central to CL and multiple metrics have been proposed to adequately measure stability and plasticity separately. However, none considers the increasing difficulty of the classification task, which inherently results in performance loss for any model. In that sense, we analyze some limitations of current metrics and identify the presence of setup-induced forgetting. Therefore, we propose new metrics that account for the task's increasing difficulty. Through experiments on benchmark datasets, we demonstrate that our proposed metrics can provide new insights into the stability-plasticity trade-off achieved by models in the continual learning environment.
The proximal Galerkin finite element method is a high-order, low iteration complexity, nonlinear numerical method that preserves the geometric and algebraic structure of bound constraints in infinite-dimensional function spaces. This paper introduces the proximal Galerkin method and applies it to solve free boundary problems, enforce discrete maximum principles, and develop scalable, mesh-independent algorithms for optimal design. The paper leads to a derivation of the latent variable proximal point (LVPP) algorithm: an unconditionally stable alternative to the interior point method. LVPP is an infinite-dimensional optimization algorithm that may be viewed as having an adaptive barrier function that is updated with a new informative prior at each (outer loop) optimization iteration. One of the main benefits of this algorithm is witnessed when analyzing the classical obstacle problem. Therein, we find that the original variational inequality can be replaced by a sequence of semilinear partial differential equations (PDEs) that are readily discretized and solved with, e.g., high-order finite elements. Throughout this work, we arrive at several unexpected contributions that may be of independent interest. These include (1) a semilinear PDE we refer to as the entropic Poisson equation; (2) an algebraic/geometric connection between high-order positivity-preserving discretizations and certain infinite-dimensional Lie groups; and (3) a gradient-based, bound-preserving algorithm for two-field density-based topology optimization. The complete latent variable proximal Galerkin methodology combines ideas from nonlinear programming, functional analysis, tropical algebra, and differential geometry and can potentially lead to new synergies among these areas as well as within variational and numerical analysis.
In this paper, we propose a human trajectory prediction model that combines a Long Short-Term Memory (LSTM) network with an attention mechanism. To do that, we use attention scores to determine which parts of the input data the model should focus on when making predictions. Attention scores are calculated for each input feature, with a higher score indicating the greater significance of that feature in predicting the output. Initially, these scores are determined for the target human position, velocity, and their neighboring individual's positions and velocities. By using attention scores, our model can prioritize the most relevant information in the input data and make more accurate predictions. We extract attention scores from our attention mechanism and integrate them into the trajectory prediction module to predict human future trajectories. To achieve this, we introduce a new neural layer that processes attention scores after extracting them and concatenates them with positional information. We evaluate our approach on the publicly available ETH and UCY datasets and measure its performance using the final displacement error (FDE) and average displacement error (ADE) metrics. We show that our modified algorithm performs better than the Social LSTM in predicting the future trajectory of pedestrians in crowded spaces. Specifically, our model achieves an improvement of 6.2% in ADE and 6.3% in FDE compared to the Social LSTM results in the literature.
We introduce new control-volume finite-element discretization schemes suitable for solving the Stokes problem. Within a common framework, we present different approaches for constructing such schemes. The first and most established strategy employs a non-overlapping partitioning into control volumes. The second represents a new idea by splitting into two sets of control volumes, the first set yielding a partition of the domain and the second containing the remaining overlapping control volumes required for stability. The third represents a hybrid approach where finite volumes are combined with finite elements based on a hierarchical splitting of the ansatz space. All approaches are based on typical finite element function spaces but yield locally mass and momentum conservative discretization schemes that can be interpreted as finite volume schemes. We apply all strategies to the inf-sub stable MINI finite-element pair. Various test cases, including convergence tests and the numerical observation of the boundedness of the number of preconditioned Krylov solver iterations, as well as more complex scenarios of flow around obstacles or through a three-dimensional vessel bifurcation, demonstrate the stability and robustness of the schemes.
Hawkes processes are often applied to model dependence and interaction phenomena in multivariate event data sets, such as neuronal spike trains, social interactions, and financial transactions. In the nonparametric setting, learning the temporal dependence structure of Hawkes processes is generally a computationally expensive task, all the more with Bayesian estimation methods. In particular, for generalised nonlinear Hawkes processes, Monte-Carlo Markov Chain methods applied to compute the doubly intractable posterior distribution are not scalable to high-dimensional processes in practice. Recently, efficient algorithms targeting a mean-field variational approximation of the posterior distribution have been proposed. In this work, we first unify existing variational Bayes approaches under a general nonparametric inference framework, and analyse the asymptotic properties of these methods under easily verifiable conditions on the prior, the variational class, and the nonlinear model. Secondly, we propose a novel sparsity-inducing procedure, and derive an adaptive mean-field variational algorithm for the popular sigmoid Hawkes processes. Our algorithm is parallelisable and therefore computationally efficient in high-dimensional setting. Through an extensive set of numerical simulations, we also demonstrate that our procedure is able to adapt to the dimensionality of the parameter of the Hawkes process, and is partially robust to some type of model mis-specification.
The aim of this paper is to combine several Ivev-like modal systems characterized by 4-valued non-deterministic matrices (Nmatrices) with IDM4, a 4-valued expansion of Belnap-Dunn's logic FDE with an implication introduced by Pynko in 1999. In order to to this, we introduce a new methodology for combining logics which are characterized by means of swap structures, based on what we call superposition of snapshots. In particular, the combination of IDM4 with Tm, the 4-valued Ivlev's version of KT, will be analyzed with more details. From the semantical perspective, the idea is to combine the 4-valued swap structures (Nmatrices) for Tm (and several of its extensions) with the 4-valued twist structure (logical matrix) for IDM4. This superposition produces a universe of 6 snapshots, with 3 of them being designated. The multioperators over the new universe are defined by combining the specifications of the given swap and twist structures. This gives origin to 6 different paradefinite Ivlev-like modal logics, each one of them characterized by a 6-valued Nmatrix, and conservatively extending the original modal logic and IDM4. This important feature allows us to consider the proposed construction as a genuine technique for combining logics. In addition, it is possible to define in the combined logics a classicality operator in the sense of logics of evidence and truth (LETs). A sound and complete Hilbert-style axiomatization is also presented for the 6 combined systems, as well as a very simple Prolog program which implements the swap structures semantics for the 6 systems, which gives a decision procedure for satisfiability, refutability and validity of formulas in these logics.
In this paper, to the best of our knowledge, we make the first attempt at studying the parametric semilinear elliptic eigenvalue problems with the parametric coefficient and some power-type nonlinearities. The parametric coefficient is assumed to have an affine dependence on the countably many parameters with an appropriate class of sequences of functions. In this paper, we obtain the upper bound estimation for the mixed derivatives of the ground eigenpairs that has the same form obtained recently for the linear eigenvalue problem. The three most essential ingredients for this estimation are the parametric analyticity of the ground eigenpairs, the uniform boundedness of the ground eigenpairs, and the uniform positive differences between ground eigenvalues of linear operators. All these three ingredients need new techniques and a careful investigation of the nonlinear eigenvalue problem that will be presented in this paper. As an application, considering each parameter as a uniformly distributed random variable, we estimate the expectation of the eigenpairs using a randomly shifted quasi-Monte Carlo lattice rule and show the dimension-independent error bound.