We prove mean-square convergence of a novel numerical method, the tamed-splitting method, for a generalized Ait-Sahalia interest rate model. The method is based on a Lamperti transform, splitting and applying a tamed numerical method for the nonlinearity. The main difficulty in the analysis is caused by the non-globally Lipschitz drift coefficients of the model. We examine the existence, uniqueness of the solution and boundedness of moments for the transformed SDE.We then prove bounded moments and inverses moments for the numerical approximation. The tamed-splitting method is a hybrid method in the sense that a backstop method is invoked to prevent solutions from overshooting zero and becoming negative. We successfully recover the mean-square convergence rate of order one for the tamed-splitting method. In addition we prove that the probability of ever needing the backstop method to prevent a negative value can be made arbitrarily small. In our numerical experiments we compare to other numerical methods in the literature for realistic parameter values.
The Multiple-try Metropolis (MTM) method is an interesting extension of the classical Metropolis-Hastings algorithm. However, theoretical understandings of its convergence behavior as well as whether and how it may help are still unknown. This paper derives the exact convergence rate for Multiple-try Metropolis Independent sampler (MTM-IS) via an explicit eigen analysis. As a by-product, we prove that MTM-IS is less efficient than the simpler approach of repeated independent Metropolis-Hastings method at the same computational cost. We further explore more variations and find it possible to design more efficient MTM algorithms by creating correlated multiple trials.
Partition of unity methods (PUM) are of domain decomposition type and provide the opportunity for multiscale and multiphysics numerical modeling. Different physical models can exist within a PUM scheme for handling problems with zones of linear elasticity and zones where fractures occur. Here, the peridynamic (PD) model is used in regions of fracture and smooth PUM is used in the surrounding linear elastic media. The method is a so-called global-local enrichment strategy. The elastic fields of the undamaged media provide appropriate boundary data for the localized PD simulations. The first steps for a combined PD/PUM simulator are presented. In part I of this series, we show that the local PD approximation can be utilized to enrich the global PUM approximation to capture the true material response with high accuracy efficiently. Test problems are provided demonstrating the validity and potential of this numerical approach.
In this paper, we introduce a new simple approach to developing and establishing the convergence of splitting methods for a large class of stochastic differential equations (SDEs), including additive, diagonal and scalar noise types. The central idea is to view the splitting method as a replacement of the driving signal of an SDE, namely Brownian motion and time, with a piecewise linear path that yields a sequence of ODEs $-$ which can be discretised to produce a numerical scheme. This new way of understanding splitting methods is inspired by, but does not use, rough path theory. We show that when the driving piecewise linear path matches certain iterated stochastic integrals of Brownian motion, then a high order splitting method can be obtained. We propose a general proof methodology for establishing the strong convergence of these approximations that is akin to the general framework of Milstein and Tretyakov. That is, once local error estimates are obtained for the splitting method, then a global rate of convergence follows. This approach can then be readily applied in future research on SDE splitting methods. By incorporating recently developed approximations for iterated integrals of Brownian motion into these piecewise linear paths, we propose several high order splitting methods for SDEs satisfying a certain commutativity condition. In our experiments, which include the Cox-Ingersoll-Ross model and additive noise SDEs (noisy anharmonic oscillator, stochastic FitzHugh-Nagumo model, underdamped Langevin dynamics), the new splitting methods exhibit convergence rates of $O(h^{3/2})$ and outperform schemes previously proposed in the literature.
In a recurrent events setting, we introduce a new score designed to evaluate the prediction ability, for a given model, of the expected cumulative number of recurrent events. This score allows to take into account the individual history of a patient through its external covariates and can be seen as an extension of the Brier Score for single time to event data but works for recurrent events with or without a terminal event. Theoretical results are provided that show that under standard assumptions in a recurrent event context, our score can be asymptotically decomposed as the sum of the theoretical mean squared error between the model and the true expected cumulative number of recurrent events and an inseparability term that does not depend on the model. This decomposition is further illustrated on simulations studies. It is also shown that this score should be used in comparison with a null model, such as a nonparametric estimator that does not include the covariates. Finally, the score is applied for the prediction of hospitalisations on a dataset of patients suffering from atrial fibrillation and a comparison of the predictions performance of different models, such as the Cox model or the Aalen Model, is investigated.
A Deep Neural Network (DNN) is a composite function of vector-valued functions, and in order to train a DNN, it is necessary to calculate the gradient of the loss function with respect to all parameters. This calculation can be a non-trivial task because the loss function of a DNN is a composition of several nonlinear functions, each with numerous parameters. The Backpropagation (BP) algorithm leverages the composite structure of the DNN to efficiently compute the gradient. As a result, the number of layers in the network does not significantly impact the complexity of the calculation. The objective of this paper is to express the gradient of the loss function in terms of a matrix multiplication using the Jacobian operator. This can be achieved by considering the total derivative of each layer with respect to its parameters and expressing it as a Jacobian matrix. The gradient can then be represented as the matrix product of these Jacobian matrices. This approach is valid because the chain rule can be applied to a composition of vector-valued functions, and the use of Jacobian matrices allows for the incorporation of multiple inputs and outputs. By providing concise mathematical justifications, the results can be made understandable and useful to a broad audience from various disciplines.
In the Colored Clustering problem, one is asked to cluster edge-colored (hyper-)graphs whose colors represent interaction types. More specifically, the goal is to select as many edges as possible without choosing two edges that share an endpoint and are colored differently. Equivalently, the goal can also be described as assigning colors to the vertices in a way that fits the edge-coloring as well as possible. As this problem is NP-hard, we build on previous work by studying its parameterized complexity. We give a $2^{\mathcal O(k)} \cdot n^{\mathcal O(1)}$-time algorithm where $k$ is the number of edges to be selected and $n$ the number of vertices. We also prove the existence of a problem kernel of size $\mathcal O(k^{5/2} )$, resolving an open problem posed in the literature. We consider parameters that are smaller than $k$, the number of edges to be selected, and $r$, the number of edges that can be deleted. Such smaller parameters are obtained by considering the difference between $k$ or $r$ and some lower bound on these values. We give both algorithms and lower bounds for Colored Clustering with such parameterizations. Finally, we settle the parameterized complexity of Colored Clustering with respect to structural graph parameters by showing that it is $W[1]$-hard with respect to both vertex cover number and tree-cut width, but fixed-parameter tractable with respect to slim tree-cut width.
Achieving gender equality is an important pillar for humankind's sustainable future. Pioneering data-driven gender bias research is based on large-scale public records such as scientific papers, patents, and company registrations, covering female researchers, inventors and entrepreneurs, and so on. Since gender information is often missing in relevant datasets, studies rely on tools to infer genders from names. However, available open-sourced Chinese gender-guessing tools are not yet suitable for scientific purposes, which may be partially responsible for female Chinese being underrepresented in mainstream gender bias research and affect their universality. Specifically, these tools focus on character-level information while overlooking the fact that the combinations of Chinese characters in multi-character names, as well as the components and pronunciations of characters, convey important messages. As a first effort, we design a Chinese Heterogeneous Graph Attention (CHGAT) model to capture the heterogeneity in component relationships and incorporate the pronunciations of characters. Our model largely surpasses current tools and also outperforms the state-of-the-art algorithm. Last but not least, the most popular Chinese name-gender dataset is single-character based with far less female coverage from an unreliable source, naturally hindering relevant studies. We open-source a more balanced multi-character dataset from an official source together with our code, hoping to help future research promoting gender equality.
Regression models that ignore measurement error in predictors may produce highly biased estimates leading to erroneous inferences. It is well known that it is extremely difficult to take measurement error into account in Gaussian nonparametric regression. This problem becomes tremendously more difficult when considering other families such as logistic regression, Poisson and negative-binomial. For the first time, we present a method aiming to correct for measurement error when estimating regression functions flexibly covering virtually all distributions and link functions regularly considered in generalized linear models. This approach depends on approximating the first and the second moment of the response after integrating out the true unobserved predictors in a semiparametric generalized linear model. Unlike previous methods, this method is not restricted to truncated splines and can utilize various basis functions. Through extensive simulation studies, we study the performance of our method under many scenarios.
In recent years, Graph Neural Networks have reported outstanding performance in tasks like community detection, molecule classification and link prediction. However, the black-box nature of these models prevents their application in domains like health and finance, where understanding the models' decisions is essential. Counterfactual Explanations (CE) provide these understandings through examples. Moreover, the literature on CE is flourishing with novel explanation methods which are tailored to graph learning. In this survey, we analyse the existing Graph Counterfactual Explanation methods, by providing the reader with an organisation of the literature according to a uniform formal notation for definitions, datasets, and metrics, thus, simplifying potential comparisons w.r.t to the method advantages and disadvantages. We discussed seven methods and sixteen synthetic and real datasets providing details on the possible generation strategies. We highlight the most common evaluation strategies and formalise nine of the metrics used in the literature. We first introduce the evaluation framework GRETEL and how it is possible to extend and use it while providing a further dimension of comparison encompassing reproducibility aspects. Finally, we provide a discussion on how counterfactual explanation interplays with privacy and fairness, before delving into open challenges and future works.
This paper surveys and organizes research works in a new paradigm in natural language processing, which we dub "prompt-based learning". Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P(y|x), prompt-based learning is based on language models that model the probability of text directly. To use these models to perform prediction tasks, the original input x is modified using a template into a textual string prompt x' that has some unfilled slots, and then the language model is used to probabilistically fill the unfilled information to obtain a final string x, from which the final output y can be derived. This framework is powerful and attractive for a number of reasons: it allows the language model to be pre-trained on massive amounts of raw text, and by defining a new prompting function the model is able to perform few-shot or even zero-shot learning, adapting to new scenarios with few or no labeled data. In this paper we introduce the basics of this promising paradigm, describe a unified set of mathematical notations that can cover a wide variety of existing work, and organize existing work along several dimensions, e.g.the choice of pre-trained models, prompts, and tuning strategies. To make the field more accessible to interested beginners, we not only make a systematic review of existing works and a highly structured typology of prompt-based concepts, but also release other resources, e.g., a website //pretrain.nlpedia.ai/ including constantly-updated survey, and paperlist.