In this paper, we propose a systematic approach for accelerating finite element-type methods by machine learning for the numerical solution of partial differential equations (PDEs). The main idea is to use a neural network to learn the solution map of the PDEs and to do so in an element-wise fashion. This map takes input of the element geometry and the PDEs' parameters on that element, and gives output of two operators -- (1) the in2out operator for inter-element communication, and (2) the in2sol operator (Green's function) for element-wise solution recovery. A significant advantage of this approach is that, once trained, this network can be used for the numerical solution of the PDE for any domain geometry and any parameter distribution without retraining. Also, the training is significantly simpler since it is done on the element level instead on the entire domain. We call this approach element learning. This method is closely related to hybridizbale discontinuous Galerkin (HDG) methods in the sense that the local solvers of HDG are replaced by machine learning approaches. Numerical tests are presented for an example PDE, the radiative transfer equation, in a variety of scenarios with idealized or realistic cloud fields, with smooth or sharp gradient in the cloud boundary transition. Under a fixed accuracy level of $10^{-3}$ in the relative $L^2$ error, and polynomial degree $p=6$ in each element, we observe an approximately 5 to 10 times speed-up by element learning compared to a classical finite element-type method.
In this article we discuss how abstraction boundaries can help tame complexity in mathematical research, with the help of an interactive theorem prover. While many of the ideas we present here have been used implicitly by mathematicians for some time, we argue that the use of an interactive theorem prover introduces additional qualitative benefits in the implementation of these ideas.
Reductionism is a viable strategy for designing and implementing practical programming languages, leading to solutions which are easier to extend, experiment with and formally analyze. We formally specify and implement an extensible programming language, based on a minimalistic first-order imperative core language plus strong abstraction mechanisms, reflection and self-modification features. The language can be extended to very high levels: by using Lisp-style macros and code-to-code transforms which automatically rewrite high-level expressions into core forms, we define closures and first-class continuations on top of the core. Non-self-modifying programs can be analyzed and formally reasoned upon, thanks to the language simple semantics. We formally develop a static analysis and prove a soundness property with respect to the dynamic semantics. We develop a parallel garbage collector suitable to multi-core machines to permit efficient execution of parallel programs.
Reinforcement learning (RL) algorithms face the challenge of limited data efficiency, particularly when dealing with high-dimensional state spaces and large-scale problems. Most RL methods often rely solely on state transition information within the same episode when updating the agent's Critic, which can lead to low data efficiency and sub-optimal training time consumption. Inspired by human-like analogical reasoning abilities, we introduce a novel mesh information propagation mechanism, termed the 'Imagination Mechanism (IM)', designed to significantly enhance the data efficiency of RL algorithms. Specifically, IM enables information generated by a single sample to be effectively broadcasted to different states, instead of simply transmitting in the same episode and it allows the model to better understand the interdependencies between states and learn scarce sample information more efficiently. To promote versatility, we extend the imagination mechanism to function as a plug-and-play module that can be seamlessly and fluidly integrated into other widely adopted RL models. Our experiments demonstrate that Imagination mechanism consistently boosts four mainstream SOTA RL-algorithms, such as SAC, PPO, DDPG, and DQN, by a considerable margin, ultimately leading to superior performance than before across various tasks. For access to our code and data, please visit //github.com/Zero-coder/FECAM.
For randomized trials that use text as an outcome, traditional approaches for assessing treatment impact require that each document first be manually coded for constructs of interest by trained human raters. This process, the current standard, is both time-consuming and limiting: even the largest human coding efforts are typically constrained to measure only a small set of dimensions across a subsample of available texts. In this work, we present an inferential framework that can be used to increase the power of an impact assessment, given a fixed human-coding budget, by taking advantage of any ``untapped" observations -- those documents not manually scored due to time or resource constraints -- as a supplementary resource. Our approach, a methodological combination of causal inference, survey sampling methods, and machine learning, has four steps: (1) select and code a sample of documents; (2) build a machine learning model to predict the human-coded outcomes from a set of automatically extracted text features; (3) generate machine-predicted scores for all documents and use these scores to estimate treatment impacts; and (4) adjust the final impact estimates using the residual differences between human-coded and machine-predicted outcomes. As an extension to this approach, we also develop a strategy for identifying an optimal subset of documents to code in Step 1 in order to further enhance precision. Through an extensive simulation study based on data from a recent field trial in education, we show that our proposed approach can be used to reduce the scope of a human-coding effort while maintaining nominal power to detect a significant treatment impact.
In this paper, we consider objective Bayesian inference of the generalized exponential distribution using the independence Jeffreys prior and validate the propriety of the posterior distribution under a family of structured priors. We propose an efficient sampling algorithm via the generalized ratio-of-uniforms method to draw samples for making posterior inference. We carry out simulation studies to assess the finite-sample performance of the proposed Bayesian approach. Finally, a real-data application is provided for illustrative purposes.
In this paper, we present a polynomial-complexity algorithm to construct a special orthogonal matrix for the deterministic remote state preparation (DRSP) of an arbitrary n-qubit state, and prove that if n>3, such matrices do not exist. Firstly, the construction problem is split into two sub-problems, i.e., finding a solution of a semi-orthogonal matrix and generating all semi-orthogonal matrices. Through giving the definitions and properties of the matching operators, it is proved that the orthogonality of a special matrix is equivalent to the cooperation of multiple matching operators, and then the construction problem is reduced to the problem of solving an XOR linear equation system, which reduces the construction complexity from exponential to polynomial level. Having proved that each semi-orthogonal matrix can be simplified into a unique form, we use the proposed algorithm to confirm that the unique form does not have any solution when n>3, which means it is infeasible to construct such a special orthogonal matrix for the DRSP of an arbitrary n-qubit state.
In this paper, we develop a novel efficient and robust nonparametric regression estimator under a framework of feedforward neural network. There are several interesting characteristics for the proposed estimator. First, the loss function is built upon an estimated maximum likelihood function, who integrates the information from observed data, as well as the information from data structure. Consequently, the resulting estimator has desirable optimal properties, such as efficiency. Second, different from the traditional maximum likelihood estimation (MLE), the proposed method avoid the specification of the distribution, hence is flexible to any kind of distribution, such as heavy tails, multimodal or heterogeneous distribution. Third, the proposed loss function relies on probabilities rather than direct observations as in least squares, contributing the robustness in the proposed estimator. Finally, the proposed loss function involves nonparametric regression function only. This enables a direct application of existing packages, simplifying the computation and programming. We establish the large sample property of the proposed estimator in terms of its excess risk and minimax near-optimal rate. The theoretical results demonstrate that the proposed estimator is equivalent to the true MLE in which the density function is known. Our simulation studies show that the proposed estimator outperforms the existing methods in terms of prediction accuracy, efficiency and robustness. Particularly, it is comparable to the true MLE, and even gets better as the sample size increases. This implies that the adaptive and data-driven loss function from the estimated density may offer an additional avenue for capturing valuable information. We further apply the proposed method to four real data examples, resulting in significantly reduced out-of-sample prediction errors compared to existing methods.
Incorporating prior knowledge into pre-trained language models has proven to be effective for knowledge-driven NLP tasks, such as entity typing and relation extraction. Current pre-training procedures usually inject external knowledge into models by using knowledge masking, knowledge fusion and knowledge replacement. However, factual information contained in the input sentences have not been fully mined, and the external knowledge for injecting have not been strictly checked. As a result, the context information cannot be fully exploited and extra noise will be introduced or the amount of knowledge injected is limited. To address these issues, we propose MLRIP, which modifies the knowledge masking strategies proposed by ERNIE-Baidu, and introduce a two-stage entity replacement strategy. Extensive experiments with comprehensive analyses illustrate the superiority of MLRIP over BERT-based models in military knowledge-driven NLP tasks.
When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.
Machine-learning models have demonstrated great success in learning complex patterns that enable them to make predictions about unobserved data. In addition to using models for prediction, the ability to interpret what a model has learned is receiving an increasing amount of attention. However, this increased focus has led to considerable confusion about the notion of interpretability. In particular, it is unclear how the wide array of proposed interpretation methods are related, and what common concepts can be used to evaluate them. We aim to address these concerns by defining interpretability in the context of machine learning and introducing the Predictive, Descriptive, Relevant (PDR) framework for discussing interpretations. The PDR framework provides three overarching desiderata for evaluation: predictive accuracy, descriptive accuracy and relevancy, with relevancy judged relative to a human audience. Moreover, to help manage the deluge of interpretation methods, we introduce a categorization of existing techniques into model-based and post-hoc categories, with sub-groups including sparsity, modularity and simulatability. To demonstrate how practitioners can use the PDR framework to evaluate and understand interpretations, we provide numerous real-world examples. These examples highlight the often under-appreciated role played by human audiences in discussions of interpretability. Finally, based on our framework, we discuss limitations of existing methods and directions for future work. We hope that this work will provide a common vocabulary that will make it easier for both practitioners and researchers to discuss and choose from the full range of interpretation methods.