Recently, symbolic regression (SR) has demonstrated its efficiency for discovering basic governing relations in physical systems. A major impact can be potentially achieved by coupling symbolic regression with asymptotic methodology. The main advantage of asymptotic approach involves the robust approximation to the sought for solution bringing a clear idea of the effect of problem parameters. However, the analytic derivation of the asymptotic series is often highly nontrivial especially, when the exact solution is not available. In this paper, we adapt SR methodology to discover asymptotic series. As an illustration we consider three problem in mechanics, including two-mass collision, viscoelastic behavior of a Kelvin-Voigt solid and propagation of Rayleigh-Lamb waves. The training data is generated from the explicit exact solutions of these problems. The obtained SR results are compared to the benchmark asymptotic expansions of the above mentioned exact solutions. Both convergent and divergent asymptotic series are considered. A good agreement between SR expansions and analytical results is observed. It is demonstrated that the proposed approach can be used to identify material parameters, e.g. Poisson's ratio, and has high prospects for utilizing experimental and numerical data.
We study the econometric properties of so-called donut regression discontinuity (RD) designs, a robustness exercise which involves repeating estimation and inference without the data points in some area around the treatment threshold. This approach is often motivated by concerns that possible systematic sorting of units, or similar data issues, in some neighborhood of the treatment threshold might distort estimation and inference of RD treatment effects. We show that donut RD estimators can have substantially larger bias and variance than contentional RD estimators, and that the corresponding confidence intervals can be substantially longer. We also provide a formal testing framework for comparing donut and conventional RD estimation results.
Railway infrastructure requires effective maintenance to ensure safe and comfortable transportation. Among the various degradation modes, track geometry deformation caused by repeated loading is a critical mechanism impacting operational safety. Detecting and maintaining acceptable track geometry relies on track recording vehicles (TRVs) that inspect and record geometric parameters. This study aims to develop a novel track geometry degradation model considering multiple indicators and their correlation, while accounting for both imperfect manual and mechanized tamping. A multi-variate Wiener model is formulated to capture the characteristics of track geometry degradation. To overcome data limitations, a hierarchical Bayesian approach with Markov Chain Monte Carlo (MCMC) simulation is utilized. This study offers a contribution on the analysis of a multi-variate predictive model which considers correlation between the degradation rates of multiple indicators, providing insights for rail operators and new track-monitoring systems. The performance of the models is rigorously validated through a real-world case study on a commuter track in Queensland, Australia, utilizing actual data and independent test datasets. This experimental calibration and validation procedure represents a novel contribution to the existing literature, offering valuable guidance for rail asset management and decision-making.
We consider the basic statistical problem of detecting truncation of the uniform distribution on the Boolean hypercube by juntas. More concretely, we give upper and lower bounds on the problem of distinguishing between i.i.d. sample access to either (a) the uniform distribution over $\{0,1\}^n$, or (b) the uniform distribution over $\{0,1\}^n$ conditioned on the satisfying assignments of a $k$-junta $f: \{0,1\}^n\to\{0,1\}$. We show that (up to constant factors) $\min\{2^k + \log{n\choose k}, {2^{k/2}\log^{1/2}{n\choose k}}\}$ samples suffice for this task and also show that a $\log{n\choose k}$ dependence on sample complexity is unavoidable. Our results suggest that testing junta truncation requires learning the set of relevant variables of the junta.
The future of automated driving (AD) is rooted in the development of robust, fair and explainable artificial intelligence methods. Upon request, automated vehicles must be able to explain their decisions to the driver and the car passengers, to the pedestrians and other vulnerable road users and potentially to external auditors in case of accidents. However, nowadays, most explainable methods still rely on quantitative analysis of the AD scene representations captured by multiple sensors. This paper proposes a novel representation of AD scenes, called Qualitative eXplainable Graph (QXG), dedicated to qualitative spatiotemporal reasoning of long-term scenes. The construction of this graph exploits the recent Qualitative Constraint Acquisition paradigm. Our experimental results on NuScenes, an open real-world multi-modal dataset, show that the qualitative eXplainable graph of an AD scene composed of 40 frames can be computed in real-time and light in space storage which makes it a potentially interesting tool for improved and more trustworthy perception and control processes in AD.
We present an efficient alternative to the convolutional layer using cheap spatial transformations. This construction exploits an inherent spatial redundancy of the learned convolutional filters to enable a much greater parameter efficiency, while maintaining the top-end accuracy of their dense counter-parts. Training these networks is modelled as a generalised pruning problem, whereby the pruned filters are replaced with cheap transformations from the set of non-pruned filters. We provide an efficient implementation of the proposed layer, followed by two natural extensions to avoid excessive feature compression and to improve the expressivity of the transformed features. We show that these networks can achieve comparable or improved performance to state-of-the-art pruning models across both the CIFAR-10 and ImageNet-1K datasets.
Disentangled Representation Learning (DRL) aims to learn a model capable of identifying and disentangling the underlying factors hidden in the observable data in representation form. The process of separating underlying factors of variation into variables with semantic meaning benefits in learning explainable representations of data, which imitates the meaningful understanding process of humans when observing an object or relation. As a general learning strategy, DRL has demonstrated its power in improving the model explainability, controlability, robustness, as well as generalization capacity in a wide range of scenarios such as computer vision, natural language processing, data mining etc. In this article, we comprehensively review DRL from various aspects including motivations, definitions, methodologies, evaluations, applications and model designs. We discuss works on DRL based on two well-recognized definitions, i.e., Intuitive Definition and Group Theory Definition. We further categorize the methodologies for DRL into four groups, i.e., Traditional Statistical Approaches, Variational Auto-encoder Based Approaches, Generative Adversarial Networks Based Approaches, Hierarchical Approaches and Other Approaches. We also analyze principles to design different DRL models that may benefit different tasks in practical applications. Finally, we point out challenges in DRL as well as potential research directions deserving future investigations. We believe this work may provide insights for promoting the DRL research in the community.
Recently, a considerable literature has grown up around the theme of Graph Convolutional Network (GCN). How to effectively leverage the rich structural information in complex graphs, such as knowledge graphs with heterogeneous types of entities and relations, is a primary open challenge in the field. Most GCN methods are either restricted to graphs with a homogeneous type of edges (e.g., citation links only), or focusing on representation learning for nodes only instead of jointly propagating and updating the embeddings of both nodes and edges for target-driven objectives. This paper addresses these limitations by proposing a novel framework, namely the Knowledge Embedding based Graph Convolutional Network (KE-GCN), which combines the power of GCNs in graph-based belief propagation and the strengths of advanced knowledge embedding (a.k.a. knowledge graph embedding) methods, and goes beyond. Our theoretical analysis shows that KE-GCN offers an elegant unification of several well-known GCN methods as specific cases, with a new perspective of graph convolution. Experimental results on benchmark datasets show the advantageous performance of KE-GCN over strong baseline methods in the tasks of knowledge graph alignment and entity classification.
Graphs, which describe pairwise relations between objects, are essential representations of many real-world data such as social networks. In recent years, graph neural networks, which extend the neural network models to graph data, have attracted increasing attention. Graph neural networks have been applied to advance many different graph related tasks such as reasoning dynamics of the physical system, graph classification, and node classification. Most of the existing graph neural network models have been designed for static graphs, while many real-world graphs are inherently dynamic. For example, social networks are naturally evolving as new users joining and new relations being created. Current graph neural network models cannot utilize the dynamic information in dynamic graphs. However, the dynamic information has been proven to enhance the performance of many graph analytical tasks such as community detection and link prediction. Hence, it is necessary to design dedicated graph neural networks for dynamic graphs. In this paper, we propose DGNN, a new {\bf D}ynamic {\bf G}raph {\bf N}eural {\bf N}etwork model, which can model the dynamic information as the graph evolving. In particular, the proposed framework can keep updating node information by capturing the sequential information of edges, the time intervals between edges and information propagation coherently. Experimental results on various dynamic graphs demonstrate the effectiveness of the proposed framework.
This paper is an attempt to explain all the matrix calculus you need in order to understand the training of deep neural networks. We assume no math knowledge beyond what you learned in calculus 1, and provide links to help you refresh the necessary math where needed. Note that you do not need to understand this material before you start learning to train and use deep learning in practice; rather, this material is for those who are already familiar with the basics of neural networks, and wish to deepen their understanding of the underlying math. Don't worry if you get stuck at some point along the way---just go back and reread the previous section, and try writing down and working through some examples. And if you're still stuck, we're happy to answer your questions in the Theory category at forums.fast.ai. Note: There is a reference section at the end of the paper summarizing all the key matrix calculus rules and terminology discussed here. See related articles at //explained.ai
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.