In this paper, we investigate the relationship between two elementary operations on derivations in adhesive high-level replacement systems that are well-known in the context of graph transformation: moving a derivation along a derivation based on parallel and sequential independence on one hand and restriction of a derivation with respect to a monomorphism into the start object on the other hand. Intuitively, a restriction clips off parts of the start object that are never matched by a rule application throughout the derivation on the other hand. As main result, it is shown that moving a derivation preserves its spine being the minimal restriction.
In this work, we present a nonlinear dynamics perspective on generating and connecting gaits for energetically conservative models of legged systems. In particular, we show that the set of conservative gaits constitutes a connected space of locally defined 1D submanifolds in the gait space. These manifolds are coordinate-free parameterized by energy level. We present algorithms for identifying such families of gaits through the use of numerical continuation methods, generating sets and bifurcation points. To this end, we also introduce several details for the numerical implementation. Most importantly, we establish the necessary condition for the Delassus' matrix to preserve energy across impacts. An important application of our work is with simple models of legged locomotion that are often able to capture the complexity of legged locomotion with just a few degrees of freedom and a small number of physical parameters. We demonstrate the efficacy of our framework on a one-legged hopper with four degrees of freedom.
In the present paper, we prove a new theorem, resulting in an update formula for linear regression model residuals calculating the exact k-fold cross-validation residuals for any choice of cross-validation strategy without model refitting. The required matrix inversions are limited by the cross-validation segment sizes and can be executed with high efficiency in parallel. The well-known formula for leave-one-out cross-validation follows as a special case of the theorem. In situations where the cross-validation segments consist of small groups of repeated measurements, we suggest a heuristic strategy for fast serial approximations of the cross-validated residuals and associated Predicted Residual Sum of Squares (PRESS) statistic. We also suggest strategies for efficient estimation of the minimum PRESS value and full PRESS function over a selected interval of regularisation values. The computational effectiveness of the parameter selection for Ridge- and Tikhonov regression modelling resulting from our theoretical findings and heuristic arguments is demonstrated in several applications with real and highly multivariate datasets.
In this paper, we explore the idea of analysing the historical bias of contextual language models based on BERT by measuring their adequacy with respect to Early Modern (EME) and Modern (ME) English. In our preliminary experiments, we perform fill-in-the-blank tests with 60 masked sentences (20 EME-specific, 20 ME-specific and 20 generic) and three different models (i.e., BERT Base, MacBERTh, English HLM). We then rate the model predictions according to a 5-point bipolar scale between the two language varieties and derive a weighted score to measure the adequacy of each model to EME and ME varieties of English.
In this paper, we present a distribution-dependent PAC-Chernoff bound that is perfectly tight for interpolators even under overparametrized model classes. This bound relies on basic principles of Large Deviation Theory and naturally provides a characterization of the smoothness of a model described as a simple real-valued function. Based on this distribution-dependent bound and the novel definition of smoothness, we propose an unifying theoretical explanation of why some interpolators generalize remarkably well while others not. And why a wide range of modern learning techniques (i.e., $\ell_2$-norm, distance-from-initialization, input-gradient and variance regularization together with data augmentation, invariant architectures, and overparameterization) are able to find them. The emergent conclusion is that all these methods provide complimentary procedures that bias the optimizer to smoother interpolators, which, according to this theoretical analysis, are the ones with better generalization error. One of the main insights of this study is that distribution-dependent bounds serve as a powerful tool better understand the complex dynamics behind the generalization capabilities of highly-overparameterized interpolators.
In this paper, we introduce two novel methods to design outer polar codes for two previously proposed concatenated polar code architectures: augmented polar codes and local-global polar codes. These methods include a stopping set (SS) construction and a nonstationary density evolution (NDE) construction. Simulation results demonstrate the advantage of these methods over previously proposed constructions based on density evolution (DE) and LLR evolution.
In this paper, we prove the first Bayesian regret bounds for Thompson Sampling in reinforcement learning in a multitude of settings. We simplify the learning problem using a discrete set of surrogate environments, and present a refined analysis of the information ratio using posterior consistency. This leads to an upper bound of order $\widetilde{O}(H\sqrt{d_{l_1}T})$ in the time inhomogeneous reinforcement learning problem where $H$ is the episode length and $d_{l_1}$ is the Kolmogorov $l_1-$dimension of the space of environments. We then find concrete bounds of $d_{l_1}$ in a variety of settings, such as tabular, linear and finite mixtures, and discuss how how our results are either the first of their kind or improve the state-of-the-art.
In this study, we focus on two main tasks, the first for detecting legal violations within unstructured textual data, and the second for associating these violations with potentially affected individuals. We constructed two datasets using Large Language Models (LLMs) which were subsequently validated by domain expert annotators. Both tasks were designed specifically for the context of class-action cases. The experimental design incorporated fine-tuning models from the BERT family and open-source LLMs, and conducting few-shot experiments using closed-source LLMs. Our results, with an F1-score of 62.69\% (violation identification) and 81.02\% (associating victims), show that our datasets and setups can be used for both tasks. Finally, we publicly release the datasets and the code used for the experiments in order to advance further research in the area of legal natural language processing (NLP).
In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR). We formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML). We evaluated the proposed approach using six languages as pretraining tasks and four languages as target tasks. Preliminary results showed that the proposed method, MetaASR, significantly outperforms the state-of-the-art multitask pretraining approach on all target languages with different combinations of pretraining languages. In addition, since MAML's model-agnostic property, this paper also opens new research direction of applying meta learning to more speech-related applications.
In this paper, we propose a novel multi-task learning architecture, which incorporates recent advances in attention mechanisms. Our approach, the Multi-Task Attention Network (MTAN), consists of a single shared network containing a global feature pool, together with task-specific soft-attention modules, which are trainable in an end-to-end manner. These attention modules allow for learning of task-specific features from the global pool, whilst simultaneously allowing for features to be shared across different tasks. The architecture can be built upon any feed-forward neural network, is simple to implement, and is parameter efficient. Experiments on the CityScapes dataset show that our method outperforms several baselines in both single-task and multi-task learning, and is also more robust to the various weighting schemes in the multi-task loss function. We further explore the effectiveness of our method through experiments over a range of task complexities, and show how our method scales well with task complexity compared to baselines.
In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.