Data assimilation algorithms combine information from observations and prior model information to obtain the most likely state of a dynamical system. The linearised weak-constraint four-dimensional variational assimilation problem can be reformulated as a saddle point problem, which admits more scope for preconditioners than the primal form. In this paper we design new terms which can be used within existing preconditioners, such as block diagonal and constraint-type preconditioners. Our novel preconditioning approaches: (i) incorporate model information whilst guaranteeing parallelism, and (ii) are designed to target correlated observation error covariance matrices. To our knowledge (i) has not previously been considered for data assimilation problems. We develop new theory demonstrating the effectiveness of the new preconditioners within Krylov subspace methods. Linear and non-linear numerical experiments reveal that our new approach leads to faster convergence than existing state-of-the-art preconditioners for a broader range of problems than indicated by the theory alone. We present a range of numerical experiments performed in serial, with further improvements expected if the highly parallelisable nature of the preconditioners is exploited.
Two main challenges in Reinforcement Learning (RL) are designing appropriate reward functions and ensuring the safety of the learned policy. To address these challenges, we present a theoretical framework for Inverse Reinforcement Learning (IRL) in constrained Markov decision processes. From a convex-analytic perspective, we extend prior results on reward identifiability and generalizability to both the constrained setting and a more general class of regularizations. In particular, we show that identifiability up to potential shaping (Cao et al., 2021) is a consequence of entropy regularization and may generally no longer hold for other regularizations or in the presence of safety constraints. We also show that to ensure generalizability to new transition laws and constraints, the true reward must be identified up to a constant. Additionally, we derive a finite sample guarantee for the suboptimality of the learned rewards, and validate our results in a gridworld environment.
Predicting the future motion of observed vehicles is a crucial enabler for safe autonomous driving. The field of motion prediction has seen large progress recently with State-of-the-Art (SotA) models achieving impressive results on large-scale public benchmarks. However, recent work revealed that learning-based methods are prone to predict off-road trajectories in challenging scenarios. These can be created by perturbing existing scenarios with additional turns in front of the target vehicle while the motion history is left unchanged. We argue that this indicates that SotA models do not consider the map information sufficiently and demonstrate how this can be solved, by representing model inputs and outputs in a Frenet frame defined by lane centreline sequences. To this end, we present a general wrapper that leverages a Frenet representation of the scene and that can be applied to SotA models without changing their architecture. We demonstrate the effectiveness of this approach in a comprehensive benchmark using two SotA motion prediction models. Our experiments show that this reduces the off-road rate on challenging scenarios by more than 90\%, without sacrificing average performance.
The smart-grid introduces several new data-gathering, communication, and information-sharing capabilities into the electrical system, as well as additional privacy threats, vulnerabilities, and cyber-attacks. In this study, Modbus is regarded as one of the most prevalent interfaces for control systems in power plants. Modern control interfaces are vulnerable to cyber-attacks, posing a risk to the entire energy infrastructure. In order to strengthen resistance to cyber-attacks, this study introduces a test bed for cyber-physical systems that operate in real-time. To investigate the network vulnerabilities of smart power grids, Modbus protocol has been examined combining a real-time power system simulator with a communication system simulator and the effects of the system presented and analyzed. The goal is to detect the vulnerability in Modbus protocol and perform the Man-in-the-middle attack with its impact on the system. This proposed testbed can be evaluated as a research model for vulnerability assessment as well as a tool for evaluating cyber-attacks and enquire into any detection mechanism for safeguarding and defending smart grid systems from a variety of cyberattacks. We present here the preliminary findings on using the testbed to identify a particular MiTM attack and the effects on system performance. Finally, we suggest a cyber security strategy as a solution to address such network vulnerabilities and deploy appropriate countermeasures.
We contribute to the efficient approximation of the Pareto-set for the classical $\mathcal{NP}$-hard multi-objective minimum spanning tree problem (moMST) adopting evolutionary computation. More precisely, by building upon preliminary work, we analyse the neighborhood structure of Pareto-optimal spanning trees and design several highly biased sub-graph-based mutation operators founded on the gained insights. In a nutshell, these operators replace (un)connected sub-trees of candidate solutions with locally optimal sub-trees. The latter (biased) step is realized by applying Kruskal's single-objective MST algorithm to a weighted sum scalarization of a sub-graph. We prove runtime complexity results for the introduced operators and investigate the desirable Pareto-beneficial property. This property states that mutants cannot be dominated by their parent. Moreover, we perform an extensive experimental benchmark study to showcase the operator's practical suitability. Our results confirm that the sub-graph based operators beat baseline algorithms from the literature even with severely restricted computational budget in terms of function evaluations on four different classes of complete graphs with different shapes of the Pareto-front.
The energy dissipation law and maximum bound principle are significant characteristics of the Allen-Chan equation. To preserve discrete counterpart of these properties, the linear part of the target system is usually discretized implicitly, resulting in a large linear or nonlinear system of equations. The Fast Fourier Transform (FFT) algorithm is commonly used to solve the resulting linear or nonlinear systems with computational costs of $\mathcal{O}(M^d log M)$ at each time step, where $M$ is the number of spatial grid points in each direction, and $d$ is the dimension of the problem. Combining the Saul'yev methods and the stabilized technique, we propose and analyze novel first- and second-order numerical schemes for the Allen-Cahn equation in this paper. In contrast to the traditional methods, the proposed methods can be solved by components, requiring only $\mathcal{O}(M^d)$ computational costs per time step. Additionally, they preserve the maximum bound principle and original energy dissipation law at the discrete level. We also propose rigorous analysis of their consistency and convergence. Numerical experiments are conducted to confirm the theoretical analysis and demonstrate the efficiency of the proposed methods.
Second order stochastic optimizers allow parameter update step size and direction to adapt to loss curvature, but have traditionally required too much memory and compute for deep learning. Recently, Shampoo [Gupta et al., 2018] introduced a Kronecker factored preconditioner to reduce these requirements: it is used for large deep models [Anil et al., 2020] and in production [Anil et al., 2022]. However, it takes inverse matrix roots of ill-conditioned matrices. This requires 64-bit precision, imposing strong hardware constraints. In this paper, we propose a novel factorization, Kronecker Approximation-Domination (KrAD). Using KrAD, we update a matrix that directly approximates the inverse empirical Fisher matrix (like full matrix AdaGrad), avoiding inversion and hence 64-bit precision. We then propose KrADagrad$^\star$, with similar computational costs to Shampoo and the same regret. Synthetic ill-conditioned experiments show improved performance over Shampoo for 32-bit precision, while for several real datasets we have comparable or better generalization.
Adversarial evaluations of language models typically focus on English alone. In this paper, we performed a multilingual evaluation of Named Entity Recognition (NER) in terms of its robustness to small perturbations in the input. Our results showed the NER models we explored across three languages (English, German and Hindi) are not very robust to such changes, as indicated by the fluctuations in the overall F1 score as well as in a more fine-grained evaluation. With that knowledge, we further explored whether it is possible to improve the existing NER models using a part of the generated adversarial data sets as augmented training data to train a new NER model or as fine-tuning data to adapt an existing NER model. Our results showed that both these approaches improve performance on the original as well as adversarial test sets. While there is no significant difference between the two approaches for English, re-training is significantly better than fine-tuning for German and Hindi.
In this paper, we evaluate the performance of novel numerical methods for solving one-dimensional nonlinear fractional dispersive and dissipative evolution equations. The methods are based on affine combinations of time-splitting integrators and pseudo-spectral discretizations using Hermite and Fourier expansions. We show the effectiveness of the proposed methods by numerically computing the dynamics of soliton solutions of the the standard and fractional variants of the nonlinear Schr\"odinger equation (NLSE) and the complex Ginzburg-Landau equation (CGLE), and by comparing the results with those obtained by standard splitting integrators. An exhaustive numerical investigation shows that the new technique is competitive with traditional composition-splitting schemes for the case of Hamiltonian problems both in terms accuracy and computational cost. Moreover, it is applicable straightforwardly to irreversible models, outperforming high-order symplectic integrators which could become unstable due to their need of negative time steps. Finally, we discuss potential improvements of the numerical methods aimed to increase their efficiency, and possible applications to the investigation of dissipative solitons that arise in nonlinear optical systems of contemporary interest. Overall, our method offers a promising alternative for solving a wide range of evolutionary partial differential equations.
Lipschitz bandit is a variant of stochastic bandits that deals with a continuous arm set defined on a metric space, where the reward function is subject to a Lipschitz constraint. In this paper, we introduce a new problem of Lipschitz bandits in the presence of adversarial corruptions where an adaptive adversary corrupts the stochastic rewards up to a total budget $C$. The budget is measured by the sum of corruption levels across the time horizon $T$. We consider both weak and strong adversaries, where the weak adversary is unaware of the current action before the attack, while the strong one can observe it. Our work presents the first line of robust Lipschitz bandit algorithms that can achieve sub-linear regret under both types of adversary, even when the total budget of corruption $C$ is unrevealed to the agent. We provide a lower bound under each type of adversary, and show that our algorithm is optimal under the strong case. Finally, we conduct experiments to illustrate the effectiveness of our algorithms against two classic kinds of attacks.
We establish precise structural and risk equivalences between subsampling and ridge regularization for ensemble ridge estimators. Specifically, we prove that linear and quadratic functionals of subsample ridge estimators, when fitted with different ridge regularization levels $\lambda$ and subsample aspect ratios $\psi$, are asymptotically equivalent along specific paths in the $(\lambda, \psi )$-plane (where $\psi$ is the ratio of the feature dimension to the subsample size). Our results only require bounded moment assumptions on feature and response distributions and allow for arbitrary joint distributions. Furthermore, we provide a datadependent method to determine the equivalent paths of $(\lambda, \psi )$. An indirect implication of our equivalences is that optimally-tuned ridge regression exhibits a monotonic prediction risk in the data aspect ratio. This resolves a recent open problem raised by Nakkiran et al. under general data distributions and a mild regularity condition that maintains regression hardness through linearized signal-to-noise ratios.