Cybersecurity attacks against industrial control systems and cyber-physical systems can cause catastrophic real-world damage by infecting device binaries with malware. Mitigating such attacks can benefit from reverse engineering tools that recover sufficient semantic knowledge in terms of mathematical operations in the code. Conventional reverse engineering tools can decompile binaries to low-level code, but offer little semantic insight. This paper proposes REMaQE, an automated framework for reverse engineering of math equations from binary executables. REMaQE uses symbolic execution for dynamic analysis of the binary to extract the relevant semantic knowledge of the implemented algorithms. REMaQE provides an automatic parameter analysis pass which also leverages symbolic execution to identify input, output, and constant parameters of the implemented math equations. REMaQE automatically handles parameters accessed via registers, the stack, global memory, or pointers, and supports reverse engineering of object-oriented implementations such as C++ classes. REMaQE uses an algebraic simplification method which allows it to scale to complex conditional equations with ease. These features make REMaQE stand out over existing reverse engineering approaches for math equations. On a dataset of randomly generated math equations compiled to binaries from C and Simulink implementations, REMaQE accurately recovers a semantically matching equation for 97.53% of the models. For complex equations with more operations, accuracy stays consistently over 94%. REMaQE executes in 0.25 seconds on average and in 1.3 seconds for more complex equations. This real-time execution speed enables a smooth integration in an interactive mathematics-oriented reverse engineering workflow.
Utilizing GPUs is critical for high performance on heterogeneous systems. However, leveraging the full potential of GPUs for accelerating legacy CPU applications can be a challenging task for developers. The porting process requires identifying code regions amenable to acceleration, managing distinct memories, synchronizing host and device execution, and handling library functions that may not be directly executable on the device. This complexity makes it challenging for non-experts to leverage GPUs effectively, or even to start offloading parts of a large legacy application. In this paper, we propose a novel compilation scheme called "GPU First" that automatically compiles legacy CPU applications directly for GPUs without any modification of the application source. Library calls inside the application are either resolved through our partial libc GPU implementation or via automatically generated remote procedure calls to the host. Our approach simplifies the task of identifying code regions amenable to acceleration and enables rapid testing of code modifications on actual GPU hardware in order to guide porting efforts. Our evaluation on two HPC proxy applications with OpenMP CPU and GPU parallelism, four micro benchmarks with originally GPU only parallelism, as well as three benchmarks from the SPEC OMP 2012 suite featuring hand-optimized OpenMP CPU parallelism showcases the simplicity of porting host applications to the GPU. For existing parallel loops, we often match the performance of corresponding manually offloaded kernels, with up to 14.36x speedup on the GPU, validating that our GPU First methodology can effectively guide porting efforts of large legacy applications.
Adversarial Machine Learning (AML) represents the ability to disrupt Machine Learning (ML) algorithms through a range of methods that broadly exploit the architecture of deep learning optimisation. This paper presents Distributed Adversarial Regions (DAR), a novel method that implements distributed instantiations of computer vision-based AML attack methods that may be used to disguise objects from image recognition in both white and black box settings. We consider the context of object detection models used in urban environments, and benchmark the MobileNetV2, NasNetMobile and DenseNet169 models against a subset of relevant images from the ImageNet dataset. We evaluate optimal parameters (size, number and perturbation method), and compare to state-of-the-art AML techniques that perturb the entire image. We find that DARs can cause a reduction in confidence of 40.4% on average, but with the benefit of not requiring the entire image, or the focal object, to be perturbed. The DAR method is a deliberately simple approach where the intention is to highlight how an adversary with very little skill could attack models that may already be productionised, and to emphasise the fragility of foundational object detection models. We present this as a contribution to the field of ML security as well as AML. This paper contributes a novel adversarial method, an original comparison between DARs and other AML methods, and frames it in a new context - that of urban camouflage and the necessity for ML security and model robustness.
We consider a problem of approximation of $d$-variate functions defined on $\mathbb{R}^d$ which belong to the Hilbert space with tensor product-type reproducing Gaussian kernel with constant shape parameter. Within worst case setting, we investigate the growth of the information complexity as $d\to\infty$. The asymptotics are obtained for the case of fixed error threshold and for the case when it goes to zero as $d\to\infty$.
Ductile damage models and cohesive laws incorporate the material plasticity entailing the growth of irrecoverable deformations even after complete failure. This unrealistic growth remains concealed until the unilateral effects arising from the crack closure emerge. We address this issue by proposing a new strategy to cope with the entire process of failure, from the very inception in the form of diffuse damage to the final stage, i.e. the emergence of sharp cracks. To this end, we introduce a new strain field, termed discontinuity strain, to the conventional additive strain decomposition to account for discontinuities in a continuous sense so that the standard principle of virtual work applies. We treat this strain field similar to a strong discontinuity, yet without introducing new kinematic variables and nonlinear boundary conditions. In this paper, we demonstrate the effectiveness of this new strategy at a simple ductile damage constitutive model. The model uses a scalar damage index to control the degradation process. The discontinuity strain field is injected into the strain decomposition if this damage index exceeds a certain threshold. The threshold corresponds to the limit at which the induced imperfections merge and form a discrete crack. With three-point bending tests under pure mode I and mixed-mode conditions, we demonstrate that this augmentation does not show the early crack closure artifact which is wrongly predicted by plastic damage formulations at load reversal. We also use the concrete damaged plasticity model provided in Abaqus commercial finite element program for our comparison. Lastly, a high-intensity low-cycle fatigue test demonstrates the unilateral effects resulting from the complete closure of the induced crack.
The small size, high dexterity, and intrinsic compliance of continuum robots (CRs) make them well suited for constrained environments. Solving the inverse kinematics (IK), that is finding robot joint configurations that satisfy desired position or pose queries, is a fundamental challenge in motion planning, control, and calibration for any robot structure. For CRs, the need to avoid obstacles in tightly confined workspaces greatly complicates the search for feasible IK solutions. Without an accurate initialization or multiple re-starts, existing algorithms often fail to find a solution. We present CIDGIKc (Convex Iteration for Distance-Geometric Inverse Kinematics for Continuum Robots), an algorithm that solves these nonconvex feasibility problems with a sequence of semidefinite programs whose objectives are designed to encourage low-rank minimizers. CIDGIKc is enabled by a novel distance-geometric parameterization of constant curvature segment geometry for CRs with extensible segments. The resulting IK formulation involves only quadratic expressions and can efficiently incorporate a large number of collision avoidance constraints. Our experimental results demonstrate >98% solve success rates within complex, highly cluttered environments which existing algorithms cannot account for.
Currently, over half of the computing power at CERN GRID is used to run High Energy Physics simulations. The recent updates at the Large Hadron Collider (LHC) create the need for developing more efficient simulation methods. In particular, there exists a demand for a fast simulation of the neutron Zero Degree Calorimeter, where existing Monte Carlo-based methods impose a significant computational burden. We propose an alternative approach to the problem that leverages machine learning. Our solution utilises neural network classifiers and generative models to directly simulate the response of the calorimeter. In particular, we examine the performance of variational autoencoders and generative adversarial networks, expanding the GAN architecture by an additional regularisation network and a simple, yet effective postprocessing step. Our approach increases the simulation speed by 2 orders of magnitude while maintaining the high fidelity of the simulation.
We present a compatible finite element discretisation for the vertical slice compressible Euler equations, at next-to-lowest order (i.e., the pressure space is bilinear discontinuous functions). The equations are numerically integrated in time using a fully implicit timestepping scheme which is solved using monolithic GMRES preconditioned by a linesmoother. The linesmoother only involves local operations and is thus suitable for domain decomposition in parallel. It allows for arbitrarily large timesteps but with iteration counts scaling linearly with Courant number in the limit of large Courant number. This solver approach is implemented using Firedrake, and the additive Schwarz preconditioner framework of PETSc. We demonstrate the robustness of the scheme using a standard set of testcases that may be compared with other approaches.
Beam search is a go-to strategy for decoding neural sequence models. The algorithm can naturally be viewed as a subset optimization problem, albeit one where the corresponding set function does not reflect interactions between candidates. Empirically, this leads to sets often exhibiting high overlap, e.g., strings may differ by only a single word. Yet in use-cases that call for multiple solutions, a diverse or representative set is often desired. To address this issue, we propose a reformulation of beam search, which we call determinantal beam search. Determinantal beam search has a natural relationship to determinantal point processes (DPPs), models over sets that inherently encode intra-set interactions. By posing iterations in beam search as a series of subdeterminant maximization problems, we can turn the algorithm into a diverse subset selection process. In a case study, we use the string subsequence kernel to explicitly encourage n-gram coverage in text generated from a sequence model. We observe that our algorithm offers competitive performance against other diverse set generation strategies in the context of language generation, while providing a more general approach to optimizing for diversity.
The existing model compression methods via structured pruning typically require complicated multi-stage procedures. Each individual stage necessitates numerous engineering efforts and domain-knowledge from the end-users which prevent their wider applications onto broader scenarios. We propose the second generation of Only-Train-Once (OTOv2), which first automatically trains and compresses a general DNN only once from scratch to produce a more compact model with competitive performance without fine-tuning. OTOv2 is automatic and pluggable into various deep learning applications, and requires almost minimal engineering efforts from the users. Methodologically, OTOv2 proposes two major improvements: (i) Autonomy: automatically exploits the dependency of general DNNs, partitions the trainable variables into Zero-Invariant Groups (ZIGs), and constructs the compressed model; and (ii) Dual Half-Space Projected Gradient (DHSPG): a novel optimizer to more reliably solve structured-sparsity problems. Numerically, we demonstrate the generality and autonomy of OTOv2 on a variety of model architectures such as VGG, ResNet, CARN, ConvNeXt, DenseNet and StackedUnets, the majority of which cannot be handled by other methods without extensive handcrafting efforts. Together with benchmark datasets including CIFAR10/100, DIV2K, Fashion-MNIST, SVNH and ImageNet, its effectiveness is validated by performing competitively or even better than the state-of-the-arts. The source code is available at //github.com/tianyic/only_train_once.
Despite its great success, machine learning can have its limits when dealing with insufficient training data. A potential solution is the additional integration of prior knowledge into the training process which leads to the notion of informed machine learning. In this paper, we present a structured overview of various approaches in this field. We provide a definition and propose a concept for informed machine learning which illustrates its building blocks and distinguishes it from conventional machine learning. We introduce a taxonomy that serves as a classification framework for informed machine learning approaches. It considers the source of knowledge, its representation, and its integration into the machine learning pipeline. Based on this taxonomy, we survey related research and describe how different knowledge representations such as algebraic equations, logic rules, or simulation results can be used in learning systems. This evaluation of numerous papers on the basis of our taxonomy uncovers key methods in the field of informed machine learning.