This is part II of a two-part paper. Part I presented a universal Birkhoff theory for fast and accurate trajectory optimization. The theory rested on two main hypotheses. In this paper, it is shown that if the computational grid is selected from any one of the Legendre and Chebyshev family of node points, be it Lobatto, Radau or Gauss, then, the resulting collection of trajectory optimization methods satisfy the hypotheses required for the universal Birkhoff theory to hold. All of these grid points can be generated at an $\mathcal{O}(1)$ computational speed. Furthermore, all Birkhoff-generated solutions can be tested for optimality by a joint application of Pontryagin's- and Covector-Mapping Principles, where the latter was developed in Part~I. More importantly, the optimality checks can be performed without resorting to an indirect method or even explicitly producing the full differential-algebraic boundary value problem that results from an application of Pontryagin's Principle. Numerical problems are solved to illustrate all these ideas. The examples are chosen to particularly highlight three practically useful features of Birkhoff methods: (1) bang-bang optimal controls can be produced without suffering any Gibbs phenomenon, (2) discontinuous and even Dirac delta covector trajectories can be well approximated, and (3) extremal solutions over dense grids can be computed in a stable and efficient manner.
The increasing demand for computational power in big data and machine learning has driven the development of distributed training methodologies. Among these, peer-to-peer (P2P) networks provide advantages such as enhanced scalability and fault tolerance. However, they also encounter challenges related to resource consumption, costs, and communication overhead as the number of participating peers grows. In this paper, we introduce a novel architecture that combines serverless computing with P2P networks for distributed training and present a method for efficient parallel gradient computation under resource constraints. Our findings show a significant enhancement in gradient computation time, with up to a 97.34\% improvement compared to conventional P2P distributed training methods. As for costs, our examination confirmed that the serverless architecture could incur higher expenses, reaching up to 5.4 times more than instance-based architectures. It is essential to consider that these higher costs are associated with marked improvements in computation time, particularly under resource-constrained scenarios. Despite the cost-time trade-off, the serverless approach still holds promise due to its pay-as-you-go model. Utilizing dynamic resource allocation, it enables faster training times and optimized resource utilization, making it a promising candidate for a wide range of machine learning applications.
This research study explores the applicability of Deep Reinforcement Learning (DRL) for thermal control based on Computational Fluid Dynamics. To accomplish that, the forced convection on a hot plate prone to a pulsating cooling jet with variable velocity has been investigated. We begin with evaluating the efficiency and viability of a vanilla Deep Q-Network (DQN) method for thermal control. Subsequently, a comprehensive comparison between different variants of DRL is conducted. Soft Double and Duel DQN achieved better thermal control performance among all the variants due to their efficient learning and action prioritization capabilities. Results demonstrate that the soft Double DQN outperforms the hard Double DQN. Moreover, soft Double and Duel can maintain the temperature in the desired threshold for more than 98% of the control cycle. These findings demonstrate the promising potential of DRL in effectively addressing thermal control systems.
This paper delves into the world of high-order curl and div elements within finite element methods, providing valuable insights into their geometric properties, indexing management, and practical implementation considerations. It first explores the decomposition of Lagrange finite elements. The discussion then extends to H(div)-conforming finite elements and H(curl)-conforming finite element spaces by choosing different frames at different sub-simplex. The required tangential continuity or normal continuity will be imposed by appropriate choices of the tangential and normal basis. The paper concludes with a focus on efficient indexing management strategies for degrees of freedom, offering practical guidance to researchers and engineers. It serves as a comprehensive resource that bridges the gap between theory and practice in the realm of high-order curl and div elements in finite element methods, which are vital for solving vector field problems and understanding electromagnetic phenomena.
We provide the first convergence guarantee for full black-box variational inference (BBVI), also known as Monte Carlo variational inference. While preliminary investigations worked on simplified versions of BBVI (e.g., bounded domain, bounded support, only optimizing for the scale, and such), our setup does not need any such algorithmic modifications. Our results hold for log-smooth posterior densities with and without strong log-concavity and the location-scale variational family. Also, our analysis reveals that certain algorithm design choices commonly employed in practice, particularly, nonlinear parameterizations of the scale of the variational approximation, can result in suboptimal convergence rates. Fortunately, running BBVI with proximal stochastic gradient descent fixes these limitations, and thus achieves the strongest known convergence rate guarantees. We evaluate this theoretical insight by comparing proximal SGD against other standard implementations of BBVI on large-scale Bayesian inference problems.
Current state-of-the-art crowd navigation approaches are mainly deep reinforcement learning (DRL)-based. However, DRL-based methods suffer from the issues of generalization and scalability. To overcome these challenges, we propose a method that includes a Collision Probability (CP) in the observation space to give the robot a sense of the level of danger of the moving crowd to help the robot navigate safely through crowds with unseen behaviors. We studied the effects of changing the number of moving obstacles to pay attention during navigation. During training, we generated local waypoints to increase the reward density and improve the learning efficiency of the system. Our approach was developed using deep reinforcement learning (DRL) and trained using the Gazebo simulator in a non-cooperative crowd environment with obstacles moving at randomized speeds and directions. We then evaluated our model on four different crowd-behavior scenarios. The results show that our method achieved a 100% success rate in all test settings. We compared our approach with a current state-of-the-art DRL-based approach, and our approach has performed significantly better, especially in terms of social safety. Importantly, our method can navigate in different crowd behaviors and requires no fine-tuning after being trained once. We further demonstrated the crowd navigation capability of our model in real-world tests.
This paper brings an in detail Genetic Algorithm (GA) based combinatorial optimization method used for the optimal design of the water distribution network (WDN) of Gurudeniya Service Zone, Sri Lanka. Genetic Algorithm (GA) mimics the survival of the fittest principle of nature to develop a search process. Methodology employs fuzzy combinations of pipe diameters to check their suitability to be considered as the cost effective optimal design solutions. Furthermore, the hydraulic constraints were implicitly evaluated within the GA itself in its aim to reaching the global optimum solution. Upon analysis, the results of this approach delivered agreeable design outputs. In addition, the comparison made between the results obtained by a previous study inspired by the Honey Bee Mating Optimization (HBMO) Algorithm and results obtained by the GA based approach, proves competency of GA for the optimal design of water distribution network in Gurudeniya Service Zone, Sri Lanka.
Memory interference may heavily inflate task execution times in Heterogeneous Systems-on-Chips (HeSoCs). Knowing worst-case interference is consequently fundamental for supporting the correct execution of time-sensitive applications. In most of the literature, worst-case interference is assumed to be generated by, and therefore is estimated through read-intensive synthetic workloads with no caching. Yet these workloads do not always generate worst-case interference. This is the consequence of the general results reported in this work. By testing on multiple architectures, we determined that the highest interference generation traffic pattern is actually hardware dependant, and that making assumptions could lead to a severe underestimation of the worst-case (in our case, of more than 9x).
As a generalization of the optimal mass transport (OMT) approach of Benamou and Brenier's, the regularized optimal mass transport (rOMT) formulates a transport problem from an initial mass configuration to another with the optimality defined by the total kinetic energy, but subject to an advection-diffusion constraint equation. Both rOMT and the Benamou and Brenier's formulation require the total initial and final masses to be equal; mass is preserved during the entire transport process. However, for many applications, e.g., in dynamic image tracking, this constraint is rarely if ever satisfied. Therefore, we propose to employ an unbalanced version of rOMT to remove this constraint together with a detailed numerical solution procedure and applications to analyzing fluid flows in the brain.
The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.
As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.