In the context of addressing the Robot Air Hockey Challenge 2023, we investigate the applicability of model-based deep reinforcement learning to acquire a policy capable of autonomously playing air hockey. Our agents learn solely from sparse rewards while incorporating self-play to iteratively refine their behaviour over time. The robotic manipulator is interfaced using continuous high-level actions for position-based control in the Cartesian plane while having partial observability of the environment with stochastic transitions. We demonstrate that agents are prone to overfitting when trained solely against a single playstyle, highlighting the importance of self-play for generalization to novel strategies of unseen opponents. Furthermore, the impact of the imagination horizon is explored in the competitive setting of the highly dynamic game of air hockey, with longer horizons resulting in more stable learning and better overall performance.
We advance a recently flourishing line of work at the intersection of learning theory and computational economics by studying the learnability of two classes of mechanisms prominent in economics, namely menus of lotteries and two-part tariffs. The former is a family of randomized mechanisms designed for selling multiple items, known to achieve revenue beyond deterministic mechanisms, while the latter is designed for selling multiple units (copies) of a single item with applications in real-world scenarios such as car or bike-sharing services. We focus on learning high-revenue mechanisms of this form from buyer valuation data in both distributional settings, where we have access to buyers' valuation samples up-front, and the more challenging and less-studied online settings, where buyers arrive one-at-a-time and no distributional assumption is made about their values. We provide a suite of results with regard to these two families of mechanisms. We provide the first online learning algorithms for menus of lotteries and two-part tariffs with strong regret-bound guarantees. Since the space of parameters is infinite and the revenue functions have discontinuities, the known techniques do not readily apply. However, we are able to provide a reduction to online learning over a finite number of experts, in our case, a finite number of parameters. Furthermore, in the limited buyers type case, we show a reduction to online linear optimization, which allows us to obtain no-regret guarantees by presenting buyers with menus that correspond to a barycentric spanner. In addition, we provide algorithms with improved running times over prior work for the distributional settings. Finally, we demonstrate how techniques from the recent literature in data-driven algorithm design are insufficient for our studied problems.
In this paper, we revisit \textsf{ROOT-SGD}, an innovative method for stochastic optimization to bridge the gap between stochastic optimization and statistical efficiency. The proposed method enhances the performance and reliability of \textsf{ROOT-SGD} by integrating a carefully designed \emph{diminishing stepsize strategy}. This approach addresses key challenges in optimization, providing robust theoretical guarantees and practical benefits. Our analysis demonstrates that \textsf{ROOT-SGD} with diminishing achieves optimal convergence rates while maintaining computational efficiency. By dynamically adjusting the learning rate, \textsf{ROOT-SGD} ensures improved stability and precision throughout the optimization process. The findings of this study offer valuable insights for developing advanced optimization algorithms that are both efficient and statistically robust.
For a multidimensional It\^o semimartingale, we consider the problem of estimating integrated volatility functionals. Jacod and Rosenbaum (2013) studied a plug-in type of estimator based on a Riemann sum approximation of the integrated functional and a spot volatility estimator with a forward uniform kernel. Motivated by recent results that show that spot volatility estimators with general two-side kernels of unbounded support are more accurate, in this paper, an estimator using a general kernel spot volatility estimator as the plug-in is considered. A biased central limit theorem for estimating the integrated functional is established with an optimal convergence rate. Unbiased central limit theorems for estimators with proper de-biasing terms are also obtained both at the optimal convergence regime for the bandwidth and when applying undersmoothing. Our results show that one can significantly reduce the estimator's bias by adopting a general kernel instead of the standard uniform kernel. Our proposed bias-corrected estimators are found to maintain remarkable robustness against bandwidth selection in a variety of sampling frequencies and functions.
In this work, we propose a novel method for Bayesian Networks (BNs) structure elicitation that is based on the initialization of several LLMs with different experiences, independently querying them to create a structure of the BN, and further obtaining the final structure by majority voting. We compare the method with one alternative method on various widely and not widely known BNs of different sizes and study the scalability of both methods on them. We also propose an approach to check the contamination of BNs in LLM, which shows that some widely known BNs are inapplicable for testing the LLM usage for BNs structure elicitation. We also show that some BNs may be inapplicable for such experiments because their node names are indistinguishable. The experiments on the other BNs show that our method performs better than the existing method with one of the three studied LLMs; however, the performance of both methods significantly decreases with the increase in BN size.
In this study, we attempt to model intuition and incorporate this formalism to improve the performance of the Convolutional Neural Networks. Despite decades of research, ambiguities persist on principles of intuition. Experimental psychology reveals many types of intuition, which depend on state of the human mind. We focus on visual intuition, useful for completing missing information during visual cognitive tasks. First, we set up a scenario to gradually decrease the amount of visual information in the images of a dataset to examine its impact on CNN accuracy. Then, we represent a model for visual intuition using Gestalt theory. The theory claims that humans derive a set of templates according to their subconscious experiences. When the brain decides that there is missing information in a scene, such as occlusion, it instantaneously completes the information by replacing the missing parts with the most similar ones. Based upon Gestalt theory, we model the visual intuition, in two layers. Details of these layers are provided throughout the paper. We use the MNIST data set to test the suggested intuition model for completing the missing information. Experiments show that the augmented CNN architecture provides higher performances compared to the classic models when using incomplete images.
In this paper, we propose a new hybrid temporal computing (HTC) framework that leverages both pulse rate and temporal data encoding to design ultra-low energy hardware accelerators. Our approach is inspired by the recently proposed temporal computing, or race logic, which encodes data values as single delays, leading to significantly lower energy consumption due to minimized signal switching. However, race logic is limited in its applications due to inherent restrictions. The new HTC framework overcomes these limitations by encoding signals in both temporal and pulse rate formats for multiplication and in temporal format for propagation. This approach maintains reduced switch energy while being general enough to implement a wide range of arithmetic operations. We demonstrate how HTC multiplication is performed for both unipolar and bipolar data encoding and present the basic designs for multipliers, adders, and MAC units. Additionally, we implement two hardware accelerators: a Finite Impulse Response (FIR) filter and a Discrete Cosine Transform (DCT)/iDCT engine for image compression and DSP applications. Experimental results show that the HTC MAC has a significantly smaller power and area footprint compared to the Unary MAC design and is orders of magnitude faster. Compared to the CBSC MAC, the HTC MAC reduces power consumption by $45.2\%$ and area footprint by $50.13\%$. For the FIR design, the HTC design significantly outperforms the Unary design on all metrics. Compared to the CBSC design, the HTC-based FIR filter reduces power consumption by $36.61\%$ and area cost by $45.85\%$. The HTC-based DCT filter retains the quality of the original image with a decent PSNR, while consuming $23.34\%$ less power and occupying $18.20\%$ less area than the CBSC MAC-based DCT filter.
For the purpose of causal inference we employ a stochastic model of the data generating process, utilizing individual propensity probabilities for the treatment, and also individual and counterfactual prognosis probabilities for the outcome. We assume a generalized version of the stable unit treatment value assumption, but we do not assume any version of strongly ignorable treatment assignment. Instead of conducting a sensitivity analysis, we utilize the principle of maximum entropy to estimate the distribution of causal effects. We develop a principled middle-way between extreme explanations of the observed data: we do not conclude that an observed association is wholly spurious, and we do not conclude that it is wholly causal. Rather, our conclusions are tempered and we conclude that the association is part spurious and part causal. In an example application we apply our methodology to analyze an observed association between marijuana use and hard drug use.
We describe ACE0, a lightweight platform for evaluating the suitability and viability of AI methods for behaviour discovery in multiagent simulations. Specifically, ACE0 was designed to explore AI methods for multi-agent simulations used in operations research studies related to new technologies such as autonomous aircraft. Simulation environments used in production are often high-fidelity, complex, require significant domain knowledge and as a result have high R&D costs. Minimal and lightweight simulation environments can help researchers and engineers evaluate the viability of new AI technologies for behaviour discovery in a more agile and potentially cost effective manner. In this paper we describe the motivation for the development of ACE0.We provide a technical overview of the system architecture, describe a case study of behaviour discovery in the aerospace domain, and provide a qualitative evaluation of the system. The evaluation includes a brief description of collaborative research projects with academic partners, exploring different AI behaviour discovery methods.
Deep neural networks (DNNs) are successful in many computer vision tasks. However, the most accurate DNNs require millions of parameters and operations, making them energy, computation and memory intensive. This impedes the deployment of large DNNs in low-power devices with limited compute resources. Recent research improves DNN models by reducing the memory requirement, energy consumption, and number of operations without significantly decreasing the accuracy. This paper surveys the progress of low-power deep learning and computer vision, specifically in regards to inference, and discusses the methods for compacting and accelerating DNN models. The techniques can be divided into four major categories: (1) parameter quantization and pruning, (2) compressed convolutional filters and matrix factorization, (3) network architecture search, and (4) knowledge distillation. We analyze the accuracy, advantages, disadvantages, and potential solutions to the problems with the techniques in each category. We also discuss new evaluation metrics as a guideline for future research.
In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.