This paper critically analyzes conventional and biomimetic robotic arms, underscoring the trade-offs between size, motion range, and load capacity in current biomimetic models. By delving into the human shoulder's mechanical intelligence, particularly the glenohumeral joint's intricate features such as its unique ball-and-socket structure and self-locking mechanism, we pinpoint innovations that bolster both stability and mobility while maintaining compactness. To substantiate these insights, we present a groundbreaking biomimetic robotic glenohumeral joint that authentically mirrors human musculoskeletal elements, from ligaments to tendons, integrating the biological joint's mechanical intelligence. Our exhaustive simulations and tests reveal enhanced flexibility and load capacity for the robotic joint. The advanced robotic arm demonstrates notable capabilities, including a significant range of motions and a 4 kg payload capacity, even exerting over 1.5 Nm torque. This study not only confirms the human shoulder joint's mechanical innovations but also introduces a pioneering design for a next-generation biomimetic robotic arm, setting a new benchmark in robotic technology.
In research of manufacturing systems and autonomous robots, the term capability is used for a machine-interpretable specification of a system function. Approaches in this research area develop information models that capture all information relevant to interpret the requirements, effects and behavior of functions. These approaches are intended to overcome the heterogeneity resulting from the various types of processes and from the large number of different vendors. However, these models and associated methods do not offer solutions for automated process planning, i.e. finding a sequence of individual capabilities required to manufacture a certain product or to accomplish a mission using autonomous robots. Instead, this is a typical task for AI planning approaches, which unfortunately require a high effort to create the respective planning problem descriptions. In this paper, we present an approach that combines these two topics: Starting from a semantic capability model, an AI planning problem is automatically generated. The planning problem is encoded using Satisfiability Modulo Theories and uses an existing solver to find valid capability sequences including required parameter values. The approach also offers possibilities to integrate existing human expertise and to provide explanations for human operators in order to help understand planning decisions.
Financial sentiment analysis plays a crucial role in uncovering latent patterns and detecting emerging trends, enabling individuals to make well-informed decisions that may yield substantial advantages within the constantly changing realm of finance. Recently, Large Language Models (LLMs) have demonstrated their effectiveness in diverse domains, showcasing remarkable capabilities even in zero-shot and few-shot in-context learning for various Natural Language Processing (NLP) tasks. Nevertheless, their potential and applicability in the context of financial sentiment analysis have not been thoroughly explored yet. To bridge this gap, we employ two approaches: in-context learning (with a focus on gpt-3.5-turbo model) and fine-tuning LLMs on a finance-domain dataset. Given the computational costs associated with fine-tuning LLMs with large parameter sizes, our focus lies on smaller LLMs, spanning from 250M to 3B parameters for fine-tuning. We then compare the performances with state-of-the-art results to evaluate their effectiveness in the finance-domain. Our results demonstrate that fine-tuned smaller LLMs can achieve comparable performance to state-of-the-art fine-tuned LLMs, even with models having fewer parameters and a smaller training dataset. Additionally, the zero-shot and one-shot performance of LLMs produces comparable results with fine-tuned smaller LLMs and state-of-the-art outcomes. Furthermore, our analysis demonstrates that there is no observed enhancement in performance for finance-domain sentiment analysis when the number of shots for in-context learning is increased.
Proper scoring rules evaluate the quality of probabilistic predictions, playing an essential role in the pursuit of accurate and well-calibrated models. Every proper score decomposes into two fundamental components -- proper calibration error and refinement -- utilizing a Bregman divergence. While uncertainty calibration has gained significant attention, current literature lacks a general estimator for these quantities with known statistical properties. To address this gap, we propose a method that allows consistent, and asymptotically unbiased estimation of all proper calibration errors and refinement terms. In particular, we introduce Kullback--Leibler calibration error, induced by the commonly used cross-entropy loss. As part of our results, we prove the relation between refinement and f-divergences, which implies information monotonicity in neural networks, regardless of which proper scoring rule is optimized. Our experiments validate empirically the claimed properties of the proposed estimator and suggest that the selection of a post-hoc calibration method should be determined by the particular calibration error of interest.
Current compilers implement security features and optimizations that require nontrivial semantic reasoning about pointers and memory allocation: the program after the insertion of the security feature, or after applying the optimization, must simulate the original program despite a different memory layout. In this article, we illustrate such reasoning on pointer allocations through memory extensions and injections, as well as fine points on undefined values, by explaining how we implemented and proved correct two security features (stack canaries and pointer authentication) and one optimization (tail recursion elimination) in the CompCert formally verified compiler.
Pilot studies are an essential cornerstone of the design of crowdsourcing campaigns, yet they are often only mentioned in passing in the scholarly literature. A lack of details surrounding pilot studies in crowdsourcing research hinders the replication of studies and the reproduction of findings, stalling potential scientific advances. We conducted a systematic literature review on the current state of pilot study reporting at the intersection of crowdsourcing and HCI research. Our review of ten years of literature included 171 articles published in the proceedings of the Conference on Human Computation and Crowdsourcing (AAAI HCOMP) and the ACM Digital Library. We found that pilot studies in crowdsourcing research (i.e., crowd pilot studies) are often under-reported in the literature. Important details, such as the number of workers and rewards to workers, are often not reported. On the basis of our findings, we reflect on the current state of practice and formulate a set of best practice guidelines for reporting crowd pilot studies in crowdsourcing research. We also provide implications for the design of crowdsourcing platforms and make practical suggestions for supporting crowd pilot study reporting.
Oscillatory combustion in aero engines and modern gas turbines often has significant adverse effects on their operation, and accurately recognizing various oscillation modes is the prerequisite for understanding and controlling combustion instability. However, the high-dimensional spatial-temporal data of a complex combustion system typically poses considerable challenges to the dynamical mode recognition. Based on a two-layer bidirectional long short-term memory variational autoencoder (Bi-LSTM-VAE) dimensionality reduction model and a two-dimensional Wasserstein distance-based classifier (WDC), this study proposes a promising method (Bi-LSTM-VAE-WDC) for recognizing dynamical modes in oscillatory combustion systems. Specifically, the Bi-LSTM-VAE dimension reduction model was introduced to reduce the high-dimensional spatial-temporal data of the combustion system to a low-dimensional phase space; Gaussian kernel density estimates (GKDE) were computed based on the distribution of phase points in a grid; two-dimensional WD values were calculated from the GKDE maps to recognize the oscillation modes. The time-series data used in this study were obtained from numerical simulations of circular arrays of laminar flame oscillators. The results show that the novel Bi-LSTM-VAE method can produce a non-overlapping distribution of phase points, indicating an effective unsupervised mode recognition and classification. Furthermore, the present method exhibits a more prominent performance than VAE and PCA (principal component analysis) for distinguishing dynamical modes in complex flame systems, implying its potential in studying turbulent combustion.
This paper outlines a roadmap to effectively leverage shared mental models in multi-robot, multi-stakeholder scenarios, drawing on experiences from the BugWright2 project. The discussion centers on an autonomous multi-robot systems designed for ship inspection and maintenance. A significant challenge in the development and implementation of this system is the calibration of trust. To address this, the paper proposes that trust calibration can be managed and optimized through the creation and continual updating of shared and accurate mental models of the robots. Strategies to promote these mental models, including cross-training, briefings, debriefings, and task-specific elaboration and visualization, are examined. Additionally, the crucial role of an adaptable, distributed, and well-structured user interface (UI) is discussed.
This paper studies the cardinality of codes correcting insertions and deletions. We give improved upper and lower bounds on code size. Our upper bound is obtained by utilizing the asymmetric property of list decoding for insertions and deletions and can be seen as analogous to the Elias bound in the Hamming metric. Our non-asymptotic bound is better than the existing bounds when the minimum Levenshtein distance is relatively large. The asymptotic bound exceeds the Elias and the MRRW bounds adapted from the Hamming-metric bounds for the binary and the quaternary cases. Our lower bound improves on the bound by Levenshtein, but its effect is limited and vanishes asymptotically.
We consider a variant of the stochastic multi-armed bandit problem. Specifically, the arms are strategic agents who can improve their rewards or absorb them. The utility of an agent increases if she is pulled more or absorbs more of her rewards but decreases if she spends more effort improving her rewards. Agents have heterogeneous properties, specifically having different means and able to improve their rewards up to different levels. Further, a non-empty subset of agents are ''honest'' and in the worst case always give their rewards without absorbing any part. The principal wishes to obtain a high revenue (cumulative reward) by designing a mechanism that incentives top level performance at equilibrium. At the same time, the principal wishes to be robust and obtain revenue at least at the level of the honest agent with the highest mean in case of non-equilibrium behaviour. We identify a class of MAB algorithms which we call performance incentivizing which satisfy a collection of properties and show that they lead to mechanisms that incentivize top level performance at equilibrium and are robust under any strategy profile. Interestingly, we show that UCB is an example of such a MAB algorithm. Further, in the case where the top performance level is unknown we show that combining second price auction ideas with performance incentivizing algorithms achieves performance at least at the second top level while also being robust.
Ensembles over neural network weights trained from different random initialization, known as deep ensembles, achieve state-of-the-art accuracy and calibration. The recently introduced batch ensembles provide a drop-in replacement that is more parameter efficient. In this paper, we design ensembles not only over weights, but over hyperparameters to improve the state of the art in both settings. For best performance independent of budget, we propose hyper-deep ensembles, a simple procedure that involves a random search over different hyperparameters, themselves stratified across multiple random initializations. Its strong performance highlights the benefit of combining models with both weight and hyperparameter diversity. We further propose a parameter efficient version, hyper-batch ensembles, which builds on the layer structure of batch ensembles and self-tuning networks. The computational and memory costs of our method are notably lower than typical ensembles. On image classification tasks, with MLP, LeNet, and Wide ResNet 28-10 architectures, our methodology improves upon both deep and batch ensembles.