The present work aims at describing hysteresis behaviour arising from cyclic bending experiments on cables by means of the Preisach operator. Pure bending experiments conducted in previous work show that slender structures such as electric cables behave inelastically and open hysteresis loops arise, with noticeable difference between the first load cycle and the following ones. The Preisach operator plays an important role in describing the input-output relation in hysteresis behaviours and it can be expressed as a superposition of relay operators. Here, we utilise data collected from pure bending experiments for a first approach. We introduce a mathematical formulation of the problem, and starting from the curvature of the cable specimen, we recursively define the Preisach plane for this specific case. Therefore, we derive a suitable kernel function in a way that the integration of such function over the Preisach plane results in the bending moment of the specimen.
Researchers and practitioners have recently reframed powerful Large Language Models (LLMs) as agents, enabling them to automate complex tasks largely via the use of specialized functions. To facilitate the development of LLM agents, we present a novel paradigm of training LLM agents without modifying the LLM weights, which is particularly useful when the LLMs are difficult or inaccessible for modifications. Inspired by how humans continuously forge tools to adapt to real-world tasks, rather than change our biological structure to fit a static set of tools, we propose to progressively forge agent's functions to better solve the downstream tasks instead of modifying the LLM weights. By treating the functions as learnable `agent parameters' and leveraging the fundamental idea of model training in artificial intelligence, we develop AgentOptimizer that employs the LLM to update agents' functions and devise an agent training algorithm with two strategies, roll-back, and early-stop, to streamline the training process. With extensive experiments, we showcase that the agent training paradigm could significantly improve the performance of representative LLM agents in various downstream tasks. We also study the behavior of the agent training regarding aspects like the learning curve and domain transferability.
This paper examines the complex nature of cyber attacks through an analysis of the LastPass breach. It argues for the integration of human-centric considerations into cybersecurity measures, focusing on mitigating factors such as goal-directed behavior, cognitive overload, human biases (e.g., optimism, anchoring), and risky behaviors. Findings from an analysis of this breach offers support to the perspective that addressing both the human and technical dimensions of cyber defense can significantly enhance the resilience of cyber systems against complex threats. This means maintaining a balanced approach while simultaneously simplifying user interactions, making users aware of biases, and discouraging risky practices are essential for preventing cyber incidents.
Clustering methods must be tailored to the dataset it operates on, as there is no objective or universal definition of ``cluster,'' but nevertheless arbitrariness in the clustering method must be minimized. This paper develops a quantitative ``stability'' method of determining clusters, where stable or persistent clustering signals are used to indicate real structures have been identified in the underlying dataset. This method is based on modulating clustering methods by controlling a parameter -- through a thermodynamic analogy, the modulation parameter is considered ``time'' and the evolving clustering methodologies can be considered a ``heat flow.'' When the information entropy of the heat flow is stable over a wide range of times -- either globally or in the local sense which we define -- we interpret this stability as an indication that essential features of the data have been found, and create clusters on this basis.
This paper delves into a rendezvous scenario involving a chaser and a target spacecraft, focusing on the application of Model Predictive Control (MPC) to design a controller capable of guiding the chaser toward the target. The operational principle of spacecraft thrusters, requiring a minimum activation time that leads to the existence of a control deadband, introduces mixed-integer constraints into the optimization, posing a considerable computational challenge due to the exponential complexity on the number of integer constraints. We address this complexity by presenting two solver algorithms that efficiently approximate the optimal solution in significantly less time than standard solvers, making them well-suited for real-time applications.
We study random walks on the giant component of Hyperbolic Random Graphs (HRGs), in the regime when the degree distribution obeys a power law with exponent in the range $(2,3)$. In particular, we first focus on the expected time for a random walk to hit a given vertex or visit, i.e. cover, all vertices. We show that, a.a.s. (with respect to the HRG), and up to multiplicative constants: the cover time is $n(\log n)^2$, the maximum hitting time is $n\log n$, and the average hitting time is $n$. We then determine the expected time to commute between two given vertices a.a.s., up to a small factor polylogarithmic in $n$, and under some mild hypothesis on the pair of vertices involved. Our results are proved by controlling effective resistances using the energy dissipated by carefully designed network flows associated to a tiling of the hyperbolic plane, on which we overlay a forest-like structure.
This work is motivated by a question whether it is possible to calculate a chaotic sequence efficiently, e.g., is it possible to get the $n$-th bit of a bit sequence generated by a chaotic map, such as $\beta$-expansion, tent map and logistic map in $\mathrm{o}(n)$ time/space? This paper gives an affirmative answer to the question about the space complexity of a tent map. We show that the decision problem of whether a given bit sequence is a valid tent code is solved in $\mathrm{O}(\log^{2} n)$ space in a sense of the smoothed complexity.
Deep representation learning methods struggle with continual learning, suffering from both catastrophic forgetting of useful units and loss of plasticity, often due to rigid and unuseful units. While many methods address these two issues separately, only a few currently deal with both simultaneously. In this paper, we introduce Utility-based Perturbed Gradient Descent (UPGD) as a novel approach for the continual learning of representations. UPGD combines gradient updates with perturbations, where it applies smaller modifications to more useful units, protecting them from forgetting, and larger modifications to less useful units, rejuvenating their plasticity. We use a challenging streaming learning setup where continual learning problems have hundreds of non-stationarities and unknown task boundaries. We show that many existing methods suffer from at least one of the issues, predominantly manifested by their decreasing accuracy over tasks. On the other hand, UPGD continues to improve performance and surpasses or is competitive with all methods in all problems. Finally, in extended reinforcement learning experiments with PPO, we show that while Adam exhibits a performance drop after initial learning, UPGD avoids it by addressing both continual learning issues.
Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing development of foundation models, e.g., Large Language Models (LLMs), there is a growing interest in exploring their abilities in reasoning tasks. In this paper, we introduce seminal foundation models proposed or adaptable for reasoning, highlighting the latest advancements in various reasoning tasks, methods, and benchmarks. We then delve into the potential future directions behind the emergence of reasoning abilities within foundation models. We also discuss the relevance of multimodal learning, autonomous agents, and super alignment in the context of reasoning. By discussing these future research directions, we hope to inspire researchers in their exploration of this field, stimulate further advancements in reasoning with foundation models, and contribute to the development of AGI.
This work considers the question of how convenient access to copious data impacts our ability to learn causal effects and relations. In what ways is learning causality in the era of big data different from -- or the same as -- the traditional one? To answer this question, this survey provides a comprehensive and structured review of both traditional and frontier methods in learning causality and relations along with the connections between causality and machine learning. This work points out on a case-by-case basis how big data facilitates, complicates, or motivates each approach.
Deep Convolutional Neural Networks (CNNs) are a special type of Neural Networks, which have shown state-of-the-art results on various competitive benchmarks. The powerful learning ability of deep CNN is largely achieved with the use of multiple non-linear feature extraction stages that can automatically learn hierarchical representation from the data. Availability of a large amount of data and improvements in the hardware processing units have accelerated the research in CNNs and recently very interesting deep CNN architectures are reported. The recent race in deep CNN architectures for achieving high performance on the challenging benchmarks has shown that the innovative architectural ideas, as well as parameter optimization, can improve the CNN performance on various vision-related tasks. In this regard, different ideas in the CNN design have been explored such as use of different activation and loss functions, parameter optimization, regularization, and restructuring of processing units. However, the major improvement in representational capacity is achieved by the restructuring of the processing units. Especially, the idea of using a block as a structural unit instead of a layer is gaining substantial appreciation. This survey thus focuses on the intrinsic taxonomy present in the recently reported CNN architectures and consequently, classifies the recent innovations in CNN architectures into seven different categories. These seven categories are based on spatial exploitation, depth, multi-path, width, feature map exploitation, channel boosting and attention. Additionally, it covers the elementary understanding of the CNN components and sheds light on the current challenges and applications of CNNs.