The coordination of actions and the allocation of profit in supply chains under decentralized control play an important role in improving the profits of retailers and suppliers in the chain. We focus on supply chains under decentralized control in which noncompeting retailers can order from multiple suppliers to replenish their stocks. Suppliers' production capacity is bounded. The goal of the firms in the chain is to maximize their individual profits. As the outcome under decentralized control is inefficient, coordination of actions between cooperating agents can improve individual profits. Cooperative game theory is used to analyze cooperation between agents. We define multi-retailer-supplier games and show that agents can always achieve together an optimal profit and they have incentives to cooperate and to form the grand coalition. Moreover, we show that there always exist stable allocations of the total profit among the firms upon which no coalition can improve. Then we propose and characterize a stable allocation of the total surplus induced by cooperation.
Software testing is an essential knowledge area required by industry for software engineers. However, software engineering students often consider testing less appealing than designing or coding. Consequently, it is difficult to engage students to create effective tests. To encourage students, we explored the use of gamification and investigated whether this technique can help to improve the engagement and performance of software testing students. We conducted a controlled experiment to compare the engagement and performance of two groups of students that took an undergraduate software testing course in different academic years. The experimental group is formed by 135 students from the gamified course whereas the control group is formed by 100 students from the non-gamified course. The data collected were statistically analyzed to answer the research questions of this study. The results show that the students that participated in the gamification experience were more engaged and achieved a better performance. As an additional finding, the analysis of the results reveals that a key aspect to succeed is the gamification experience design. It is important to distribute the motivating stimulus provided by the gamification throughout the whole experience to engage students until the end. Given these results, we plan to readjust the gamification experience design to increase student engagement in the last stage of the experience, as well as to conduct a longitudinal study to evaluate the effects of gamification.
Noiseless compressive sensing is a protocol that enables undersampling and later recovery of a signal without loss of information. This compression is possible because the signal is usually sufficiently sparse in a given basis. Currently, the algorithm offering the best tradeoff between compression rate, robustness, and speed for compressive sensing is the LASSO (l1-norm bias) algorithm. However, many studies have pointed out the possibility that the implementation of lp-norms biases, with p smaller than one, could give better performance while sacrificing convexity. In this work, we focus specifically on the extreme case of the l0-based reconstruction, a task that is complicated by the discontinuity of the loss. In the first part of the paper, we describe via statistical physics methods, and in particular the replica method, how the solutions to this optimization problem are arranged in a clustered structure. We observe two distinct regimes: one at low compression rate where the signal can be recovered exactly, and one at high compression rate where the signal cannot be recovered accurately. In the second part, we present two message-passing algorithms based on our first results for the l0-norm optimization problem. The proposed algorithms are able to recover the signal at compression rates higher than the ones achieved by LASSO while being computationally efficient.
Obtaining guarantees on the convergence of the minimizers of empirical risks to the ones of the true risk is a fundamental matter in statistical learning. Instead of deriving guarantees on the usual estimation error, the goal of this paper is to provide concentration inequalities on the distance between the sets of minimizers of the risks for a broad spectrum of estimation problems. In particular, the risks are defined on metric spaces through probability measures that are also supported on metric spaces. A particular attention will therefore be given to include unbounded spaces and non-convex cost functions that might also be unbounded. This work identifies a set of assumptions allowing to describe a regime that seem to govern the concentration in many estimation problems, where the empirical minimizers are stable. This stability can then be leveraged to prove parametric concentration rates in probability and in expectation. The assumptions are verified, and the bounds showcased, on a selection of estimation problems such as barycenters on metric space with positive or negative curvature, subspaces of covariance matrices, regression problems and entropic-Wasserstein barycenters.
With the stringent requirements introduced by the new sixth-generation (6G) internet-of-things (IoT) use cases, traditional approaches to multiple access control have started to show their limitations. A new wave of grant-free (GF) approaches have been therefore proposed as a viable alternative. However, a definitive solution is still to be accomplished. In our work, we propose a new semi-GF coordinated random access (RA) protocol, denoted as partial-information multiple access (PIMA), to reduce packet loss and latency, particularly in the presence of sporadic activations. We consider a machine-type communications (MTC) scenario, wherein devices need to transmit data packets in the uplink to a base station (BS). When using PIMA, the BS can acquire partial information on the instantaneous traffic conditions and, using compute-over-the-air techniques, estimate the number of devices with packets waiting for transmission in their queue. Based on this knowledge, the BS assigns to each device a single slot for transmission. However, since each slot may still be assigned to multiple users, collisions may occur. Both the total number of allocated slots and the user assignments are optimized, based on the estimated number of active users, to reduce collisions and improve the efficiency of the multiple access scheme. To prove the validity of our solution, we compare PIMA to time-division multiple-access (TDMA) and slotted ALOHA (SALOHA) schemes, the ideal solutions for orthogonal multiple access (OMA) in the time domain in the case of low and high traffic conditions, respectively. We show that PIMA is able not only to adapt to different traffic conditions and to provide fewer packet drops regardless of the intensity of packet generations, but also able to merge the advantages of both TDMA and SALOHA schemes, thus providing performance improvements in terms of packet loss probability and latency.
We address the theoretical and practical problems related to the trajectory generation and tracking control of tail-sitter UAVs. Theoretically, we focus on the differential flatness property with full exploitation of actual UAV aerodynamic models, which lays a foundation for generating dynamically feasible trajectory and achieving high-performance tracking control. We have found that a tail-sitter is differentially flat with accurate aerodynamic models within the entire flight envelope, by specifying coordinate flight condition and choosing the vehicle position as the flat output. This fundamental property allows us to fully exploit the high-fidelity aerodynamic models in the trajectory planning and tracking control to achieve accurate tail-sitter flights. Particularly, an optimization-based trajectory planner for tail-sitters is proposed to design high-quality, smooth trajectories with consideration of kinodynamic constraints, singularity-free constraints and actuator saturation. The planned trajectory of flat output is transformed to state trajectory in real-time with consideration of wind in environments. To track the state trajectory, a global, singularity-free, and minimally-parameterized on-manifold MPC is developed, which fully leverages the accurate aerodynamic model to achieve high-accuracy trajectory tracking within the whole flight envelope. The effectiveness of the proposed framework is demonstrated through extensive real-world experiments in both indoor and outdoor field tests, including agile SE(3) flight through consecutive narrow windows requiring specific attitude and with speed up to 10m/s, typical tail-sitter maneuvers (transition, level flight and loiter) with speed up to 20m/s, and extremely aggressive aerobatic maneuvers (Wingover, Loop, Vertical Eight and Cuban Eight) with acceleration up to 2.5g.
This paper attempts to characterize the kinds of physical scenarios in which an online learning-based cognitive radar is expected to reliably outperform a fixed rule-based waveform selection strategy, as well as the converse. We seek general insights through an examination of two decision-making scenarios, namely dynamic spectrum access and multiple-target tracking. The radar scene is characterized by inducing a state-space model and examining the structure of its underlying Markov state transition matrix, in terms of entropy rate and diagonality. It is found that entropy rate is a strong predictor of online learning-based waveform selection, while diagonality is a better predictor of fixed rule-based waveform selection. We show that these measures can be used to predict first and second-order stochastic dominance relationships, which can allow system designers to make use of simple decision rules instead of more cumbersome learning approaches under certain conditions. We validate our findings through numerical results for each application and provide guidelines for future implementations.
The emergence of reinforcement learning (RL) methods in traffic signal control tasks has achieved better performance than conventional rule-based approaches. Most RL approaches require the observation of the environment for the agent to decide which action is optimal for a long-term reward. However, in real-world urban scenarios, missing observation of traffic states may frequently occur due to the lack of sensors, which makes existing RL methods inapplicable on road networks with missing observation. In this work, we aim to control the traffic signals in a real-world setting, where some of the intersections in the road network are not installed with sensors and thus with no direct observations around them. To the best of our knowledge, we are the first to use RL methods to tackle the traffic signal control problem in this real-world setting. Specifically, we propose two solutions: the first one imputes the traffic states to enable adaptive control, and the second one imputes both states and rewards to enable adaptive control and the training of RL agents. Through extensive experiments on both synthetic and real-world road network traffic, we reveal that our method outperforms conventional approaches and performs consistently with different missing rates. We also provide further investigations on how missing data influences the performance of our model.
We present prompt distribution learning for effectively adapting a pre-trained vision-language model to address downstream recognition tasks. Our method not only learns low-bias prompts from a few samples but also captures the distribution of diverse prompts to handle the varying visual representations. In this way, we provide high-quality task-related content for facilitating recognition. This prompt distribution learning is realized by an efficient approach that learns the output embeddings of prompts instead of the input embeddings. Thus, we can employ a Gaussian distribution to model them effectively and derive a surrogate loss for efficient training. Extensive experiments on 12 datasets demonstrate that our method consistently and significantly outperforms existing methods. For example, with 1 sample per category, it relatively improves the average result by 9.1% compared to human-crafted prompts.
Effective multi-robot teams require the ability to move to goals in complex environments in order to address real-world applications such as search and rescue. Multi-robot teams should be able to operate in a completely decentralized manner, with individual robot team members being capable of acting without explicit communication between neighbors. In this paper, we propose a novel game theoretic model that enables decentralized and communication-free navigation to a goal position. Robots each play their own distributed game by estimating the behavior of their local teammates in order to identify behaviors that move them in the direction of the goal, while also avoiding obstacles and maintaining team cohesion without collisions. We prove theoretically that generated actions approach a Nash equilibrium, which also corresponds to an optimal strategy identified for each robot. We show through extensive simulations that our approach enables decentralized and communication-free navigation by a multi-robot system to a goal position, and is able to avoid obstacles and collisions, maintain connectivity, and respond robustly to sensor noise.
This paper aims to mitigate straggler effects in synchronous distributed learning for multi-agent reinforcement learning (MARL) problems. Stragglers arise frequently in a distributed learning system, due to the existence of various system disturbances such as slow-downs or failures of compute nodes and communication bottlenecks. To resolve this issue, we propose a coded distributed learning framework, which speeds up the training of MARL algorithms in the presence of stragglers, while maintaining the same accuracy as the centralized approach. As an illustration, a coded distributed version of the multi-agent deep deterministic policy gradient(MADDPG) algorithm is developed and evaluated. Different coding schemes, including maximum distance separable (MDS)code, random sparse code, replication-based code, and regular low density parity check (LDPC) code are also investigated. Simulations in several multi-robot problems demonstrate the promising performance of the proposed framework.