Time-critical cyber-physical applications demand the timely delivery of information. In this work, we employ a high-speed packet processing testbed to quantitatively analyze a packet forwarding application running on a shared memory multi-processor architecture, where efficient synchronization of concurrent access to a Forwarding Information Base is essential for low-latency and timely delivery of information. While modern packet processing frameworks are optimized for maximum packet throughput, their ability to support timely delivery remains an open question. Here we focus on the age of information performance issues induced by throughput-focused packet processing frameworks. Our results underscore the importance of careful selection of offered load parameters and concurrency constructs in such frameworks.
Many applications utilize sensors in mobile devices and machine learning to provide novel services. However, various factors such as different users, devices, and environments impact the performance of such applications, thus making the domain shift (i.e., distributional shift between the training domain and the target domain) a critical issue in mobile sensing. Despite attempts in domain adaptation to solve this challenging problem, their performance is unreliable due to the complex interplay among diverse factors. In principle, the performance uncertainty can be identified and redeemed by performance validation with ground-truth labels. However, it is infeasible for every user to collect high-quality, sufficient labeled data. To address the issue, we present DAPPER (Domain AdaPtation Performance EstimatoR) that estimates the adaptation performance in a target domain with only unlabeled target data. Our key idea is to approximate the model performance based on the mutual information between the model inputs and corresponding outputs. Our evaluation with four real-world sensing datasets compared against six baselines shows that on average, DAPPER outperforms the state-of-the-art baseline by 39.8% in estimation accuracy. Moreover, our on-device experiment shows that DAPPER achieves up to 396X less computation overhead compared with the baselines.
While conformal predictors reap the benefits of rigorous statistical guarantees for their error frequency, the size of their corresponding prediction sets is critical to their practical utility. Unfortunately, there is currently a lack of finite-sample analysis and guarantees for their prediction set sizes. To address this shortfall, we theoretically quantify the expected size of the prediction set under the split conformal prediction framework. As this precise formulation cannot usually be calculated directly, we further derive point estimates and high probability intervals that can be easily computed, providing a practical method for characterizing the expected prediction set size across different possible realizations of the test and calibration data. Additionally, we corroborate the efficacy of our results with experiments on real-world datasets, for both regression and classification problems.
With the increasing diversity of ML infrastructures nowadays, distributed training over heterogeneous computing systems is desired to facilitate the production of big models. Mixture-of-Experts (MoE) models have been proposed to lower the cost of training subject to the overall size of models/data through gating and parallelism in a divide-and-conquer fashion. While DeepSpeed has made efforts in carrying out large-scale MoE training over heterogeneous infrastructures, the efficiency of training and inference could be further improved from several system aspects, including load balancing, communication/computation efficiency, and memory footprint limits. In this work, we present SE-MoE that proposes Elastic MoE training with 2D prefetch and Fusion communication over Hierarchical storage, so as to enjoy efficient parallelisms in various types. For scalable inference in a single node, especially when the model size is larger than GPU memory, SE-MoE forms the CPU-GPU memory jointly into a ring of sections to load the model, and executes the computation tasks across the memory sections in a round-robin manner for efficient inference. We carried out extensive experiments to evaluate SE-MoE, where SE-MoE successfully trains a Unified Feature Optimization (UFO) model with a Sparsely-Gated Mixture-of-Experts model of 12B parameters in 8 days on 48 A100 GPU cards. The comparison against the state-of-the-art shows that SE-MoE outperformed DeepSpeed with 33% higher throughput (tokens per second) in training and 13% higher throughput in inference in general. Particularly, under unbalanced MoE Tasks, e.g., UFO, SE-MoE achieved 64% higher throughput with 18% lower memory footprints. The code of the framework will be released on: //github.com/PaddlePaddle/Paddle.
Fine-tuning pre-trained language models for multiple tasks tends to be expensive in terms of storage. To mitigate this, parameter-efficient transfer learning (PETL) methods have been proposed to address this issue, but they still require a significant number of parameters and storage when being applied to broader ranges of tasks. To achieve even greater storage reduction, we propose PROPETL, a novel method that enables efficient sharing of a single PETL module which we call prototype network (e.g., adapter, LoRA, and prefix-tuning) across layers and tasks. We then learn binary masks to select different sub-networks from the shared prototype network and apply them as PETL modules into different layers. We find that the binary masks can determine crucial information from the network, which is often ignored in previous studies. Our work can also be seen as a type of pruning method, where we find that overparameterization also exists in the seemingly small PETL modules. We evaluate PROPETL on various downstream tasks and show that it can outperform other PETL methods with approximately 10% of the parameter storage required by the latter.
Network digital twins (NDTs) facilitate the estimation of key performance indicators (KPIs) before physically implementing a network, thereby enabling efficient optimization of the network configuration. In this paper, we propose a learning-based NDT for network simulators. The proposed method offers a holistic representation of information flow in a wireless network by integrating node, edge, and path embeddings. Through this approach, the model is trained to map the network configuration to KPIs in a single forward pass. Hence, it offers a more efficient alternative to traditional simulation-based methods, thus allowing for rapid experimentation and optimization. Our proposed method has been extensively tested through comprehensive experimentation in various scenarios, including wired and wireless networks. Results show that it outperforms baseline learning models in terms of accuracy and robustness. Moreover, our approach achieves comparable performance to simulators but with significantly higher computational efficiency.
This study analyzes the possible relationship between personality traits, in terms of Big Five (extraversion, agreeableness, responsibility, emotional stability and openness to experience), and social interactions mediated by digital platforms in different socioeconomic and cultural contexts. We considered data from a questionnaire and the experience of using a chatbot, as a mean of requesting and offering help, with students from 4 universities: University of Trento (Italy), the National University of Mongolia, the School of Economics of London (United Kingdom) and the Universidad Cat\'olica Nuestra Se\~nora de la Asunci\'on (Paraguay). The main findings confirm that personality traits may influence social interactions and active participation in groups. Therefore, they should be taken into account to enrich the recommendation of matching algorithms between people who ask for help and people who could respond not only on the basis of their knowledge and skills.
With the ever-increasing utilization of natural language processing (NLP), we started to witness over the past few years a significant transformation in our interaction with legal texts. This technology has advanced the analysis and enhanced the understanding of complex legal terminology and contexts. The development of recent large language models (LLMs), particularly ChatGPT, has also introduced a revolutionary contribution to the way that legal texts can be processed and comprehended. In this paper, we present our work on a cooperative-legal question-answering LLM-based chatbot, where we developed a set of legal questions about Palestinian cooperatives, associated with their regulations and compared the auto-generated answers by the chatbot to their correspondences that are designed by a legal expert. To evaluate the proposed chatbot, we have used 50 queries generated by the legal expert and compared the answers produced by the chart to their relevance judgments. Finding demonstrated that an overall accuracy rate of 82% has been achieved when answering the queries, while exhibiting an F1 score equivalent to 79%.
There has been a growing interest in covariate adjustment in the analysis of randomized controlled trials in past years. For instance, the U.S. Food and Drug Administration recently issued guidance that emphasizes the importance of distinguishing between conditional and marginal treatment effects. Although these effects coincide in linear models, this is not typically the case in other settings, and this distinction is often overlooked in clinical trial practice. Considering these developments, this paper provides a review of when and how to utilize covariate adjustment to enhance precision in randomized controlled trials. We describe the differences between conditional and marginal estimands and stress the necessity of aligning statistical analysis methods with the chosen estimand. Additionally, we highlight the potential misalignment of current practices in estimating marginal treatment effects. Instead, we advocate for the utilization of standardization, which can improve efficiency by leveraging the information contained in baseline covariates while remaining robust to model misspecification. Finally, we present practical considerations that have arisen in our respective consultations to further clarify the advantages and limitations of covariate adjustment.
The usability of Reinforcement Learning is restricted by the large computation times it requires. Curriculum Reinforcement Learning speeds up learning by defining a helpful order in which an agent encounters tasks, i.e. from simple to hard. Curricula based on Absolute Learning Progress (ALP) have proven successful in different environments, but waste computation on repeating already learned behaviour in new tasks. We solve this problem by introducing a new regularization method based on Self-Paced (Deep) Learning, called Self-Paced Absolute Learning Progress (SPALP). We evaluate our method in three different environments. Our method achieves performance comparable to original ALP in all cases, and reaches it quicker than ALP in two of them. We illustrate possibilities to further improve the efficiency and performance of SPALP.
Understanding causality should be a core requirement of any attempt to build real impact through AI. Due to the inherent unobservability of counterfactuals, large randomised trials (RCTs) are the standard for causal inference. But large experiments are generically expensive, and randomisation carries its own costs, e.g. when suboptimal decisions are trialed. Recent work has proposed more sample-efficient alternatives to RCTs, but these are not adaptable to the downstream application for which the causal effect is sought. In this work, we develop a task-specific approach to experimental design and derive sampling strategies customised to particular downstream applications. Across a range of important tasks, real-world datasets, and sample sizes, our method outperforms other benchmarks, e.g. requiring an order-of-magnitude less data to match RCT performance on targeted marketing tasks.