Federated learning (FL) based on the centralized design faces both challenges regarding the trust issue and a single point of failure. To alleviate these issues, blockchain-aided decentralized FL (BDFL) introduces the decentralized network architecture into the FL training process, which can effectively overcome the defects of centralized architecture. However, deploying BDFL in wireless networks usually encounters challenges such as limited bandwidth, computing power, and energy consumption. Driven by these considerations, a dynamic stochastic optimization problem is formulated to minimize the average training delay by jointly optimizing the resource allocation and client selection under the constraints of limited energy budget and client participation. We solve the long-term mixed integer non-linear programming problem by employing the tool of Lyapunov optimization and thereby propose the dynamic resource allocation and client scheduling BDFL (DRC-BDFL) algorithm. Furthermore, we analyze the learning performance of DRC-BDFL and derive an upper bound for convergence regarding the global loss function. Extensive experiments conducted on SVHN and CIFAR-10 datasets demonstrate that DRC-BDFL achieves comparable accuracy to baseline algorithms while significantly reducing the training delay by 9.24% and 12.47%, respectively.
This study proposes a unified theory and statistical learning approach for traffic conflict detection, addressing the long-existing call for a consistent and comprehensive methodology to evaluate the collision risk emerged in road user interactions. The proposed theory assumes a context-dependent probabilistic collision risk and frames conflict detection as estimating the risk by statistical learning from observed proximities and contextual variables. Three primary tasks are integrated: representing interaction context from selected observables, inferring proximity distributions in different contexts, and applying extreme value theory to relate conflict intensity with conflict probability. As a result, this methodology is adaptable to various road users and interaction scenarios, enhancing its applicability without the need for pre-labelled conflict data. Demonstration experiments are executed using real-world trajectory data, with the unified metric trained on lane-changing interactions on German highways and applied to near-crash events from the 100-Car Naturalistic Driving Study in the U.S. The experiments demonstrate the methodology's ability to provide effective collision warnings, generalise across different datasets and traffic environments, cover a broad range of conflicts, and deliver a long-tailed distribution of conflict intensity. This study contributes to traffic safety by offering a consistent and explainable methodology for conflict detection applicable across various scenarios. Its societal implications include enhanced safety evaluations of traffic infrastructures, more effective collision warning systems for autonomous and driving assistance systems, and a deeper understanding of road user behaviour in different traffic conditions, contributing to a potential reduction in accident rates and improving overall traffic safety.
The efficacy of interpolating via Variably Scaled Kernels (VSKs) is known to be dependent on the definition of a proper scaling function, but no numerical recipes to construct it are available. Previous works suggest that such a function should mimic the target one, but no theoretical evidence is provided. This paper fills both the gaps: it proves that a scaling function reflecting the target one may lead to enhanced approximation accuracy, and it provides a user-independent tool for learning the scaling function by means of Discontinuous Neural Networks ({\delta}NN), i.e., NNs able to deal with possible discontinuities. Numerical evidence supports our claims, as it shows that the key features of the target function can be clearly recovered in the learned scaling function.
This work presents an evaluation of CNN models and data augmentation to carry out the hierarchical localization of a mobile robot by using omnidireccional images. In this sense, an ablation study of different state-of-the-art CNN models used as backbone is presented and a variety of data augmentation visual effects are proposed for addressing the visual localization of the robot. The proposed method is based on the adaption and re-training of a CNN with a dual purpose: (1) to perform a rough localization step in which the model is used to predict the room from which an image was captured, and (2) to address the fine localization step, which consists in retrieving the most similar image of the visual map among those contained in the previously predicted room by means of a pairwise comparison between descriptors obtained from an intermediate layer of the CNN. In this sense, we evaluate the impact of different state-of-the-art CNN models such as ConvNeXt for addressing the proposed localization. Finally, a variety of data augmentation visual effects are separately employed for training the model and their impact is assessed. The performance of the resulting CNNs is evaluated under real operation conditions, including changes in the lighting conditions. Our code is publicly available on the project website //github.com/juanjo-cabrera/IndoorLocalizationSingleCNN.git
Multi-agent reinforcement learning (MARL) studies crucial principles that are applicable to a variety of fields, including wireless networking and autonomous driving. We propose a photonic-based decision-making algorithm to address one of the most fundamental problems in MARL, called the competitive multi-armed bandit (CMAB) problem. Our numerical simulations demonstrate that chaotic oscillations and cluster synchronization of optically coupled lasers, along with our proposed decentralized coupling adjustment, efficiently balance exploration and exploitation while facilitating cooperative decision-making without explicitly sharing information among agents. Our study demonstrates how decentralized reinforcement learning can be achieved by exploiting complex physical processes controlled by simple algorithms.
We propose a high order unfitted finite element method for solving timeharmonic Maxwell interface problems. The unfitted finite element method is based on a mixed formulation in the discontinuous Galerkin framework on a Cartesian mesh with possible hanging nodes. The $H^2$ regularity of the solution to Maxwell interface problems with $C^2$ interfaces in each subdomain is proved. Practical interface-resolving mesh conditions are introduced under which the hp inverse estimates on three-dimensional curved domains are proved. Stability and hp a priori error estimate of the unfitted finite element method are proved. Numerical results are included to illustrate the performance of the method.
Variance reduction for causal inference in the presence of network interference is often achieved through either outcome modeling, which is typically analyzed under unit-randomized Bernoulli designs, or clustered experimental designs, which are typically analyzed without strong parametric assumptions. In this work, we study the intersection of these two approaches and consider the problem of estimation in low-order outcome models using data from a general experimental design. Our contributions are threefold. First, we present an estimator of the total treatment effect (also called the global average treatment effect) in a low-degree outcome model when the data are collected under general experimental designs, generalizing previous results for Bernoulli designs. We refer to this estimator as the pseudoinverse estimator and give bounds on its bias and variance in terms of properties of the experimental design. Second, we evaluate these bounds for the case of cluster randomized designs with both Bernoulli and complete randomization. For clustered Bernoulli randomization, we find that our estimator is always unbiased and that its variance scales like the smaller of the variance obtained from a low-order assumption and the variance obtained from cluster randomization, showing that combining these variance reduction strategies is preferable to using either individually. For clustered complete randomization, we find a notable bias-variance trade-off mediated by specific features of the clustering. Third, when choosing a clustered experimental design, our bounds can be used to select a clustering from a set of candidate clusterings. Across a range of graphs and clustering algorithms, we show that our method consistently selects clusterings that perform well on a range of response models, suggesting that our bounds are useful to practitioners.
We present exact non-Gaussian joint likelihoods for auto- and cross-correlation functions on arbitrarily masked spherical Gaussian random fields. Our considerations apply to spin-0 as well as spin-2 fields but are demonstrated here for the spin-2 weak-lensing correlation function. We motivate that this likelihood cannot be Gaussian and show how it can nevertheless be calculated exactly for any mask geometry and on a curved sky, as well as jointly for different angular-separation bins and redshift-bin combinations. Splitting our calculation into a large- and small-scale part, we apply a computationally efficient approximation for the small scales that does not alter the overall non-Gaussian likelihood shape. To compare our exact likelihoods to correlation-function sampling distributions, we simulated a large number of weak-lensing maps, including shape noise, and find excellent agreement for one-dimensional as well as two-dimensional distributions. Furthermore, we compare the exact likelihood to the widely employed Gaussian likelihood and find significant levels of skewness at angular separations $\gtrsim 1^{\circ}$ such that the mode of the exact distributions is shifted away from the mean towards lower values of the correlation function. We find that the assumption of a Gaussian random field for the weak-lensing field is well valid at these angular separations. Considering the skewness of the non-Gaussian likelihood, we evaluate its impact on the posterior constraints on $S_8$. On a simplified weak-lensing-survey setup with an area of $10 \ 000 \ \mathrm{deg}^2$, we find that the posterior mean of $S_8$ is up to $2\%$ higher when using the non-Gaussian likelihood, a shift comparable to the precision of current stage-III surveys.
This article advocates the use of conformal prediction (CP) methods for Gaussian process (GP) interpolation to enhance the calibration of prediction intervals. We begin by illustrating that using a GP model with parameters selected by maximum likelihood often results in predictions that are not optimally calibrated. CP methods can adjust the prediction intervals, leading to better uncertainty quantification while maintaining the accuracy of the underlying GP model. We compare different CP variants and introduce a novel variant based on an asymmetric score. Our numerical experiments demonstrate the effectiveness of CP methods in improving calibration without compromising accuracy. This work aims to facilitate the adoption of CP methods in the GP community.
This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error? In the first part of the thesis, we will empirically study how training deep networks via stochastic gradient descent implicitly controls the networks' capacity. Subsequently, to show how this leads to better generalization, we will derive {\em data-dependent} {\em uniform-convergence-based} generalization bounds with improved dependencies on the parameter count. Uniform convergence has in fact been the most widely used tool in deep learning literature, thanks to its simplicity and generality. Given its popularity, in this thesis, we will also take a step back to identify the fundamental limits of uniform convergence as a tool to explain generalization. In particular, we will show that in some example overparameterized settings, {\em any} uniform convergence bound will provide only a vacuous generalization bound. With this realization in mind, in the last part of the thesis, we will change course and introduce an {\em empirical} technique to estimate generalization using unlabeled data. Our technique does not rely on any notion of uniform-convergece-based complexity and is remarkably precise. We will theoretically show why our technique enjoys such precision. We will conclude by discussing how future work could explore novel ways to incorporate distributional assumptions in generalization bounds (such as in the form of unlabeled data) and explore other tools to derive bounds, perhaps by modifying uniform convergence or by developing completely new tools altogether.
Machine-learning models have demonstrated great success in learning complex patterns that enable them to make predictions about unobserved data. In addition to using models for prediction, the ability to interpret what a model has learned is receiving an increasing amount of attention. However, this increased focus has led to considerable confusion about the notion of interpretability. In particular, it is unclear how the wide array of proposed interpretation methods are related, and what common concepts can be used to evaluate them. We aim to address these concerns by defining interpretability in the context of machine learning and introducing the Predictive, Descriptive, Relevant (PDR) framework for discussing interpretations. The PDR framework provides three overarching desiderata for evaluation: predictive accuracy, descriptive accuracy and relevancy, with relevancy judged relative to a human audience. Moreover, to help manage the deluge of interpretation methods, we introduce a categorization of existing techniques into model-based and post-hoc categories, with sub-groups including sparsity, modularity and simulatability. To demonstrate how practitioners can use the PDR framework to evaluate and understand interpretations, we provide numerous real-world examples. These examples highlight the often under-appreciated role played by human audiences in discussions of interpretability. Finally, based on our framework, we discuss limitations of existing methods and directions for future work. We hope that this work will provide a common vocabulary that will make it easier for both practitioners and researchers to discuss and choose from the full range of interpretation methods.