Today, digital identity management for individuals is either inconvenient and error-prone or creates undesirable lock-in effects and violates privacy and security expectations. These shortcomings inhibit the digital transformation in general and seem particularly concerning in the context of novel applications such as access control for decentralized autonomous organizations and identification in the Metaverse. Decentralized or self-sovereign identity (SSI) aims to offer a solution to this dilemma by empowering individuals to manage their digital identity through machine-verifiable attestations stored in a "digital wallet" application on their edge devices. However, when presented to a relying party, these attestations typically reveal more attributes than required and allow tracking end users' activities. Several academic works and practical solutions exist to reduce or avoid such excessive information disclosure, from simple selective disclosure to data-minimizing anonymous credentials based on zero-knowledge proofs (ZKPs). We first demonstrate that the SSI solutions that are currently built with anonymous credentials still lack essential features such as scalable revocation, certificate chaining, and integration with secure elements. We then argue that general-purpose ZKPs in the form of zk-SNARKs can appropriately address these pressing challenges. We describe our implementation and conduct performance tests on different edge devices to illustrate that the performance of zk-SNARK-based anonymous credentials is already practical. We also discuss further advantages that general-purpose ZKPs can easily provide for digital wallets, for instance, to create "designated verifier presentations" that facilitate new design options for digital identity infrastructures that previously were not accessible because of the threat of man-in-the-middle attacks.
Neural networks excel at discovering statistical patterns in high-dimensional data sets. In practice, higher-order cumulants, which quantify the non-Gaussian correlations between three or more variables, are particularly important for the performance of neural networks. But how efficient are neural networks at extracting features from higher-order cumulants? We study this question in the spiked cumulant model, where the statistician needs to recover a privileged direction or "spike" from the order-$p\ge 4$ cumulants of~$d$-dimensional inputs. We first characterise the fundamental statistical and computational limits of recovering the spike by analysing the number of samples~$n$ required to strongly distinguish between inputs from the spiked cumulant model and isotropic Gaussian inputs. We find that statistical distinguishability requires $n\gtrsim d$ samples, while distinguishing the two distributions in polynomial time requires $n \gtrsim d^2$ samples for a wide class of algorithms, i.e. those covered by the low-degree conjecture. These results suggest the existence of a wide statistical-to-computational gap in this problem. Numerical experiments show that neural networks learn to distinguish the two distributions with quadratic sample complexity, while "lazy" methods like random features are not better than random guessing in this regime. Our results show that neural networks extract information from higher-order correlations in the spiked cumulant model efficiently, and reveal a large gap in the amount of data required by neural networks and random features to learn from higher-order cumulants.
Dependence is undoubtedly a central concept in statistics. Though, it proves difficult to locate in the literature a formal definition which goes beyond the self-evident 'dependence = non-independence'. This absence has allowed the term 'dependence' and its declination to be used vaguely and indiscriminately for qualifying a variety of disparate notions, leading to numerous incongruities. For example, the classical Pearson's, Spearman's or Kendall's correlations are widely regarded as 'dependence measures' of major interest, in spite of returning 0 in some cases of deterministic relationships between the variables at play, evidently not measuring dependence at all. Arguing that research on such a fundamental topic would benefit from a slightly more rigid framework, this paper suggests a general definition of the dependence between two random variables defined on the same probability space. Natural enough for aligning with intuition, that definition is still sufficiently precise for allowing unequivocal identification of a 'universal' representation of the dependence structure of any bivariate distribution. Links between this representation and familiar concepts are highlighted, and ultimately, the idea of a dependence measure based on that universal representation is explored and shown to satisfy Renyi's postulates.
Microring resonators (MRRs) are promising devices for time-delay photonic reservoir computing, but the impact of the different physical effects taking place in the MRRs on the reservoir computing performance is yet to be fully understood. We numerically analyze the impact of linear losses as well as thermo-optic and free-carrier effects relaxation times on the prediction error of the time-series task NARMA-10. We demonstrate the existence of three regions, defined by the input power and the frequency detuning between the optical source and the microring resonance, that reveal the cavity transition from linear to nonlinear regimes. One of these regions offers very low error in time-series prediction under relatively low input power and number of nodes while the other regions either lack nonlinearity or become unstable. This study provides insight into the design of the MRR and the optimization of its physical properties for improving the prediction performance of time-delay reservoir computing.
Nowadays, machine learning algorithms continue to grow in complexity and require a substantial amount of computational resources and energy. For these reasons, there is a growing awareness of the development of new green algorithms and distributed AI can contribute to this. Federated learning (FL) is one of the most active research lines in machine learning, as it allows the training of collaborative models in a distributed way, an interesting option in many real-world environments, such as the Internet of Things, allowing the use of these models in edge computing devices. In this work, we present a FL method, based on a neural network without hidden layers, capable of generating a global collaborative model in a single training round, unlike traditional FL methods that require multiple rounds for convergence. This allows obtaining an effective and efficient model that simplifies the management of the training process. Moreover, this method preserve data privacy by design, a crucial aspect in current data protection regulations. We conducted experiments with large datasets and a large number of federated clients. Despite being based on a network model without hidden layers, it maintains in all cases competitive accuracy results compared to more complex state-of-the-art machine learning models. Furthermore, we show that the method performs equally well in both identically and non-identically distributed scenarios. Finally, it is an environmentally friendly algorithm as it allows significant energy savings during the training process compared to its centralized counterpart.
Text normalization is a crucial technology for low-resource languages which lack rigid spelling conventions or that have undergone multiple spelling reforms. Low-resource text normalization has so far relied upon hand-crafted rules, which are perceived to be more data efficient than neural methods. In this paper we examine the case of text normalization for Ligurian, an endangered Romance language. We collect 4,394 Ligurian sentences paired with their normalized versions, as well as the first open source monolingual corpus for Ligurian. We show that, in spite of the small amounts of data available, a compact transformer-based model can be trained to achieve very low error rates by the use of backtranslation and appropriate tokenization.
As a crossover frontier of physics and mechanics, quantum computing is showing its great potential in computational mechanics. However, quantum hardware noise remains a critical barrier to achieving accurate simulation results due to the limitation of the current hardware level. In this paper, we integrate error-mitigated quantum computing in data-driven computational mechanics, where the zero-noise extrapolation (ZNE) technique is employed to improve the accuracy of quantum computing. Numerical examples including multiscale simulation of a composite L-shaped beam are conducted with the quantum computer simulator Qpanda, and the results validate the effectiveness of the proposed method. We believe this work presents a promising step towards using the power of quantum computing in computational mechanics.
Deep neural networks (DNNs) often fail silently with over-confident predictions on out-of-distribution (OOD) samples, posing risks in real-world deployments. Existing techniques predominantly emphasize either the feature representation space or the gradient norms computed with respect to DNN parameters, yet they overlook the intricate gradient distribution and the topology of classification regions. To address this gap, we introduce GRadient-aware Out-Of-Distribution detection in interpolated manifolds (GROOD), a novel framework that relies on the discriminative power of gradient space to distinguish between in-distribution (ID) and OOD samples. To build this space, GROOD relies on class prototypes together with a prototype that specifically captures OOD characteristics. Uniquely, our approach incorporates a targeted mix-up operation at an early intermediate layer of the DNN to refine the separation of gradient spaces between ID and OOD samples. We quantify OOD detection efficacy using the distance to the nearest neighbor gradients derived from the training set, yielding a robust OOD score. Experimental evaluations substantiate that the introduction of targeted input mix-upamplifies the separation between ID and OOD in the gradient space, yielding impressive results across diverse datasets. Notably, when benchmarked against ImageNet-1k, GROOD surpasses the established robustness of state-of-the-art baselines. Through this work, we establish the utility of leveraging gradient spaces and class prototypes for enhanced OOD detection for DNN in image classification.
High-order tensor methods for solving both convex and nonconvex optimization problems have generated significant research interest, leading to algorithms with optimal global rates of convergence and local rates that are faster than Newton's method. On each iteration, these methods require the unconstrained local minimization of a (potentially nonconvex) multivariate polynomial of degree higher than two, constructed using third-order (or higher) derivative information, and regularized by an appropriate power of regularization. Developing efficient techniques for solving such subproblems is an ongoing topic of research, and this paper addresses the case of the third-order tensor subproblem. We propose the CQR algorithmic framework, for minimizing a nonconvex Cubic multivariate polynomial with Quartic Regularisation, by minimizing a sequence of local quadratic models that incorporate simple cubic and quartic terms. The role of the cubic term is to crudely approximate local tensor information, while the quartic one controls model regularization and progress. We provide necessary and sufficient optimality conditions that fully characterise the global minimizers of these cubic-quartic models. We then turn these conditions into secular equations that can be solved using nonlinear eigenvalue techniques. We show, using our optimality characterisations, that a CQR algorithmic variant has the optimal-order evaluation complexity of $\mathcal{O}(\epsilon^{-3/2})$ when applied to minimizing our quartically-regularised cubic subproblem, which can be further improved in special cases. We propose practical CQR variants that use local tensor information to construct the local cubic-quartic models. We test these variants numerically and observe them to be competitive with ARC and other subproblem solvers on typical instances and even superior on ill-conditioned subproblems with special structure.
Conventional computing paradigm struggles to fulfill the rapidly growing demands from emerging applications, especially those for machine intelligence, because much of the power and energy is consumed by constant data transfers between logic and memory modules. A new paradigm, called "computational random-access memory (CRAM)" has emerged to address this fundamental limitation. CRAM performs logic operations directly using the memory cells themselves, without having the data ever leave the memory. The energy and performance benefits of CRAM for both conventional and emerging applications have been well established by prior numerical studies. However, there lacks an experimental demonstration and study of CRAM to evaluate its computation accuracy, which is a realistic and application-critical metrics for its technological feasibility and competitiveness. In this work, a CRAM array based on magnetic tunnel junctions (MTJs) is experimentally demonstrated. First, basic memory operations as well as 2-, 3-, and 5-input logic operations are studied. Then, a 1-bit full adder with two different designs is demonstrated. Based on the experimental results, a suite of modeling has been developed to characterize the accuracy of CRAM computation. Further analysis of scalar addition, multiplication, and matrix multiplication shows promising results. These results are then applied to a complete application: a neural network based handwritten digit classifier, as an example to show the connection between the application performance and further MTJ development. The classifier achieved almost-perfect classification accuracy, with reasonable projections of future MTJ development. With the confirmation of MTJ-based CRAM's accuracy, there is a strong case that this technology will have a significant impact on power- and energy-demanding applications of machine intelligence.
Deep learning is usually described as an experiment-driven field under continuous criticizes of lacking theoretical foundations. This problem has been partially fixed by a large volume of literature which has so far not been well organized. This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analyzing the generalizability of deep learning; (2) stochastic differential equations and their dynamic systems for modelling stochastic gradient descent and its variants, which characterize the optimization and generalization of deep learning, partially inspired by Bayesian inference; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; (4) the roles of over-parameterization of deep neural networks from both positive and negative perspectives; (5) theoretical foundations of several special structures in network architectures; and (6) the increasingly intensive concerns in ethics and security and their relationships with generalizability.