As global governments intensify efforts to operationalize retail central bank digital currencies (CBDCs), the imperative for architectures that preserve user privacy has never been more pronounced. This paper advances an existing retail CBDC framework developed at University College London. Utilizing the capabilities of the Comet research framework, our proposed design allows users to retain direct custody of their assets without the need for intermediary service providers, all while preserving transactional anonymity. The study unveils a novel technique to expedite the retrieval of Proof of Provenance, significantly accelerating the verification of transaction legitimacy through the refinement of Merkle Trie structures. In parallel, we introduce a streamlined Digital Ledger designed to offer fast, immutable, and decentralized transaction validation within a permissioned ecosystem. The ultimate objective of this research is to benchmark the performance of the legacy system formulated by the original Comet research team against the newly devised system elucidated in this paper. Our endeavour is to establish a foundational design for a scalable national infrastructure proficient in seamlessly processing thousands of transactions in real-time, without compromising consumer privacy or data integrity.
Power grids are critical infrastructures of paramount importance to modern society and, therefore, engineered to operate under diverse conditions and failures. The ongoing energy transition poses new challenges for the decision-makers and system operators. Therefore, we must develop grid analysis algorithms to ensure reliable operations. These key tools include power flow analysis and system security analysis, both needed for effective operational and strategic planning. The literature review shows a growing trend of machine learning (ML) models that perform these analyses effectively. In particular, Graph Neural Networks (GNNs) stand out in such applications because of the graph-based structure of power grids. However, there is a lack of publicly available graph datasets for training and benchmarking ML models in electrical power grid applications. First, we present PowerGraph, which comprises GNN-tailored datasets for i) power flows, ii) optimal power flows, and iii) cascading failure analyses of power grids. Second, we provide ground-truth explanations for the cascading failure analysis. Finally, we perform a complete benchmarking of GNN methods for node-level and graph-level tasks and explainability. Overall, PowerGraph is a multifaceted GNN dataset for diverse tasks that includes power flow and fault scenarios with real-world explanations, providing a valuable resource for developing improved GNN models for node-level, graph-level tasks and explainability methods in power system modeling. The dataset is available at //figshare.com/articles/dataset/PowerGraph/22820534 and the code at //github.com/PowerGraph-Datasets.
Based on the analysis of the data obtainable from the Web of Science publication and citation database, typical signs of possible papermilling behavior are described, quantified, and illustrated by examples. A MATLAB function is provided for the analysis of the outputs from the Web of Science. A new quantitative indicator -- integrity index, or I-index -- is proposed for using it along with standard bibliographic and scientometric indicators.
A central challenge in the verification of quantum computers is benchmarking their performance as a whole and demonstrating their computational capabilities. In this work, we find a universal model of quantum computation, Bell sampling, that can be used for both of those tasks and thus provides an ideal stepping stone towards fault-tolerance. In Bell sampling, we measure two copies of a state prepared by a quantum circuit in the transversal Bell basis. We show that the Bell samples are classically intractable to produce and at the same time constitute what we call a circuit shadow: from the Bell samples we can efficiently extract information about the quantum circuit preparing the state, as well as diagnose circuit errors. In addition to known properties that can be efficiently extracted from Bell samples, we give several new and efficient protocols: an estimator of state fidelity, a test for the depth of the circuit and an algorithm to estimate a lower bound to the number of T gates in the circuit. With some additional measurements, our algorithm learns a full description of states prepared by circuits with low T-count.
This article aims to study efficient/trace optimal designs for crossover trials with multiple responses recorded from each subject in the time periods. A multivariate fixed effects model is proposed with direct and carryover effects corresponding to the multiple responses. The corresponding error dispersion matrix is chosen to be either of the proportional or the generalized Markov covariance type, permitting the existence of direct and cross-correlations within and between the multiple responses. The corresponding information matrices for direct effects under the two types of dispersions are used to determine efficient designs. The efficiency of orthogonal array designs of Type $I$ and strength $2$ is investigated for a wide choice of covariance functions, namely, Mat($0.5$), Mat($1.5$) and Mat($\infty$). To motivate these multivariate crossover designs, a gene expression dataset in a $3 \times 3$ framework is utilized.
Recurrent neural networks (RNNs) notoriously struggle to learn long-term memories, primarily due to vanishing and exploding gradients. The recent success of state-space models (SSMs), a subclass of RNNs, to overcome such difficulties challenges our theoretical understanding. In this paper, we delve into the optimization challenges of RNNs and discover that, as the memory of a network increases, changes in its parameters result in increasingly large output variations, making gradient-based learning highly sensitive, even without exploding gradients. Our analysis further reveals the importance of the element-wise recurrence design pattern combined with careful parametrizations in mitigating this effect. This feature is present in SSMs, as well as in other architectures, such as LSTMs. Overall, our insights provide a new explanation for some of the difficulties in gradient-based learning of RNNs and why some architectures perform better than others.
Evaluating observational estimators of causal effects demands information that is rarely available: unconfounded interventions and outcomes from the population of interest, created either by randomization or adjustment. As a result, it is customary to fall back on simulators when creating benchmark tasks. Simulators offer great control but are often too simplistic to make challenging tasks, either because they are hand-designed and lack the nuances of real-world data, or because they are fit to observational data without structural constraints. In this work, we propose a general, repeatable strategy for turning observational data into sequential structural causal models and challenging estimation tasks by following two simple principles: 1) fitting real-world data where possible, and 2) creating complexity by composing simple, hand-designed mechanisms. We implement these ideas in a highly configurable software package and apply it to the well-known Adult income data set to construct the \tt IncomeSCM simulator. From this, we devise multiple estimation tasks and sample data sets to compare established estimators of causal effects. The tasks present a suitable challenge, with effect estimates varying greatly in quality between methods, despite similar performance in the modeling of factual outcomes, highlighting the need for dedicated causal estimators and model selection criteria.
This study explores the impact of adversarial perturbations on Convolutional Neural Networks (CNNs) with the aim of enhancing the understanding of their underlying mechanisms. Despite numerous defense methods proposed in the literature, there is still an incomplete understanding of this phenomenon. Instead of treating the entire model as vulnerable, we propose that specific feature maps learned during training contribute to the overall vulnerability. To investigate how the hidden representations learned by a CNN affect its vulnerability, we introduce the Adversarial Intervention framework. Experiments were conducted on models trained on three well-known computer vision datasets, subjecting them to attacks of different nature. Our focus centers on the effects that adversarial perturbations to a model's initial layer have on the overall behavior of the model. Empirical results revealed compelling insights: a) perturbing selected channel combinations in shallow layers causes significant disruptions; b) the channel combinations most responsible for the disruptions are common among different types of attacks; c) despite shared vulnerable combinations of channels, different attacks affect hidden representations with varying magnitudes; d) there exists a positive correlation between a kernel's magnitude and its vulnerability. In conclusion, this work introduces a novel framework to study the vulnerability of a CNN model to adversarial perturbations, revealing insights that contribute to a deeper understanding of the phenomenon. The identified properties pave the way for the development of efficient ad-hoc defense mechanisms in future applications.
We report an implementation of the McMurchie-Davidson (MD) algorithm for 3-center and 4-center 2-particle integrals over Gaussian atomic orbitals (AOs) with low and high angular momenta $l$ and varying degrees of contraction for graphical processing units (GPUs). This work builds upon our recent implementation of a matrix form of the MD algorithm that is efficient for GPU evaluation of 4-center 2-particle integrals over Gaussian AOs of high angular momenta ($l\geq 4$) [$\mathit{J. Phys. Chem. A}\ \mathbf{127}$, 10889 (2023)]. The use of unconventional data layouts and three variants of the MD algorithm allow to evaluate integrals in double precision with sustained performance between 25% and 70% of the theoretical hardware peak. Performance assessment includes integrals over AOs with $l\leq 6$ (higher $l$ is supported). Preliminary implementation of the Hartree-Fock exchange operator is presented and assessed for computations with up to quadruple-zeta basis and more than 20,000 AOs. The corresponding C++ code is a part of the experimental open-source $\mathtt{LibintX}$ library available at $\mathbf{github.com:ValeevGroup/LibintX}$.
In the contemporary data landscape characterized by multi-source data collection and third-party sharing, ensuring individual privacy stands as a critical concern. While various anonymization methods exist, their utility preservation and privacy guarantees remain challenging to quantify. In this work, we address this gap by studying the utility and privacy of the spectral anonymization (SA) algorithm, particularly in an asymptotic framework. Unlike conventional anonymization methods that directly modify the original data, SA operates by perturbing the data in a spectral basis and subsequently reverting them to their original basis. Alongside the original version $\mathcal{P}$-SA, employing random permutation transformation, we introduce two novel SA variants: $\mathcal{J}$-spectral anonymization and $\mathcal{O}$-spectral anonymization, which employ sign-change and orthogonal matrix transformations, respectively. We show how well, under some practical assumptions, these SA algorithms preserve the first and second moments of the original data. Our results reveal, in particular, that the asymptotic efficiency of all three SA algorithms in covariance estimation is exactly 50% when compared to the original data. To assess the applicability of these asymptotic results in practice, we conduct a simulation study with finite data and also evaluate the privacy protection offered by these algorithms using distance-based record linkage. Our research reveals that while no method exhibits clear superiority in finite-sample utility, $\mathcal{O}$-SA distinguishes itself for its exceptional privacy preservation, never producing identical records, albeit with increased computational complexity. Conversely, $\mathcal{P}$-SA emerges as a computationally efficient alternative, demonstrating unmatched efficiency in mean estimation.
Knowledge graphs (KGs) of real-world facts about entities and their relationships are useful resources for a variety of natural language processing tasks. However, because knowledge graphs are typically incomplete, it is useful to perform knowledge graph completion or link prediction, i.e. predict whether a relationship not in the knowledge graph is likely to be true. This paper serves as a comprehensive survey of embedding models of entities and relationships for knowledge graph completion, summarizing up-to-date experimental results on standard benchmark datasets and pointing out potential future research directions.