Sharding distributed ledgers is a promising on-chain solution for scaling blockchains but lacks formal grounds, nurturing skepticism on whether such complex systems can scale blockchains securely. We fill this gap by introducing the first formal framework as well as a roadmap to robust sharding. In particular, we first define the properties sharded distributed ledgers should fulfill. We build upon and extend the Bitcoin backbone protocol by defining consistency and scalability. Consistency encompasses the need for atomic execution of cross-shard transactions to preserve safety, whereas scalability encapsulates the speedup a sharded system can gain in comparison to a non-sharded system. Using our model, we explore the limitations of sharding. We show that a sharded ledger with $n$ participants cannot scale under a fully adaptive adversary, but it can scale up to $m$ shards where $n=c'm\log m$, under an epoch-adaptive adversary; the constant $c'$ encompasses the trade-off between security and scalability. This is possible only if the sharded ledgers create succinct proofs of the valid state updates at every epoch. We leverage our results to identify the sufficient components for robust sharding, which we incorporate in a protocol abstraction termed Divide & Scale. To demonstrate the power of our framework, we analyze the most prominent sharded blockchains (Elastico, Monoxide, OmniLedger, RapidChain) and pinpoint where they fail to meet the desired properties.
We consider the problem of mixed sparse linear regression with two components, where two real $k$-sparse signals $\beta_1, \beta_2$ are to be recovered from $n$ unlabelled noisy linear measurements. The sparsity is allowed to be sublinear in the dimension, and additive noise is assumed to be independent Gaussian with variance $\sigma^2$. Prior work has shown that the problem suffers from a $\frac{k}{SNR^2}$-to-$\frac{k^2}{SNR^2}$ statistical-to-computational gap, resembling other computationally challenging high-dimensional inference problems such as Sparse PCA and Robust Sparse Mean Estimation; here $SNR$ is the signal-to-noise ratio. We establish the existence of a more extensive computational barrier for this problem through the method of low-degree polynomials, but show that the problem is computationally hard only in a very narrow symmetric parameter regime. We identify a smooth information-computation tradeoff between the sample complexity $n$ and runtime for any randomized algorithm in this hard regime. Via a simple reduction, this provides novel rigorous evidence for the existence of a computational barrier to solving exact support recovery in sparse phase retrieval with sample complexity $n = \tilde{o}(k^2)$. Our second contribution is to analyze a simple thresholding algorithm which, outside of the narrow regime where the problem is hard, solves the associated mixed regression detection problem in $O(np)$ time with square-root the number of samples and matches the sample complexity required for (non-mixed) sparse linear regression; this allows the recovery problem to be subsequently solved by state-of-the-art techniques from the dense case. As a special case of our results, we show that this simple algorithm is order-optimal among a large family of algorithms in solving exact signed support recovery in sparse linear regression.
Traditional blockchains grant the miner of a block full control not only over which transactions but also their order. This constitutes a major flaw discovered with the introduction of decentralized finance and allows miners to perform MEV attacks. In this paper, we address the issue of sandwich attacks by providing a construction that takes as input a blockchain protocol and outputs a new blockchain protocol with the same security but in which sandwich attacks are not profitable. Furthermore, our protocol is fully decentralized with no trusted third parties or heavy cryptography primitives and carries a linear increase in latency and minimum computation overhead.
Due to the diffusion of IoT, modern software systems are often thought to control and coordinate smart devices in order to manage assets and resources, and to guarantee efficient behaviours. For this class of systems, which interact extensively with humans and with their environment, it is thus crucial to guarantee their correct behaviour in order to avoid unexpected and possibly dangerous situations. In this paper we will present a framework that allows us to measure the robustness of systems. This is the ability of a program to tolerate changes in the environmental conditions and preserving the original behaviour. In the proposed framework, the interaction of a program with its environment is represented as a sequence of random variables describing how both evolve in time. For this reason, the considered measures will be defined among probability distributions of observed data. The proposed framework will be then used to define the notions of adaptability and reliability. The former indicates the ability of a program to absorb perturbation on environmental conditions after a given amount of time. The latter expresses the ability of a program to maintain its intended behaviour (up-to some reasonable tolerance) despite the presence of perturbations in the environment. Moreover, an algorithm, based on statistical inference, is proposed to evaluate the proposed metric and the aforementioned properties. We use two case studies to the describe and evaluate the proposed approach.
Motivated by control with communication constraints, in this work we develop a time-invariant data compression architecture for linear-quadratic-Gaussian (LQG) control with minimum bitrate prefix-free feedback. For any fixed control performance, the approach we propose nearly achieves known directed information (DI) lower bounds on the time-average expected codeword length. We refine the analysis of a classical achievability approach, which required quantized plant measurements to be encoded via a time-varying lossless source code. We prove that the sequence of random variables describing the quantizations has a limiting distribution and that the quantizations may be encoded with a fixed source code optimized for this distribution without added time-asymptotic redundancy. Our result follows from analyzing the long-term stochastic behavior of the system, and permits us to additionally guarantee that the time-average codeword length (as opposed to expected length) is almost surely within a few bits of the minimum DI. To our knowledge, this time-invariant achievability result is the first in the literature. The originally published version of the supplementary material included a proof that contained an error that turned out to be inconsequential. This updated preprint corrects this error, which originally appeared under Lemma A.7.
Centralized data silos are not only becoming prohibitively expensive but also raise issues of data ownership and data availability. These developments are affecting the industry, researchers, and ultimately society in general. Decentralized storage solutions present a promising alternative. Furthermore, such systems can become a crucial layer for new paradigms of edge-centric computing and web3 applications. Decentralized storage solutions based on p2p networks can enable scalable and self-sustaining open-source infrastructures. However, like other p2p systems, they require well-designed incentive mechanisms for participating peers. These mechanisms should be not only effective but also fair in regard to individual participants. Even though several such systems have been studied in deployment, there is still a lack of systematic understanding regarding these issues. We investigate the interplay between incentive mechanisms, network characteristics, and fairness of peer rewards. In particular, we identify and evaluate three core and up-to-date reward mechanisms for moving data in p2p networks: distance-based payments, reciprocity, and time-limited free service. Distance-based payments are relevant since libp2p Kademlia, which enables distance-based algorithms for content lookup and retrieval, is part of various modern p2p systems. We base our model on the Swarm network that uses a combination of the three mechanisms and serves as inspiration for our Tit-for-Token model. We present our Tit-for-Token model and develop a tool to explore the behaviors of these payment mechanisms. Our evaluation provides novel insights into the functioning and interplay of these mechanisms and helps. Based on these insights, we propose modifications to these mechanisms that better address fairness concerns and outline improvement proposals for the Swarm network.
Mobile Edge Computing (MEC) is expected to play a significant role in the development of 6G networks, as new applications such as cooperative driving and eXtended Reality (XR) require both communication and computational resources from the network edge. However, the limited capabilities of edge servers may be strained to perform complex computational tasks within strict latency bounds for multiple clients. In these contexts, both maintaining a low average Age of Information (AoI) and guaranteeing a low Peak AoI (PAoI) even in the worst case may have significant user experience and safety implications. In this work, we investigate a theoretical model of a MEC server, deriving the expected AoI and the PAoI and latency distributions under the First In First Out (FIFO) and Generalized Processor Sharing (GPS) resource allocation policies. We consider both synchronized and unsynchronized systems, and draw insights on the robust design of resource allocation policies from the analytical results.
Mutual coherence is a measure of similarity between two opinions. Although the notion comes from philosophy, it is essential for a wide range of technologies, e.g., the Wahl-O-Mat system. In Germany, this system helps voters to find candidates that are the closest to their political preferences. The exact computation of mutual coherence is highly time-consuming due to the iteration over all subsets of an opinion. Moreover, for every subset, an instance of the SAT model counting problem has to be solved which is known to be a hard problem in computer science. This work is the first study to accelerate this computation. We model the distribution of the so-called confirmation values as a mixture of three Gaussians and present efficient heuristics to estimate its model parameters. The mutual coherence is then approximated with the expected value of the distribution. Some of the presented algorithms are fully polynomial-time, others only require solving a small number of instances of the SAT model counting problem. The average squared error of our best algorithm lies below 0.0035 which is insignificant if the efficiency is taken into account. Furthermore, the accuracy is precise enough to be used in Wahl-O-Mat-like systems.
The Data Science domain has expanded monumentally in both research and industry communities during the past decade, predominantly owing to the Big Data revolution. Artificial Intelligence (AI) and Machine Learning (ML) are bringing more complexities to data engineering applications, which are now integrated into data processing pipelines to process terabytes of data. Typically, a significant amount of time is spent on data preprocessing in these pipelines, and hence improving its e fficiency directly impacts the overall pipeline performance. The community has recently embraced the concept of Dataframes as the de-facto data structure for data representation and manipulation. However, the most widely used serial Dataframes today (R, pandas) experience performance limitations while working on even moderately large data sets. We believe that there is plenty of room for improvement by taking a look at this problem from a high-performance computing point of view. In a prior publication, we presented a set of parallel processing patterns for distributed dataframe operators and the reference runtime implementation, Cylon [1]. In this paper, we are expanding on the initial concept by introducing a cost model for evaluating the said patterns. Furthermore, we evaluate the performance of Cylon on the ORNL Summit supercomputer.
Given a graph, the shortest-path problem requires finding a sequence of edges with minimum cumulative length that connects a source vertex to a target vertex. We consider a variant of this classical problem in which the position of each vertex in the graph is a continuous decision variable constrained in a convex set, and the length of an edge is a convex function of the position of its endpoints. Problems of this form arise naturally in many areas, from motion planning of autonomous vehicles to optimal control of hybrid systems. The price for such a wide applicability is the complexity of this problem, which is easily seen to be NP-hard. Our main contribution is a strong and lightweight mixed-integer convex formulation based on perspective operators, that makes it possible to efficiently find globally optimal paths in large graphs and in high-dimensional spaces.
Humans have a natural instinct to identify unknown object instances in their environments. The intrinsic curiosity about these unknown instances aids in learning about them, when the corresponding knowledge is eventually available. This motivates us to propose a novel computer vision problem called: `Open World Object Detection', where a model is tasked to: 1) identify objects that have not been introduced to it as `unknown', without explicit supervision to do so, and 2) incrementally learn these identified unknown categories without forgetting previously learned classes, when the corresponding labels are progressively received. We formulate the problem, introduce a strong evaluation protocol and provide a novel solution, which we call ORE: Open World Object Detector, based on contrastive clustering and energy based unknown identification. Our experimental evaluation and ablation studies analyze the efficacy of ORE in achieving Open World objectives. As an interesting by-product, we find that identifying and characterizing unknown instances helps to reduce confusion in an incremental object detection setting, where we achieve state-of-the-art performance, with no extra methodological effort. We hope that our work will attract further research into this newly identified, yet crucial research direction.