The Swapped Dragonfly with M routers per group and K global ports per router is denoted D3(K;M) [1]. It has n=KMM routers and is a partially populated Dragonfly. A Swapped Dragonfly with K and M restricted is studied in this paper. There are four cases. matrix product: If K is a perfect square, a matrix product of size n can be performed in squareroot n rounds. all-to-all exchange: If K and M have a common factor s, an all-to-all exchange can be performed in n/s rounds. broadcast: If D3(K,M) is equipped with a synchronized source-vector header it can perform x broadcast in 3x/M rounds. ascend-descend: If K and M are powers of 2 an ascend-descend algorithm can be performed at twice the cost of the algorithm on a Boolean hypercube of size n. In each case the algorithm on the Swapped Dragonfly is free of link conflicts and is compared with algorithms on a hypercube as well as on the fully populated Dragonfly. The results on the Swapped Dragonfly are more applicable than the special cases because D3(K,M) contains emulations of every Swapped Dragonfly with J less than equal to K and/or L less than or equal to M. Keywords: Swapped Interconnection Network, Matrix Product, All-to-all, Universal Exchange, Boolean Hypercube, Ascend-descend algorithm, Broad- cast, Edge-disjoint spanning tree. References [1] R. Draper. The Swapped Dragonfly , ArXiv for Computer Science:2202.01843. 1
Communication and computation are often viewed as separate tasks. This approach is very effective from the perspective of engineering as isolated optimizations can be performed. On the other hand, there are many cases where the main interest is a function of the local information at the devices instead of the local information itself. For such scenarios, information theoretical results show that harnessing the interference in a multiple-access channel for computation, i.e., over-the-air computation (OAC), can provide a significantly higher achievable computation rate than the one with the separation of communication and computation tasks. Besides, the gap between OAC and separation in terms of computation rate increases with more participating nodes. Given this motivation, in this study, we provide a comprehensive survey on practical OAC methods. After outlining fundamentals related to OAC, we discuss the available OAC schemes with their pros and cons. We then provide an overview of the enabling mechanisms and relevant metrics to achieve reliable computation in the wireless channel. Finally, we summarize the potential applications of OAC and point out some future directions.
We study the use of inverse harmonic Rayleigh quotients with target for the stepsize selection in gradient methods for nonlinear unconstrained optimization problems. This provides not only an elegant and flexible framework to parametrize and reinterpret existing stepsize schemes, but also gives inspiration for new flexible and tunable families of steplengths. In particular, we analyze and extend the adaptive Barzilai-Borwein method to a new family of stepsizes. While this family exploits negative values for the target, we also consider positive targets. We present a convergence analysis for quadratic problems extending results by Dai and Liao (2002), and carry out experiments outlining the potential of the approaches.
We apply the PAC-Bayes theory to the setting of learning-to-optimize. To the best of our knowledge, we present the first framework to learn optimization algorithms with provable generalization guarantees (PAC-bounds) and explicit trade-off between a high probability of convergence and a high convergence speed. Even in the limit case, where convergence is guaranteed, our learned optimization algorithms provably outperform related algorithms based on a (deterministic) worst-case analysis. Our results rely on PAC-Bayes bounds for general, unbounded loss-functions based on exponential families. By generalizing existing ideas, we reformulate the learning procedure into a one-dimensional minimization problem and study the possibility to find a global minimum, which enables the algorithmic realization of the learning procedure. As a proof-of-concept, we learn hyperparameters of standard optimization algorithms to empirically underline our theory.
High-value payment systems (HVPS) are typically liquidity-intensive as the payment requests are indivisible and settled on a gross basis. Finding the right order in which payments should be processed to maximize the liquidity efficiency of these systems is an $NP$-hard combinatorial optimization problem, which quantum algorithms may be able to tackle at meaningful scales. We developed an algorithm and ran it on a hybrid quantum annealing solver to find an ordering of payments that reduced the amount of system liquidity necessary without substantially increasing payment delays. Despite the limitations in size and speed of today's quantum computers, our algorithm provided quantifiable efficiency improvements when applied to the Canadian HVPS using a 30-day sample of transaction data. By reordering each batch of 70 payments as they entered the queue, we achieved an average of C\$240 million in daily liquidity savings, with a settlement delay of approximately 90 seconds. For a few days in the sample, the liquidity savings exceeded C\$1 billion. This algorithm could be incorporated as a centralized preprocessor into existing HVPS without entailing a fundamental change to their risk management models.
We consider the problem of quantum state certification, where we are given the description of a mixed state $\sigma \in \mathbb{C}^{d \times d}$, $n$ copies of a mixed state $\rho \in \mathbb{C}^{d \times d}$, and $\varepsilon > 0$, and we are asked to determine whether $\rho = \sigma$ or whether $\| \rho - \sigma \|_1 > \varepsilon$. When $\sigma$ is the maximally mixed state $\frac{1}{d} I_d$, this is known as mixedness testing. We focus on algorithms which use incoherent measurements, i.e. which only measure one copy of $\rho$ at a time. Unlike those that use entangled, multi-copy measurements, these can be implemented without persistent quantum memory and thus represent a large class of protocols that can be run on current or near-term devices. For mixedness testing, there is a folklore algorithm which uses incoherent measurements and only needs $O(d^{3/2} / \varepsilon^2)$ copies. The algorithm is non-adaptive, that is, its measurements are fixed ahead of time, and is known to be optimal for non-adaptive algorithms. However, when the algorithm can make arbitrary incoherent measurements, the best known lower bound is only $\Omega (d^{4/3} / \varepsilon^2)$ [Bubeck-Chen-Li '20], and it has been an outstanding open problem to close this polynomial gap. In this work, 1) we settle the copy complexity of mixedness testing with incoherent measurements and show that $\Omega (d^{3/2} / \varepsilon^2)$ copies are necessary, and 2) we show the instance-optimal bounds for state certification to general $\sigma$ first derived by [Chen-Li-O'Donnell '21] for non-adaptive measurements also hold for arbitrary incoherent measurements. Qualitatively, our results say that adaptivity does not help at all for these problems. Our results are based on new techniques that allow us to reduce the problem to understanding certain matrix martingales, which we believe may be of independent interest.
In this paper we present ODIN, a front-running protection system that uses a novel algorithm to measure Round-Trip-Time (RTT) to untrusted servers. ODIN is the decentralized equivalent of THOR, a RTT-aware front-running protection system for trading on centralized exchanges. Unlike centralized exchanges, P2P exchanges have potentially malicious peers which makes reliable direct RTT measurement impossible. In order to prevent tampering by an arbitrarily malicious peer, ODIN performs an indirect RTT measurement that never interacts directly with the target machine. The RTT to the target is estimated by measuring the RTT to a randomized IP address that is known to be close to the target's IP address in the global routing network. We find that ODIN's RTT estimation algorithm provides an accurate, practical, and generic solution for collecting network latency data in a hostile network environment.
The goal of this paper is to investigate a family of optimization problems arising from list homomorphisms, and to understand what the best possible algorithms are if we restrict the problem to bounded-treewidth graphs. For a fixed $H$, the input of the optimization problem LHomVD($H$) is a graph $G$ with lists $L(v)$, and the task is to find a set $X$ of vertices having minimum size such that $(G-X,L)$ has a list homomorphism to $H$. We define analogously the edge-deletion variant LHomED($H$). This expressive family of problems includes members that are essentially equivalent to fundamental problems such as Vertex Cover, Max Cut, Odd Cycle Transversal, and Edge/Vertex Multiway Cut. For both variants, we first characterize those graphs $H$ that make the problem polynomial-time solvable and show that the problem is NP-hard for every other fixed $H$. Second, as our main result, we determine for every graph $H$ for which the problem is NP-hard, the smallest possible constant $c_H$ such that the problem can be solved in time $c^t_H\cdot n^{O(1)}$ if a tree decomposition of $G$ having width $t$ is given in the input, assuming the SETH. Let $i(H)$ be the maximum size of a set of vertices in $H$ that have pairwise incomparable neighborhoods. For the vertex-deletion variant LHomVD($H$), we show that the smallest possible constant is $i(H)+1$ for every $H$. The situation is more complex for the edge-deletion version. For every $H$, one can solve LHomED($H$) in time $i(H)^t\cdot n^{O(1)}$ if a tree decomposition of width $t$ is given. However, the existence of a specific type of decomposition of $H$ shows that there are graphs $H$ where LHomED($H$) can be solved significantly more efficiently and the best possible constant can be arbitrarily smaller than $i(H)$. Nevertheless, we determine this best possible constant and (assuming the SETH) prove tight bounds for every fixed $H$.
Radial basis functions are typically used when discretization sche-mes require inhomogeneous node distributions. While spawning from a desire to interpolate functions on a random set of nodes, they have found successful applications in solving many types of differential equations. However, the weights of the interpolated solution, used in the linear superposition of basis functions to interpolate the solution, and the actual value of the solution are completely different. In fact, these weights mix the value of the solution with the geometrical location of the nodes used to discretize the equation. In this paper, we used nodal radial basis functions, which are interpolants of the impulse function at each node inside the domain. This transformation allows to solve a linear hyperbolic partial differential equation using series expansion rather than the explicit computation of a matrix inverse. This transformation effectively yields an implicit solver which only requires the multiplication of vectors with matrices. Because the solver requires neither matrix inverse nor matrix-matrix products, this approach is numerically more stable and reduces the error by at least two orders of magnitude, compared to other solvers using radial basis functions directly. Further, boundary conditions are integrated directly inside the solver, at no extra cost. The method is naturally conservative, keeping the error virtually constant throughout the computation.
Machine learning (ML) formalizes the problem of getting computers to learn from experience as optimization of performance according to some metric(s) on a set of data examples. This is in contrast to requiring behaviour specified in advance (e.g. by hard-coded rules). Formalization of this problem has enabled great progress in many applications with large real-world impact, including translation, speech recognition, self-driving cars, and drug discovery. But practical instantiations of this formalism make many assumptions - for example, that data are i.i.d.: independent and identically distributed - whose soundness is seldom investigated. And in making great progress in such a short time, the field has developed many norms and ad-hoc standards, focused on a relatively small range of problem settings. As applications of ML, particularly in artificial intelligence (AI) systems, become more pervasive in the real world, we need to critically examine these assumptions, norms, and problem settings, as well as the methods that have become de-facto standards. There is much we still do not understand about how and why deep networks trained with stochastic gradient descent are able to generalize as well as they do, why they fail when they do, and how they will perform on out-of-distribution data. In this thesis I cover some of my work towards better understanding deep net generalization, identify several ways assumptions and problem settings fail to generalize to the real world, and propose ways to address those failures in practice.
The age of information minimization problems has been extensively studied in Real-time monitoring applications frameworks. In this paper, we consider the problem of monitoring the states of unknown remote source that evolves according to a Markovian Process. A central scheduler decides at each time slot whether to schedule the source or not in order to receive the new status updates in such a way as to minimize the Mean Age of Incorrect Information (MAoII). When the scheduler knows the source parameters, we formulate the minimization problem as an MDP problem. Then, we prove that the optimal solution is a threshold-based policy. When the source's parameters are unknown, the problem's difficulty lies in finding a strategy with a good trade-off between exploitation and exploration. Indeed, we need to provide an algorithm implemented by the scheduler that jointly estimates the unknown parameters (exploration) and minimizes the MAoII (exploitation). However, considering our system model, we can only explore the source if the monitor decides to schedule it. Then, applying the greedy approach, we risk definitively stopping the exploration process in the case where at a specific time, we end up with an estimation of the Markovian source's parameters to which the corresponding optimal solution is never to transmit. In this case, we can no longer improve the estimation of our unknown parameters, which may significantly detract from the performance of the algorithm. For that, we develop a new learning algorithm that gives a good balance between exploration and exploitation to avoid this main problem. Then, we theoretically analyze the performance of our algorithm compared to a genie solution by proving that the regret bound at time T is log(T). Finally, we provide some numerical results to highlight the performance of our derived policy compared to the greedy approach.