两个人的视频免费国语版_爆乳护士一区二区三区在线播放_亚洲国产日韩在线制服丝袜_亚洲男人视频网站_日韩AV电影在线播放_我看欧美黄片变态黄片乱伦黄片_亚洲中文字幕一区第三页

Tensor networks are nowadays the backbone of classical simulations of quantum many-body systems and quantum circuits. Most tensor methods rely on the fact that we can eventually contract the tensor network to obtain the final result. While the contraction operation itself is trivial, its execution time is highly dependent on the order in which the contractions are performed. To this end, one tries to find beforehand an optimal order in which the contractions should be performed. However, there is a drawback: the general problem of finding the optimal contraction order is NP-complete. Therefore, one must settle for a mixture of exponential algorithms for small problems, e.g., $n \leq 20$, and otherwise hope for good contraction orders. For this reason, previous research has focused on the latter part, trying to find better heuristics. In this work, we take a more conservative approach and show that tree tensor networks accept optimal linear contraction orders. Beyond the optimality results, we adapt two join ordering techniques that can build on our work to guarantee near-optimal orders for arbitrary tensor networks.

相關內容

收(shou)縮

關注 0

直徑 · 類別 · 切割 · 無線電網絡 · 能量成本 ·

2023 年 4 月 10 日

The Energy Complexity of Diameter and Minimum Cut Computation in Bounded-Genus Networks

Yi-Jun Chang

from arxiv, Removing results that were already moved to arXiv:2007.09816. Polishing the writing. Changing the title. To appear in SIROCCO 2023

This paper investigates the energy complexity of distributed graph problems in multi-hop radio networks, where the energy cost of an algorithm is measured by the maximum number of awake rounds of a vertex. Recent works revealed that some problems, such as broadcast, breadth-first search, and maximal matching, can be solved with energy-efficient algorithms that consume only $\text{poly} \log n$ energy. However, there exist some problems, such as computing the diameter of the graph, that require $\Omega(n)$ energy to solve. To improve energy efficiency for these problems, we focus on a special graph class: bounded-genus graphs. We present algorithms for computing the exact diameter, the exact global minimum cut size, and a $(1 \pm\epsilon)$-approximate $s$-$t$ minimum cut size with $\tilde{O}(\sqrt{n})$ energy for bounded-genus graphs. Our approach is based on a generic framework that divides the vertex set into high-degree and low-degree parts and leverages the structural properties of bounded-genus graphs to control the number of certain connected components in the subgraph induced by the low-degree part.

隨機投影 · 精度 · 投影 · 混合 · 隨機矩陣 ·

2023 年 4 月 10 日

Mixed-Precision Random Projection for RandNLA on Tensor Cores

Hiroyuki Ootomo,Rio Yokota

from arxiv, PASC'23

Random projection can reduce the dimension of data while capturing its structure and is a fundamental tool for machine learning, signal processing, and information retrieval, which deal with a large amount of data today. RandNLA (Randomized Numerical Linear Algebra) leverages random projection to reduce the computational complexity of low-rank decomposition of tensors and solve least-square problems. While the computation of the random projection is a simple matrix multiplication, its asymptotic computational complexity is typically larger than other operations in a RandNLA algorithm. Therefore, various studies propose methods for reducing its computational complexity. We propose a fast mixed-precision random projection method on NVIDIA GPUs using Tensor Cores for single-precision tensors. We exploit the fact that the random matrix requires less precision, and develop a highly optimized matrix multiplication between FP32 and FP16 matrices -- SHGEMM (Single and Half-precision GEMM) -- on Tensor Cores, where the random matrix is stored in FP16. Our method can compute Randomized SVD 1.28 times faster and Random projection high order SVD 1.75 times faster than baseline single-precision implementations while maintaining accuracy.

近似 · 最優 · 分布式算法 · 回合 · 算法 ·

2023 年 4 月 7 日

Submodular Maximization with Nearly Optimal Approximation, Adaptivity and Query Complexity

Matthew Fahrbach,Vahab Mirrokni,Morteza Zadimoghaddam

from arxiv, 30 pages. This version fixes minor bugs with the definition of $I_t$ and the termination condition of Algorithm 5. We also update all theorem statements to explicitly assume monotone submodular functions

Submodular optimization generalizes many classic problems in combinatorial optimization and has recently found a wide range of applications in machine learning (e.g., feature engineering and active learning). For many large-scale optimization problems, we are often concerned with the adaptivity complexity of an algorithm, which quantifies the number of sequential rounds where polynomially-many independent function evaluations can be executed in parallel. While low adaptivity is ideal, it is not sufficient for a distributed algorithm to be efficient, since in many practical applications of submodular optimization the number of function evaluations becomes prohibitively expensive. Motivated by these applications, we study the adaptivity and query complexity of adaptive submodular optimization. Our main result is a distributed algorithm for maximizing a monotone submodular function with cardinality constraint $k$ that achieves a $(1-1/e-\varepsilon)$-approximation in expectation. This algorithm runs in $O(\log(n))$ adaptive rounds and makes $O(n)$ calls to the function evaluation oracle in expectation. The approximation guarantee and query complexity are optimal, and the adaptivity is nearly optimal. Moreover, the number of queries is substantially less than in previous works. Last, we extend our results to the submodular cover problem to demonstrate the generality of our algorithm and techniques.

非光滑 · 約束條件 · 塊坐標下降 · 坐標下降 · 正交 ·

2023 年 4 月 7 日

A Block Coordinate Descent Method for Nonsmooth Composite Optimization under Orthogonality Constraints

Ganzhao Yuan

Nonsmooth composite optimization with orthogonality constraints has a broad spectrum of applications in statistical learning and data science. However, this problem is generally challenging to solve due to its non-convex and non-smooth nature. Existing solutions are limited by one or more of the following restrictions: (i) they are full gradient methods that require high computational costs in each iteration; (ii) they are not capable of solving general nonsmooth composite problems; (iii) they are infeasible methods and can only achieve the feasibility of the solution at the limit point; (iv) they lack rigorous convergence guarantees; (v) they only obtain weak optimality of critical points. In this paper, we propose \textit{\textbf{OBCD}}, a new Block Coordinate Descent method for solving general nonsmooth composite problems under Orthogonality constraints. \textit{\textbf{OBCD}} is a feasible method with low computation complexity footprints. In each iteration, our algorithm updates $k$ rows of the solution matrix ($k\geq2$ is a parameter) to preserve the constraints. Then, it solves a small-sized nonsmooth composite optimization problem under orthogonality constraints either exactly or approximately. We demonstrate that any exact block-$k$ stationary point is always an approximate block-$k$ stationary point, which is equivalent to the critical stationary point. We are particularly interested in the case where $k=2$ as the resulting subproblem reduces to a one-dimensional nonconvex problem. We propose a breakpoint searching method and a fifth-order iterative method to solve this problem efficiently and effectively. We also propose two novel greedy strategies to find a good working set to further accelerate the convergence of \textit{\textbf{OBCD}}. Finally, we have conducted extensive experiments on several tasks to demonstrate the superiority of our approach.

收斂速率 · 光滑函數 · 多項式時間 · 優化問題 · 最優 ·

2023 年 4 月 6 日

Convergence Rates for Non-Log-Concave Sampling and Log-Partition Estimation

David Holzmüller,Francis Bach

from arxiv, Changes in v2: Minor corrections and formatting changes. Plots can be reproduced using the code at //github.com/dholzmueller/sampling_experiments

Sampling from Gibbs distributions $p(x) \propto \exp(-V(x)/\varepsilon)$ and computing their log-partition function are fundamental tasks in statistics, machine learning, and statistical physics. However, while efficient algorithms are known for convex potentials $V$, the situation is much more difficult in the non-convex case, where algorithms necessarily suffer from the curse of dimensionality in the worst case. For optimization, which can be seen as a low-temperature limit of sampling, it is known that smooth functions $V$ allow faster convergence rates. Specifically, for $m$-times differentiable functions in $d$ dimensions, the optimal rate for algorithms with $n$ function evaluations is known to be $O(n^{-m/d})$, where the constant can potentially depend on $m, d$ and the function to be optimized. Hence, the curse of dimensionality can be alleviated for smooth functions at least in terms of the convergence rate. Recently, it has been shown that similarly fast rates can also be achieved with polynomial runtime $O(n^{3.5})$, where the exponent $3.5$ is independent of $m$ or $d$. Hence, it is natural to ask whether similar rates for sampling and log-partition computation are possible, and whether they can be realized in polynomial time with an exponent independent of $m$ and $d$. We show that the optimal rates for sampling and log-partition computation are sometimes equal and sometimes faster than for optimization. We then analyze various polynomial-time sampling algorithms, including an extension of a recent promising optimization approach, and find that they sometimes exhibit interesting behavior but no near-optimal rates. Our results also give further insights on the relation between sampling, log-partition, and optimization problems.

進化算法 · 排列 · 算法 · 隨機矩陣 · 概率 ·

2023 年 4 月 5 日

Doubly Stochastic Matrix Models for Estimation of Distribution Algorithms

Valentino Santucci,Josu Ceberio

from arxiv, Preprint of the paper accepted at ACM GECCO 2023

Problems with solutions represented by permutations are very prominent in combinatorial optimization. Thus, in recent decades, a number of evolutionary algorithms have been proposed to solve them, and among them, those based on probability models have received much attention. In that sense, most efforts have focused on introducing algorithms that are suited for solving ordering/ranking nature problems. However, when it comes to proposing probability-based evolutionary algorithms for assignment problems, the works have not gone beyond proposing simple and in most cases univariate models. In this paper, we explore the use of Doubly Stochastic Matrices (DSM) for optimizing matching and assignment nature permutation problems. To that end, we explore some learning and sampling methods to efficiently incorporate DSMs within the picture of evolutionary algorithms. Specifically, we adopt the framework of estimation of distribution algorithms and compare DSMs to some existing proposals for permutation problems. Conducted preliminary experiments on instances of the quadratic assignment problem validate this line of research and show that DSMs may obtain very competitive results, while computational cost issues still need to be further investigated.

超圖 · 分割 · 近似 · 分層 · 約束 ·

2023 年 4 月 5 日

Partitioning Hypergraphs is Hard: Models, Inapproximability, and Applications

Pál András Papp,Georg Anegg,A. N. Yzelman

from arxiv, Published in the 35th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2023)

We study the balanced $k$-way hypergraph partitioning problem, with a special focus on its practical applications to manycore scheduling. Given a hypergraph on $n$ nodes, our goal is to partition the node set into $k$ parts of size at most $(1+\epsilon)\cdot \frac{n}{k}$ each, while minimizing the cost of the partitioning, defined as the number of cut hyperedges, possibly also weighted by the number of partitions they intersect. We show that this problem cannot be approximated to within a $n^{1/\text{poly} \log\log n}$ factor of the optimal solution in polynomial time if the Exponential Time Hypothesis holds, even for hypergraphs of maximal degree 2. We also study the hardness of the partitioning problem from a parameterized complexity perspective, and in the more general case when we have multiple balance constraints. Furthermore, we consider two extensions of the partitioning problem that are motivated from practical considerations. Firstly, we introduce the concept of hyperDAGs to model precedence-constrained computations as hypergraphs, and we analyze the adaptation of the balanced partitioning problem to this case. Secondly, we study the hierarchical partitioning problem to model hierarchical NUMA (non-uniform memory access) effects in modern computer architectures, and we show that ignoring this hierarchical aspect of the communication cost can yield significantly weaker solutions.

混合 · 最優 · 算法 · 均勻分布 · 馬爾科夫鏈 ·

2023 年 4 月 4 日

Strong spatial mixing for colorings on trees and its algorithmic applications

Zongchen Chen,Kuikui Liu,Nitya Mani,Ankur Moitra

from arxiv, 51 pages, 6 page appendix

Strong spatial mixing (SSM) is an important quantitative notion of correlation decay for Gibbs distributions arising in statistical physics, probability theory, and theoretical computer science. A longstanding conjecture is that the uniform distribution on proper $q$-colorings on a $\Delta$-regular tree exhibits SSM whenever $q \ge \Delta+1$. Moreover, it is widely believed that as long as SSM holds on bounded-degree trees with $q$ colors, one would obtain an efficient sampler for $q$-colorings on all bounded-degree graphs via simple Markov chain algorithms. It is surprising that such a basic question is still open, even on trees, but then again it also highlights how much we still have to learn about random colorings. In this paper, we show the following: (1) For any $\Delta \ge 3$, SSM holds for random $q$-colorings on trees of maximum degree $\Delta$ whenever $q \ge \Delta + 3$. Thus we almost fully resolve the aforementioned conjecture. Our result substantially improves upon the previously best bound which requires $q \ge 1.59\Delta+\gamma^*$ for an absolute constant $\gamma^* > 0$. (2) For any $\Delta\ge 3$ and girth $g = \Omega_\Delta(1)$, we establish optimal mixing of the Glauber dynamics for $q$-colorings on graphs of maximum degree $\Delta$ and girth $g$ whenever $q \ge \Delta+3$. Our approach is based on a new general reduction from spectral independence on large-girth graphs to SSM on trees that is of independent interest. Using the same techniques, we also prove near-optimal bounds on weak spatial mixing (WSM), a closely-related notion to SSM, for the antiferromagnetic Potts model on trees.

ReLU神經網絡 · 非參數 · ReLU · 神經網絡 · 過度參數化 ·

2023 年 4 月 4 日

Optimal rates of approximation by shallow ReLU$^k$ neural networks and applications to nonparametric regression

Yunfei Yang,Ding-Xuan Zhou

We study the approximation capacity of some variation spaces corresponding to shallow ReLU$^k$ neural networks. It is shown that sufficiently smooth functions are contained in these spaces with finite variation norms. For functions with less smoothness, the approximation rates in terms of the variation norm are established. Using these results, we are able to prove the optimal approximation rates in terms of the number of neurons for shallow ReLU$^k$ neural networks. It is also shown how these results can be used to derive approximation bounds for deep neural networks and convolutional neural networks (CNNs). As applications, we study convergence rates for nonparametric regression using three ReLU neural network models: shallow neural network, over-parameterized neural network, and CNN. In particular, we show that shallow neural networks can achieve the minimax optimal rates for learning H\"older functions, which complements recent results for deep neural networks. It is also proven that over-parameterized (deep or shallow) neural networks can achieve nearly optimal rates for nonparametric regression.

聯合學習 · 均值 · 最優 · 頻率估計 · 差分 ·

2023 年 4 月 4 日

Privacy Amplification via Compression: Achieving the Optimal Privacy-Accuracy-Communication Trade-off in Distributed Mean Estimation

Wei-Ning Chen,Dan Song,Ayfer Ozgur,Peter Kairouz

Privacy and communication constraints are two major bottlenecks in federated learning (FL) and analytics (FA). We study the optimal accuracy of mean and frequency estimation (canonical models for FL and FA respectively) under joint communication and $(\varepsilon, \delta)$-differential privacy (DP) constraints. We show that in order to achieve the optimal error under $(\varepsilon, \delta)$-DP, it is sufficient for each client to send $\Theta\left( n \min\left(\varepsilon, \varepsilon^2\right)\right)$ bits for FL and $\Theta\left(\log\left( n\min\left(\varepsilon, \varepsilon^2\right) \right)\right)$ bits for FA to the server, where $n$ is the number of participating clients. Without compression, each client needs $O(d)$ bits and $\log d$ bits for the mean and frequency estimation problems respectively (where $d$ corresponds to the number of trainable parameters in FL or the domain size in FA), which means that we can get significant savings in the regime $ n \min\left(\varepsilon, \varepsilon^2\right) = o(d)$, which is often the relevant regime in practice. Our algorithms leverage compression for privacy amplification: when each client communicates only partial information about its sample, we show that privacy can be amplified by randomly selecting the part contributed by each client.