成人不卡顿免费视频在线_甜味弥漫一区二区在线观看_人人澡人人澡人人看欧美_国产综合久久精品东京热中文字幕一区_午夜无码专区免费看片_精品欧美一区二区精品久久_色欲AV无码片一区二区三区

We show that any bounded integral function $f : A \times B \mapsto \{0,1, \dots, \Delta\}$ with rank $r$ has deterministic communication complexity $\Delta^{O(\Delta)} \cdot \sqrt{r} \cdot \log^2 r$, where the rank of $f$ is defined to be the rank of the $A \times B$ matrix whose entries are the function values. As a corollary, we show that any $n$-dimensional polytope that admits a slack matrix with entries from $\{0,1,\dots,\Delta\}$ has extension complexity at most $\exp(\Delta^{O(\Delta)} \cdot \sqrt{n} \cdot \log^2 n)$.

相關內容

Extensibility

關注 5

iOS 8 提供的應用間和應用跟系統的功能交互特性。

Today (iOS and OS X): widgets for the Today view of Notification Center
Share (iOS and OS X): post content to web services or share content with others
Actions (iOS and OS X): app extensions to view or manipulate inside another app
Photo Editing (iOS): edit a photo or video in Apple's Photos app with extensions from a third-party apps
Finder Sync (OS X): remote file storage in the Finder with support for Finder content annotation
Storage Provider (iOS): an interface between files inside an app and other apps on a user's device
Custom Keyboard (iOS): system-wide alternative keyboards

Source:

非凸 · INTERACT · 蒙特卡羅 · 容差 · 估計/估計量 ·

2024 年 2 月 7 日

Extending the Reach of First-Order Algorithms for Nonconvex Min-Max Problems with Cohypomonotonicity

Ahmet Alacaoglu,Donghwan Kim,Stephen J. Wright

We focus on constrained, $L$-smooth, nonconvex-nonconcave min-max problems either satisfying $\rho$-cohypomonotonicity or admitting a solution to the $\rho$-weakly Minty Variational Inequality (MVI), where larger values of the parameter $\rho>0$ correspond to a greater degree of nonconvexity. These problem classes include examples in two player reinforcement learning, interaction dominant min-max problems, and certain synthetic test problems on which classical min-max algorithms fail. It has been conjectured that first-order methods can tolerate value of $\rho$ no larger than $\frac{1}{L}$, but existing results in the literature have stagnated at the tighter requirement $\rho < \frac{1}{2L}$. With a simple argument, we obtain optimal or best-known complexity guarantees with cohypomonotonicity or weak MVI conditions for $\rho < \frac{1}{L}$. The algorithms we analyze are inexact variants of Halpern and Krasnosel'ski\u{\i}-Mann (KM) iterations. We also provide algorithms and complexity guarantees in the stochastic case with the same range on $\rho$. Our main insight for the improvements in the convergence analyses is to harness the recently proposed "conic nonexpansiveness" property of operators. As byproducts, we provide a refined analysis for inexact Halpern iteration and propose a stochastic KM iteration with a multilevel Monte Carlo estimator.

有向 · 泛函 · 相互獨立的 · 相關系數 · GROUP ·

2024 年 2 月 7 日

Constant Degree Direct Product Testers with Small Soundness

Mitali Bafna,Noam Lifshitz,Dor Minzer

Let $X$ be a $d$-dimensional simplicial complex. A function $F\colon X(k)\to \{0,1\}^k$ is said to be a direct product function if there exists a function $f\colon X(1)\to \{0,1\}$ such that $F(\sigma) = (f(\sigma_1), \ldots, f(\sigma_k))$ for each $k$-face $\sigma$. In an effort to simplify components of the PCP theorem, Goldreich and Safra introduced the problem of direct product testing, which asks whether one can test if $F\colon X(k)\to \{0,1\}^k$ is correlated with a direct product function by querying $F$ on only $2$ inputs. Dinur and Kaufman conjectured that there exist bounded degree complexes with a direct product test in the small soundness regime. We resolve their conjecture by showing that for all $\delta>0$, there exists a family of high-dimensional expanders with degree $O_{\delta}(1)$ and a $2$-query direct product tester with soundness $\delta$. We use the characterization given by a subset of the authors and independently by Dikstein and Dinur, who showed that some form of non-Abelian coboundary expansion (which they called "Unique-Games coboundary expansion") is a necessary and sufficient condition for a complex to admit such direct product testers. Our main technical contribution is a general technique for showing coboundary expansion of complexes with coefficients in a non-Abelian group. This allows us to prove that the high dimensional expanders constructed by Chapman and Lubotzky satisfies the necessary conditions, thus admitting a 2-query direct product tester with small soundness.

縮放 · 查準率/準確率 · Analysis · 線性的 · 可約的 ·

2024 年 2 月 7 日

Strongly Polynomial Frame Scaling to High Precision

Daniel Dadush,Akshay Ramachandran

from arxiv, Comments welcome

The frame scaling problem is: given vectors $U := \{u_{1}, ..., u_{n} \} \subseteq \mathbb{R}^{d}$, marginals $c \in \mathbb{R}^{n}_{++}$, and precision $\varepsilon > 0$, find left and right scalings $L \in \mathbb{R}^{d \times d}, r \in \mathbb{R}^n$ such that $(v_1,\dots,v_n) := (Lu_1 r_1,\dots,Lu_nr_n)$ simultaneously satisfies $\sum_{i=1}^n v_i v_i^{\mathsf{T}} = I_d$ and $\|v_{j}\|_{2}^{2} = c_{j}, \forall j \in [n]$, up to error $\varepsilon$. This problem has appeared in a variety of fields throughout linear algebra and computer science. In this work, we give a strongly polynomial algorithm for frame scaling with $\log(1/\varepsilon)$ convergence. This answers a question of Diakonikolas, Tzamos and Kane (STOC 2023), who gave the first strongly polynomial randomized algorithm with poly$(1/\varepsilon)$ convergence for the special case $c = \frac{d}{n} 1_{n}$. Our algorithm is deterministic, applies for general $c \in \mathbb{R}^{n}_{++}$, and requires $O(n^{3} \log(n/\varepsilon))$ iterations as compared to $O(n^{5} d^{11}/\varepsilon^{5})$ iterations of DTK. By lifting the framework of Linial, Samorodnitsky and Wigderson (Combinatorica 2000) for matrix scaling to frames, we are able to simplify both the algorithm and analysis. Our main technical contribution is to generalize the potential analysis of LSW to the frame setting and compute an update step in strongly polynomial time that achieves geometric progress in each iteration. In fact, we can adapt our results to give an improved analysis of strongly polynomial matrix scaling, reducing the $O(n^{5} \log(n/\varepsilon))$ iteration bound of LSW to $O(n^{3} \log(n/\varepsilon))$. Additionally, we prove a novel bound on the size of approximate frame scaling solutions, involving the condition measure $\bar{\chi}$ studied in the linear programming literature, which may be of independent interest.

Integration · ML · 回火 · MoDELS · 黎曼積分 ·

2024 年 2 月 6 日

Tempered Calculus for ML: Application to Hyperbolic Model Embedding

Richard Nock,Ehsan Amid,Frank Nielsen,Alexander Soen,Manfred K. Warmuth

Most mathematical distortions used in ML are fundamentally integral in nature: $f$-divergences, Bregman divergences, (regularized) optimal transport distances, integral probability metrics, geodesic distances, etc. In this paper, we unveil a grounded theory and tools which can help improve these distortions to better cope with ML requirements. We start with a generalization of Riemann integration that also encapsulates functions that are not strictly additive but are, more generally, $t$-additive, as in nonextensive statistical mechanics. Notably, this recovers Volterra's product integral as a special case. We then generalize the Fundamental Theorem of calculus using an extension of the (Euclidean) derivative. This, along with a series of more specific Theorems, serves as a basis for results showing how one can specifically design, alter, or change fundamental properties of distortion measures in a simple way, with a special emphasis on geometric- and ML-related properties that are the metricity, hyperbolicity, and encoding. We show how to apply it to a problem that has recently gained traction in ML: hyperbolic embeddings with a "cheap" and accurate encoding along the hyperbolic vs Euclidean scale. We unveil a new application for which the Poincar\'e disk model has very appealing features, and our theory comes in handy: \textit{model} embeddings for boosted combinations of decision trees, trained using the log-loss (trees) and logistic loss (combinations).

奇異的 · MoDELS · 線性的 · Weight · Analysis ·

2024 年 2 月 6 日

Analysis of Linear Mode Connectivity via Permutation-Based Weight Matching

Akira Ito,Masanori Yamada,Atsutoshi Kumagai

from arxiv, 20 pages

Recently, Ainsworth et al. showed that using weight matching (WM) to minimize the $L_2$ distance in a permutation search of model parameters effectively identifies permutations that satisfy linear mode connectivity (LMC), in which the loss along a linear path between two independently trained models with different seeds remains nearly constant. This paper provides a theoretical analysis of LMC using WM, which is crucial for understanding stochastic gradient descent's effectiveness and its application in areas like model merging. We first experimentally and theoretically show that permutations found by WM do not significantly reduce the $L_2$ distance between two models and the occurrence of LMC is not merely due to distance reduction by WM in itself. We then provide theoretical insights showing that permutations can change the directions of the singular vectors, but not the singular values, of the weight matrices in each layer. This finding shows that permutations found by WM mainly align the directions of singular vectors associated with large singular values across models. This alignment brings the singular vectors with large singular values, which determine the model functionality, closer between pre-merged and post-merged models, so that the post-merged model retains functionality similar to the pre-merged models, making it easy to satisfy LMC. Finally, we analyze the difference between WM and straight-through estimator (STE), a dataset-dependent permutation search method, and show that WM outperforms STE, especially when merging three or more models.

CASES · 無偏 · 向量化 · 稀疏 · 標準正交 ·

2024 年 2 月 6 日

Almost Perfect Mutually Unbiased Bases that are Sparse

Ajeet Kumar,Subhamoy Maitra,Somjit Roy

In dimension $d$, Mutually Unbiased Bases (MUBs) are a collection of orthonormal bases over $\mathbb{C}^d$ such that for any two vectors $v_1, v_2$ belonging to different bases, the dot or scalar product $|\braket{v_1|v_2}| = \frac{1}{\sqrt{d}}$. The upper bound on the number of such bases is $d+1$. Construction methods to achieve this bound are known for cases when $d$ is some power of prime. The situation is more restrictive in other cases and also when we consider the results over real rather than complex. Thus, certain relaxations of this model are considered in literature and consequently Approximate MUBs (AMUB) are studied. This enables one to construct potentially large number of such objects for $\mathbb{C}^d$ as well as in $\mathbb{R}^d$. In this regard, we propose the concept of Almost Perfect MUBs (APMUB), where we restrict the absolute value of inner product $|\braket{v_1|v_2}|$ to be two-valued, one being 0 and the other $ \leq \frac{1+\mathcal{O}(d^{-\lambda})}{\sqrt{d}}$, such that $\lambda > 0$ and the numerator $1 + \mathcal{O}(d^{-\lambda}) \leq 2$. Each such vector constructed, has an important feature that large number of its components are zero and the non-zero components are of equal magnitude. Our techniques are based on combinatorial structures related to Resolvable Block Designs (RBDs). We show that for several composite dimensions $d$, one can construct $\mathcal{O}(\sqrt{d})$ many APMUBs, in which cases the number of MUBs are significantly small. To be specific, this result works for $d$ of the form $(q-e)(q+f), \ q, e, f \in \mathbb{N}$, with the conditions $0 \leq f \leq e$ for constant $e, f$ and $q$ some power of prime. We also show that such APMUBs provide sets of Bi-angular vectors which are of the order of $\mathcal{O}(d^{3/2})$ in numbers, having high angular distances among them.

FAST · 變換 · 模型評估 · 秩 · 核化 ·

2024 年 2 月 6 日

A Dynamic Low-Rank Fast Gaussian Transform

Baihe Huang,Zhao Song,Omri Weinstein,Junze Yin,Hengjie Zhang,Ruizhe Zhang

The \emph{Fast Gaussian Transform} (FGT) enables subquadratic-time multiplication of an $n\times n$ Gaussian kernel matrix $\mathsf{K}_{i,j}= \exp ( - \| x_i - x_j \|_2^2 ) $ with an arbitrary vector $h \in \mathbb{R}^n$, where $x_1,\dots, x_n \in \mathbb{R}^d$ are a set of \emph{fixed} source points. This kernel plays a central role in machine learning and random feature maps. Nevertheless, in most modern data analysis applications, datasets are dynamically changing (yet often have low rank), and recomputing the FGT from scratch in (kernel-based) algorithms incurs a major computational overhead ($\gtrsim n$ time for a single source update $\in \mathbb{R}^d$). These applications motivate a \emph{dynamic FGT} algorithm, which maintains a dynamic set of sources under \emph{kernel-density estimation} (KDE) queries in \emph{sublinear time} while retaining Mat-Vec multiplication accuracy and speed. Assuming the dynamic data-points $x_i$ lie in a (possibly changing) $k$-dimensional subspace ($k\leq d$), our main result is an efficient dynamic FGT algorithm, supporting the following operations in $\log^{O(k)}(n/\varepsilon)$ time: (1) Adding or deleting a source point, and (2) Estimating the ``kernel-density'' of a query point with respect to sources with $\varepsilon$ additive accuracy. The core of the algorithm is a dynamic data structure for maintaining the \emph{projected} ``interaction rank'' between source and target boxes, decoupled into finite truncation of Taylor and Hermite expansions.

流 · Attention · 近似 · 詞元分析器 · MoDELS ·

2024 年 2 月 5 日

One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space

Raghav Addanki,Chenyang Li,Zhao Song,Chiwun Yang

Attention computation takes both the time complexity of $O(n^2)$ and the space complexity of $O(n^2)$ simultaneously, which makes deploying Large Language Models (LLMs) in streaming applications that involve long contexts requiring substantial computational resources. In recent OpenAI DevDay (Nov 6, 2023), OpenAI released a new model that is able to support a 128K-long document, in our paper, we focus on the memory-efficient issue when context length $n$ is much greater than 128K ($n \gg 2^d$). Considering a single-layer self-attention with Query, Key, and Value matrices $Q, K, V \in \mathbb{R}^{n \times d}$, the polynomial method approximates the attention output $T \in \mathbb{R}^{n \times d}$. It accomplishes this by constructing $U_1, U_2 \in \mathbb{R}^{n \times t}$ to expedite attention ${\sf Attn}(Q, K, V)$ computation within $n^{1+o(1)}$ time executions. Despite this, computing the approximated attention matrix $U_1U_2^\top \in \mathbb{R}^{n \times n}$ still necessitates $O(n^2)$ space, leading to significant memory usage. In response to these challenges, we introduce a new algorithm that only reads one pass of the data in a streaming fashion. This method employs sublinear space $o(n)$ to store three sketch matrices, alleviating the need for exact $K, V$ storage. Notably, our algorithm exhibits exceptional memory-efficient performance with super-long tokens. As the token length $n$ increases, our error guarantee diminishes while the memory usage remains nearly constant. This unique attribute underscores the potential of our technique in efficiently handling LLMs in streaming applications.

塊 · 代碼 · 可理解性 · 相同 · 離散數學 ·

2024 年 2 月 4 日

Improved Upper Bound for the Size of a Trifferent Code

Siddharth Bhandari,Abhishek Khetan

from arxiv, 11 pages, 2 figures

A subset $\mathcal{C}\subseteq\{0,1,2\}^n$ is said to be a $\textit{trifferent}$ code (of block length $n$) if for every three distinct codewords $x,y, z \in \mathcal{C}$, there is a coordinate $i\in \{1,2,\ldots,n\}$ where they all differ, that is, $\{x(i),y(i),z(i)\}$ is same as $\{0,1,2\}$. Let $T(n)$ denote the size of the largest trifferent code of block length $n$. Understanding the asymptotic behavior of $T(n)$ is closely related to determining the zero-error capacity of the $(3/2)$-channel defined by Elias'88, and is a long-standing open problem in the area. Elias had shown that $T(n)\leq 2\times (3/2)^n$ and prior to our work the best upper bound was $T(n)\leq 0.6937 \times (3/2)^n$ due to Kurz'23. We improve this bound to $T(n)\leq c \times n^{-2/5}\times (3/2)^n$ where $c$ is an absolute constant.

近鄰 · 近似 · Microsoft Windows · 標注 · 查全率/召回率 ·

2024 年 2 月 1 日

Approximate Nearest Neighbor Search with Window Filters

Joshua Engels,Benjamin Landrum,Shangdi Yu,Laxman Dhulipala,Julian Shun

from arxiv, Code available: //github.com/JoshEngels/RangeFilteredANN

We define and investigate the problem of $\textit{c-approximate window search}$: approximate nearest neighbor search where each point in the dataset has a numeric label, and the goal is to find nearest neighbors to queries within arbitrary label ranges. Many semantic search problems, such as image and document search with timestamp filters, or product search with cost filters, are natural examples of this problem. We propose and theoretically analyze a modular tree-based framework for transforming an index that solves the traditional c-approximate nearest neighbor problem into a data structure that solves window search. On standard nearest neighbor benchmark datasets equipped with random label values, adversarially constructed embeddings, and image search embeddings with real timestamps, we obtain up to a $75\times$ speedup over existing solutions at the same level of recall.