人人干人人摸人人操,日本又色又爽又黄一级视频

We study the discrete bin covering problem where a multiset of items from a fixed set $S \subseteq (0,1]$ must be split into disjoint subsets while maximizing the number of subsets whose contents sum to at least $1$. We study the online discrete variant, where $S$ is finite, and items arrive sequentially. In the purely online setting, we show that the competitive ratios of best deterministic (and randomized) algorithms converge to $\frac{1}{2}$ for large $S$, similar to the continuous setting. Therefore, we consider the problem under the prediction setting, where algorithms may access a vector of frequencies predicting the frequency of items of each size in the instance. In this setting, we introduce a family of online algorithms that perform near-optimally when the predictions are correct. Further, we introduce a second family of more robust algorithms that presents a tradeoff between the performance guarantees when the predictions are perfect and when predictions are adversarial. Finally, we consider a stochastic setting where items are drawn independently from any fixed but unknown distribution of $S$. Using results from the PAC-learnability of probabilities in discrete distributions, we also introduce a purely online algorithm whose average-case performance is near-optimal with high probability for all finite sets $S$ and all distributions of $S$.

相關內容

情景

關注 1

近似 · AISTATS · SODA · NeurIPS 2019 · 情景 ·

2024 年 3 月 7 日

Improved Lower Bound for Differentially Private Facility Location

Pasin Manurangsi

We consider the differentially private (DP) facility location problem in the so called super-set output setting proposed by Gupta et al. [SODA 2010]. The current best known expected approximation ratio for an $\epsilon$-DP algorithm is $O\left(\frac{\log n}{\sqrt{\epsilon}}\right)$ due to Cohen-Addad et al. [AISTATS 2022] where $n$ denote the size of the metric space, meanwhile the best known lower bound is $\Omega(1/\sqrt{\epsilon})$ [NeurIPS 2019]. In this short note, we give a lower bound of $\tilde{\Omega}\left(\min\left\{\log n, \sqrt{\frac{\log n}{\epsilon}}\right\}\right)$ on the expected approximation ratio of any $\epsilon$-DP algorithm, which is the first evidence that the approximation ratio has to grow with the size of the metric space.

正則化項 · 圖 · Conformer · 正則表達式 · 代價 ·

2024 年 3 月 7 日

Distinct Shortest Walk Enumeration for RPQs

Claire David,Nadime Francis,Victor Marsault

from arxiv, 14 pages

We consider the Distinct Shortest Walks problem. Given two vertices $s$ and $t$ of a graph database $\mathcal{D}$ and a regular path query, enumerate all walks of minimal length from $s$ to $t$ that carry a label that conforms to the query. Usual theoretical solutions turn out to be inefficient when applied to graph models that are closer to real-life systems, in particular because edges may carry multiple labels. Indeed, known algorithms may repeat the same answer exponentially many times. We propose an efficient algorithm for multi-labelled graph databases. The preprocessing runs in $O{|\mathcal{D}|\times|\mathcal{A}|}$ and the delay between two consecutive outputs is in $O(\lambda\times|\mathcal{A}|)$, where $\mathcal{A}$ is a nondeterministic automaton representing the query and $\lambda$ is the minimal length. The algorithm can handle $\varepsilon$-transitions in $\mathcal{A}$ or queries given as regular expressions at no additional cost.

圖 · STOC · 泛函 · 算法與數據結構 · 離散數學 ·

2024 年 3 月 6 日

Isomorphism Testing for Graphs Excluding Small Topological Subgraphs

Daniel Neuen

from arxiv, 42 pages, 3 figures, full version of a paper accepted at SODA 2022; second and third version improve the presentation of the results. arXiv admin note: text overlap with arXiv:2004.07671

We give an isomorphism test that runs in time $n^{\operatorname{polylog}(h)}$ on all $n$-vertex graphs excluding some $h$-vertex vertex graph as a topological subgraph. Previous results state that isomorphism for such graphs can be tested in time $n^{\operatorname{polylog}(n)}$ (Babai, STOC 2016) and $n^{f(h)}$ for some function $f$ (Grohe and Marx, SIAM J. Comp., 2015). Our result also unifies and extends previous isomorphism tests for graphs of maximum degree $d$ running in time $n^{\operatorname{polylog}(d)}$ (SIAM J. Comp., 2023) and for graphs of Hadwiger number $h$ running in time $n^{\operatorname{polylog}(h)}$ (SIAM J. Comp., 2023).

蒸餾 · 數據集 · MoDELS · 模型評估 · 潛在 ·

2024 年 3 月 6 日

Latent Dataset Distillation with Diffusion Models

Brian B. Moser,Federico Raue,Sebastian Palacio,Stanislav Frolov,Andreas Dengel

The efficacy of machine learning has traditionally relied on the availability of increasingly larger datasets. However, large datasets pose storage challenges and contain non-influential samples, which could be ignored during training without impacting the final accuracy of the model. In response to these limitations, the concept of distilling the information on a dataset into a condensed set of (synthetic) samples, namely a distilled dataset, emerged. One crucial aspect is the selected architecture (usually ConvNet) for linking the original and synthetic datasets. However, the final accuracy is lower if the employed model architecture differs from the model used during distillation. Another challenge is the generation of high-resolution images, e.g., 128x128 and higher. In this paper, we propose Latent Dataset Distillation with Diffusion Models (LD3M) that combine diffusion in latent space with dataset distillation to tackle both challenges. LD3M incorporates a novel diffusion process tailored for dataset distillation, which improves the gradient norms for learning synthetic images. By adjusting the number of diffusion steps, LD3M also offers a straightforward way of controlling the trade-off between speed and accuracy. We evaluate our approach in several ImageNet subsets and for high-resolution images (128x128 and 256x256). As a result, LD3M consistently outperforms state-of-the-art distillation techniques by up to 4.8 p.p. and 4.2 p.p. for 1 and 10 images per class, respectively.

CASE · 線性的 · 秩 · Lipschitz · 損失 ·

2024 年 3 月 6 日

Differentially Private Generalized Linear Models Revisited

Raman Arora,Raef Bassily,Cristóbal Guzmán,Michael Menart,Enayat Ullah

We study the problem of $(\epsilon,\delta)$-differentially private learning of linear predictors with convex losses. We provide results for two subclasses of loss functions. The first case is when the loss is smooth and non-negative but not necessarily Lipschitz (such as the squared loss). For this case, we establish an upper bound on the excess population risk of $\tilde{O}\left(\frac{\Vert w^*\Vert}{\sqrt{n}} + \min\left\{\frac{\Vert w^* \Vert^2}{(n\epsilon)^{2/3}},\frac{\sqrt{d}\Vert w^*\Vert^2}{n\epsilon}\right\}\right)$, where $n$ is the number of samples, $d$ is the dimension of the problem, and $w^*$ is the minimizer of the population risk. Apart from the dependence on $\Vert w^\ast\Vert$, our bound is essentially tight in all parameters. In particular, we show a lower bound of $\tilde{\Omega}\left(\frac{1}{\sqrt{n}} + {\min\left\{\frac{\Vert w^*\Vert^{4/3}}{(n\epsilon)^{2/3}}, \frac{\sqrt{d}\Vert w^*\Vert}{n\epsilon}\right\}}\right)$. We also revisit the previously studied case of Lipschitz losses [SSTT20]. For this case, we close the gap in the existing work and show that the optimal rate is (up to log factors) $\Theta\left(\frac{\Vert w^*\Vert}{\sqrt{n}} + \min\left\{\frac{\Vert w^*\Vert}{\sqrt{n\epsilon}},\frac{\sqrt{\text{rank}}\Vert w^*\Vert}{n\epsilon}\right\}\right)$, where $\text{rank}$ is the rank of the design matrix. This improves over existing work in the high privacy regime. Finally, our algorithms involve a private model selection approach that we develop to enable attaining the stated rates without a-priori knowledge of $\Vert w^*\Vert$.

圖 · 樣例 · SimPLe · 前向 · 情景 ·

2024 年 3 月 6 日

Double Exponential Lower Bound for Telephone Broadcast

Prafullkumar Tale

Consider the Telephone Broadcast problem in which an input is a connected graph $G$ on $n$ vertices, a source vertex $s \in V(G)$, and a positive integer $t$. The objective is to decide whether there is a broadcast protocol from $s$ that ensures that all the vertices of $G$ get the message in at most $t$ rounds. We consider the broadcast protocol where, in a round, any node aware of the message can forward it to at most one of its neighbors. As the number of nodes aware of the message can at most double at each round, for a non-trivial instance we have $n \le 2^t$. Hence, the brute force algorithm that checks all the permutations of the vertices runs in time $2^{2^{\calO(t)}} \cdot n^{\calO(1)}$. As our first result, we prove this simple algorithm is the best possible in the following sense. Telephone Broadcast does not admit an algorithm running in time $2^{2^{o(t)}} \cdot n^{\calO(1)}$, unless the \ETH\ fails. To the best of our knowledge, this is only the fourth example of \NP-Complete problem that admits a double exponential lower bound when parameterized by the solution size. It also resolves the question by Fomin, Fraigniaud, and Golovach [WG 2023]. In the same article, the authors asked whether the problem is \FPT\ when parameterized by the feedback vertex set number of the graph. We answer this question in the negative. Telephone Broadcast, when restricted to graphs of the feedback vertex number one, and hence treewidth of two, is \NP-\complete. We find this a relatively rare example of problems that admit a polynomial-time algorithm on trees but is \NP-\complete\ on graphs of treewidth two.

簇 · 歐氏空間 · Extensibility · 代價函數 · 相互獨立的 ·

2024 年 3 月 6 日

Space Complexity of Euclidean Clustering

Xiaoyi Zhu,Yuxiang Tian,Lingxiao Huang,Zengfeng Huang

from arxiv, Accepted by SoCG2024

The $(k, z)$-Clustering problem in Euclidean space $\mathbb{R}^d$ has been extensively studied. Given the scale of data involved, compression methods for the Euclidean $(k, z)$-Clustering problem, such as data compression and dimension reduction, have received significant attention in the literature. However, the space complexity of the clustering problem, specifically, the number of bits required to compress the cost function within a multiplicative error $\varepsilon$, remains unclear in existing literature. This paper initiates the study of space complexity for Euclidean $(k, z)$-Clustering and offers both upper and lower bounds. Our space bounds are nearly tight when $k$ is constant, indicating that storing a coreset, a well-known data compression approach, serves as the optimal compression scheme. Furthermore, our lower bound result for $(k, z)$-Clustering establishes a tight space bound of $\Theta( n d )$ for terminal embedding, where $n$ represents the dataset size. Our technical approach leverages new geometric insights for principal angles and discrepancy methods, which may hold independent interest.

情景 · 圖 · 極大 · 邊 · 離散數學 ·

2024 年 3 月 6 日

Secure Total Domination Number in Maximal Outerplanar Graphs

Yasufumi Aita,Toru Araki

A subset $S$ of vertices in a graph $G$ is a secure total dominating set of $G$ if $S$ is a total dominating set of $G$ and, for each vertex $u \not\in S$, there is a vertex $v \in S$ such that $uv$ is an edge and $(S \setminus \{v\}) \cup \{u\}$ is also a total dominating set of $G$. We show that if $G$ is a maximal outerplanar graph of order $n$, then $G$ has a total secure dominating set of size at most $\lfloor 2n/3 \rfloor$. Moreover, if an outerplanar graph $G$ of order $n$, then each secure total dominating set has at least $\lceil (n+2)/3 \rceil$ vertices. We show that these bounds are best possible.

情景 · 歐氏空間 · 極小點 · 張成子空間 · Weight ·

2024 年 3 月 5 日

Maintaining Light Spanners via Minimal Updates

Hadi Khodabandeh,David Eppstein

We study the problem of maintaining a lightweight bounded-degree $(1+\varepsilon)$-spanner of a dynamic point set in a $d$-dimensional Euclidean space, where $\varepsilon>0$ and $d$ are arbitrary constants. In our fully-dynamic setting, points are allowed to be inserted as well as deleted, and our objective is to maintain a $(1+\varepsilon)$-spanner that has constant bounds on its maximum degree and its lightness (the ratio of its weight to that of the minimum spanning tree), while minimizing the recourse, which is the number of edges added or removed by each point insertion or deletion. We present a fully-dynamic algorithm that handles point insertion with amortized constant recourse and point deletion with amortized $O(\log\Delta)$ recourse, where $\Delta$ is the aspect ratio of the point set.

異常點 · 異常檢測 · CIFAR-10 · Extensibility · Performance ·

2018 年 12 月 21 日

Deep Anomaly Detection with Outlier Exposure

Dan Hendrycks,Mantas Mazeika,Thomas G. Dietterich

from arxiv, ICLR 2019; PyTorch code available at //github.com/hendrycks/outlier-exposure

It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small- and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.