曰本中文字幕一区二区三区高清-亚洲国产最新AV片

In this paper we investigate the parameterized complexity of the task of counting and detecting occurrences of small patterns in unit disk graphs: Given an $n$-vertex unit disk graph $G$ with an embedding of ply $p$ (that is, the graph is represented as intersection graph with closed disks of unit size, and each point is contained in at most $p$ disks) and a $k$-vertex unit disk graph $P$, count the number of (induced) copies of $P$ in $G$. For general patterns $P$, we give an $2^{O(p k /\log k)}n^{O(1)}$ time algorithm for counting pattern occurrences. We show this is tight, even for ply $p=2$ and $k=n$: any $2^{o(n/\log n)}n^{O(1)}$ time algorithm violates the Exponential Time Hypothesis (ETH). For most natural classes of patterns, such as connected graphs and independent sets we present the following results: First, we give an $(pk)^{O(\sqrt{pk})}n^{O(1)}$ time algorithm, which is nearly tight under the ETH for bounded ply and many patterns. Second, for $p= k^{O(1)}$ we provide a Turing kernelization (i.e. we give a polynomial time preprocessing algorithm to reduce the instance size to $k^{O(1)}$). Our approach combines previous tools developed for planar subgraph isomorphism such as `efficient inclusion-exclusion' from [Nederlof STOC'20], and `isomorphisms checks' from [Bodlaender et al. ICALP'16] with a different separator hierarchy and a new bound on the number of non-isomorphic separations of small order tailored for unit disk graphs.

相關內容

圖

關注 6

Learning · 環 · Performance · 外部記憶 · Extensibility ·

2024 年 7 月 7 日

Evaluating Learned Indexes for External-Memory Joins

Yuvaraj Chesetti,Prashant Pandey

In this paper, we investigate the effectiveness of utilizing CDF-based learned indexes in indexed-nested loop joins for both sorted and unsorted data in external memory. Our experimental study seeks to determine whether the advantages of learned indexes observed in in-memory joins by Sabek and Kraska (VLDB 2023) extend to the external memory context. First, we introduce two optimizations for integrating learned indexes into external-memory joins. Subsequently, we conduct an extensive evaluation, employing hash join, sort join, and indexed-nested loop join with real-world and simulated datasets. Furthermore, we independently assess the learned index-based join across various dimensions, including storage device types, key types, data sorting, parallelism, constrained memory settings, and increasing model error. Our experiments indicate that B-trees and learned indexes exhibit largely similar performance in external-memory joins. Learned indexes offer advantages in terms of smaller index size and faster lookup performance. However, their construction time is approximately $1000\times$ higher. While learned indexes can be significantly smaller ($2\times$-$4\times$) than the internal nodes of a B-tree index, these internal nodes constitute only 0.4 to 1% of the data size and typically fit in main memory in most practical scenarios. Additionally, unlike in the in-memory setting, learned indexes can prioritize faster construction over accuracy (larger error window) without significantly affecting query performance.

SimPLe · 圖 · 論文 ·

2024 年 7 月 5 日

Characterisation of Lawvere-Tierney Topologies on Simplicial Sets, Bicolored Graphs, and Fuzzy Sets

Alo?s Rosset,Helle Hvid Hansen,J?rg Endrullis

Simplicial sets generalize many categories of graphs. In this paper, we give a complete characterization of the Lawvere-Tierney topologies on (semi-)simplicial sets, on bicolored graphs, and on fuzzy sets. We apply our results to establish that 'partially simple' simplicial sets and 'partially simple' graphs form quasitoposes.

向量化 · 偽似然 · 估計/估計量 · 參數空間 · Tensor ·

2024 年 7 月 4 日

Rates of Convergence of the Magnetization in the Tensor Curie-Weiss Potts Model

Sanchayan Bhowal,Somabha Mukherjee

from arxiv, 30 pages

In this paper, we derive distributional convergence rates for the magnetization vector and the maximum pseudolikelihood estimator of the inverse temperature parameter in the tensor Curie-Weiss Potts model. Limit theorems for the magnetization vector have been derived recently in Bhowal and Mukherjee (2023), where several phase transition phenomena in terms of the scaling of the (centered) magnetization and its asymptotic distribution were established, depending upon the position of the true parameters in the parameter space. In the current work, we establish Berry-Esseen type results for the magnetization vector, specifying its rate of convergence at these different phases. At most points in the parameter space, this rate is $N^{-1/2}$ ($N$ being the size of the Curie-Weiss network), while at some "special" points, the rate is either $N^{-1/4}$ or $N^{-1/6}$, depending upon the behavior of the fourth derivative of a certain "negative free energy function" at these special points. These results are then used to derive Berry-Esseen type bounds for the maximum pseudolikelihood estimator of the inverse temperature parameter whenever it lies above a certain criticality threshold.

Minimax · 優化器 · Lipschitz · Lipschitz常數 · 正則化項 ·

2024 年 7 月 4 日

A Fully Parameter-Free Second-Order Algorithm for Convex-Concave Minimax Problems with Optimal Iteration Complexity

Junlin Wang,Junnan Yang,Zi Xu

In this paper, we study second-order algorithms for the convex-concave minimax problem, which has attracted much attention in many fields such as machine learning in recent years. We propose a Lipschitz-free cubic regularization (LF-CR) algorithm for solving the convex-concave minimax optimization problem without knowing the Lipschitz constant. It can be shown that the iteration complexity of the LF-CR algorithm to obtain an $\epsilon$-optimal solution with respect to the restricted primal-dual gap is upper bounded by $\mathcal{O}(\frac{\rho\|z^0-z^*\|^3}{\epsilon})^{\frac{2}{3}}$, where $z^0=(x^0,y^0)$ is a pair of initial points, $z^*=(x^*,y^*)$ is a pair of optimal solutions, and $\rho$ is the Lipschitz constant. We further propose a fully parameter-free cubic regularization (FF-CR) algorithm that does not require any parameters of the problem, including the Lipschitz constant and the upper bound of the distance from the initial point to the optimal solution. We also prove that the iteration complexity of the FF-CR algorithm to obtain an $\epsilon$-optimal solution with respect to the gradient norm is upper bounded by $\mathcal{O}(\frac{\rho\|z^0-z^*\|^2}{\epsilon})^{\frac{2}{3}}$. Numerical experiments show the efficiency of both algorithms. To the best of our knowledge, the proposed FF-CR algorithm is the first completely parameter-free second-order algorithm for solving convex-concave minimax optimization problems, and its iteration complexity is consistent with the optimal iteration complexity lower bound of existing second-order algorithms with parameters for solving convex-concave minimax problems.

簇 · INFORMS · Performer · Continuity · 聚類方法 ·

2024 年 7 月 3 日

A Deterministic Information Bottleneck Method for Clustering Mixed-Type Data

Efthymios Costa,Ioanna Papatsouma,Angelos Markos

from arxiv, Accepted at the 18th conference of the International Federation of Classification Societies (IFCS)

In this paper, we present an information-theoretic method for clustering mixed-type data, that is, data consisting of both continuous and categorical variables. The method is a variant of the Deterministic Information Bottleneck algorithm which optimally compresses the data while retaining relevant information about the underlying structure. We compare the performance of the proposed method to that of three well-established clustering methods (KAMILA, K-Prototypes, and Partitioning Around Medoids with Gower's dissimilarity) on simulated and real-world datasets. The results demonstrate that the proposed approach represents a competitive alternative to conventional clustering techniques under specific conditions.

極小點 · 圖 · Facebook AI Research · Processing（編程語言） · 知識 (knowledge) ·

2024 年 7 月 3 日

Fair Resource Allocation for Probabilistic Semantic Communication in IIoT

Siyun Liang,Zhouxiang Zhao,Chen Zhu,Zhaohui Yang,Yinchao Yang,Mohammad Shikh-Bahaei,Zhaoyang Zhang

In this paper, the problem of minimum rate maximization for probabilistic semantic communication (PSCom) in industrial Internet of Things (IIoT) is investigated. In the considered model, users employ semantic information extraction techniques to compress the original data before sending it to the base station (BS). During this semantic compression process, knowledge graphs are employed to represent the semantic information, and the probability graph sharing between users and the BS is utilized to further compress the knowledge graph. The semantic compression process can significantly reduce the transmitted data size, but it inevitably introduces additional computation overhead. Considering the limited power budget of the user, we formulate a joint communication and computation optimization problem is formulated aiming to maximize the minimum equivalent rate among all users while meeting total power and semantic compression ratio constraints. To address this problem, two algorithms with different computational complexities are proposed to obtain suboptimal solutions. One algorithm is based on a prorate distribution of transmission power, while the other traverses the combinations of semantic compression ratios among all users. In both algorithms, bisection is employed in order to achieve the greatest minimum equivalent rate. The simulation results validate the effectiveness of the proposed algorithms.

樣例 · MoDELS · 語言模型化 · Learning · 大語言模型 ·

2024 年 7 月 3 日

Learning and Forgetting Unsafe Examples in Large Language Models

Jiachen Zhao,Zhun Deng,David Madras,James Zou,Mengye Ren

from arxiv, accepted by ICML 24

As the number of large language models (LLMs) released to the public grows, there is a pressing need to understand the safety implications associated with these models learning from third-party custom finetuning data. We explore the behavior of LLMs finetuned on noisy custom data containing unsafe content, represented by datasets that contain biases, toxicity, and harmfulness, finding that while aligned LLMs can readily learn this unsafe content, they also tend to forget it more significantly than other examples when subsequently finetuned on safer content. Drawing inspiration from the discrepancies in forgetting, we introduce the "ForgetFilter" algorithm, which filters unsafe data based on how strong the model's forgetting signal is for that data. We demonstrate that the ForgetFilter algorithm ensures safety in customized finetuning without compromising downstream task performance, unlike sequential safety finetuning. ForgetFilter outperforms alternative strategies like replay and moral self-correction in curbing LLMs' ability to assimilate unsafe content during custom finetuning, e.g. 75% lower than not applying any safety measures and 62% lower than using self-correction in toxicity score.

MoDELS · 語言模型化 · Learning · 逼真度 · 講稿 ·

2024 年 4 月 11 日

Best Practices and Lessons Learned on Synthetic Data for Language Models

Ruibo Liu,Jerry Wei,Fangyu Liu,Chenglei Si,Yanzhe Zhang,Jinmeng Rao,Steven Zheng,Daiyi Peng,Diyi Yang,Denny Zhou,Andrew M. Dai

The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challenges, and future directions. We present empirical evidence from prior art to demonstrate its effectiveness and highlight the importance of ensuring its factuality, fidelity, and unbiasedness. We emphasize the need for responsible use of synthetic data to build more powerful, inclusive, and trustworthy language models.

FRN · INFORMS · Networking · MoDELS · 學成 ·

2021 年 4 月 12 日

Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition

Delian Ruan, YanYan,Shenqi Lai,Zhenhua Chai,Chunhua Shen,Hanzi Wang

from arxiv, IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021 (CVPR 2021)

In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.

元學習 · 語音識別 · MAML · 學成 · 端到端 ·

2019 年 10 月 26 日

Meta Learning for End-to-End Low-Resource Speech Recognition

Jui-Yang Hsu,Yuan-Jui Chen,Hung-yi Lee

from arxiv, 5 pages, submitted to ICASSP 2020

In this paper, we proposed to apply meta learning approach for low-resource automatic speech recognition (ASR). We formulated ASR for different languages as different tasks, and meta-learned the initialization parameters from many pretraining languages to achieve fast adaptation on unseen target language, via recently proposed model-agnostic meta learning algorithm (MAML). We evaluated the proposed approach using six languages as pretraining tasks and four languages as target tasks. Preliminary results showed that the proposed method, MetaASR, significantly outperforms the state-of-the-art multitask pretraining approach on all target languages with different combinations of pretraining languages. In addition, since MAML's model-agnostic property, this paper also opens new research direction of applying meta learning to more speech-related applications.