成人艳情一二三区按摩_亚洲色大成人WWW_2023国产精品啪啪视频_激情五月图片偷拍_国产真实视频网址在线观看_中文字幕日韩欧美性爱网站_灰色丝袜美女一区二区三区

Embedding tables dominate industrial-scale recommendation model sizes, using up to terabytes of memory. A popular and the largest publicly available machine learning MLPerf benchmark on recommendation data is a Deep Learning Recommendation Model (DLRM) trained on a terabyte of click-through data. It contains 100GB of embedding memory (25+Billion parameters). DLRMs, due to their sheer size and the associated volume of data, face difficulty in training, deploying for inference, and memory bottlenecks due to large embedding tables. This paper analyzes and extensively evaluates a generic parameter sharing setup (PSS) for compressing DLRM models. We show theoretical upper bounds on the learnable memory requirements for achieving $(1 \pm \epsilon)$ approximations to the embedding table. Our bounds indicate exponentially fewer parameters suffice for good accuracy. To this end, we demonstrate a PSS DLRM reaching 10000$\times$ compression on criteo-tb without losing quality. Such a compression, however, comes with a caveat. It requires 4.5 $\times$ more iterations to reach the same saturation quality. The paper argues that this tradeoff needs more investigations as it might be significantly favorable. Leveraging the small size of the compressed model, we show a 4.3$\times$ improvement in training latency leading to similar overall training times. Thus, in the tradeoff between system advantage of a small DLRM model vs. slower convergence, we show that scales are tipped towards having a smaller DLRM model, leading to faster inference, easier deployment, and similar training times.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · 極大 · PODC · MoDELS · 團 ·

2022 年 9 月 19 日

Rounds vs Communication Tradeoffs for Maximal Independent Sets

Sepehr Assadi,Gillat Kol,Zhijun Zhang

from arxiv, Full version of the paper in FOCS 2022, 44 pages

We consider the problem of finding a maximal independent set (MIS) in the shared blackboard communication model with vertex-partitioned inputs. There are $n$ players corresponding to vertices of an undirected graph, and each player sees the edges incident on its vertex -- this way, each edge is known by both its endpoints and is thus shared by two players. The players communicate in simultaneous rounds by posting their messages on a shared blackboard visible to all players, with the goal of computing an MIS of the graph. While the MIS problem is well studied in other distributed models, and while shared blackboard is, perhaps, the simplest broadcast model, lower bounds for our problem were only known against one-round protocols. We present a lower bound on the round-communication tradeoff for computing an MIS in this model. Specifically, we show that when $r$ rounds of interaction are allowed, at least one player needs to communicate $\Omega(n^{1/20^{r+1}})$ bits. In particular, with logarithmic bandwidth, finding an MIS requires $\Omega(\log\log{n})$ rounds. This lower bound can be compared with the algorithm of Ghaffari, Gouleakis, Konrad, Mitrovi\'c, and Rubinfeld [PODC 2018] that solves MIS in $O(\log\log{n})$ rounds but with a logarithmic bandwidth for an average player. Additionally, our lower bound further extends to the closely related problem of maximal bipartite matching. To prove our results, we devise a new round elimination framework, which we call partial-input embedding, that may also be useful in future work for proving round-sensitive lower bounds in the presence of edge-sharing between players. Finally, we discuss several implications of our results to multi-round (adaptive) distributed sketching algorithms, broadcast congested clique, and to the welfare maximization problem in two-sided matching markets.

Attention · state-of-the-art · Performance · Analysis · 基準 ·

2022 年 9 月 16 日

Recursive Attentive Methods with Reused Item Representations for Sequential Recommendation

Bo Peng,Srinivasan Parthasarathy,Xia Ning

Sequential recommendation aims to recommend the next item of users' interest based on their historical interactions. Recently, the self-attention mechanism has been adapted for sequential recommendation, and demonstrated state-of-the-art performance. However, in this manuscript, we show that the self-attention-based sequential recommendation methods could suffer from the localization-deficit issue. As a consequence, in these methods, over the first few blocks, the item representations may quickly diverge from their original representations, and thus, impairs the learning in the following blocks. To mitigate this issue, in this manuscript, we develop a recursive attentive method with reused item representations (RAM) for sequential recommendation. We compare RAM with five state-of-the-art baseline methods on six public benchmark datasets. Our experimental results demonstrate that RAM significantly outperforms the baseline methods on benchmark datasets, with an improvement of as much as 11.3%. Our stability analysis shows that RAM could enable deeper and wider models for better performance. Our run-time performance comparison signifies that RAM could also be more efficient on benchmark datasets.

Learning · 機器學習建模 · 列 · 變換 · MoDELS ·

2022 年 9 月 16 日

TransTab: Learning Transferable Tabular Transformers Across Tables

Zifeng Wang,Jimeng Sun

from arxiv, NeurIPS 2022

Tabular data (or tables) are the most widely used data format in machine learning (ML). However, ML models often assume the table structure keeps fixed in training and testing. Before ML modeling, heavy data cleaning is required to merge disparate tables with different columns. This preprocessing often incurs significant data waste (e.g., removing unmatched columns and samples). How to learn ML models from multiple tables with partially overlapping columns? How to incrementally update ML models as more columns become available over time? Can we leverage model pretraining on multiple distinct tables? How to train an ML model which can predict on an unseen table? To answer all those questions, we propose to relax fixed table structures by introducing a Transferable Tabular Transformer (TransTab) for tables. The goal of TransTab is to convert each sample (a row in the table) to a generalizable embedding vector, and then apply stacked transformers for feature encoding. One methodology insight is combining column description and table cells as the raw input to a gated transformer model. The other insight is to introduce supervised and self-supervised pretraining to improve model performance. We compare TransTab with multiple baseline methods on diverse benchmark datasets and five oncology clinical trial datasets. Overall, TransTab ranks 1.00, 1.00, 1.78 out of 12 methods in supervised learning, feature incremental learning, and transfer learning scenarios, respectively; and the proposed pretraining leads to 2.3% AUC lift on average over the supervised learning.

邊緣化 · Learning · 線性的 · 模型評估 · 類標記 ·

2022 年 9 月 15 日

Private Synthetic Data for Multitask Learning and Marginal Queries

Giuseppe Vietri,Cedric Archambeau,Sergul Aydore,William Brown,Michael Kearns,Aaron Roth,Ankit Siva,Shuai Tang,Zhiwei Steven Wu

from arxiv, The short version of this paper appears in the proceedings of NeurIPS-22

We provide a differentially private algorithm for producing synthetic data simultaneously useful for multiple tasks: marginal queries and multitask machine learning (ML). A key innovation in our algorithm is the ability to directly handle numerical features, in contrast to a number of related prior approaches which require numerical features to be first converted into {high cardinality} categorical features via {a binning strategy}. Higher binning granularity is required for better accuracy, but this negatively impacts scalability. Eliminating the need for binning allows us to produce synthetic data preserving large numbers of statistical queries such as marginals on numerical features, and class conditional linear threshold queries. Preserving the latter means that the fraction of points of each class label above a particular half-space is roughly the same in both the real and synthetic data. This is the property that is needed to train a linear classifier in a multitask setting. Our algorithm also allows us to produce high quality synthetic data for mixed marginal queries, that combine both categorical and numerical features. Our method consistently runs 2-5x faster than the best comparable techniques, and provides significant accuracy improvements in both marginal queries and linear prediction tasks for mixed-type datasets.

MoDELS · Learning · entity · 知識 (knowledge) · 泛化理論 ·

2022 年 9 月 15 日

PSSAT: A Perturbed Semantic Structure Awareness Transferring Method for Perturbation-Robust Slot Filling

Guanting Dong,Daichi Guo,Liwen Wang,Xuefeng Li,Zechen Wang,Chen Zeng,Keqing He,Jinzheng Zhao,Hao Lei,Xinyue Cui,Yi Huang,Junlan Feng,Weiran Xu

from arxiv, Accepted by COLING 2022

Most existing slot filling models tend to memorize inherent patterns of entities and corresponding contexts from training data. However, these models can lead to system failure or undesirable outputs when being exposed to spoken language perturbation or variation in practice. We propose a perturbed semantic structure awareness transferring method for training perturbation-robust slot filling models. Specifically, we introduce two MLM-based training strategies to respectively learn contextual semantic structure and word distribution from unsupervised language perturbation corpus. Then, we transfer semantic knowledge learned from upstream training procedure into the original samples and filter generated data by consistency processing. These procedures aim to enhance the robustness of slot filling models. Experimental results show that our method consistently outperforms the previous basic methods and gains strong generalization while preventing the model from memorizing inherent patterns of entities and contexts.

學成 · SSL · Taxonomy · 特化 · 未標記 ·

2022 年 3 月 29 日

Self-Supervised Learning for Recommender Systems: A Survey

Junliang Yu,Hongzhi Yin,Xin Xia,Tong Chen,Jundong Li,Zi Huang

from arxiv, 20pages. Submitted to TKDE

Neural architecture-based recommender systems have achieved tremendous success in recent years. However, when dealing with highly sparse data, they still fall short of expectation. Self-supervised learning (SSL), as an emerging technique to learn with unlabeled data, recently has drawn considerable attention in many fields. There is also a growing body of research proceeding towards applying SSL to recommendation for mitigating the data sparsity issue. In this survey, a timely and systematical review of the research efforts on self-supervised recommendation (SSR) is presented. Specifically, we propose an exclusive definition of SSR, on top of which we build a comprehensive taxonomy to divide existing SSR methods into four categories: contrastive, generative, predictive, and hybrid. For each category, the narrative unfolds along its concept and formulation, the involved methods, and its pros and cons. Meanwhile, to facilitate the development and evaluation of SSR models, we release an open-source library SELFRec, which incorporates multiple benchmark datasets and evaluation metrics, and has implemented a number of state-of-the-art SSR models for empirical comparison. Finally, we shed light on the limitations in the current research and outline the future research directions.

Networking · MoDELS · 卷積神經網絡 · 變換 · DNN ·

2021 年 8 月 30 日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Yucheng Zhao,Guangting Wang,Chuanxin Tang,Chong Luo,Wenjun Zeng,Zheng-Jun Zha

Convolutional neural networks (CNN) are the dominant deep neural network (DNN) architecture for computer vision. Recently, Transformer and multi-layer perceptron (MLP)-based models, such as Vision Transformer and MLP-Mixer, started to lead new trends as they showed promising results in the ImageNet classification task. In this paper, we conduct empirical studies on these DNN structures and try to understand their respective pros and cons. To ensure a fair comparison, we first develop a unified framework called SPACH which adopts separate modules for spatial and channel processing. Our experiments under the SPACH framework reveal that all structures can achieve competitive performance at a moderate scale. However, they demonstrate distinctive behaviors when the network size scales up. Based on our findings, we propose two hybrid models using convolution and Transformer modules. The resulting Hybrid-MS-S+ model achieves 83.9% top-1 accuracy with 63M parameters and 12.3G FLOPS. It is already on par with the SOTA models with sophisticated designs. The code and models will be made publicly available.

控制器 · INTERACT · state-of-the-art · 模型評估 · Next ·

2020 年 8 月 3 日

Controllable Multi-Interest Framework for Recommendation

Yukuo Cen,Jianwei Zhang,Xu Zou,Chang Zhou,Hongxia Yang,Jie Tang

from arxiv, Accepted to KDD 2020

Recently, neural networks have been widely used in e-commerce recommender systems, owing to the rapid development of deep learning. We formalize the recommender system as a sequential recommendation problem, intending to predict the next items that the user might be interacted with. Recent works usually give an overall embedding from a user's behavior sequence. However, a unified user embedding cannot reflect the user's multiple interests during a period. In this paper, we propose a novel controllable multi-interest framework for the sequential recommendation, called ComiRec. Our multi-interest module captures multiple interests from user behavior sequences, which can be exploited for retrieving candidate items from the large-scale item pool. These items are then fed into an aggregation module to obtain the overall recommendation. The aggregation module leverages a controllable factor to balance the recommendation accuracy and diversity. We conduct experiments for the sequential recommendation on two real-world datasets, Amazon and Taobao. Experimental results demonstrate that our framework achieves significant improvements over state-of-the-art models. Our framework has also been successfully deployed on the offline Alibaba distributed cloud platform.

MoDELS · Transformer模型 · 變換 · 推斷 · 模型評估 ·

2020 年 6 月 23 日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Zhuohan Li,Eric Wallace,Sheng Shen,Kevin Lin,Kurt Keutzer,Dan Klein,Joseph E. Gonzalez

from arxiv, ICML 2020

Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute: self-supervised pretraining and high-resource machine translation. We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models converge in significantly fewer steps. Moreover, this acceleration in convergence typically outpaces the additional computational overhead of using larger models. Therefore, the most compute-efficient training strategy is to counterintuitively train extremely large models but stop after a small number of iterations. This leads to an apparent trade-off between the training efficiency of large Transformer models and the inference efficiency of small Transformer models. However, we show that large models are more robust to compression techniques such as quantization and pruning than small models. Consequently, one can get the best of both worlds: heavily compressed, large models achieve higher accuracy than lightly compressed, small models.

學成 · 替代損失 · 在線 · Bandits · 賭博機/老虎機 ·

2019 年 12 月 31 日

A Modern Introduction to Online Learning

Francesco Orabona

In this monograph, I introduce the basic concepts of Online Learning through a modern view of Online Convex Optimization. Here, online learning refers to the framework of regret minimization under worst-case assumptions. I present first-order and second-order algorithms for online learning with convex losses, in Euclidean and non-Euclidean settings. All the algorithms are clearly presented as instantiation of Online Mirror Descent or Follow-The-Regularized-Leader and their variants. Particular attention is given to the issue of tuning the parameters of the algorithms and learning in unbounded domains, through adaptive and parameter-free online learning algorithms. Non-convex losses are dealt through convex surrogate losses and through randomization. The bandit setting is also briefly discussed, touching on the problem of adversarial and stochastic multi-armed bandits. These notes do not require prior knowledge of convex analysis and all the required mathematical tools are rigorously explained. Moreover, all the proofs have been carefully chosen to be as simple and as short as possible.