The stringent low-latency, high reliability, availability and resilience requirements of 6G use cases will present challenges to cloud providers. Currently, cloud providers lack simple, efficient, and secure implementation of provisioning solutions that meet these challenges. Multi-cloud federation is a promising approach. In this paper, we evaluate the application of private and public blockchain networks for multi-cloud federation. We compare the performance of blockchain-based federation in private and public blockchain networks and their integration with a production-ready orchestration solution. Our results show that the public blockchain needs approximately 91 seconds to complete the federation procedure compared to the 48 seconds in the private blockchain scenario.
Industrial recommender systems usually consist of the retrieval stage and the ranking stage, to handle the billion-scale of users and items. The retrieval stage retrieves candidate items relevant to user interests for recommendations and has attracted much attention. Frequently, a user shows refined multi-interests in a hierarchical structure. For example, a user likes Conan and Kuroba Kaito, which are the roles in hierarchical structure "Animation, Japanese Animation, Detective Conan". However, most existing methods ignore this hierarchical nature, and simply average the fine-grained interest information. Therefore, we propose a novel two-stage approach to explicitly modeling refined multi-interest in a hierarchical structure for recommendation. In the first hierarchical multi-interest mining stage, the hierarchical clustering and transformer-based model adaptively generate circles or sub-circles that users are interested in. In the second stage, the partition of retrieval space allows the EBR models to deal only with items within each circle and accurately capture users' refined interests. Experimental results show that the proposed approach achieves state-of-the-art performance. Our framework has also been deployed at Lofter.
In this work, we provide four methods for constructing new maximum sum-rank distance (MSRD) codes. The first method, a variant of cartesian products, allows faster decoding than known MSRD codes of the same parameters. The other three methods allow us to extend or modify existing MSRD codes in order to obtain new explicit MSRD codes for sets of matrix sizes (numbers of rows and columns in different blocks) that were not attainable by previous constructions. In this way, we show that MSRD codes exist (by giving explicit constructions) for new ranges of parameters, in particular with different numbers of rows and columns at different positions.
In this work, we propose a novel activation mechanism aimed at establishing layer-level activation (LayerAct) functions for CNNs with BatchNorm. These functions are designed to be more noise-robust compared to existing element-level activation functions by reducing the layer-level fluctuation of the activation outputs due to shift in inputs. Moreover, the LayerAct functions achieve this noise-robustness independent of the activation's saturation state, which limits the activation output space and complicates efficient training. We present an analysis and experiments demonstrating that LayerAct functions exhibit superior noise-robustness compared to element-level activation functions, and empirically show that these functions have a zero-like mean activation. Experimental results with three clean and three out-of-distribution benchmark datasets for image classification tasks show that LayerAct functions excel in handling noisy datasets, outperforming element-level activation functions, while the performance on clean datasets is also superior in most cases.
This paper introduces three key initiatives in the pursuit of a hybrid decoding framework characterized by superior decoding performance, high throughput, low complexity, and independence from channel noise variance. Firstly, adopting a graphical neural network perspective, we propose a design methodology for a family of neural min-sum variants. Our exploration delves into the frame error rates associated with different decoding variants and the consequential impact of decoding failures on subsequent ordered statistics decoding. Notably, these neural min-sum variants exhibit generally indistinguishable performance, hence the simplest member is chosen as the constituent of the hybrid decoding. Secondly, to address computational complexities arising from exhaustive searches for authentic error patterns in cases of decoding failure, two alternatives for ordered statistics decoding implementation are proposed. The first approach involves uniformly grouping test error patterns, while the second scheme dynamically generates qualified searching test error patterns with varied sizes for each group. In both methods, group priorities are determined empirically. Thirdly, iteration diversity is highlighted in the case of LDPC codes requiring high maximum iterations of decoding. This is achieved by segmenting the long iterative decoding trajectory of a decoding failure into shorter segments, which are then independently fed to small models to enhance the chances of acquiring the authentic error pattern. These ideas are substantiated through extensive simulation results covering the codes with block lengths ranging from one hundred to several hundreds.
Industrial recommender systems usually consist of the retrieval stage and the ranking stage, to handle the billion-scale of users and items. The retrieval stage retrieves candidate items relevant to user interests for recommendations and has attracted much attention. Frequently, users show hierarchical multi-interests reflected in a heavy user of a certain NBA team Golden State Warriors in Sports, who is also a light user of almost the whole Animation. Both Sports and Animation are at the same level. However, most existing methods implicitly learn this hierarchical difference, making more fine-grained interest information to be averaged and limiting detailed understanding of the user's different needs in heavy interests and other light interests. Therefore, we propose a novel two-stage approach to explicitly modeling hierarchical multi-interest for recommendation in this work. In the first hierarchical multi-interest mining stage, the hierarchical clustering and transformer-based model adaptively generate circles or sub-circles that users are interested in. In the second stage, the partition of retrieval space allows the EBR models to only deal with items within each circle and accurately capture user's refined interests. Experimental results show that the proposed approach achieves state-of-the-art performance. Our framework has also successfully deployed at Lofter (one of the largest derivative content communities with 10 million monthly active users) for over four months.
It is well established that to ensure or certify the robustness of a neural network, its Lipschitz constant plays a prominent role. However, its calculation is NP-hard. In this note, by taking into account activation regions at each layer as new constraints, we propose new quadratically constrained MIP formulations for the neural network Lipschitz estimation problem. The solutions of these problems give lower bounds and upper bounds of the Lipschitz constant and we detail conditions when they coincide with the exact Lipschitz constant.
The emergence of industrial-scale speech recognition (ASR) models such as Whisper and USM, trained on 1M hours of weakly labelled and 12M hours of audio only proprietary data respectively, has led to a stronger need for large scale public ASR corpora and competitive open source pipelines. Unlike the said models, large language models are typically based on Transformer decoders, and it remains unclear if decoder-only models trained on public data alone can deliver competitive performance. In this work, we investigate factors such as choice of training datasets and modeling components necessary for obtaining the best performance using public English ASR corpora alone. Our Decoder-Only Transformer for ASR (DOTA) model comprehensively outperforms the encoder-decoder open source replication of Whisper (OWSM) on nearly all English ASR benchmarks and outperforms Whisper large-v3 on 7 out of 15 test sets. We release our codebase and model checkpoints under permissive license.
Recently pre-trained language representation models such as BERT have shown great success when fine-tuned on downstream tasks including information retrieval (IR). However, pre-training objectives tailored for ad-hoc retrieval have not been well explored. In this paper, we propose Pre-training with Representative wOrds Prediction (PROP) for ad-hoc retrieval. PROP is inspired by the classical statistical language model for IR, specifically the query likelihood model, which assumes that the query is generated as the piece of text representative of the "ideal" document. Based on this idea, we construct the representative words prediction (ROP) task for pre-training. Given an input document, we sample a pair of word sets according to the document language model, where the set with higher likelihood is deemed as more representative of the document. We then pre-train the Transformer model to predict the pairwise preference between the two word sets, jointly with the Masked Language Model (MLM) objective. By further fine-tuning on a variety of representative downstream ad-hoc retrieval tasks, PROP achieves significant improvements over baselines without pre-training or with other pre-training methods. We also show that PROP can achieve exciting performance under both the zero- and low-resource IR settings. The code and pre-trained models are available at //github.com/Albert-Ma/PROP.
Recent work pre-training Transformers with self-supervised objectives on large text corpora has shown great success when fine-tuned on downstream NLP tasks including text summarization. However, pre-training objectives tailored for abstractive text summarization have not been explored. Furthermore there is a lack of systematic evaluation across diverse domains. In this work, we propose pre-training large Transformer-based encoder-decoder models on massive text corpora with a new self-supervised objective. In PEGASUS, important sentences are removed/masked from an input document and are generated together as one output sequence from the remaining sentences, similar to an extractive summary. We evaluated our best PEGASUS model on 12 downstream summarization tasks spanning news, science, stories, instructions, emails, patents, and legislative bills. Experiments demonstrate it achieves state-of-the-art performance on all 12 downstream datasets measured by ROUGE scores. Our model also shows surprising performance on low-resource summarization, surpassing previous state-of-the-art results on 6 datasets with only 1000 examples. Finally we validated our results using human evaluation and show that our model summaries achieve human performance on multiple datasets.
Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread of pre-training models for NLP applications, they almost focused on text-level manipulation, while neglecting the layout and style information that is vital for document image understanding. In this paper, we propose the LayoutLM to jointly model the interaction between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents. Furthermore, we also leverage the image features to incorporate the visual information of words into LayoutLM. To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for document-level pre-training. It achieves new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification (from 93.07 to 94.42). The code and pre-trained LayoutLM models are publicly available at //github.com/microsoft/unilm/tree/master/layoutlm.