In this paper, we introduce a novel fine-tuning technique for language models, which involves incorporating symmetric noise into the embedding process. This method aims to enhance the model's function by more stringently regulating its local curvature, demonstrating superior performance over the current method, NEFTune. When fine-tuning the LLaMA-2-7B model using Alpaca, standard techniques yield a 29.79% score on AlpacaEval. However, our approach, SymNoise, increases this score significantly to 69.04%, using symmetric noisy embeddings. This is a 6.7% improvement over the state-of-the-art method, NEFTune~(64.69%). Furthermore, when tested on various models and stronger baseline instruction datasets, such as Evol-Instruct, ShareGPT, OpenPlatypus, SymNoise consistently outperforms NEFTune. The current literature, including NEFTune, has underscored the importance of more in-depth research into the application of noise-based strategies in the fine-tuning of language models. Our approach, SymNoise, is another significant step towards this direction, showing notable improvement over the existing state-of-the-art method.
Semantic communication initiates a new direction for future communication. In this paper, we aim to establish a systematic framework of semantic information theory (SIT). First, we propose a semantic communication model and define the synonymous mapping to indicate the critical relationship between semantic information and syntactic information. Based on this core concept, we introduce the measures of semantic information, such as semantic entropy $H_s(\tilde{U})$, up/down semantic mutual information $I^s(\tilde{X};\tilde{Y})$ $(I_s(\tilde{X};\tilde{Y}))$, semantic capacity $C_s=\max_{p(x)}I^s(\tilde{X};\tilde{Y})$, and semantic rate-distortion function $R_s(D)=\min_{p(\hat{x}|x):\mathbb{E}d_s(\tilde{x},\hat{\tilde{x}})\leq D}I_s(\tilde{X};\hat{\tilde{X}})$. Furthermore, we prove three coding theorems of SIT, that is, the semantic source coding theorem, semantic channel coding theorem, and semantic rate-distortion coding theorem. We find that the limits of information theory are extended by using synonymous mapping, that is, $H_s(\tilde{U})\leq H(U)$, $C_s\geq C$ and $R_s(D)\leq R(D)$. All these works composite the basis of semantic information theory. In summary, the theoretic framework proposed in this paper is a natural extension of classic information theory and may reveal great performance potential for future communication.
In this paper, we extend our prior research named DKIC and propose the perceptual-oriented learned image compression method, PO-DKIC. Specifically, DKIC adopts a dynamic kernel-based dynamic residual block group to enhance the transform coding and an asymmetric space-channel context entropy model to facilitate the estimation of gaussian parameters. Based on DKIC, PO-DKIC introduces PatchGAN and LPIPS loss to enhance visual quality. Furthermore, to maximize the overall perceptual quality under a rate constraint, we formulate this challenge into a constrained programming problem and use the Linear Integer Programming method for resolution. The experiments demonstrate that our proposed method can generate realistic images with richer textures and finer details when compared to state-of-the-art image compression techniques.
In this paper, we present a novel deep image clustering approach termed PICI, which enforces the partial information discrimination and the cross-level interaction in a joint learning framework. In particular, we leverage a Transformer encoder as the backbone, through which the masked image modeling with two paralleled augmented views is formulated. After deriving the class tokens from the masked images by the Transformer encoder, three partial information learning modules are further incorporated, including the PISD module for training the auto-encoder via masked image reconstruction, the PICD module for employing two levels of contrastive learning, and the CLI module for mutual interaction between the instance-level and cluster-level subspaces. Extensive experiments have been conducted on six real-world image datasets, which demononstrate the superior clustering performance of the proposed PICI approach over the state-of-the-art deep clustering approaches. The source code is available at //github.com/Regan-Zhang/PICI.
In this paper, we propose and study construction of confidence bands for shape-constrained regression functions when the predictor is multivariate. In particular, we consider the continuous multidimensional white noise model given by $d Y(\mathbf{t}) = n^{1/2} f(\mathbf{t}) \,d\mathbf{t} + d W(\mathbf{t})$, where $Y$ is the observed stochastic process on $[0,1]^d$ ($d\ge 1$), $W$ is the standard Brownian sheet on $[0,1]^d$, and $f$ is the unknown function of interest assumed to belong to a (shape-constrained) function class, e.g., coordinate-wise monotone functions or convex functions. The constructed confidence bands are based on local kernel averaging with bandwidth chosen automatically via a multivariate multiscale statistic. The confidence bands have guaranteed coverage for every $n$ and for every member of the underlying function class. Under monotonicity/convexity constraints on $f$, the proposed confidence bands automatically adapt (in terms of width) to the global and local (H\"{o}lder) smoothness and intrinsic dimensionality of the unknown $f$; the bands are also shown to be optimal in a certain sense. These bands have (almost) parametric ($n^{-1/2}$) widths when the underlying function has ``low-complexity'' (e.g., piecewise constant/affine).
In this paper, we propose the use of self-supervised pretraining on a large unlabelled data set to improve the performance of a personalized voice activity detection (VAD) model in adverse conditions. We pretrain a long short-term memory (LSTM)-encoder using the autoregressive predictive coding (APC) framework and fine-tune it for personalized VAD. We also propose a denoising variant of APC, with the goal of improving the robustness of personalized VAD. The trained models are systematically evaluated on both clean speech and speech contaminated by various types of noise at different SNR-levels and compared to a purely supervised model. Our experiments show that self-supervised pretraining not only improves performance in clean conditions, but also yields models which are more robust to adverse conditions compared to purely supervised learning.
In this paper, we introduce a new approach for constructing robust well-balanced numerical methods for the one-dimensional Saint-Venant system with and without the Manning friction term. Following the idea presented in [R. Abgrall, Commun. Appl. Math. Comput. 5(2023), pp. 370-402], we first combine the conservative and non-conservative (primitive) formulations of the studied conservative hyperbolic system in a natural way. The solution is globally continuous and described by a combination of point values and average values. The point values and average values will then be evolved by two different forms of PDEs: a conservative version of the cell averages and a possibly non-conservative one for the points. We show how to deal with both the conservative and non-conservative forms of PDEs in a well-balanced manner. The developed schemes are capable of exactly preserving both the still-water and moving-water equilibria. Compared with existing well-balanced methods, this new class of scheme is nonlinear-equations-solver-free. This makes the developed schemes less computationally costly and easier to extend to other models. We demonstrate the behavior of the proposed new scheme on several challenging examples.
In this paper, we introduce a two-level attention schema, Poolingformer, for long document modeling. Its first level uses a smaller sliding window pattern to aggregate information from neighbors. Its second level employs a larger window to increase receptive fields with pooling attention to reduce both computational cost and memory consumption. We first evaluate Poolingformer on two long sequence QA tasks: the monolingual NQ and the multilingual TyDi QA. Experimental results show that Poolingformer sits atop three official leaderboards measured by F1, outperforming previous state-of-the-art models by 1.9 points (79.8 vs. 77.9) on NQ long answer, 1.9 points (79.5 vs. 77.6) on TyDi QA passage answer, and 1.6 points (67.6 vs. 66.0) on TyDi QA minimal answer. We further evaluate Poolingformer on a long sequence summarization task. Experimental results on the arXiv benchmark continue to demonstrate its superior performance.
In this paper, we present a comprehensive review of the imbalance problems in object detection. To analyze the problems in a systematic manner, we introduce a problem-based taxonomy. Following this taxonomy, we discuss each problem in depth and present a unifying yet critical perspective on the solutions in the literature. In addition, we identify major open issues regarding the existing imbalance problems as well as imbalance problems that have not been discussed before. Moreover, in order to keep our review up to date, we provide an accompanying webpage which catalogs papers addressing imbalance problems, according to our problem-based taxonomy. Researchers can track newer studies on this webpage available at: //github.com/kemaloksuz/ObjectDetectionImbalance .
BERT, a pre-trained Transformer model, has achieved ground-breaking performance on multiple NLP tasks. In this paper, we describe BERTSUM, a simple variant of BERT, for extractive summarization. Our system is the state of the art on the CNN/Dailymail dataset, outperforming the previous best-performed system by 1.65 on ROUGE-L. The codes to reproduce our results are available at //github.com/nlpyang/BertSum
In this paper, we propose a novel multi-task learning architecture, which incorporates recent advances in attention mechanisms. Our approach, the Multi-Task Attention Network (MTAN), consists of a single shared network containing a global feature pool, together with task-specific soft-attention modules, which are trainable in an end-to-end manner. These attention modules allow for learning of task-specific features from the global pool, whilst simultaneously allowing for features to be shared across different tasks. The architecture can be built upon any feed-forward neural network, is simple to implement, and is parameter efficient. Experiments on the CityScapes dataset show that our method outperforms several baselines in both single-task and multi-task learning, and is also more robust to the various weighting schemes in the multi-task loss function. We further explore the effectiveness of our method through experiments over a range of task complexities, and show how our method scales well with task complexity compared to baselines.