女生喊疼男生越往里寨的免费观看-影888午夜理论不卡

We attempt to detect out-of-distribution (OOD) text samples though applying Topological Data Analysis (TDA) to attention maps in transformer-based language models. We evaluate our proposed TDA-based approach for out-of-distribution detection on BERT, a transformer-based language model, and compare the to a more traditional OOD approach based on BERT CLS embeddings. We found that our TDA approach outperforms the CLS embedding approach at distinguishing in-distribution data (politics and entertainment news articles from HuffPost) from far out-of-domain samples (IMDB reviews), but its effectiveness deteriorates with near out-of-domain (CNN/Dailymail) or same-domain (business news articles from HuffPost) datasets.

相關內容

語言模型化

關注 9

Continuity · Weight · 時間步 · Analysis · CASE ·

2024 年 1 月 12 日

Enhancing nonlinear solvers for the Navier-Stokes equations with continuous (noisy) data assimilation

Bosco Garcia-Archilla,Xuejian Li,Julia Novo,Leo Rebholz

We consider nonlinear solvers for the incompressible, steady (or at a fixed time step for unsteady) Navier-Stokes equations in the setting where partial measurement data of the solution is available. The measurement data is incorporated/assimilated into the solution through a nudging term addition to the the Picard iteration that penalized the difference between the coarse mesh interpolants of the true solution and solver solution, analogous to how continuous data assimilation (CDA) is implemented for time dependent PDEs. This was considered in the paper [Li et al. {\it CMAME} 2023], and we extend the methodology by improving the analysis to be in the $L^2$ norm instead of a weighted $H^1$ norm where the weight depended on the coarse mesh width, and to the case of noisy measurement data. For noisy measurement data, we prove that the CDA-Picard method is stable and convergent, up to the size of the noise. Numerical tests illustrate the results, and show that a very good strategy when using noisy data is to use CDA-Picard to generate an initial guess for the classical Newton iteration.

對數幾率回歸 · 馬爾可夫鏈蒙特卡羅 · Markov · 馬爾可夫鏈 · 蒙特卡羅 ·

2024 年 1 月 12 日

Efficient posterior sampling for high-dimensional imbalanced logistic regression

Deborshee Sen,Matthias Sachs,Jianfeng Lu,David Dunson

from arxiv, 4 figures

High-dimensional data are routinely collected in many areas. We are particularly interested in Bayesian classification models in which one or more variables are imbalanced. Current Markov chain Monte Carlo algorithms for posterior computation are inefficient as $n$ and/or $p$ increase due to worsening time per step and mixing rates. One strategy is to use a gradient-based sampler to improve mixing while using data sub-samples to reduce per-step computational complexity. However, usual sub-sampling breaks down when applied to imbalanced data. Instead, we generalize piece-wise deterministic Markov chain Monte Carlo algorithms to include importance-weighted and mini-batch sub-sampling. These approaches maintain the correct stationary distribution with arbitrarily small sub-samples, and substantially outperform current competitors. We provide theoretical support and illustrate gains in simulated and real data applications.

多樣性 · MoDELS · Performer · 模型性能 · 異常點 ·

2024 年 1 月 12 日

Effects of diversity incentives on sample diversity and downstream model performance in LLM-based text augmentation

Jan Cegin,Branislav Pecher,Jakub Simko,Ivan Srba,Maria Bielikova,Peter Brusilovsky

from arxiv, 18 pages, 37 figures

The latest generative large language models (LLMs) have found their application in data augmentation tasks, where small numbers of text samples are LLM-paraphrased and then used to fine-tune the model. However, more research is needed to assess how different prompts, seed data selection strategies, filtering methods, or model settings affect the quality of paraphrased data (and downstream models). In this study, we investigate three text diversity incentive methods well established in crowdsourcing: taboo words, hints by previous outlier solutions, and chaining on previous outlier solutions. Using these incentive methods as part of instructions to LLMs augmenting text datasets, we measure their effects on generated texts' lexical diversity and downstream model performance. We compare the effects over 5 different LLMs and 6 datasets. We show that diversity is most increased by taboo words, while downstream model performance is highest when previously created paraphrases are used as hints.

Color · MoDELS · Continuity · TOOLS · Analysis ·

2024 年 1 月 12 日

survivalContour: Visualizing predicted survival via colored contour plots

Yushu Shi,Liangliang Zhang,Kim-Anh Do,Robert R. Jenq,Christine B. Peterson

Advances in survival analysis have facilitated unprecedented flexibility in data modeling, yet there remains a lack of tools for graphically illustrating the influence of continuous covariates on predicted survival outcomes. We propose the utilization of a colored contour plot to depict the predicted survival probabilities over time, and provide a Shiny app and R package as implementations of this tool. Our approach is capable of supporting conventional models, including the Cox and Fine-Gray models. However, its capability shines when coupled with cutting-edge machine learning models such as random survival forests and deep neural networks.

指數衰減 · 變換 · 泛函 · Better · 近似 ·

2024 年 1 月 12 日

Error analyses of Sinc-collocation methods for exponential decay initial value problems

Tomoaki Okayama,Ryota Hara,Shun'ichi Goto

from arxiv, Keywork: Ordinary differential equations, Initial value problems, Volterra integral equations, Sinc numerical methods, SE transformation, DE transformation

Nurmuhammad et al. developed the Sinc-Nystr\"{o}m methods for initial value problems in which the solutions exhibit exponential decay end behavior. In these methods, the Single-Exponential (SE) transformation or the Double-Exponential (DE) transformation is combined with the Sinc approximation. Hara and Okayama improved on these transformations to attain a better convergence rate, which was later supported by theoretical error analyses. However, these methods have a computational drawback owing to the inclusion of a special function in the basis functions. To address this issue, Okayama and Hara proposed Sinc-collocation methods, which do not include any special function in the basis functions. This study conducts error analyses of these methods.

優化器 · Medium · 估計/估計量 · 值域 · 情景 ·

2024 年 1 月 11 日

Optimal artificial boundary conditions based on second-order correctors for three dimensional random elliptic media

Jianfeng Lu,Felix Otto,Lihan Wang

We are interested in numerical algorithms for computing the electrical field generated by a charge distribution localized on scale $\ell$ in an infinite heterogeneous medium, in a situation where the medium is only known in a box of diameter $L\gg\ell$ around the support of the charge. We propose a boundary condition that with overwhelming probability is (near) optimal with respect to scaling in terms of $\ell$ and $L$, in the setting where the medium is a sample from a stationary ensemble with a finite range of dependence (set to be unity and with the assumption that $\ell \gg 1$). The boundary condition is motivated by quantitative stochastic homogenization that allows for a multipole expansion [BGO20]. This work extends [LO21], the algorithm in which is optimal in two dimension, and thus we need to take quadrupoles, next to dipoles, into account. This in turn relies on stochastic estimates of second-order, next to first-order, correctors. These estimates are provided for finite range ensembles under consideration, based on an extension of the semi-group approach of [GO15].

語言模型化 · Performer · 大語言模型 · MoDELS · Extensibility ·

2024 年 1 月 11 日

Lightweight reranking for language model generations

Siddhartha Jain,Xiaofei Ma,Anoop Deoras,Bing Xiang

Large Language Models (LLMs) can exhibit considerable variation in the quality of their sampled outputs. Reranking and selecting the best generation from the sampled set is a popular way of obtaining strong gains in generation quality. In this paper, we present a novel approach for reranking LLM generations. Unlike other techniques that might involve additional inferences or training a specialized reranker, our approach relies on easy to compute pairwise statistics between the generations that have minimal compute overhead. We show that our approach can be formalized as an extension of self-consistency and analyze its performance in that framework, theoretically as well as via simulations. We show strong improvements for selecting the best k generations for code generation tasks as well as robust improvements for the best generation for the tasks of autoformalization, summarization, and translation. While our approach only assumes black-box access to LLMs, we show that additional access to token probabilities can improve performance even further.

對抗樣本 · Performer · 樣本 · MoDELS · 黑盒 ·

2024 年 1 月 11 日

GE-AdvGAN: Improving the transferability of adversarial samples by gradient editing-based adversarial generative model

Zhiyu Zhu,Huaming Chen,Xinyi Wang,Jiayu Zhang,Zhibo Jin,Kim-Kwang Raymond Choo

from arxiv, Accepted by SIAM International Conference on Data Mining (SDM24)

Adversarial generative models, such as Generative Adversarial Networks (GANs), are widely applied for generating various types of data, i.e., images, text, and audio. Accordingly, its promising performance has led to the GAN-based adversarial attack methods in the white-box and black-box attack scenarios. The importance of transferable black-box attacks lies in their ability to be effective across different models and settings, more closely aligning with real-world applications. However, it remains challenging to retain the performance in terms of transferable adversarial examples for such methods. Meanwhile, we observe that some enhanced gradient-based transferable adversarial attack algorithms require prolonged time for adversarial sample generation. Thus, in this work, we propose a novel algorithm named GE-AdvGAN to enhance the transferability of adversarial samples whilst improving the algorithm's efficiency. The main approach is via optimising the training process of the generator parameters. With the functional and characteristic similarity analysis, we introduce a novel gradient editing (GE) mechanism and verify its feasibility in generating transferable samples on various models. Moreover, by exploring the frequency domain information to determine the gradient editing direction, GE-AdvGAN can generate highly transferable adversarial samples while minimizing the execution time in comparison to the state-of-the-art transferable adversarial attack algorithms. The performance of GE-AdvGAN is comprehensively evaluated by large-scale experiments on different datasets, which results demonstrate the superiority of our algorithm. The code for our algorithm is available at: //github.com/LMBTough/GE-advGAN

縮放 · 等變 · Networking · Performer · 表示 ·

2024 年 1 月 11 日

Riesz feature representation: scale equivariant scattering network for classification tasks

Tin Barisin,Jesus Angulo,Katja Schladitz,Claudia Redenbach

Scattering networks yield powerful and robust hierarchical image descriptors which do not require lengthy training and which work well with very few training data. However, they rely on sampling the scale dimension. Hence, they become sensitive to scale variations and are unable to generalize to unseen scales. In this work, we define an alternative feature representation based on the Riesz transform. We detail and analyze the mathematical foundations behind this representation. In particular, it inherits scale equivariance from the Riesz transform and completely avoids sampling of the scale dimension. Additionally, the number of features in the representation is reduced by a factor four compared to scattering networks. Nevertheless, our representation performs comparably well for texture classification with an interesting addition: scale equivariance. Our method yields superior performance when dealing with scales outside of those covered by the training dataset. The usefulness of the equivariance property is demonstrated on the digit classification task, where accuracy remains stable even for scales four times larger than the one chosen for training. As a second example, we consider classification of textures.

MoDELS · Performer · tuning · 評論員 · 語言模型化 ·

2024 年 1 月 11 日

Harnessing large-language models to generate private synthetic text

Alexey Kurakin,Natalia Ponomareva,Umar Syed,Liam MacDermed,Andreas Terzis

from arxiv, 31 pages; 7 figures; compared to previous version added result of LoRa-finetuning

Differentially private training algorithms like DP-SGD protect sensitive training data by ensuring that trained models do not reveal private information. An alternative approach, which this paper studies, is to use a sensitive dataset to generate synthetic data that is differentially private with respect to the original data, and then non-privately training a model on the synthetic data. Doing so has several advantages: synthetic data can be reused for other tasks (including for hyper parameter tuning), retained indefinitely, and shared with third parties without sacrificing privacy. However, generating private synthetic data is much harder than training a private model. To improve performance on text data, recent work has utilized public data by starting with a pre-trained generative language model and privately fine-tuning it on sensitive data. This model can be used to sample a DP synthetic dataset. While this strategy seems straightforward, executing it has proven problematic. Previous approaches either show significant performance loss, or have, as we show, critical design flaws. In this paper we demonstrate that a proper training objective along with tuning fewer parameters results in excellent DP synthetic data quality. Our approach is competitive with direct DP-training of downstream classifiers in terms of performance on downstream tasks. Further, we demonstrate that our DP synthetic data is not only useful for downstream classifier training, but also to tune those same models.