日本一区二区三区不卡网站,AV乱码AV免费AⅤ成人,成年人在线免费观看小视频

Pairwise sequence comparison is one of the most fundamental problems in string processing. The most common metric to quantify the similarity between sequences S and T is edit distance, d(S,T), which corresponds to the number of characters that need to be substituted, deleted from, or inserted into S to generate T. However, fewer edit operations may be sufficient for some string pairs to transform one string to the other if larger rearrangements are permitted. Block edit distance refers to such changes in substring level (i.e., blocks) that "penalizes" entire block removals, insertions, copies, and reversals with the same cost as single-character edits (Lopresti & Tomkins, 1997). Most studies to calculate block edit distance to date aimed only to characterize the distance itself for applications in sequence nearest neighbor search without reporting the full alignment details. Although a few tools try to solve block edit distance for genomic sequences, such as GR-Aligner, they have limited functionality and are no longer maintained. Here, we present SABER, an algorithm to solve block edit distance that supports block deletions, block moves, and block reversals in addition to the classical single-character edit operations. Our algorithm runs in O(m^2.n.l_range) time for |S|=m, |T|=n and the permitted block size range of l_range; and can report all breakpoints for the block operations. We also provide an implementation of SABER currently optimized for genomic sequences (i.e., generated by the DNA alphabet), although the algorithm can theoretically be used for any alphabet. SABER is available at //github.com/BilkentCompGen/saber

相關內容

塊

關注 0

Processing（編程語言） · 線性的 · 操作 · 樣本 · 線性變換 ·

2024 年 1 月 11 日

Images of Gaussian and other stochastic processes under closed, densely-defined, unbounded linear operators

Tadashi Matsumoto,T. J. Sullivan

from arxiv, 13 pages

Gaussian processes (GPs) are widely-used tools in spatial statistics and machine learning and the formulae for the mean function and covariance kernel of a GP $T u$ that is the image of another GP $u$ under a linear transformation $T$ acting on the sample paths of $u$ are well known, almost to the point of being folklore. However, these formulae are often used without rigorous attention to technical details, particularly when $T$ is an unbounded operator such as a differential operator, which is common in many modern applications. This note provides a self-contained proof of the claimed formulae for the case of a closed, densely-defined operator $T$ acting on the sample paths of a square-integrable (not necessarily Gaussian) stochastic process. Our proof technique relies upon Hille's theorem for the Bochner integral of a Banach-valued random variable.

對抗樣本 · Performer · 樣本 · MoDELS · 黑盒 ·

2024 年 1 月 11 日

GE-AdvGAN: Improving the transferability of adversarial samples by gradient editing-based adversarial generative model

Zhiyu Zhu,Huaming Chen,Xinyi Wang,Jiayu Zhang,Zhibo Jin,Kim-Kwang Raymond Choo

from arxiv, Accepted by SIAM International Conference on Data Mining (SDM24)

Adversarial generative models, such as Generative Adversarial Networks (GANs), are widely applied for generating various types of data, i.e., images, text, and audio. Accordingly, its promising performance has led to the GAN-based adversarial attack methods in the white-box and black-box attack scenarios. The importance of transferable black-box attacks lies in their ability to be effective across different models and settings, more closely aligning with real-world applications. However, it remains challenging to retain the performance in terms of transferable adversarial examples for such methods. Meanwhile, we observe that some enhanced gradient-based transferable adversarial attack algorithms require prolonged time for adversarial sample generation. Thus, in this work, we propose a novel algorithm named GE-AdvGAN to enhance the transferability of adversarial samples whilst improving the algorithm's efficiency. The main approach is via optimising the training process of the generator parameters. With the functional and characteristic similarity analysis, we introduce a novel gradient editing (GE) mechanism and verify its feasibility in generating transferable samples on various models. Moreover, by exploring the frequency domain information to determine the gradient editing direction, GE-AdvGAN can generate highly transferable adversarial samples while minimizing the execution time in comparison to the state-of-the-art transferable adversarial attack algorithms. The performance of GE-AdvGAN is comprehensively evaluated by large-scale experiments on different datasets, which results demonstrate the superiority of our algorithm. The code for our algorithm is available at: //github.com/LMBTough/GE-advGAN

Microsoft Surface · 離散化 · Analysis · 泛化理論 · 周期的 ·

2024 年 1 月 11 日

Realistic pattern formations on surfaces by adding arbitrary roughness

Siqing Li,Leevan Ling,Steven J. Ruuth,Xuemeng Wang

from arxiv, 22 pages, 16 figures

We are interested in generating surfaces with arbitrary roughness and forming patterns on the surfaces. Two methods are applied to construct rough surfaces. In the first method, some superposition of wave functions with random frequencies and angles of propagation are used to get periodic rough surfaces with analytic parametric equations. The amplitude of such surfaces is also an important variable in the provided eigenvalue analysis for the Laplace-Beltrami operator and in the generation of pattern formation. Numerical experiments show that the patterns become irregular as the amplitude and frequency of the rough surface increase. For the sake of easy generalization to closed manifolds, we propose a second construction method for rough surfaces, which uses random nodal values and discretized heat filters. We provide numerical evidence that both surface {construction methods} yield comparable patterns to those {observed} in real-life animals.

MoDELS · Performer · tuning · 評論員 · 語言模型化 ·

2024 年 1 月 11 日

Harnessing large-language models to generate private synthetic text

Alexey Kurakin,Natalia Ponomareva,Umar Syed,Liam MacDermed,Andreas Terzis

from arxiv, 31 pages; 7 figures; compared to previous version added result of LoRa-finetuning

Differentially private training algorithms like DP-SGD protect sensitive training data by ensuring that trained models do not reveal private information. An alternative approach, which this paper studies, is to use a sensitive dataset to generate synthetic data that is differentially private with respect to the original data, and then non-privately training a model on the synthetic data. Doing so has several advantages: synthetic data can be reused for other tasks (including for hyper parameter tuning), retained indefinitely, and shared with third parties without sacrificing privacy. However, generating private synthetic data is much harder than training a private model. To improve performance on text data, recent work has utilized public data by starting with a pre-trained generative language model and privately fine-tuning it on sensitive data. This model can be used to sample a DP synthetic dataset. While this strategy seems straightforward, executing it has proven problematic. Previous approaches either show significant performance loss, or have, as we show, critical design flaws. In this paper we demonstrate that a proper training objective along with tuning fewer parameters results in excellent DP synthetic data quality. Our approach is competitive with direct DP-training of downstream classifiers in terms of performance on downstream tasks. Further, we demonstrate that our DP synthetic data is not only useful for downstream classifier training, but also to tune those same models.

PCA · Analysis · 降維 · 數據填補 · 可約的 ·

2024 年 1 月 10 日

Blockwise Principal Component Analysis for monotone missing data imputation and dimensionality reduction

Tu T. Do,Mai Anh Vu,Tuan L. Vo,Hoang Thien Ly,Thu Nguyen,Steven A. Hicks,Michael A. Riegler,P?l Halvorsen,Binh T. Nguyen

Monotone missing data is a common problem in data analysis. However, imputation combined with dimensionality reduction can be computationally expensive, especially with the increasing size of datasets. To address this issue, we propose a Blockwise principal component analysis Imputation (BPI) framework for dimensionality reduction and imputation of monotone missing data. The framework conducts Principal Component Analysis (PCA) on the observed part of each monotone block of the data and then imputes on merging the obtained principal components using a chosen imputation technique. BPI can work with various imputation techniques and can significantly reduce imputation time compared to conducting dimensionality reduction after imputation. This makes it a practical and efficient approach for large datasets with monotone missing data. Our experiments validate the improvement in speed. In addition, our experiments also show that while applying MICE imputation directly on missing data may not yield convergence, applying BPI with MICE for the data may lead to convergence.

時間步 · Extensibility · 可約的 · Continuity · 離散化 ·

2024 年 1 月 10 日

A high-order multi-time-step scheme for bond-based peridynamics

Chenguang Liu,Jie Sun,Hao Tian,WaiSun Don,Lili Ju

A high-order multi-time-step (MTS) scheme for the bond-based peridynamic (PD) model, an extension of classical continuous mechanics widely used for analyzing discontinuous problems like cracks, is proposed. The MTS scheme discretizes the spatial domain with a meshfree method and advances in time with a high-order Runge-Kutta method. To effectively handle discontinuities (cracks) that appear in a local subdomain in the solution, the scheme employs the Taylor expansion and Lagrange interpolation polynomials with a finer time step size, that is, coarse and fine time step sizes for smooth and discontinuous subdomains, respectively, to achieve accurate and efficient simulations. By eliminating unnecessary fine-scale resolution imposed on the entire domain, the MTS scheme outperforms the standard PD scheme by significantly reducing computational costs, particularly for problems with discontinuous solutions, as demonstrated by comprehensive theoretical analysis and numerical experiments.

估計/估計量 · 優化器 · 泛函 · 近似 · Processing（編程語言） ·

2024 年 1 月 9 日

Efficient estimation for ergodic diffusion processes sampled at high frequency

Michael S?rensen

A general theory of efficient estimation for ergodic diffusion processes sampled at high frequency with an infinite time horizon is presented. High frequency sampling is common in many applications, with finance as a prominent example. The theory is formulated in term of approximate martingale estimating functions and covers a large class of estimators including most of the previously proposed estimators for diffusion processes. Easily checked conditions ensuring that an estimating function is an approximate martingale are derived, and general conditions ensuring consistency and asymptotic normality of estimators are given. Most importantly, simple conditions are given that ensure rate optimality and efficiency. Rate optimal estimators of parameters in the diffusion coefficient converge faster than estimators of drift coefficient parameters because they take advantage of the information in the quadratic variation. The conditions facilitate the choice among the multitude of estimators that have been proposed for diffusion models. Optimal martingale estimating functions in the sense of Godambe and Heyde and their high frequency approximations are, under weak conditions, shown to satisfy the conditions for rate optimality and efficiency. This provides a natural feasible method of constructing explicit rate optimal and efficient estimating functions by solving a linear equation.

2024 年 1 月 9 日

DepressionEmo: A novel dataset for multilabel classification of depression emotions

Abu Bakar Siddiqur Rahman,Hoang-Thang Ta,Lotfollah Najjar,Azad Azadmanesh,Ali Saffet G?nül

from arxiv, 18 pages

Emotions are integral to human social interactions, with diverse responses elicited by various situational contexts. Particularly, the prevalence of negative emotional states has been correlated with negative outcomes for mental health, necessitating a comprehensive analysis of their occurrence and impact on individuals. In this paper, we introduce a novel dataset named DepressionEmo designed to detect 8 emotions associated with depression by 6037 examples of long Reddit user posts. This dataset was created through a majority vote over inputs by zero-shot classifications from pre-trained models and validating the quality by annotators and ChatGPT, exhibiting an acceptable level of interrater reliability between annotators. The correlation between emotions, their distribution over time, and linguistic analysis are conducted on DepressionEmo. Besides, we provide several text classification methods classified into two groups: machine learning methods such as SVM, XGBoost, and Light GBM; and deep learning methods such as BERT, GAN-BERT, and BART. The pretrained BART model, bart-base allows us to obtain the highest F1- Macro of 0.76, showing its outperformance compared to other methods evaluated in our analysis. Across all emotions, the highest F1-Macro value is achieved by suicide intent, indicating a certain value of our dataset in identifying emotions in individuals with depression symptoms through text analysis. The curated dataset is publicly available at: //github.com/abuBakarSiddiqurRahman/DepressionEmo.

變換 · 傅立葉變換 · Integration · 離散化 · 樣例 ·

2024 年 1 月 9 日

Multi-domain spectral approach to rational-order fractional derivatives

C. Klein,N. Stoilov

We propose a method to numerically compute fractional derivatives (or the fractional Laplacian) on the whole real line via Riesz fractional integrals. The compactified real line is divided into a number of intervals, thus amounting to a multi-domain approach; after transformations in accordance with the underlying $Z_{q}$ curve ensuring analyticity of the respective integrands, the integrals over the different domains are computed with a Clenshaw-Curtis algorithm. As an example, we consider solitary waves for fractional Korteweg-de Vries equations and compare these to results obtained with a discrete Fourier transform.

目標檢測 · Mask-RCNN · MS · 過采樣 · Performer ·

2019 年 2 月 19 日

Augmentation for small object detection

Mate Kisantal,Zbigniew Wojna,Jakub Murawski,Jacek Naruniec,Kyunghyun Cho

In recent years, object detection has experienced impressive progress. Despite these improvements, there is still a significant gap in the performance between the detection of small and large objects. We analyze the current state-of-the-art model, Mask-RCNN, on a challenging dataset, MS COCO. We show that the overlap between small ground-truth objects and the predicted anchors is much lower than the expected IoU threshold. We conjecture this is due to two factors; (1) only a few images are containing small objects, and (2) small objects do not appear enough even within each image containing them. We thus propose to oversample those images with small objects and augment each of those images by copy-pasting small objects many times. It allows us to trade off the quality of the detector on large objects with that on small objects. We evaluate different pasting augmentation strategies, and ultimately, we achieve 9.7\% relative improvement on the instance segmentation and 7.1\% on the object detection of small objects, compared to the current state of the art method on MS COCO.