The field of adversarial textual attack has significantly grown over the last few years, where the commonly considered objective is to craft adversarial examples (AEs) that can successfully fool the target model. However, the imperceptibility of attacks, which is also essential for practical attackers, is often left out by previous studies. In consequence, the crafted AEs tend to have obvious structural and semantic differences from the original human-written text, making them easily perceptible. In this work, we advocate leveraging multi-objectivization to address such issue. Specifically, we reformulate the problem of crafting AEs as a multi-objective optimization problem, where the attack imperceptibility is considered as an auxiliary objective. Then, we propose a simple yet effective evolutionary algorithm, dubbed HydraText, to solve this problem. To the best of our knowledge, HydraText is currently the only approach that can be effectively applied to both score-based and decision-based attack settings. Exhaustive experiments involving 44237 instances demonstrate that HydraText consistently achieves competitive attack success rates and better attack imperceptibility than the recently proposed attack approaches. A human evaluation study also shows that the AEs crafted by HydraText are more indistinguishable from human-written text. Finally, these AEs exhibit good transferability and can bring notable robustness improvement to the target model by adversarial training.
Methods for anomaly detection of new physics processes are often limited to low-dimensional spaces due to the difficulty of learning high-dimensional probability densities. Particularly at the constituent level, incorporating desirable properties such as permutation invariance and variable-length inputs becomes difficult within popular density estimation methods. In this work, we introduce a permutation-invariant density estimator for particle physics data based on diffusion models, specifically designed to handle variable-length inputs. We demonstrate the efficacy of our methodology by utilizing the learned density as a permutation-invariant anomaly detection score, effectively identifying jets with low likelihood under the background-only hypothesis. To validate our density estimation method, we investigate the ratio of learned densities and compare to those obtained by a supervised classification algorithm.
Constant (naive) imputation is still widely used in practice as this is a first easy-to-use technique to deal with missing data. Yet, this simple method could be expected to induce a large bias for prediction purposes, as the imputed input may strongly differ from the true underlying data. However, recent works suggest that this bias is low in the context of high-dimensional linear predictors when data is supposed to be missing completely at random (MCAR). This paper completes the picture for linear predictors by confirming the intuition that the bias is negligible and that surprisingly naive imputation also remains relevant in very low dimension.To this aim, we consider a unique underlying random features model, which offers a rigorous framework for studying predictive performances, whilst the dimension of the observed features varies.Building on these theoretical results, we establish finite-sample bounds on stochastic gradient (SGD) predictors applied to zero-imputed data, a strategy particularly well suited for large-scale learning.If the MCAR assumption appears to be strong, we show that similar favorable behaviors occur for more complex missing data scenarios.
Vecchia approximation has been widely used to accurately scale Gaussian-process (GP) inference to large datasets, by expressing the joint density as a product of conditional densities with small conditioning sets. We study fixed-domain asymptotic properties of Vecchia-based GP inference for a large class of covariance functions (including Mat\'ern covariances) with boundary conditioning. In this setting, we establish that consistency and asymptotic normality of maximum exact-likelihood estimators imply those of maximum Vecchia-likelihood estimators, and that exact GP prediction can be approximated accurately by Vecchia GP prediction, given that the size of conditioning sets grows polylogarithmically with the data size. Hence, Vecchia-based inference with quasilinear complexity is asymptotically equivalent to exact GP inference with cubic complexity. This also provides a general new result on the screening effect. Our findings are illustrated by numerical experiments, which also show that Vecchia approximation can be more accurate than alternative approaches such as covariance tapering and reduced-rank approximations.
The goal of anomaly detection is to identify observations that are generated by a distribution that differs from the reference distribution that qualifies normal behavior. When examining a time series, the reference distribution may evolve over time. The anomaly detector must therefore be able to adapt to such changes. In the online context, it is particularly difficult to adapt to abrupt and unpredictable changes. Our solution to this problem is based on the detection of breakpoints in order to adapt in real time to the new reference behavior of the series and to increase the accuracy of the anomaly detection. This solution also provides a control of the False Discovery Rate by extending methods developed for stationary series.
We consider finite element approximations to the optimal constant for the Hardy inequality with exponent $p=2$ in bounded domains of dimension $n=1$ or $n \geq 3$. For finite element spaces of piecewise linear and continuous functions on a mesh of size $h$, we prove that the approximate Hardy constant converges to the optimal Hardy constant at a rate proportional to $1/| \log h |^2$. This result holds in dimension $n=1$, in any dimension $n \geq 3$ if the domain is the unit ball and the finite element discretization exploits the rotational symmetry of the problem, and in dimension $n=3$ for general finite element discretizations of the unit ball. In the first two cases, our estimates show excellent quantitative agreement with values of the discrete Hardy constant obtained computationally.
Support Vector Machines (SVMs) are an important tool for performing classification on scattered data, where one usually has to deal with many data points in high-dimensional spaces. We propose solving SVMs in primal form using feature maps based on trigonometric functions or wavelets. In small dimensional settings the Fast Fourier Transform (FFT) and related methods are a powerful tool in order to deal with the considered basis functions. For growing dimensions the classical FFT-based methods become inefficient due to the curse of dimensionality. Therefore, we restrict ourselves to multivariate basis functions, each one of them depends only on a small number of dimensions. This is motivated by the well-known sparsity of effects and recent results regarding the reconstruction of functions from scattered data in terms of truncated analysis of variance (ANOVA) decomposition, which makes the resulting model even interpretable in terms of importance of the features as well as their couplings. The usage of small superposition dimensions has the consequence that the computational effort no longer grows exponentially but only polynomially with respect to the dimension. In order to enforce sparsity regarding the basis coefficients, we use the frequently applied $\ell_2$-norm and, in addition, $\ell_1$-norm regularization. The found classifying function, which is the linear combination of basis functions, and its variance can then be analyzed in terms of the classical ANOVA decomposition of functions. Based on numerical examples we show that we are able to recover the signum of a function that perfectly fits our model assumptions. We obtain better results with $\ell_1$-norm regularization, both in terms of accuracy and clarity of interpretability.
A rectangulation is a decomposition of a rectangle into finitely many rectangles. Via natural equivalence relations, rectangulations can be seen as combinatorial objects with a rich structure, with links to lattice congruences, flip graphs, polytopes, lattice paths, Hopf algebras, etc. In this paper, we first revisit the structure of the respective equivalence classes: weak rectangulations that preserve rectangle-segment adjacencies, and strong rectangulations that preserve rectangle-rectangle adjacencies. We thoroughly investigate posets defined by adjacency in rectangulations of both kinds, and unify and simplify known bijections between rectangulations and permutation classes. This yields a uniform treatment of mappings between permutations and rectangulations that unifies the results from earlier contributions, and emphasizes parallelism and differences between the weak and the strong cases. Then, we consider the special case of guillotine rectangulations, and prove that they can be characterized - under all known mappings between permutations and rectangulations - by avoidance of two mesh patterns that correspond to "windmills" in rectangulations. This yields new permutation classes in bijection with weak guillotine rectangulations, and the first known permutation class in bijection with strong guillotine rectangulations. Finally, we address enumerative issues and prove asymptotic bounds for several families of strong rectangulations.
Despite the predominance of contextualized embeddings in NLP, approaches to detect semantic change relying on these embeddings and clustering methods underperform simpler counterparts based on static word embeddings. This stems from the poor quality of the clustering methods to produce sense clusters -- which struggle to capture word senses, especially those with low frequency. This issue hinders the next step in examining how changes in word senses in one language influence another. To address this issue, we propose a graph-based clustering approach to capture nuanced changes in both high- and low-frequency word senses across time and languages, including the acquisition and loss of these senses over time. Our experimental results show that our approach substantially surpasses previous approaches in the SemEval2020 binary classification task across four languages. Moreover, we showcase the ability of our approach as a versatile visualization tool to detect semantic changes in both intra-language and inter-language setups. We make our code and data publicly available.
Conventional entity typing approaches are based on independent classification paradigms, which make them difficult to recognize inter-dependent, long-tailed and fine-grained entity types. In this paper, we argue that the implicitly entailed extrinsic and intrinsic dependencies between labels can provide critical knowledge to tackle the above challenges. To this end, we propose \emph{Label Reasoning Network(LRN)}, which sequentially reasons fine-grained entity labels by discovering and exploiting label dependencies knowledge entailed in the data. Specifically, LRN utilizes an auto-regressive network to conduct deductive reasoning and a bipartite attribute graph to conduct inductive reasoning between labels, which can effectively model, learn and reason complex label dependencies in a sequence-to-set, end-to-end manner. Experiments show that LRN achieves the state-of-the-art performance on standard ultra fine-grained entity typing benchmarks, and can also resolve the long tail label problem effectively.
To retrieve more relevant, appropriate and useful documents given a query, finding clues about that query through the text is crucial. Recent deep learning models regard the task as a term-level matching problem, which seeks exact or similar query patterns in the document. However, we argue that they are inherently based on local interactions and do not generalise to ubiquitous, non-consecutive contextual relationships.In this work, we propose a novel relevance matching model based on graph neural networks to leverage the document-level word relationships for ad-hoc retrieval. In addition to the local interactions, we explicitly incorporate all contexts of a term through the graph-of-word text format. Matching patterns can be revealed accordingly to provide a more accurate relevance score. Our approach significantly outperforms strong baselines on two ad-hoc benchmarks. We also experimentally compare our model with BERT and show our ad-vantages on long documents.