This paper studies robust nonparametric regression, in which an adversarial attacker can modify the values of up to $q$ samples from a training dataset of size $N$. Our initial solution is an M-estimator based on Huber loss minimization. Compared with simple kernel regression, i.e. the Nadaraya-Watson estimator, this method can significantly weaken the impact of malicious samples on the regression performance. We provide the convergence rate as well as the corresponding minimax lower bound. The result shows that, with proper bandwidth selection, $\ell_\infty$ error is minimax optimal. The $\ell_2$ error is optimal with relatively small $q$, but is suboptimal with larger $q$. The reason is that this estimator is vulnerable if there are many attacked samples concentrating in a small region. To address this issue, we propose a correction method by projecting the initial estimate to the space of Lipschitz functions. The final estimate is nearly minimax optimal for arbitrary $q$, up to a $\ln N$ factor.
This paper considers the filtering problem which consists in reconstructing the state of a dynamical system with partial observations coming from sensor measurements, and the knowledge that the dynamics are governed by a physical PDE model with unknown parameters. We present a filtering algorithm where the reconstruction of the dynamics is done with neural network approximations whose weights are dynamically updated using observational data. In addition to the estimate of the state, we also obtain time-dependent parameter estimations of the PDE parameters governing the observed evolution. We illustrate the behavior of the method in a one-dimensional KdV equation involving the transport of solutions with local support. Our numerical investigation reveals the importance of the location and number of the observations. In particular, it suggests to consider dynamical sensor placement.
In open-set recognition (OSR), a promising strategy is exploiting pseudo-unknown data outside given $K$ known classes as an additional $K$+$1$-th class to explicitly model potential open space. However, treating unknown classes without distinction is unequal for them relative to known classes due to the category-agnostic and scale-agnostic of the unknowns. This inevitably not only disrupts the inherent distributions of unknown classes but also incurs both class-wise and instance-wise imbalances between known and unknown classes. Ideally, the OSR problem should model the whole class space as $K$+$\infty$, but enumerating all unknowns is impractical. Since the core of OSR is to effectively model the boundaries of known classes, this means just focusing on the unknowns nearing the boundaries of targeted known classes seems sufficient. Thus, as a compromise, we convert the open classes from infinite to $K$, with a novel concept Target-Aware Universum (TAU) and propose a simple yet effective framework Dual Contrastive Learning with Target-Aware Universum (DCTAU). In details, guided by the targeted known classes, TAU automatically expands the unknown classes from the previous $1$ to $K$, effectively alleviating the distribution disruption and the imbalance issues mentioned above. Then, a novel Dual Contrastive (DC) loss is designed, where all instances irrespective of known or TAU are considered as positives to contrast with their respective negatives. Experimental results indicate DCTAU sets a new state-of-the-art.
Composed image retrieval, a task involving the search for a target image using a reference image and a complementary text as the query, has witnessed significant advancements owing to the progress made in cross-modal modeling. Unlike the general image-text retrieval problem with only one alignment relation, i.e., image-text, we argue for the existence of two types of relations in composed image retrieval. The explicit relation pertains to the reference image & complementary text-target image, which is commonly exploited by existing methods. Besides this intuitive relation, the observations during our practice have uncovered another implicit yet crucial relation, i.e., reference image & target image-complementary text, since we found that the complementary text can be inferred by studying the relation between the target image and the reference image. Regrettably, existing methods largely focus on leveraging the explicit relation to learn their networks, while overlooking the implicit relation. In response to this weakness, We propose a new framework for composed image retrieval, termed dual relation alignment, which integrates both explicit and implicit relations to fully exploit the correlations among the triplets. Specifically, we design a vision compositor to fuse reference image and target image at first, then the resulted representation will serve two roles: (1) counterpart for semantic alignment with the complementary text and (2) compensation for the complementary text to boost the explicit relation modeling, thereby implant the implicit relation into the alignment learning. Our method is evaluated on two popular datasets, CIRR and FashionIQ, through extensive experiments. The results confirm the effectiveness of our dual-relation learning in substantially enhancing composed image retrieval performance.
Guessing random additive noise decoding (GRAND) is a recently proposed decoding paradigm particularly suitable for codes with short length and high rate. Among its variants, ordered reliability bits GRAND (ORBGRAND) exploits soft information in a simple and effective fashion to schedule its queries, thereby allowing efficient hardware implementation. Compared with maximum likelihood (ML) decoding, however, ORBGRAND still exhibits noticeable performance gap in terms of block error rate (BLER). In order to improve the performance of ORBGRAND while still retaining its amenability to hardware implementation, a new variant of ORBGRAND termed RS-ORBGRAND is proposed, whose basic idea is to reshuffle the queries of ORBGRAND so that the expected number of queries is minimized. Numerical simulations show that RS-ORBGRAND leads to noticeable gains compared with ORBGRAND and its existing variants, and is only 0.1dB away from ML decoding, for BLER as low as $10^{-6}$.
In Bayesian peer-to-peer decentralized data fusion, the underlying distributions held locally by autonomous agents are frequently assumed to be over the same set of variables (homogeneous). This requires each agent to process and communicate the full global joint distribution, and thus leads to high computation and communication costs irrespective of relevancy to specific local objectives. This work formulates and studies heterogeneous decentralized fusion problems, defined as the set of problems in which either the communicated or the processed distributions describe different, but overlapping, random states of interest that are subsets of a larger full global joint state. We exploit the conditional independence structure of such problems and provide a rigorous derivation of novel exact and approximate conditionally factorized heterogeneous fusion rules. We further develop a new version of the homogeneous Channel Filter algorithm to enable conservative heterogeneous fusion for smoothing and filtering scenarios in dynamic problems. Numerical examples show more than $99.5\%$ potential communication reduction for heterogeneous channel filter fusion, and a multi-target tracking simulation shows that these methods provide consistent estimates while remaining computationally scalable.
This study explores a new mathematical operator, symbolized as $\cupplus$, for information aggregation, aimed at enhancing traditional methods by directly amalgamating probability distributions. This operator facilitates the combination of probability densities, contributing a nuanced approach to probabilistic analysis. We apply this operator to a personalized incentive scenario, illustrating its potential in a practical context. The paper's primary contribution lies in introducing this operator and elucidating its elegant mathematical properties. This exploratory work marks a step forward in the field of information fusion and probabilistic reasoning.
We explore a novel methodology for constructing confidence regions for parameters of linear models, using predictions from any arbitrary predictor. Our framework requires minimal assumptions on the noise and can be extended to functions deviating from strict linearity up to some adjustable threshold, thereby accommodating a comprehensive and pragmatically relevant set of functions. The derived confidence regions can be cast as constraints within a Mixed Integer Linear Programming framework, enabling optimisation of linear objectives. This representation enables robust optimization and the extraction of confidence intervals for specific parameter coordinates. Unlike previous methods, the confidence region can be empty, which can be used for hypothesis testing. Finally, we validate the empirical applicability of our method on synthetic data.
Transformer-based models excel in various natural language processing (NLP) tasks, attracting countless efforts to explain their inner workings. Prior methods explain Transformers by focusing on the raw gradient and attention as token attribution scores, where non-relevant information is often considered during explanation computation, resulting in confusing results. In this work, we propose highlighting the important information and eliminating irrelevant information by a refined information flow on top of the layer-wise relevance propagation (LRP) method. Specifically, we consider identifying syntactic and positional heads as important attention heads and focus on the relevance obtained from these important heads. Experimental results demonstrate that irrelevant information does distort output attribution scores and then should be masked during explanation computation. Compared to eight baselines on both classification and question-answering datasets, our method consistently outperforms with over 3\% to 33\% improvement on explanation metrics, providing superior explanation performance. Our anonymous code repository is available at: //github.com/LinxinS97/Mask-LRP
2D-based Industrial Anomaly Detection has been widely discussed, however, multimodal industrial anomaly detection based on 3D point clouds and RGB images still has many untouched fields. Existing multimodal industrial anomaly detection methods directly concatenate the multimodal features, which leads to a strong disturbance between features and harms the detection performance. In this paper, we propose Multi-3D-Memory (M3DM), a novel multimodal anomaly detection method with hybrid fusion scheme: firstly, we design an unsupervised feature fusion with patch-wise contrastive learning to encourage the interaction of different modal features; secondly, we use a decision layer fusion with multiple memory banks to avoid loss of information and additional novelty classifiers to make the final decision. We further propose a point feature alignment operation to better align the point cloud and RGB features. Extensive experiments show that our multimodal industrial anomaly detection model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTec-3D AD dataset. Code is available at //github.com/nomewang/M3DM.
We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.