Artificial Intelligence (AI)-driven defect inspection is pivotal in industrial manufacturing. Yet, many methods, tailored to specific pipelines, grapple with diverse product portfolios and evolving processes. Addressing this, we present the Incremental Unified Framework (IUF), which can reduce the feature conflict problem when continuously integrating new objects in the pipeline, making it advantageous in object-incremental learning scenarios. Employing a state-of-the-art transformer, we introduce Object-Aware Self-Attention (OASA) to delineate distinct semantic boundaries. Semantic Compression Loss (SCL) is integrated to optimize non-primary semantic space, enhancing network adaptability for novel objects. Additionally, we prioritize retaining the features of established objects during weight updates. Demonstrating prowess in both image and pixel-level defect inspection, our approach achieves state-of-the-art performance, proving indispensable for dynamic and scalable industrial inspections. Our code will be released at \url{//github.com/jqtangust/IUF}.
We introduce Cluster Edge Modification problems with constraints on the size of the clusters and study their complexity. A graph $G$ is a cluster graph if every connected component of $G$ is a clique. In a typical Cluster Edge Modification problem such as the widely studied Cluster Editing, we are given a graph $G$ and a non-negative integer $k$ as input, and we have to decide if we can turn $G$ into a cluster graph by way of at most $k$ edge modifications -- that is, by adding or deleting edges. In this paper, we study the parameterized complexity of such problems, but with an additional constraint: The size difference between any two connected components of the resulting cluster graph should not exceed a given threshold. Depending on which modifications are permissible -- only adding edges, only deleting edges, both adding and deleting edges -- we have three different computational problems. We show that all three problems, when parameterized by $k$, admit single-exponential time FPT algorithms and polynomial kernels. Our problems may be thought of as the size-constrained or balanced counterparts of the typical Cluster Edge Modification problems, similar to the well-studied size-constrained or balanced counterparts of other clustering problems such as $k$-Means Clustering.
Floating Car Observers (FCOs) are an innovative method to collect traffic data by deploying sensor-equipped vehicles to detect and locate other vehicles. We demonstrate that even a small penetration rate of FCOs can identify a significant amount of vehicles at a given intersection. This is achieved through the emulation of detection within a microscopic traffic simulation. Additionally, leveraging data from previous moments can enhance the detection of vehicles in the current frame. Our findings indicate that, with a 20-second observation window, it is possible to recover up to 20\% of vehicles that are not visible by FCOs in the current timestep. To exploit this, we developed a data-driven strategy, utilizing sequences of Bird's Eye View (BEV) representations of detected vehicles and deep learning models. This approach aims to bring currently undetected vehicles into view in the present moment, enhancing the currently detected vehicles. Results of different spatiotemporal architectures show that up to 41\% of the vehicles can be recovered into the current timestep at their current position. This enhancement enriches the information initially available by the FCO, allowing an improved estimation of traffic states and metrics (e.g. density and queue length) for improved implementation of traffic management strategies.
Texture editing is a crucial task in 3D modeling that allows users to automatically manipulate the surface materials of 3D models. However, the inherent complexity of 3D models and the ambiguous text description lead to the challenge in this task. To address this challenge, we propose ITEM3D, a \textbf{T}exture \textbf{E}diting \textbf{M}odel designed for automatic \textbf{3D} object editing according to the text \textbf{I}nstructions. Leveraging the diffusion models and the differentiable rendering, ITEM3D takes the rendered images as the bridge of text and 3D representation, and further optimizes the disentangled texture and environment map. Previous methods adopted the absolute editing direction namely score distillation sampling (SDS) as the optimization objective, which unfortunately results in the noisy appearance and text inconsistency. To solve the problem caused by the ambiguous text, we introduce a relative editing direction, an optimization objective defined by the noise difference between the source and target texts, to release the semantic ambiguity between the texts and images. Additionally, we gradually adjust the direction during optimization to further address the unexpected deviation in the texture domain. Qualitative and quantitative experiments show that our ITEM3D outperforms the state-of-the-art methods on various 3D objects. We also perform text-guided relighting to show explicit control over lighting. Our project page: //shengqiliu1.github.io/ITEM3D.
The Stochastic Gradient Descent method (SGD) and its stochastic variants have become methods of choice for solving finite-sum optimization problems arising from machine learning and data science thanks to their ability to handle large-scale applications and big datasets. In the last decades, researchers have made substantial effort to study the theoretical performance of SGD and its shuffling variants. However, only limited work has investigated its shuffling momentum variants, including shuffling heavy-ball momentum schemes for non-convex problems and Nesterov's momentum for convex settings. In this work, we extend the analysis of the shuffling momentum gradient method developed in [Tran et al (2021)] to both finite-sum convex and strongly convex optimization problems. We provide the first analysis of shuffling momentum-based methods for the strongly convex setting, attaining a convergence rate of $O(1/nT^2)$, where $n$ is the number of samples and $T$ is the number of training epochs. Our analysis is a state-of-the-art, matching the best rates of existing shuffling stochastic gradient algorithms in the literature.
Obtaining accurate and timely channel state information (CSI) is a fundamental challenge for large antenna systems. Mobile systems like 5G use a beam management framework that joins the initial access, beamforming, CSI acquisition, and data transmission. The design of codebooks for these stages, however, is challenging due to their interrelationships, varying array sizes, and site-specific channel and user distributions. Furthermore, beam management is often focused on single-sector operations while ignoring the overarching network- and system-level optimization. In this paper, we proposed an end-to-end learned codebook design algorithm, network beamspace learning (NBL), that captures and optimizes codebooks to mitigate interference while maximizing the achievable performance with extremely large hybrid arrays. The proposed algorithm requires limited shared information yet designs codebooks that outperform traditional codebooks by over 10dB in beam alignment and achieve more than 25% improvements in network spectral efficiency.
Spectral Graph Neural Networks (GNNs), also referred to as graph filters have gained increasing prevalence for heterophily graphs. Optimal graph filters rely on Laplacian eigendecomposition for Fourier transform. In an attempt to avert the prohibitive computations, numerous polynomial filters by leveraging distinct polynomials have been proposed to approximate the desired graph filters. However, polynomials in the majority of polynomial filters are predefined and remain fixed across all graphs, failing to accommodate the diverse heterophily degrees across different graphs. To tackle this issue, we first investigate the correlation between polynomial bases of desired graph filters and the degrees of graph heterophily via a thorough theoretical analysis. Afterward, we develop an adaptive heterophily basis by incorporating graph heterophily degrees. Subsequently, we integrate this heterophily basis with the homophily basis, creating a universal polynomial basis UniBasis. In consequence, we devise a general polynomial filter UniFilter. Comprehensive experiments on both real-world and synthetic datasets with varying heterophily degrees significantly support the superiority of UniFilter, demonstrating the effectiveness and generality of UniBasis, as well as its promising capability as a new method for graph analysis.
We study the Densest Subgraph (DSG) problem under the additional constraint of differential privacy. DSG is a fundamental theoretical question which plays a central role in graph analytics, and so privacy is a natural requirement. But all known private algorithms for Densest Subgraph lose constant multiplicative factors as well as relative large (at least $\log^2 n$) additive factors, despite the existence of non-private exact algorithms. We show that, perhaps surprisingly, these losses are not necessary: in both the classic differential privacy model and the LEDP model (local edge differential privacy, introduced recently by Dhulipala et al. [FOCS 2022]), we give $(\epsilon, \delta)$-differentially private algorithms with no multiplicative loss whatsoever. In other words, the loss is purely additive. Moreover, our additive losses improve the best-known previous additive loss (in any version of differential privacy) when $1/\delta$ is at least polynomial in $n$, and are almost tight: in the centralized setting, our additive loss is $O(\log n /\epsilon)$ while there is a known lower bound of $\Omega(\sqrt{\log n / \epsilon})$. Additionally, we give a different algorithm that is $\epsilon$-differentially private in the LEDP model which achieves a multiplicative ratio arbitrarily close to $2$, along with an additional additive factor. This improves over the previous multiplicative $4$-approximation in the LEDP model. Finally, we conclude with extensions of our techniques to both the node-weighted and the directed versions of the problem.
We propose Hyper-Dimensional Function Encoding (HDFE). Given samples of a continuous object (e.g. a function), HDFE produces an explicit vector representation of the given object, invariant to the sample distribution and density. Sample distribution and density invariance enables HDFE to consistently encode continuous objects regardless of their sampling, and therefore allows neural networks to receive continuous objects as inputs for machine learning tasks, such as classification and regression. Besides, HDFE does not require any training and is proved to map the object into an organized embedding space, which facilitates the training of the downstream tasks. In addition, the encoding is decodable, which enables neural networks to regress continuous objects by regressing their encodings. Therefore, HDFE serves as an interface for processing continuous objects. We apply HDFE to function-to-function mapping, where vanilla HDFE achieves competitive performance as the state-of-the-art algorithm. We apply HDFE to point cloud surface normal estimation, where a simple replacement from PointNet to HDFE leads to immediate 12% and 15% error reductions in two benchmarks. In addition, by integrating HDFE into the PointNet-based SOTA network, we improve the SOTA baseline by 2.5% and 1.7% in the same benchmarks.
2D-based Industrial Anomaly Detection has been widely discussed, however, multimodal industrial anomaly detection based on 3D point clouds and RGB images still has many untouched fields. Existing multimodal industrial anomaly detection methods directly concatenate the multimodal features, which leads to a strong disturbance between features and harms the detection performance. In this paper, we propose Multi-3D-Memory (M3DM), a novel multimodal anomaly detection method with hybrid fusion scheme: firstly, we design an unsupervised feature fusion with patch-wise contrastive learning to encourage the interaction of different modal features; secondly, we use a decision layer fusion with multiple memory banks to avoid loss of information and additional novelty classifiers to make the final decision. We further propose a point feature alignment operation to better align the point cloud and RGB features. Extensive experiments show that our multimodal industrial anomaly detection model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTec-3D AD dataset. Code is available at //github.com/nomewang/M3DM.
Graph Neural Networks (GNNs) have been shown to be effective models for different predictive tasks on graph-structured data. Recent work on their expressive power has focused on isomorphism tasks and countable feature spaces. We extend this theoretical framework to include continuous features - which occur regularly in real-world input domains and within the hidden layers of GNNs - and we demonstrate the requirement for multiple aggregation functions in this context. Accordingly, we propose Principal Neighbourhood Aggregation (PNA), a novel architecture combining multiple aggregators with degree-scalers (which generalize the sum aggregator). Finally, we compare the capacity of different models to capture and exploit the graph structure via a novel benchmark containing multiple tasks taken from classical graph theory, alongside existing benchmarks from real-world domains, all of which demonstrate the strength of our model. With this work, we hope to steer some of the GNN research towards new aggregation methods which we believe are essential in the search for powerful and robust models.