The problem of roadside monocular 3D detection requires detecting objects of interested classes in a 2D RGB frame and predicting their 3D information such as locations in bird's-eye-view (BEV). It has broad applications in traffic control, vehicle-vehicle communication, and vehicle-infrastructure cooperative perception. To approach this problem, we present a novel and simple method by prompting the 3D detector using 2D detections. Our method builds on a key insight that, compared with 3D detectors, a 2D detector is much easier to train and performs significantly better w.r.t detections on the 2D image plane. That said, one can exploit 2D detections of a well-trained 2D detector as prompts to a 3D detector, being trained in a way of inflating such 2D detections to 3D towards 3D detection. To construct better prompts using the 2D detector, we explore three techniques: (a) concatenating both 2D and 3D detectors' features, (b) attentively fusing 2D and 3D detectors' features, and (c) encoding predicted 2D boxes x, y, width, height, label and attentively fusing such with the 3D detector's features. Surprisingly, the third performs the best. Moreover, we present a yaw tuning tactic and a class-grouping strategy that merges classes based on their functionality; these techniques improve 3D detection performance further. Comprehensive ablation studies and extensive experiments demonstrate that our method resoundingly outperforms prior works, achieving the state-of-the-art on two large-scale roadside 3D detection benchmarks.
Given a graph G, a budget k and a misinformation seed set S, Influence Minimization (IMIN) via node blocking aims to find a set of k nodes to be blocked such that the expected spread of S is minimized. This problem finds important applications in suppressing the spread of misinformation and has been extensively studied in the literature. However, existing solutions for IMIN still incur significant computation overhead, especially when k becomes large. In addition, there is still no approximation solution with non-trivial theoretical guarantee for IMIN via node blocking prior to our work. In this paper, we conduct the first attempt to propose algorithms that yield data-dependent approximation guarantees. Based on the Sandwich framework, we first develop submodular and monotonic lower and upper bounds for our non-submodular objective function and prove the computation of proposed bounds is \#P-hard. In addition, two advanced sampling methods are proposed to estimate the value of bounding functions. Moreover, we develop two novel martingale-based concentration bounds to reduce the sample complexity and design two non-trivial algorithms that provide (1-1/e-\epsilon)-approximate solutions to our bounding functions. Comprehensive experiments on 9 real-world datasets are conducted to validate the efficiency and effectiveness of the proposed techniques. Compared with the state-of-the-art methods, our solutions can achieve up to two orders of magnitude speedup and provide theoretical guarantees for the quality of returned results.
We develop Bayesian predictive stacking for geostatistical models, where the primary inferential objective is to provide inference on the latent spatial random field and conduct spatial predictions at arbitrary locations. We exploit analytically tractable posterior distributions for regression coefficients of predictors and the realizations of the spatial process conditional upon process parameters. We subsequently combine such inference by stacking these models across the range of values of the hyper-parameters. We devise stacking of means and posterior densities in a manner that is computationally efficient without resorting to iterative algorithms such as Markov chain Monte Carlo (MCMC) and can exploit the benefits of parallel computations. We offer novel theoretical insights into the resulting inference within an infill asymptotic paradigm and through empirical results showing that stacked inference is comparable to full sampling-based Bayesian inference at a significantly lower computational cost.
Polycube layouts for 3D models effectively support a wide variety of applications such as hexahedral mesh construction, seamless texture mapping, spline fitting, and multi-block grid generation. However, the automated construction of valid polycube layouts suffers from robustness issues: the state-of-the-art deformation-based methods are not guaranteed to find a valid solution. In this paper we present a novel approach which is guaranteed to return a valid polycube layout for 3D models of genus 0. Our algorithm is based on a dual representation of polycubes; we construct polycube layouts by iteratively adding or removing dual loops. The iterative nature of our algorithm facilitates a seamless trade-off between quality and complexity of the solution. Our method is efficient and can be implemented using comparatively simple algorithmic building blocks. We experimentally compare the results of our algorithm against state-of-the-art methods. Our fully automated method always produces provably valid polycube layouts whose quality - assessed via the quality of derived hexahedral meshes - is on par with state-of-the-art deformation methods.
We experimentally evaluated the accuracy with which material properties can be estimated through object compression by two standard parallel jaw grippers and a force/torque sensor mounted at the robot wrist, with a professional biaxial compression device used as reference. Gripper effort versus position curves were obtained and transformed into stress/strain curves. The modulus of elasticity was estimated at different strain points and the effect of multiple compression cycles (precycling), compression speed, and the gripper surface area on estimation was studied. Viscoelasticity was estimated using the energy absorbed in a compression/decompression cycle, the Kelvin-Voigt, and Hunt-Crossley models. We found that: (1) slower compression speeds improved elasticity estimation, while precycling or surface area did not; (2) the robot grippers, even after calibration, were found to have a limited capability of delivering accurate estimates of absolute values of Young's modulus and viscoelasticity; (3) relative ordering of material characteristics was largely consistent across different grippers; (4) despite the nonlinear characteristics of deformable objects, fitting linear stress/strain approximations led to more stable results than local estimates of Young's modulus; (5) the Hunt-Crossley model worked best to estimate viscoelasticity, from a single object compression. A two-dimensional space formed by elasticity and viscoelasticity estimates obtained from a single grasp is advantageous for the discrimination of the object material properties. We demonstrated the applicability of our findings in a mock single stream recycling scenario, where plastic, paper, and metal objects were correctly separated from a single grasp, even when compressed at different locations on the object. The data and code are publicly available.
This work is on a user-friendly reduced basis method for solving a family of parametric PDEs by preconditioned Krylov subspace methods including the conjugate gradient method, generalized minimum residual method, and bi-conjugate gradient method. The proposed methods use a preconditioned Krylov subspace method for a high-fidelity discretization of one parameter instance to generate orthogonal basis vectors of the reduced basis subspace. Then large-scale discrete parameter-dependent problems are approximately solved in the low-dimensional Krylov subspace. As shown in the theory and experiments, only a small number of Krylov subspace iterations are needed to simultaneously generate approximate solutions of a family of high-fidelity and large-scale systems in the reduced basis subspace. This reduces the computational cost dramatically because (1) to construct the reduced basis vectors, we only solve one large-scale problem in the high-fidelity level; and (2) the family of large-scale problems restricted to the reduced basis subspace have much smaller sizes.
2D-based Industrial Anomaly Detection has been widely discussed, however, multimodal industrial anomaly detection based on 3D point clouds and RGB images still has many untouched fields. Existing multimodal industrial anomaly detection methods directly concatenate the multimodal features, which leads to a strong disturbance between features and harms the detection performance. In this paper, we propose Multi-3D-Memory (M3DM), a novel multimodal anomaly detection method with hybrid fusion scheme: firstly, we design an unsupervised feature fusion with patch-wise contrastive learning to encourage the interaction of different modal features; secondly, we use a decision layer fusion with multiple memory banks to avoid loss of information and additional novelty classifiers to make the final decision. We further propose a point feature alignment operation to better align the point cloud and RGB features. Extensive experiments show that our multimodal industrial anomaly detection model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTec-3D AD dataset. Code is available at //github.com/nomewang/M3DM.
This paper presents a new approach for assembling graph neural networks based on framelet transforms. The latter provides a multi-scale representation for graph-structured data. With the framelet system, we can decompose the graph feature into low-pass and high-pass frequencies as extracted features for network training, which then defines a framelet-based graph convolution. The framelet decomposition naturally induces a graph pooling strategy by aggregating the graph feature into low-pass and high-pass spectra, which considers both the feature values and geometry of the graph data and conserves the total information. The graph neural networks with the proposed framelet convolution and pooling achieve state-of-the-art performance in many types of node and graph prediction tasks. Moreover, we propose shrinkage as a new activation for the framelet convolution, which thresholds the high-frequency information at different scales. Compared to ReLU, shrinkage in framelet convolution improves the graph neural network model in terms of denoising and signal compression: noises in both node and structure can be significantly reduced by accurately cutting off the high-pass coefficients from framelet decomposition, and the signal can be compressed to less than half its original size with the prediction performance well preserved.
Knowledge graph (KG) embedding encodes the entities and relations from a KG into low-dimensional vector spaces to support various applications such as KG completion, question answering, and recommender systems. In real world, knowledge graphs (KGs) are dynamic and evolve over time with addition or deletion of triples. However, most existing models focus on embedding static KGs while neglecting dynamics. To adapt to the changes in a KG, these models need to be re-trained on the whole KG with a high time cost. In this paper, to tackle the aforementioned problem, we propose a new context-aware Dynamic Knowledge Graph Embedding (DKGE) method which supports the embedding learning in an online fashion. DKGE introduces two different representations (i.e., knowledge embedding and contextual element embedding) for each entity and each relation, in the joint modeling of entities and relations as well as their contexts, by employing two attentive graph convolutional networks, a gate strategy, and translation operations. This effectively helps limit the impacts of a KG update in certain regions, not in the entire graph, so that DKGE can rapidly acquire the updated KG embedding by a proposed online learning algorithm. Furthermore, DKGE can also learn KG embedding from scratch. Experiments on the tasks of link prediction and question answering in a dynamic environment demonstrate the effectiveness and efficiency of DKGE.
We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.
Providing model-generated explanations in recommender systems is important to user experience. State-of-the-art recommendation algorithms -- especially the collaborative filtering (CF) based approaches with shallow or deep models -- usually work with various unstructured information sources for recommendation, such as textual reviews, visual images, and various implicit or explicit feedbacks. Though structured knowledge bases were considered in content-based approaches, they have been largely ignored recently due to the availability of vast amount of data and the learning power of many complex models. However, structured knowledge bases exhibit unique advantages in personalized recommendation systems. When the explicit knowledge about users and items is considered for recommendation, the system could provide highly customized recommendations based on users' historical behaviors and the knowledge is helpful for providing informed explanations regarding the recommended items. In this work, we propose to reason over knowledge base embeddings for explainable recommendation. Specifically, we propose a knowledge base representation learning framework to embed heterogeneous entities for recommendation, and based on the embedded knowledge base, a soft matching algorithm is proposed to generate personalized explanations for the recommended items. Experimental results on real-world e-commerce datasets verified the superior recommendation performance and the explainability power of our approach compared with state-of-the-art baselines.