亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this paper, we introduce PCR-CG: a novel 3D point cloud registration module explicitly embedding the color signals into the geometry representation. Different from previous methods that only use geometry representation, our module is specifically designed to effectively correlate color into geometry for the point cloud registration task. Our key contribution is a 2D-3D cross-modality learning algorithm that embeds the deep features learned from color signals to the geometry representation. With our designed 2D-3D projection module, the pixel features in a square region centered at correspondences perceived from images are effectively correlated with point clouds. In this way, the overlapped regions can be inferred not only from point cloud but also from the texture appearances. Adding color is non-trivial. We compare against a variety of baselines designed for adding color to 3D, such as exhaustively adding per-pixel features or RGB values in an implicit manner. We leverage Predator [25] as the baseline method and incorporate our proposed module onto it. To validate the effectiveness of 2D features, we ablate different 2D pre-trained networks and show a positive correlation between the pre-trained weights and the task performance. Our experimental results indicate a significant improvement of 6.5% registration recall over the baseline method on the 3DLoMatch benchmark. We additionally evaluate our approach on SOTA methods and observe consistent improvements, such as an improvement of 2.4% registration recall over GeoTransformer as well as 3.5% over CoFiNet. Our study reveals a significant advantages of correlating explicit deep color features to the point cloud in the registration task.

相關內容

This paper introduces a novel neural audio codec targeting high waveform sampling rates and low bitrates named APCodec, which seamlessly integrates the strengths of parametric codecs and waveform codecs. The APCodec revolutionizes the process of audio encoding and decoding by concurrently handling the amplitude and phase spectra as audio parametric characteristics like parametric codecs. It is composed of an encoder and a decoder with the modified ConvNeXt v2 network as the backbone, connected by a quantizer based on the residual vector quantization (RVQ) mechanism. The encoder compresses the audio amplitude and phase spectra in parallel, amalgamating them into a continuous latent code at a reduced temporal resolution. This code is subsequently quantized by the quantizer. Ultimately, the decoder reconstructs the audio amplitude and phase spectra in parallel, and the decoded waveform is obtained by inverse short-time Fourier transform. To ensure the fidelity of decoded audio like waveform codecs, spectral-level loss, quantization loss, and generative adversarial network (GAN) based loss are collectively employed for training the APCodec. To support low-latency streamable inference, we employ feed-forward layers and causal convolutional layers in APCodec, incorporating a knowledge distillation training strategy to enhance the quality of decoded audio. Experimental results confirm that our proposed APCodec can encode 48 kHz audio at bitrate of just 6 kbps, with no significant degradation in the quality of the decoded audio. At the same bitrate, our proposed APCodec also demonstrates superior decoded audio quality and faster generation speed compared to well-known codecs, such as SoundStream, Encodec, HiFi-Codec and AudioDec.

This paper proposes a deep learning based solution for multi-modal image alignment regarding UAV-taken images. Many recently proposed state-of-the-art alignment techniques rely on using Lucas-Kanade (LK) based solutions for a successful alignment. However, we show that we can achieve state of the art results without using LK-based methods. Our approach carefully utilizes a two-branch based convolutional neural network (CNN) based on feature embedding blocks. We propose two variants of our approach, where in the first variant (ModelA), we directly predict the new coordinates of only the four corners of the image to be aligned; and in the second one (ModelB), we predict the homography matrix directly. Applying alignment on the image corners forces algorithm to match only those four corners as opposed to computing and matching many (key)points, since the latter may cause many outliers, yielding less accurate alignment. We test our proposed approach on four aerial datasets and obtain state of the art results, when compared to the existing recent deep LK-based architectures.

Many computational factors limit broader deployment of large language models. In this paper, we focus on a memory bottleneck imposed by the key-value (KV) cache, a computational shortcut that requires storing previous KV pairs during decoding. While existing KV cache methods approach this problem by pruning or evicting large swaths of relatively less important KV pairs to dramatically reduce the memory footprint of the cache, they can have limited success in tasks that require recollecting a majority of previous tokens. To alleviate this issue, we propose LESS, a simple integration of a (nearly free) constant sized cache with eviction-based cache methods, such that all tokens can be queried at later decoding steps. Its ability to retain information throughout time shows merit on a variety of tasks where we demonstrate LESS can help reduce the performance gap from caching everything, sometimes even matching it, all while being efficient.

In this paper, we present a novel framework for enhancing the performance of Quanvolutional Neural Networks (QuNNs) by introducing trainable quanvolutional layers and addressing the critical challenges associated with them. Traditional quanvolutional layers, although beneficial for feature extraction, have largely been static, offering limited adaptability. Unlike state-of-the-art, our research overcomes this limitation by enabling training within these layers, significantly increasing the flexibility and potential of QuNNs. However, the introduction of multiple trainable quanvolutional layers induces complexities in gradient-based optimization, primarily due to the difficulty in accessing gradients across these layers. To resolve this, we propose a novel architecture, Residual Quanvolutional Neural Networks (ResQuNNs), leveraging the concept of residual learning, which facilitates the flow of gradients by adding skip connections between layers. By inserting residual blocks between quanvolutional layers, we ensure enhanced gradient access throughout the network, leading to improved training performance. Moreover, we provide empirical evidence on the strategic placement of these residual blocks within QuNNs. Through extensive experimentation, we identify an efficient configuration of residual blocks, which enables gradients across all the layers in the network that eventually results in efficient training. Our findings suggest that the precise location of residual blocks plays a crucial role in maximizing the performance gains in QuNNs. Our results mark a substantial step forward in the evolution of quantum deep learning, offering new avenues for both theoretical development and practical quantum computing applications.

Few-shot object detection (FSOD) is a challenging problem aimed at detecting novel concepts from few exemplars. Existing approaches to FSOD all assume abundant base labels to adapt to novel objects. This paper studies the new task of semi-supervised FSOD by considering a realistic scenario in which both base and novel labels are simultaneously scarce. We explore the utility of unlabeled data within our proposed label-efficient detection framework and discover its remarkable ability to boost semi-supervised FSOD by way of region proposals. Motivated by this finding, we introduce SoftER Teacher, a robust detector combining pseudo-labeling with consistency learning on region proposals, to harness unlabeled data for improved FSOD without relying on abundant labels. Rigorous experiments show that SoftER Teacher surpasses the novel performance of a strong supervised detector using only 10% of required base labels, without catastrophic forgetting observed in prior approaches. Our work also sheds light on a potential relationship between semi-supervised and few-shot detection suggesting that a stronger semi-supervised detector leads to a more effective few-shot detector.

This paper introduces a novel Perturbation-Assisted Inference (PAI) framework utilizing synthetic data generated by the Perturbation-Assisted Sample Synthesis (PASS) method. The framework focuses on uncertainty quantification in complex data scenarios, particularly involving unstructured data while utilizing deep learning models. On one hand, PASS employs a generative model to create synthetic data that closely mirrors raw data while preserving its rank properties through data perturbation, thereby enhancing data diversity and bolstering privacy. By incorporating knowledge transfer from large pre-trained generative models, PASS enhances estimation accuracy, yielding refined distributional estimates of various statistics via Monte Carlo experiments. On the other hand, PAI boasts its statistically guaranteed validity. In pivotal inference, it enables precise conclusions even without prior knowledge of the pivotal's distribution. In non-pivotal situations, we enhance the reliability of synthetic data generation by training it with an independent holdout sample. We demonstrate the effectiveness of PAI in advancing uncertainty quantification in complex, data-driven tasks by applying it to diverse areas such as image synthesis, sentiment word analysis, multimodal inference, and the construction of prediction intervals.

This paper introduces the MeshAC package, which generates three-dimensional adaptive meshes tailored for the efficient and robust implementation of multiscale coupling methods. While Delaunay triangulation is commonly used for mesh generation across the entire computational domain, generating meshes for multiscale coupling methods is more challenging due to intrinsic discrete structures such as defects, and the need to match these structures to the continuum domain at the interface. The MeshAC package tackles these challenges by generating meshes that align with fine-level discrete structures. It also incorporates localized modification and reconstruction operations specifically designed for interfaces. These enhancements improve both the implementation efficiency and the quality of the coupled mesh. Furthermore, MeshAC introduces a novel adaptive feature that utilizes gradient-based a posteriori error estimation, which automatically adjusts the atomistic region and continuum mesh, ensuring an optimal balance between accuracy and efficiency. This package can be directly applied to the geometry optimization problems of a/c coupling in static mechanics, with potential extensions to many other scenarios. Its capabilities are demonstrated for complex material defects, including straight edge dislocation in BCC W and double voids in FCC Cu. These results suggest that MeshAC can be a valuable tool for researchers and practitioners in computational mechanics.

Link prediction is a very fundamental task on graphs. Inspired by traditional path-based methods, in this paper we propose a general and flexible representation learning framework based on paths for link prediction. Specifically, we define the representation of a pair of nodes as the generalized sum of all path representations, with each path representation as the generalized product of the edge representations in the path. Motivated by the Bellman-Ford algorithm for solving the shortest path problem, we show that the proposed path formulation can be efficiently solved by the generalized Bellman-Ford algorithm. To further improve the capacity of the path formulation, we propose the Neural Bellman-Ford Network (NBFNet), a general graph neural network framework that solves the path formulation with learned operators in the generalized Bellman-Ford algorithm. The NBFNet parameterizes the generalized Bellman-Ford algorithm with 3 neural components, namely INDICATOR, MESSAGE and AGGREGATE functions, which corresponds to the boundary condition, multiplication operator, and summation operator respectively. The NBFNet is very general, covers many traditional path-based methods, and can be applied to both homogeneous graphs and multi-relational graphs (e.g., knowledge graphs) in both transductive and inductive settings. Experiments on both homogeneous graphs and knowledge graphs show that the proposed NBFNet outperforms existing methods by a large margin in both transductive and inductive settings, achieving new state-of-the-art results.

In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.

In this paper we address issues with image retrieval benchmarking on standard and popular Oxford 5k and Paris 6k datasets. In particular, annotation errors, the size of the dataset, and the level of challenge are addressed: new annotation for both datasets is created with an extra attention to the reliability of the ground truth. Three new protocols of varying difficulty are introduced. The protocols allow fair comparison between different methods, including those using a dataset pre-processing stage. For each dataset, 15 new challenging queries are introduced. Finally, a new set of 1M hard, semi-automatically cleaned distractors is selected. An extensive comparison of the state-of-the-art methods is performed on the new benchmark. Different types of methods are evaluated, ranging from local-feature-based to modern CNN based methods. The best results are achieved by taking the best of the two worlds. Most importantly, image retrieval appears far from being solved.

北京阿比特科技有限公司