亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We investigate transductive zero-shot point cloud semantic segmentation, where the network is trained on seen objects and able to segment unseen objects. The 3D geometric elements are essential cues to imply a novel 3D object type. However, previous methods neglect the fine-grained relationship between the language and the 3D geometric elements. To this end, we propose a novel framework to learn the geometric primitives shared in seen and unseen categories' objects and employ a fine-grained alignment between language and the learned geometric primitives. Therefore, guided by language, the network recognizes the novel objects represented with geometric primitives. Specifically, we formulate a novel point visual representation, the similarity vector of the point's feature to the learnable prototypes, where the prototypes automatically encode geometric primitives via back-propagation. Besides, we propose a novel Unknown-aware InfoNCE Loss to fine-grained align the visual representation with language. Extensive experiments show that our method significantly outperforms other state-of-the-art methods in the harmonic mean-intersection-over-union (hIoU), with the improvement of 17.8\%, 30.4\%, 9.2\% and 7.9\% on S3DIS, ScanNet, SemanticKITTI and nuScenes datasets, respectively. Codes are available (//github.com/runnanchen/Zero-Shot-Point-Cloud-Segmentation)

相關內容

根據激光測量原理得到的點云,包括三維坐標(XYZ)和激光反射強度(Intensity)。 根據攝影測量原理得到的點云,包括三維坐標(XYZ)和顏色信息(RGB)。 結合激光測量和攝影測量原理得到點云,包括三維坐標(XYZ)、激光反射強度(Intensity)和顏色信息(RGB)。 在獲取物體表面每個采樣點的空間坐標后,得到的是一個點的集合,稱之為“點云”(Point Cloud)

Semantic segmentation of point clouds in autonomous driving datasets requires techniques that can process large numbers of points efficiently. Sparse 3D convolutions have become the de-facto tools to construct deep neural networks for this task: they exploit point cloud sparsity to reduce the memory and computational loads and are at the core of today's best methods. In this paper, we propose an alternative method that reaches the level of state-of-the-art methods without requiring sparse convolutions. We actually show that such level of performance is achievable by relying on tools a priori unfit for large scale and high-performing 3D perception. In particular, we propose a novel 3D backbone, WaffleIron, made almost exclusively of MLPs and dense 2D convolutions and present how to train it to reach high performance on SemanticKITTI and nuScenes. We believe that WaffleIron is a compelling alternative to backbones using sparse 3D convolutions, especially in frameworks and on hardware where those convolutions are not readily available.

The expanding model size and computation of deep neural networks (DNNs) have increased the demand for efficient model deployment methods. Quantization-aware training (QAT) is a representative model compression method to leverage redundancy in weights and activations. However, most existing QAT methods require end-to-end training on the entire dataset, which suffers from long training time and high energy costs. Coreset selection, aiming to improve data efficiency utilizing the redundancy of training data, has also been widely used for efficient training. In this work, we propose a new angle through the coreset selection to improve the training efficiency of quantization-aware training. Based on the characteristics of QAT, we propose two metrics: error vector score and disagreement score, to quantify the importance of each sample during training. Guided by these two metrics of importance, we proposed a quantization-aware adaptive coreset selection (ACS) method to select the data for the current training epoch. We evaluate our method on various networks (ResNet-18, MobileNetV2), datasets(CIFAR-100, ImageNet-1K), and under different quantization settings. Compared with previous coreset selection methods, our method significantly improves QAT performance with different dataset fractions. Our method can achieve an accuracy of 68.39% of 4-bit quantized ResNet-18 on the ImageNet-1K dataset with only a 10% subset, which has an absolute gain of 4.24% compared to the baseline.

Classification systems are normally trained by minimizing the cross-entropy between system outputs and reference labels, which makes the Kullback-Leibler divergence a natural choice for measuring how closely the system can follow the data. Precision and recall provide another perspective for measuring the performance of a classification system. Non-binary references can arise from various sources, and it is often beneficial to use the soft labels for training instead of the binarized data. However, the existing definitions for precision and recall require binary reference labels, and binarizing the data can cause erroneous interpretations. We present a novel method to calculate precision, recall and F-score without quantizing the data. The proposed metrics extend the well established metrics as the definitions coincide when used with binary labels. To understand the behavior of the metrics we show simple example cases and an evaluation of different sound event detection models trained on real data with soft labels.

The semantic understanding of indoor 3D point cloud data is crucial for a range of subsequent applications, including indoor service robots, navigation systems, and digital twin engineering. Global features are crucial for achieving high-quality semantic and instance segmentation of indoor point clouds, as they provide essential long-range context information. To this end, we propose JSMNet, which combines a multi-layer network with a global feature self-attention module to jointly segment three-dimensional point cloud semantics and instances. To better express the characteristics of indoor targets, we have designed a multi-resolution feature adaptive fusion module that takes into account the differences in point cloud density caused by varying scanner distances from the target. Additionally, we propose a framework for joint semantic and instance segmentation by integrating semantic and instance features to achieve superior results. We conduct experiments on S3DIS, which is a large three-dimensional indoor point cloud dataset. Our proposed method is compared against other methods, and the results show that it outperforms existing methods in semantic and instance segmentation and provides better results in target local area segmentation. Specifically, our proposed method outperforms PointNet (Qi et al., 2017a) by 16.0% and 26.3% in terms of semantic segmentation mIoU in S3DIS (Area 5) and instance segmentation mPre, respectively. Additionally, it surpasses ASIS (Wang et al., 2019) by 6.0% and 4.6%, respectively, as well as JSPNet (Chen et al., 2022) by a margin of 3.3% for semantic segmentation mIoU and a slight improvement of 0.3% for instance segmentation mPre.

The development of deep neural networks (DNN) has significantly enhanced the performance of speaker verification (SV) systems in recent years. However, a critical issue that persists when applying DNN-based SV systems in practical applications is domain mismatch. To mitigate the performance degradation caused by the mismatch, domain adaptation becomes necessary. This paper introduces an approach to adapt DNN-based SV models by manipulating the learnable model inputs, inspired by the concept of adversarial reprogramming. The pre-trained SV model remains fixed and functions solely in the forward process, resembling a black-box model. A lightweight network is utilized to estimate the gradients for the learnable parameters at the input, which bypasses the gradient backpropagation through the black-box model. The reprogrammed output is processed by a two-layer backend learning module as the final adapted speaker embedding. The number of parameters involved in the gradient calculation is small in our design. With few additional parameters, the proposed method achieves both memory and parameter efficiency. The experiments are conducted in language mismatch scenarios. Using much less computation cost, the proposed method obtains close or superior performance to the fully finetuned models in our experiments, which demonstrates its effectiveness.

In Time-Triggered (TT) or time-sensitive networks, the transmission of a TT frame is required to be scheduled at a precise time instant for industrial distributed real-time control systems. Other (or {\em best-effort} (BE)) frames are forwarded in a BE manner. Under this scheduling strategy, the transmission of a TT frame must wait until its scheduled instant even if it could have been transmitted sooner. On the other hand, BE frames are transmitted whenever possible but may miss deadlines or may even be dropped due to congestion. As a result, TT transmission and BE delivery are incompatible with each other. To remedy this incompatibility, we propose a synergistic switch architecture (SWA) for TT transmission with BE delivery to dynamically improve the end-to-end (e2e) latency of TT frames by opportunistically exploiting BE delivery. Given a TT frame, the SWA generates and transmits a cloned copy with BE delivery. The first frame arriving at the receiver device is delivered with a configured jitter and the other copy ignored. So, the SWA achieves shorter latency and controllable jitter, the best of both worlds. We have implemented SWA using FPGAs in an industry-strength TT switches and used four test scenarios to demonstrate SWA's improvements of e2e latency and controllable jitter over the state-of-the-art TT transmission scheme.

Latent space models (LSMs) are frequently used to model network data by embedding a network's nodes into a low-dimensional latent space; however, choosing the dimension of this space remains a challenge. To this end, we begin by formalizing a class of LSMs we call generalized linear network eigenmodels (GLNEMs) that can model various edge types (binary, ordinal, non-negative continuous) found in scientific applications. This model class subsumes the traditional eigenmodel by embedding it in a generalized linear model with an exponential dispersion family random component and fixes identifiability issues that hindered interpretability. Next, we propose a Bayesian approach to dimension selection for GLNEMs based on an ordered spike-and-slab prior that provides improved dimension estimation and satisfies several appealing theoretical properties. In particular, we show that the model's posterior concentrates on low-dimensional models near the truth. We demonstrate our approach's consistent dimension selection on simulated networks. Lastly, we use GLNEMs to study the effect of covariates on the formation of networks from biology, ecology, and economics and the existence of residual latent structure.

Graph Convolutional Networks (GCNs) have been widely applied in various fields due to their significant power on processing graph-structured data. Typical GCN and its variants work under a homophily assumption (i.e., nodes with same class are prone to connect to each other), while ignoring the heterophily which exists in many real-world networks (i.e., nodes with different classes tend to form edges). Existing methods deal with heterophily by mainly aggregating higher-order neighborhoods or combing the immediate representations, which leads to noise and irrelevant information in the result. But these methods did not change the propagation mechanism which works under homophily assumption (that is a fundamental part of GCNs). This makes it difficult to distinguish the representation of nodes from different classes. To address this problem, in this paper we design a novel propagation mechanism, which can automatically change the propagation and aggregation process according to homophily or heterophily between node pairs. To adaptively learn the propagation process, we introduce two measurements of homophily degree between node pairs, which is learned based on topological and attribute information, respectively. Then we incorporate the learnable homophily degree into the graph convolution framework, which is trained in an end-to-end schema, enabling it to go beyond the assumption of homophily. More importantly, we theoretically prove that our model can constrain the similarity of representations between nodes according to their homophily degree. Experiments on seven real-world datasets demonstrate that this new approach outperforms the state-of-the-art methods under heterophily or low homophily, and gains competitive performance under homophily.

Ensembles over neural network weights trained from different random initialization, known as deep ensembles, achieve state-of-the-art accuracy and calibration. The recently introduced batch ensembles provide a drop-in replacement that is more parameter efficient. In this paper, we design ensembles not only over weights, but over hyperparameters to improve the state of the art in both settings. For best performance independent of budget, we propose hyper-deep ensembles, a simple procedure that involves a random search over different hyperparameters, themselves stratified across multiple random initializations. Its strong performance highlights the benefit of combining models with both weight and hyperparameter diversity. We further propose a parameter efficient version, hyper-batch ensembles, which builds on the layer structure of batch ensembles and self-tuning networks. The computational and memory costs of our method are notably lower than typical ensembles. On image classification tasks, with MLP, LeNet, and Wide ResNet 28-10 architectures, our methodology improves upon both deep and batch ensembles.

A large number of real-world graphs or networks are inherently heterogeneous, involving a diversity of node types and relation types. Heterogeneous graph embedding is to embed rich structural and semantic information of a heterogeneous graph into low-dimensional node representations. Existing models usually define multiple metapaths in a heterogeneous graph to capture the composite relations and guide neighbor selection. However, these models either omit node content features, discard intermediate nodes along the metapath, or only consider one metapath. To address these three limitations, we propose a new model named Metapath Aggregated Graph Neural Network (MAGNN) to boost the final performance. Specifically, MAGNN employs three major components, i.e., the node content transformation to encapsulate input node attributes, the intra-metapath aggregation to incorporate intermediate semantic nodes, and the inter-metapath aggregation to combine messages from multiple metapaths. Extensive experiments on three real-world heterogeneous graph datasets for node classification, node clustering, and link prediction show that MAGNN achieves more accurate prediction results than state-of-the-art baselines.

北京阿比特科技有限公司