亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Industrial anomaly detection (IAD) is crucial for automating industrial quality inspection. The diversity of the datasets is the foundation for developing comprehensive IAD algorithms. Existing IAD datasets focus on the diversity of data categories, overlooking the diversity of domains within the same data category. In this paper, to bridge this gap, we propose the Aero-engine Blade Anomaly Detection (AeBAD) dataset, consisting of two sub-datasets: the single-blade dataset and the video anomaly detection dataset of blades. Compared to existing datasets, AeBAD has the following two characteristics: 1.) The target samples are not aligned and at different scales. 2.) There is a domain shift between the distribution of normal samples in the test set and the training set, where the domain shifts are mainly caused by the changes in illumination and view. Based on this dataset, we observe that current state-of-the-art (SOTA) IAD methods exhibit limitations when the domain of normal samples in the test set undergoes a shift. To address this issue, we propose a novel method called masked multi-scale reconstruction (MMR), which enhances the model's capacity to deduce causality among patches in normal samples by a masked reconstruction task. MMR achieves superior performance compared to SOTA methods on the AeBAD dataset. Furthermore, MMR achieves competitive performance with SOTA methods to detect the anomalies of different types on the MVTec AD dataset. Code and dataset are available at //github.com/zhangzilongc/MMR.

相關內容

Current gesture recognition systems primarily focus on identifying gestures within a predefined set, leaving a gap in connecting these gestures to interactive GUI elements or system functions (e.g., linking a 'thumb-up' gesture to a 'like' button). We introduce GestureGPT, a novel zero-shot gesture understanding and grounding framework leveraging large language models (LLMs). Gesture descriptions are formulated based on hand landmark coordinates from gesture videos and fed into our dual-agent dialogue system. A gesture agent deciphers these descriptions and queries about the interaction context (e.g., interface, history, gaze data), which a context agent organizes and provides. Following iterative exchanges, the gesture agent discerns user intent, grounding it to an interactive function. We validated the gesture description module using public first-view and third-view gesture datasets and tested the whole system in two real-world settings: video streaming and smart home IoT control. The highest zero-shot Top-5 grounding accuracies are 80.11% for video streaming and 90.78% for smart home tasks, showing potential of the new gesture understanding paradigm.

Prism-based LiDARs are more compact and cheaper than the conventional mechanical multi-line spinning LiDARs, which have become increasingly popular in robotics, recently. However, there are several challenges for these new LiDAR sensors, including small field of view, severe motion distortions, and irregular patterns, which hinder them from being widely used in LiDAR odometry, practically. To tackle these problems, we present an effective continuous-time LiDAR odometry (ECTLO) method for the Risley-prism-based LiDARs with non-repetitive scanning patterns. A single range image covering historical points in LiDAR's small FoV is adopted for efficient map representation. To account for the noisy data from occlusions after map updating, a filter-based point-to-plane Gaussian Mixture Model is used for robust registration. Moreover, a LiDAR-only continuous-time motion model is employed to relieve the inevitable distortions. Extensive experiments have been conducted on various testbeds using the prism-based LiDARs with different scanning patterns, whose promising results demonstrate the efficacy of our proposed approach.

This study addresses the vital role of data analytics in monitoring fertiliser applications in crop cultivation. Inaccurate fertiliser application decisions can lead to costly consequences, hinder food production, and cause environmental harm. We propose a solution to predict nutrient application by determining required fertiliser quantities for an entire season. The proposed solution recommends adjusting fertiliser amounts based on weather conditions and soil characteristics to promote cost-effective and environmentally friendly agriculture. The collected dataset is high-dimensional and heterogeneous. Our research examines large-scale heterogeneous datasets in the context of the decision-making process, encompassing data collection and analysis. We also study the impact of fertiliser applications combined with weather data on crop yield, using the winter wheat crop as a case study. By understanding local contextual and geographic factors, we aspire to stabilise or even reduce the demand for agricultural nutrients while enhancing crop development. The proposed approach is proven to be efficient and scalable, as it is validated using a real-world and large dataset.

Monocular 3D object detection is an inherently ill-posed problem, as it is challenging to predict accurate 3D localization from a single image. Existing monocular 3D detection knowledge distillation methods usually project the LiDAR onto the image plane and train the teacher network accordingly. Transferring LiDAR-based model knowledge to RGB-based models is more complex, so a general distillation strategy is needed. To alleviate cross-modal prob-lem, we propose MonoSKD, a novel Knowledge Distillation framework for Monocular 3D detection based on Spearman correlation coefficient, to learn the relative correlation between cross-modal features. Considering the large gap between these features, strict alignment of features may mislead the training, so we propose a looser Spearman loss. Furthermore, by selecting appropriate distillation locations and removing redundant modules, our scheme saves more GPU resources and trains faster than existing methods. Extensive experiments are performed to verify the effectiveness of our framework on the challenging KITTI 3D object detection benchmark. Our method achieves state-of-the-art performance until submission with no additional inference computational cost. Our codes are available at //github.com/Senwang98/MonoSKD

Text simplification has emerged as an increasingly useful application of AI for bridging the communication gap in specialized fields such as medicine, where the lexicon is often dominated by technical jargon and complex constructs. Despite notable progress, methods in medical simplification sometimes result in the generated text having lower quality and diversity. In this work, we explore ways to further improve the readability of text simplification in the medical domain. We propose (1) a new unlikelihood loss that encourages generation of simpler terms and (2) a reranked beam search decoding method that optimizes for simplicity, which achieve better performance on readability metrics on three datasets. This study's findings offer promising avenues for improving text simplification in the medical field.

Scaling laws have been recently employed to derive compute-optimal model size (number of parameters) for a given compute duration. We advance and refine such methods to infer compute-optimal model shapes, such as width and depth, and successfully implement this in vision transformers. Our shape-optimized vision transformer, SoViT, achieves results competitive with models that exceed twice its size, despite being pre-trained with an equivalent amount of compute. For example, SoViT-400m/14 achieves 90.3% fine-tuning accuracy on ILSRCV2012, surpassing the much larger ViT-g/14 and approaching ViT-G/14 under identical settings, with also less than half the inference cost. We conduct a thorough evaluation across multiple tasks, such as image classification, captioning, VQA and zero-shot transfer, demonstrating the effectiveness of our model across a broad range of domains and identifying limitations. Overall, our findings challenge the prevailing approach of blindly scaling up vision models and pave a path for a more informed scaling.

Label assignment is a crucial process in object detection, which significantly influences the detection performance by determining positive or negative samples during training process. However, existing label assignment strategies barely consider the characteristics of targets in remote sensing images (RSIs) thoroughly, e.g., large variations in scales and aspect ratios, leading to insufficient and imbalanced sampling and introducing more low-quality samples, thereby limiting detection performance. To solve the above problems, an Elliptical Distribution aided Adaptive Rotation Label Assignment (EARL) is proposed to select high-quality positive samples adaptively in anchor-free detectors. Specifically, an adaptive scale sampling (ADS) strategy is presented to select samples adaptively among multi-level feature maps according to the scales of targets, which achieves sufficient sampling with more balanced scale-level sample distribution. In addition, a dynamic elliptical distribution aided sampling (DED) strategy is proposed to make the sample distribution more flexible to fit the shapes and orientations of targets, and filter out low-quality samples. Furthermore, a spatial distance weighting (SDW) module is introduced to integrate the adaptive distance weighting into loss function, which makes the detector more focused on the high-quality samples. Extensive experiments on several popular datasets demonstrate the effectiveness and superiority of our proposed EARL, where without bells and whistles, it can be easily applied to different detectors and achieve state-of-the-art performance. The source code will be available at: //github.com/Justlovesmile/EARL.

Spectral-domain CNNs have been shown to be more efficient than traditional spatial CNNs in terms of reducing computation complexity. However they come with a `kernel explosion' problem that, even after compression (pruning), imposes a high memory burden and off-chip bandwidth requirement for kernel access. This creates a performance gap between the potential acceleration offered by compression and actual FPGA implementation performance, especially for low-latency CNN inference. In this paper, we develop a principled approach to overcoming this performance gap and designing a low-latency, low-bandwidth, spectral sparse CNN accelerator on FPGAs. First, we analyze the bandwidth-storage tradeoff of sparse convolutional layers and locate communication bottlenecks. We then develop a dataflow for flexibly optimizing data reuse in different layers to minimize off-chip communication. Finally, we propose a novel scheduling algorithm to optimally schedule the on-chip memory access of multiple sparse kernels and minimize read conflicts. On a state-of-the-art FPGA platform, our design reduces data transfers by 42\% with DSP utilization up to 90\% and achieves inference latency of 9 ms for VGG16, compared to the baseline state-of-the-art latency of 68 ms.

The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving.

Unsupervised person re-identification (Re-ID) attracts increasing attention due to its potential to resolve the scalability problem of supervised Re-ID models. Most existing unsupervised methods adopt an iterative clustering mechanism, where the network was trained based on pseudo labels generated by unsupervised clustering. However, clustering errors are inevitable. To generate high-quality pseudo-labels and mitigate the impact of clustering errors, we propose a novel clustering relationship modeling framework for unsupervised person Re-ID. Specifically, before clustering, the relation between unlabeled images is explored based on a graph correlation learning (GCL) module and the refined features are then used for clustering to generate high-quality pseudo-labels.Thus, GCL adaptively mines the relationship between samples in a mini-batch to reduce the impact of abnormal clustering when training. To train the network more effectively, we further propose a selective contrastive learning (SCL) method with a selective memory bank update policy. Extensive experiments demonstrate that our method shows much better results than most state-of-the-art unsupervised methods on Market1501, DukeMTMC-reID and MSMT17 datasets. We will release the code for model reproduction.

北京阿比特科技有限公司