亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Accurate lesion classification in Wireless Capsule Endoscopy (WCE) images is vital for early diagnosis and treatment of gastrointestinal (GI) cancers. However, this task is confronted with challenges like tiny lesions and background interference. Additionally, WCE images exhibit higher intra-class variance and inter-class similarities, adding complexity. To tackle these challenges, we propose Decoupled Supervised Contrastive Learning for WCE image classification, learning robust representations from zoomed-in WCE images generated by Saliency Augmentor. Specifically, We use uniformly down-sampled WCE images as anchors and WCE images from the same class, especially their zoomed-in images, as positives. This approach empowers the Feature Extractor to capture rich representations from various views of the same image, facilitated by Decoupled Supervised Contrastive Learning. Training a linear Classifier on these representations within 10 epochs yields an impressive 92.01% overall accuracy, surpassing the prior state-of-the-art (SOTA) by 0.72% on a blend of two publicly accessible WCE datasets. Code is available at: //github.com/Qiukunpeng/DSCL.

相關內容

圖像分類(lei),顧名思義,是一個輸(shu)(shu)入(ru)圖像,輸(shu)(shu)出對(dui)該圖像內容分類(lei)的(de)描述的(de)問題。它是計算機視(shi)覺的(de)核心,實際應用廣泛(fan)。

APT (Advanced Persistent Threat) with the characteristics of persistence, stealth, and diversity is one of the greatest threats against cyber-infrastructure. As a countermeasure, existing studies leverage provenance graphs to capture the complex relations between system entities in a host for effective APT detection. In addition to detecting single attack events as most existing work does, understanding the tactics / techniques (e.g., Kill-Chain, ATT&CK) applied to organize and accomplish the APT attack campaign is more important for security operations. Existing studies try to manually design a set of rules to map low-level system events to high-level APT tactics / techniques. However, the rule based methods are coarse-grained and lack generalization ability, thus they can only recognize APT tactics and cannot identify fine-grained APT techniques and mutant APT attacks. In this paper, we propose TREC, the first attempt to recognize APT tactics / techniques from provenance graphs by exploiting deep learning techniques. To address the "needle in a haystack" problem, TREC segments small and compact subgraphs covering individual APT technique instances from a large provenance graph based on a malicious node detection model and a subgraph sampling algorithm. To address the "training sample scarcity" problem, TREC trains the APT tactic / technique recognition model in a few-shot learning manner by adopting a Siamese neural network. We evaluate TREC based on a customized dataset collected and made public by our team. The experiment results show that TREC significantly outperforms state-of-the-art systems in APT tactic recognition and TREC can also effectively identify APT techniques.

The advent of 3D Gaussian Splatting (3DGS) has recently brought about a revolution in the field of neural rendering, facilitating high-quality renderings at real-time speed. However, 3DGS heavily depends on the initialized point cloud produced by Structure-from-Motion (SfM) techniques. When tackling with large-scale scenes that unavoidably contain texture-less surfaces, the SfM techniques always fail to produce enough points in these surfaces and cannot provide good initialization for 3DGS. As a result, 3DGS suffers from difficult optimization and low-quality renderings. In this paper, inspired by classical multi-view stereo (MVS) techniques, we propose GaussianPro, a novel method that applies a progressive propagation strategy to guide the densification of the 3D Gaussians. Compared to the simple split and clone strategies used in 3DGS, our method leverages the priors of the existing reconstructed geometries of the scene and patch matching techniques to produce new Gaussians with accurate positions and orientations. Experiments on both large-scale and small-scale scenes validate the effectiveness of our method, where our method significantly surpasses 3DGS on the Waymo dataset, exhibiting an improvement of 1.15dB in terms of PSNR.

This study investigates how weight decay affects the update behavior of individual neurons in deep neural networks through a combination of applied analysis and experimentation. Weight decay can cause the expected magnitude and angular updates of a neuron's weight vector to converge to a steady state we call rotational equilibrium. These states can be highly homogeneous, effectively balancing the average rotation -- a proxy for the effective learning rate -- across different layers and neurons. Our work analyzes these dynamics across optimizers like Adam, Lion, and SGD with momentum, offering a new simple perspective on training that elucidates the efficacy of widely used but poorly understood methods in deep learning. We demonstrate how balanced rotation plays a key role in the effectiveness of normalization like Weight Standardization, as well as that of AdamW over Adam with L2-regularization. Finally, we show that explicitly controlling the rotation provides the benefits of weight decay while substantially reducing the need for learning rate warmup.

Automatic breast cancer classification in histopathology images is crucial for precise diagnosis and treatment planning. Recently, classification approaches based on the ResNet architecture have gained popularity for significantly improving accuracy by using skip connections to mitigate vanishing gradient problems, thereby integrating low-level and high-level feature information. Nevertheless, the conventional ResNet architecture faces challenges such as data imbalance and limited interpretability, necessitating cross-domain knowledge and collaboration among medical experts. This study effectively addresses these challenges by introducing a novel method for breast cancer classification, the Dual-Activated Lightweight Attention ResNet50 (DALAResNet50). It integrates a pre-trained ResNet50 model with a lightweight attention mechanism, embedding an attention module in the fourth layer of ResNet50 and incorporating two fully connected layers with LeakyReLU and ReLU activation functions to enhance feature learning capabilities. The DALAResNet50 method was tested on breast cancer histopathology images from the BreakHis Database across magnification factors of 40X, 100X, 200X, and 400X, achieving accuracies of 98.5%, 98.7%, 97.9%, and 94.3%, respectively. It was also compared with established deep learning models such as SEResNet50, DenseNet121, VGG16, VGG16Inception, ViT, Swin-Transformer, Dinov2_Vitb14, and ResNet50. The results demonstrate that DALAResNet50 surpasses these models in precision, accuracy, recall, F1 score, and GMean, proving its robustness and applicability across various magnifications and handling imbalanced breast cancer datasets.

Network pruning is an effective measure to alleviate the storage and computational burden of deep neural networks arising from its high overparameterization. Thus raises a fundamental question: How sparse can we prune a deep network without sacrifice on the performance? To address this problem, in this work we'll take a first principles approach, i.e. we directly impose the sparsity constraint on the original loss function and then characterize the necessary and sufficient condition of the sparsity (\textit{which turns out to nearly coincide}) by leveraging the notion of \textit{statistical dimension} in convex geometry. Through this fundamental limit, we're able to identify two key factors that determine the pruning ratio limit, i.e., weight magnitude and network flatness. Generally speaking, the flatter the loss landscape or the smaller the weight magnitude, the smaller pruning ratio. In addition, we provide efficient countermeasures to address the challenges in computing the pruning limit, which involves accurate spectrum estimation of a large-scale and non-positive Hessian matrix. Moreover, through the lens of the pruning ratio threshold, we can provide rigorous interpretations on several heuristics in existing pruning algorithms. Extensive experiments are performed that demonstrate that the our theoretical pruning ratio threshold coincides very well with the experiments. All codes are available at: //github.com/QiaozheZhang/Global-One-shot-Pruning

In recent years, Face Image Quality Assessment (FIQA) has become an indispensable part of the face recognition system to guarantee the stability and reliability of recognition performance in an unconstrained scenario. For this purpose, the FIQA method should consider both the intrinsic property and the recognizability of the face image. Most previous works aim to estimate the sample-wise embedding uncertainty or pair-wise similarity as the quality score, which only considers the information from partial intra-class. However, these methods ignore the valuable information from the inter-class, which is for estimating to the recognizability of face image. In this work, we argue that a high-quality face image should be similar to its intra-class samples and dissimilar to its inter-class samples. Thus, we propose a novel unsupervised FIQA method that incorporates Similarity Distribution Distance for Face Image Quality Assessment (SDD-FIQA). Our method generates quality pseudo-labels by calculating the Wasserstein Distance (WD) between the intra-class similarity distributions and inter-class similarity distributions. With these quality pseudo-labels, we are capable of training a regression network for quality prediction. Extensive experiments on benchmark datasets demonstrate that the proposed SDD-FIQA surpasses the state-of-the-arts by an impressive margin. Meanwhile, our method shows good generalization across different recognition systems.

Normalization is known to help the optimization of deep neural networks. Curiously, different architectures require specialized normalization methods. In this paper, we study what normalization is effective for Graph Neural Networks (GNNs). First, we adapt and evaluate the existing methods from other domains to GNNs. Faster convergence is achieved with InstanceNorm compared to BatchNorm and LayerNorm. We provide an explanation by showing that InstanceNorm serves as a preconditioner for GNNs, but such preconditioning effect is weaker with BatchNorm due to the heavy batch noise in graph datasets. Second, we show that the shift operation in InstanceNorm results in an expressiveness degradation of GNNs for highly regular graphs. We address this issue by proposing GraphNorm with a learnable shift. Empirically, GNNs with GraphNorm converge faster compared to GNNs using other normalization. GraphNorm also improves the generalization of GNNs, achieving better performance on graph classification benchmarks.

With the advent of deep neural networks, learning-based approaches for 3D reconstruction have gained popularity. However, unlike for images, in 3D there is no canonical representation which is both computationally and memory efficient yet allows for representing high-resolution geometry of arbitrary topology. Many of the state-of-the-art learning-based 3D reconstruction approaches can hence only represent very coarse 3D geometry or are limited to a restricted domain. In this paper, we propose occupancy networks, a new representation for learning-based 3D reconstruction methods. Occupancy networks implicitly represent the 3D surface as the continuous decision boundary of a deep neural network classifier. In contrast to existing approaches, our representation encodes a description of the 3D output at infinite resolution without excessive memory footprint. We validate that our representation can efficiently encode 3D structure and can be inferred from various kinds of input. Our experiments demonstrate competitive results, both qualitatively and quantitatively, for the challenging tasks of 3D reconstruction from single images, noisy point clouds and coarse discrete voxel grids. We believe that occupancy networks will become a useful tool in a wide variety of learning-based 3D tasks.

Generative Adversarial Networks (GANs) can produce images of surprising complexity and realism, but are generally modeled to sample from a single latent source ignoring the explicit spatial interaction between multiple entities that could be present in a scene. Capturing such complex interactions between different objects in the world, including their relative scaling, spatial layout, occlusion, or viewpoint transformation is a challenging problem. In this work, we propose to model object composition in a GAN framework as a self-consistent composition-decomposition network. Our model is conditioned on the object images from their marginal distributions to generate a realistic image from their joint distribution by explicitly learning the possible interactions. We evaluate our model through qualitative experiments and user evaluations in both the scenarios when either paired or unpaired examples for the individual object images and the joint scenes are given during training. Our results reveal that the learned model captures potential interactions between the two object domains given as input to output new instances of composed scene at test time in a reasonable fashion.

Convolutional Neural Networks (CNNs) have gained significant traction in the field of machine learning, particularly due to their high accuracy in visual recognition. Recent works have pushed the performance of GPU implementations of CNNs to significantly improve their classification and training times. With these improvements, many frameworks have become available for implementing CNNs on both CPUs and GPUs, with no support for FPGA implementations. In this work we present a modified version of the popular CNN framework Caffe, with FPGA support. This allows for classification using CNN models and specialized FPGA implementations with the flexibility of reprogramming the device when necessary, seamless memory transactions between host and device, simple-to-use test benches, and the ability to create pipelined layer implementations. To validate the framework, we use the Xilinx SDAccel environment to implement an FPGA-based Winograd convolution engine and show that the FPGA layer can be used alongside other layers running on a host processor to run several popular CNNs (AlexNet, GoogleNet, VGG A, Overfeat). The results show that our framework achieves 50 GFLOPS across 3x3 convolutions in the benchmarks. This is achieved within a practical framework, which will aid in future development of FPGA-based CNNs.

北京阿比特科技有限公司