亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Deep & Cross Network and its derivative models have become an important paradigm for click-through rate (CTR) prediction due to their effective balance between computational cost and performance. However, these models face four major limitations: (1) the performance of existing explicit feature interaction methods is often weaker than that of implicit deep neural network (DNN), undermining their necessity; (2) many models fail to adaptively filter noise while increasing the order of feature interactions; (3) the fusion methods of most models cannot provide suitable supervision signals for their different sub-networks; (4) while most models claim to capture high-order feature interactions, they often do so implicitly and non-interpretably through DNN, which limits the trustworthiness of the model's predictions. To address the identified limitations, this paper proposes the next generation deep cross network: Deep Cross Network v3 (DCNv3), along with its two sub-networks: Linear Cross Network (LCN) and Exponential Cross Network (ECN) for CTR prediction. DCNv3 ensures interpretability in feature interaction modeling while linearly and exponentially increasing the order of feature interactions to achieve genuine Deep Crossing rather than just Deep & Cross. Additionally, we employ a Self-Mask operation to filter noise and reduce the number of parameters in the Cross Network by half. In the fusion layer, we use a simple yet effective multi-loss trade-off and calculation method, called Tri-BCE, to provide appropriate supervision signals. Comprehensive experiments on six datasets demonstrate the effectiveness, efficiency, and interpretability of DCNv3. The code, running logs, and detailed hyperparameter configurations are available at: //github.com/salmon1802/DCNv3.

相關內容

Networking:IFIP International Conferences on Networking。 Explanation:國際網絡會議。 Publisher:IFIP。 SIT:

After large models (LMs) have gained widespread acceptance in code-related tasks, their superior generative capacity has greatly promoted the application of the code LM. Nevertheless, the security of the generated code has raised attention to its potential damage. Existing secure code generation methods have limited generalizability to unseen test cases and poor robustness against the attacked model, leading to safety failures in code generation. In this paper, we propose a generalizable and robust secure code generation method SecCoder by using in-context learning (ICL) and the safe demonstration. The dense retriever is also used to select the most helpful demonstration to maximize the improvement of the generated code's security. Experimental results show the superior generalizability of the proposed model SecCoder compared to the current secure code generation method, achieving a significant security improvement of an average of 7.20% on unseen test cases. The results also show the better robustness of SecCoder compared to the current attacked code LM, achieving a significant security improvement of an average of 7.74%. Our analysis indicates that SecCoder enhances the security of LMs in generating code, and it is more generalizable and robust.

Diffusion models have achieved remarkable success in Text-to-Image generation tasks, leading to the development of many commercial models. However, recent studies have reported that diffusion models often generate replicated images in train data when triggered by specific prompts, potentially raising social issues ranging from copyright to privacy concerns. To sidestep the memorization, there have been recent studies for developing memorization mitigation methods for diffusion models. Nevertheless, the lack of benchmarks impedes the assessment of the true effectiveness of these methods. In this work, we present MemBench, the first benchmark for evaluating image memorization mitigation methods. Our benchmark includes a large number of memorized image trigger prompts in various Text-to-Image diffusion models. Furthermore, in contrast to the prior work evaluating mitigation performance only on trigger prompts, we present metrics evaluating on both trigger prompts and general prompts, so that we can see whether mitigation methods address the memorization issue while maintaining performance for general prompts. This is an important development considering the practical applications which previous works have overlooked. Through evaluation on MemBench, we verify that the performance of existing image memorization mitigation methods is still insufficient for application to diffusion models. The code and datasets are available at //github.com/chunsanHong/MemBench\_code.

Remote sensing images pose distinct challenges for downstream tasks due to their inherent complexity. While a considerable amount of research has been dedicated to remote sensing classification, object detection and semantic segmentation, most of these studies have overlooked the valuable prior knowledge embedded within remote sensing scenarios. Such prior knowledge can be useful because remote sensing objects may be mistakenly recognized without referencing a sufficiently long-range context, which can vary for different objects. This paper considers these priors and proposes a lightweight Large Selective Kernel Network (LSKNet) backbone. LSKNet can dynamically adjust its large spatial receptive field to better model the ranging context of various objects in remote sensing scenarios. To our knowledge, large and selective kernel mechanisms have not been previously explored in remote sensing images. Without bells and whistles, our lightweight LSKNet sets new state-of-the-art scores on standard remote sensing classification, object detection and semantic segmentation benchmarks. Our comprehensive analysis further validated the significance of the identified priors and the effectiveness of LSKNet. The code is available at //github.com/zcablii/LSKNet.

SLAM is a fundamental capability of unmanned systems, with LiDAR-based SLAM gaining widespread adoption due to its high precision. Current SLAM systems can achieve centimeter-level accuracy within a short period. However, there are still several challenges when dealing with largescale mapping tasks including significant storage requirements and difficulty of reusing the constructed maps. To address this, we first design an elastic and lightweight map representation called CELLmap, composed of several CELLs, each representing the local map at the corresponding location. Then, we design a general backend including CELL-based bidirectional registration module and loop closure detection module to improve global map consistency. Our experiments have demonstrated that CELLmap can represent the precise geometric structure of large-scale maps of KITTI dataset using only about 60 MB. Additionally, our general backend achieves up to a 26.88% improvement over various LiDAR odometry methods.

In the domain of recommendation and collaborative filtering, Graph Contrastive Learning (GCL) has become an influential approach. Nevertheless, the reasons for the effectiveness of contrastive learning are still not well understood. In this paper, we challenge the conventional use of random augmentations on graph structure or embedding space in GCL, which may disrupt the structural and semantic information inherent in Graph Neural Networks. Moreover, fixed-rate data augmentation proves to be less effective compared to augmentation with an adaptive rate. In the initial training phases, significant perturbations are more suitable, while as the training approaches convergence, milder perturbations yield better results. We introduce a twin encoder in place of random augmentations, demonstrating the redundancy of traditional augmentation techniques. The twin encoder updating mechanism ensures the generation of more diverse contrastive views in the early stages, transitioning to views with greater similarity as training progresses. In addition, we investigate the learned representations from the perspective of alignment and uniformity on a hypersphere to optimize more efficiently. Our proposed Twin Graph Contrastive Learning model -- TwinCL -- aligns positive pairs of user and item embeddings and the representations from the twin encoder while maintaining the uniformity of the embeddings on the hypersphere. Our theoretical analysis and experimental results show that the proposed model optimizing alignment and uniformity with the twin encoder contributes to better recommendation accuracy and training efficiency performance. In comprehensive experiments on three public datasets, our proposed TwinCL achieves an average improvement of 5.6% (NDCG@10) in recommendation accuracy with faster training speed, while effectively mitigating popularity bias.

Causal Graph Discovery (CGD) is the process of estimating the underlying probabilistic graphical model that represents joint distribution of features of a dataset. CGD-algorithms are broadly classified into two categories: (i) Constraint-based algorithms (outcome depends on conditional independence (CI) tests), (ii) Score-based algorithms (outcome depends on optimized score-function). Since, sensitive features of observational data is prone to privacy-leakage, Differential Privacy (DP) has been adopted to ensure user privacy in CGD. Adding same amount of noise in this sequential-natured estimation process affects the predictive performance of the algorithms. As initial CI tests in constraint-based algorithms and later iterations of the optimization process of score-based algorithms are crucial, they need to be more accurate, less noisy. Based on this key observation, we present CURATE (CaUsal gRaph AdapTivE privacy), a DP-CGD framework with adaptive privacy budgeting. In contrast to existing DP-CGD algorithms with uniform privacy budgeting across all iterations, CURATE allows adaptive privacy budgeting by minimizing error probability (for constraint-based), maximizing iterations of the optimization problem (for score-based) while keeping the cumulative leakage bounded. To validate our framework, we present a comprehensive set of experiments on several datasets and show that CURATE achieves higher utility compared to existing DP-CGD algorithms with less privacy-leakage.

With the development of video understanding, there is a proliferation of tasks for clip-level temporal video analysis, including temporal action detection (TAD), temporal action segmentation (TAS), and generic event boundary detection (GEBD). While task-specific video understanding models have exhibited outstanding performance in each task, there remains a dearth of a unified framework capable of simultaneously addressing multiple tasks, which is a promising direction for the next generation of AI. To this end, in this paper, we propose a single unified framework, coined as Temporal2Seq, to formulate the output of these temporal video understanding tasks as a sequence of discrete tokens. With this unified token representation, Temporal2Seq can train a generalist model within a single architecture on different video understanding tasks. In the absence of multi-task learning (MTL) benchmarks, we compile a comprehensive co-training dataset by borrowing the datasets from TAD, TAS, and GEBD tasks. We evaluate our Temporal2Seq generalist model on the corresponding test sets of three tasks, demonstrating that Temporal2Seq can produce reasonable results on various tasks and achieve advantages compared with single-task training on this framework. We also investigate the generalization performance of our generalist model on new datasets from different tasks, which yields superior performance to the specific model.

Diffusion models (DMs) have shown great potential for high-quality image synthesis. However, when it comes to producing images with complex scenes, how to properly describe both image global structures and object details remains a challenging task. In this paper, we present Frido, a Feature Pyramid Diffusion model performing a multi-scale coarse-to-fine denoising process for image synthesis. Our model decomposes an input image into scale-dependent vector quantized features, followed by a coarse-to-fine gating for producing image output. During the above multi-scale representation learning stage, additional input conditions like text, scene graph, or image layout can be further exploited. Thus, Frido can be also applied for conditional or cross-modality image synthesis. We conduct extensive experiments over various unconditioned and conditional image generation tasks, ranging from text-to-image synthesis, layout-to-image, scene-graph-to-image, to label-to-image. More specifically, we achieved state-of-the-art FID scores on five benchmarks, namely layout-to-image on COCO and OpenImages, scene-graph-to-image on COCO and Visual Genome, and label-to-image on COCO. Code is available at //github.com/davidhalladay/Frido.

Conventionally, spatiotemporal modeling network and its complexity are the two most concentrated research topics in video action recognition. Existing state-of-the-art methods have achieved excellent accuracy regardless of the complexity meanwhile efficient spatiotemporal modeling solutions are slightly inferior in performance. In this paper, we attempt to acquire both efficiency and effectiveness simultaneously. First of all, besides traditionally treating H x W x T video frames as space-time signal (viewing from the Height-Width spatial plane), we propose to also model video from the other two Height-Time and Width-Time planes, to capture the dynamics of video thoroughly. Secondly, our model is designed based on 2D CNN backbones and model complexity is well kept in mind by design. Specifically, we introduce a novel multi-view fusion (MVF) module to exploit video dynamics using separable convolution for efficiency. It is a plug-and-play module and can be inserted into off-the-shelf 2D CNNs to form a simple yet effective model called MVFNet. Moreover, MVFNet can be thought of as a generalized video modeling framework and it can specialize to be existing methods such as C2D, SlowOnly, and TSM under different settings. Extensive experiments are conducted on popular benchmarks (i.e., Something-Something V1 & V2, Kinetics, UCF-101, and HMDB-51) to show its superiority. The proposed MVFNet can achieve state-of-the-art performance with 2D CNN's complexity.

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining. Empirically, XLNet outperforms BERT on 20 tasks, often by a large margin, and achieves state-of-the-art results on 18 tasks including question answering, natural language inference, sentiment analysis, and document ranking.

北京阿比特科技有限公司