又黄又爽又色的视频免费,欧美丰满大乳屁股流白浆,免费看国产成年无码AV的软件,理伦中文字幕去网站上观察,一级毛片免费AAA202AAAA

Dataset distillation (DD) has emerged as a widely adopted technique for crafting a synthetic dataset that captures the essential information of a training dataset, facilitating the training of accurate neural models. Its applications span various domains, including transfer learning, federated learning, and neural architecture search. The most popular methods for constructing the synthetic data rely on matching the convergence properties of training the model with the synthetic dataset and the training dataset. However, targeting the training dataset must be thought of as auxiliary in the same sense that the training set is an approximate substitute for the population distribution, and the latter is the data of interest. Yet despite its popularity, an aspect that remains unexplored is the relationship of DD to its generalization, particularly across uncommon subgroups. That is, how can we ensure that a model trained on the synthetic dataset performs well when faced with samples from regions with low population density? Here, the representativeness and coverage of the dataset become salient over the guaranteed training error at inference. Drawing inspiration from distributionally robust optimization, we introduce an algorithm that combines clustering with the minimization of a risk measure on the loss to conduct DD. We provide a theoretical rationale for our approach and demonstrate its effective generalization and robustness across subgroups through numerical experiments.

相關內容

數據集

關注 88

數據集，又稱為資料集、數據集合或資料集合，是一種由數據所組成的集合。
Data set（或dataset）是一個數據的集合，通常以表格形式出現。每一列代表一個特定變量。每一行都對應于某一成員的數據集的問題。它列出的價值觀為每一個變量，如身高和體重的一個物體或價值的隨機數。每個數值被稱為數據資料。對應于行數，該數據集的數據可能包括一個或多個成員。

線性的 · Processing（編程語言） · 類別 · 樣例 · 情景 ·

2024 年 3 月 20 日

Secure Query Processing with Linear Complexity

Qiyao Luo,Yilei Wang,Wei Dong,Ke Yi

We present LINQ, the first join protocol with linear complexity (in both running time and communication) under the secure multi-party computation model (MPC). It can also be extended to support all free-connex queries, a large class of select-join-aggregate queries, still with linear complexity. This matches the plaintext result for the query processing problem, as free-connex queries are the largest class of queries known to be solvable in linear time in plaintext. We have then built a query processing system based on LINQ, and the experimental results show that LINQ significantly outperforms the state of the art. For example, it can finish a query on three relations with an output size of 1 million tuples in around 100s in the LAN setting, while existing protocols that support the query cannot finish in an hour. Thus LINQ brings MPC query processing closer to practicality.

多峰值 · 優化器 · 代價函數 · 代價 · 查準率/準確率 ·

2024 年 3 月 20 日

A Unified Optimal Transport Framework for Cross-Modal Retrieval with Noisy Labels

Haochen Han,Minnan Luo,Huan Liu,Fang Nan

from arxiv, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Cross-modal retrieval (CMR) aims to establish interaction between different modalities, among which supervised CMR is emerging due to its flexibility in learning semantic category discrimination. Despite the remarkable performance of previous supervised CMR methods, much of their success can be attributed to the well-annotated data. However, even for unimodal data, precise annotation is expensive and time-consuming, and it becomes more challenging with the multimodal scenario. In practice, massive multimodal data are collected from the Internet with coarse annotation, which inevitably introduces noisy labels. Training with such misleading labels would bring two key challenges -- enforcing the multimodal samples to \emph{align incorrect semantics} and \emph{widen the heterogeneous gap}, resulting in poor retrieval performance. To tackle these challenges, this work proposes UOT-RCL, a Unified framework based on Optimal Transport (OT) for Robust Cross-modal Retrieval. First, we propose a semantic alignment based on partial OT to progressively correct the noisy labels, where a novel cross-modal consistent cost function is designed to blend different modalities and provide precise transport cost. Second, to narrow the discrepancy in multi-modal data, an OT-based relation alignment is proposed to infer the semantic-level cross-modal matching. Both of these two components leverage the inherent correlation among multi-modal data to facilitate effective cost function. The experiments on three widely-used cross-modal retrieval datasets demonstrate that our UOT-RCL surpasses the state-of-the-art approaches and significantly improves the robustness against noisy labels.

欠估計 · 可約的 · Bioinformatics · 數據可用性 · state-of-the-art ·

2024 年 3 月 19 日

Predicting Dynamic Memory Requirements for Scientific Workflow Tasks

Jonathan Bader,Nils Diedrich,Lauritz Thamsen,Odej Kao

from arxiv, Paper accepted in 2023 IEEE International Conference on Big Data

With the increasing amount of data available to scientists in disciplines as diverse as bioinformatics, physics, and remote sensing, scientific workflow systems are becoming increasingly important for composing and executing scalable data analysis pipelines. When writing such workflows, users need to specify the resources to be reserved for tasks so that sufficient resources are allocated on the target cluster infrastructure. Crucially, underestimating a task's memory requirements can result in task failures. Therefore, users often resort to overprovisioning, resulting in significant resource wastage and decreased throughput. In this paper, we propose a novel online method that uses monitoring time series data to predict task memory usage in order to reduce the memory wastage of scientific workflow tasks. Our method predicts a task's runtime, divides it into k equally-sized segments, and learns the peak memory value for each segment depending on the total file input size. We evaluate the prototype implementation of our method using workflows from the publicly available nf-core repository, showing an average memory wastage reduction of 29.48% compared to the best state-of-the-art approach.

分解的 · 散度 · MoDELS · 方陣 · 層 ·

2024 年 3 月 18 日

Deep Nonnegative Matrix Factorization with Beta Divergences

Valentin Leplat,Le Thi Khanh Hien,Akwum Onwunta,Nicolas Gillis

from arxiv, 34 pages. We have improved the presentation of the paper, corrected a few typoes, and added the MU for beta=1/2. Accepted in Neural Computation

Deep Nonnegative Matrix Factorization (deep NMF) has recently emerged as a valuable technique for extracting multiple layers of features across different scales. However, all existing deep NMF models and algorithms have primarily centered their evaluation on the least squares error, which may not be the most appropriate metric for assessing the quality of approximations on diverse datasets. For instance, when dealing with data types such as audio signals and documents, it is widely acknowledged that $\beta$-divergences offer a more suitable alternative. In this paper, we develop new models and algorithms for deep NMF using some $\beta$-divergences, with a focus on the Kullback-Leibler divergence. Subsequently, we apply these techniques to the extraction of facial features, the identification of topics within document collections, and the identification of materials within hyperspectral images.

圖 · 標注 · Learning · MoDELS · 訓練實例 ·

2024 年 3 月 18 日

Graph Partial Label Learning with Potential Cause Discovering

Hang Gao,Jiaguo Yuan,Jiangmeng Li,Chengyu Yao,Fengge Wu,Junsuo Zhao,Changwen Zheng

Graph Neural Networks (GNNs) have gained considerable attention for their potential in addressing challenges posed by complex graph-structured data in diverse domains. However, accurately annotating graph data for training is difficult due to the inherent complexity and interconnectedness of graphs. To tackle this issue, we propose a novel graph representation learning method that enables GNN models to effectively learn discriminative information even in the presence of noisy labels within the context of Partially Labeled Learning (PLL). PLL is a critical weakly supervised learning problem, where each training instance is associated with a set of candidate labels, including both the true label and additional noisy labels. Our approach leverages potential cause extraction to obtain graph data that exhibit a higher likelihood of possessing a causal relationship with the labels. By incorporating auxiliary training based on the extracted graph data, our model can effectively filter out the noise contained in the labels. We support the rationale behind our approach with a series of theoretical analyses. Moreover, we conduct extensive evaluations and ablation studies on multiple datasets, demonstrating the superiority of our proposed method.

MoDELS · PCA · 噪聲 · 生成模型 · 估計/估計量 ·

2024 年 3 月 15 日

Generative Modelling of Stochastic Rotating Shallow Water Noise

Dan Crisan,Oana Lang,Alexander Lobbe

In recent work, the authors have developed a generic methodology for calibrating the noise in fluid dynamics stochastic partial differential equations where the stochasticity was introduced to parametrize subgrid-scale processes. The stochastic parameterization of sub-grid scale processes is required in the estimation of uncertainty in weather and climate predictions, to represent systematic model errors arising from subgrid-scale fluctuations. The previous methodology used a principal component analysis (PCA) technique based on the ansatz that the increments of the stochastic parametrization are normally distributed. In this paper, the PCA technique is replaced by a generative model technique. This enables us to avoid imposing additional constraints on the increments. The methodology is tested on a stochastic rotating shallow water model with the elevation variable of the model used as input data. The numerical simulations show that the noise is indeed non-Gaussian. The generative modelling technology gives good RMSE, CRPS score and forecast rank histogram results.

INFORMS · 圖 · 結構化學習 · Extensibility · 學成 ·

2021 年 12 月 16 日

Graph Structure Learning with Variational Information Bottleneck

Qingyun Sun,Jianxin Li,Hao Peng,Jia Wu,Xingcheng Fu,Cheng Ji,Philip S. Yu

from arxiv, Accepted by AAAI 2022, Preprint version with Appendix

Graph Neural Networks (GNNs) have shown promising results on a broad spectrum of applications. Most empirical studies of GNNs directly take the observed graph as input, assuming the observed structure perfectly depicts the accurate and complete relations between nodes. However, graphs in the real world are inevitably noisy or incomplete, which could even exacerbate the quality of graph representations. In this work, we propose a novel Variational Information Bottleneck guided Graph Structure Learning framework, namely VIB-GSL, in the perspective of information theory. VIB-GSL advances the Information Bottleneck (IB) principle for graph structure learning, providing a more elegant and universal framework for mining underlying task-relevant relations. VIB-GSL learns an informative and compressive graph structure to distill the actionable information for specific downstream tasks. VIB-GSL deduces a variational approximation for irregular graph data to form a tractable IB objective function, which facilitates training stability. Extensive experimental results demonstrate that the superior effectiveness and robustness of VIB-GSL.

樣例 · 黑盒 · Networking · MoDELS · 原點 ·

2018 年 1 月 15 日

Generating Adversarial Examples with Adversarial Networks

Chaowei Xiao,Bo Li,Jun-Yan Zhu,Warren He,Mingyan Liu,Dawn Song

Deep neural networks (DNNs) have been found to be vulnerable to adversarial examples resulting from adding small-magnitude perturbations to inputs. Such adversarial examples can mislead DNNs to produce adversary-selected results. Different attack strategies have been proposed to generate adversarial examples, but how to produce them with high perceptual quality and more efficiently requires more research efforts. In this paper, we propose AdvGAN to generate adversarial examples with generative adversarial networks (GANs), which can learn and approximate the distribution of original instances. For AdvGAN, once the generator is trained, it can generate adversarial perturbations efficiently for any instance, so as to potentially accelerate adversarial training as defenses. We apply AdvGAN in both semi-whitebox and black-box attack settings. In semi-whitebox attacks, there is no need to access the original target model after the generator is trained, in contrast to traditional white-box attacks. In black-box attacks, we dynamically train a distilled model for the black-box model and optimize the generator accordingly. Adversarial examples generated by AdvGAN on different target models have high attack success rate under state-of-the-art defenses compared to other attacks. Our attack has placed the first with 92.76% accuracy on a public MNIST black-box attack challenge.

自動問答 · MoDELS · Networking · Processing（編程語言） · state-of-the-art ·

2018 年 1 月 15 日

An Interpretable Reasoning Network for Multi-Relation Question Answering

Mantong Zhou,Minlie Huang,Xiaoyan Zhu

Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis.

INFORMS · FM · 推薦系統 · CF · 分解的 ·

2018 年 1 月 8 日

Learning with Heterogeneous Side Information Fusion for Recommender Systems

Huan Zhao,Quanming Yao,Yangqiu Song,James Kwok,Dik Lun Lee

from arxiv, 35 pages, 12 figures

Recommender System (RS) is a hot area where artificial intelligence (AI) techniques can be effectively applied to improve performance. Since the well-known Netflix Challenge, collaborative filtering (CF) has become the most popular and effective recommendation method. Despite their success in CF, various AI techniques still have to face the data sparsity and cold start problems. Previous works tried to solve these two problems by utilizing auxiliary information, such as social connections among users and meta-data of items. However, they process different types of information separately, leading to information loss. In this work, we propose to utilize Heterogeneous Information Network (HIN), which is a natural and general representation of different types of data, to enhance CF-based recommending methods. HIN-based recommender systems face two problems: how to represent high-level semantics for recommendation and how to fuse the heterogeneous information to recommend. To address these problems, we propose to applying meta-graph to HIN-based RS and solve the information fusion problem with a "matrix factorization (MF) + factorization machine (FM)" framework. For the "MF" part, we obtain user-item similarity matrices from each meta-graph and adopt low-rank matrix approximation to get latent features for both users and items. For the "FM" part, we propose to apply FM with Group lasso (FMG) on the obtained features to simultaneously predict missing ratings and select useful meta-graphs. Experimental results on two large real-world datasets, i.e., Amazon and Yelp, show that our proposed approach is better than that of the state-of-the-art FM and other HIN-based recommending methods.