91婷婷国产精选国产色-人妻内射AV六九无码一零八零P

This paper proposes a new RWO-Sampling (Random Walk Over-Sampling) based on graphs for imbalanced datasets. In this method, two schemes based on under-sampling and over-sampling methods are introduced to keep the proximity information robust to noises and outliers. After constructing the first graph on minority class, RWO-Sampling will be implemented on selected samples, and the rest will remain unchanged. The second graph is constructed for the majority class, and the samples in a low-density area (outliers) are removed. Finally, in the proposed method, samples of the majority class in a high-density area are selected, and the rest are eliminated. Furthermore, utilizing RWO-sampling, the boundary of minority class is increased though the outliers are not raised. This method is tested, and the number of evaluation measures is compared to previous methods on nine continuous attribute datasets with different over-sampling rates and one data set for the diagnosis of COVID-19 disease. The experimental results indicated the high efficiency and flexibility of the proposed method for the classification of imbalanced data

2022 年 2 月 7 日

An Automated Approach for Privacy Leakage Identification in IoT Apps

Bara' Nazzal,Manar H. Alalfi

This paper presents a fully automated static analysis approach and a tool, Taint-Things, for the identification of tainted flows in SmartThings IoT apps. Taint-Things accurately identifies all tainted flows reported by one of the state-of-the-art tools with at least 4 times improved performance. Our approach reports potential vulnerable tainted flows in a form of a concise security slice, where the relevant parts of the code are given with the lines affecting the sensitive information, which could provide security auditors with an effective and precise tool to pinpoint security issues in SmartThings apps under test. We also present and test ways to add precision to Taint-Things by adding extra sensitivities; we provide different approaches for flow, path and context sensitive analyses through modules that can be added to Taint-Things. We present experiments to evaluate Taint-Things by running it on a SmartThings app dataset as well as testing for precision and recall on a set generated by a mutation framework to see how much coverage is achieved without adding false positives. This shows an improvement in performance both in terms of speed up to 4 folds, as well as improving the precision avoiding false positives by providing a higher level of flow and path sensitivity analysis in comparison with one of state of the art tools.

Performer · 樣本 · MoDELS · 隨機采樣 · CASES ·

2022 年 2 月 4 日

Comparison of the performance and reliability between improved sampling strategies for polynomial chaos expansion

Konstantin Weise,Erik Müller,Lucas Po?ner,Thomas R. Kn?sche

from arxiv, 30 pages, 9 figures, 3 tables

With the ever growing importance of uncertainty and sensitivity analysis of complex model evaluations and the difficulty of their timely realizations comes a need for more efficient numerical operations. Non-intrusive Polynomial Chaos methods are highly efficient and accurate to map input-output relationships to investigate complex models. There is a lot of potential to increase the efficacy of the method regarding the selected sampling scheme. We examined state-of-the-art sampling schemes categorized in space-filling-optimal designs such as Latin Hypercube sampling and L1 optimal sampling and compare their empirical performance against standard random sampling. The analysis was performed in the context of L1 minimization using the least-angle regression algorithm to fit the gPC regression models. The sampling schemes are thoroughly investigated by evaluating the quality of the constructed surrogate models considering distinct test cases representing different problem classes covering low, medium and high dimensional problems. Finally, the samplings schemes are tested on an application example to estimate the sensitivity of the self-impedance of a probe, which is used to measure the impedance of biological tissues at different frequencies. Due to the random nature, we compared the sampling schemes using statistical stability measures and evaluated the success rates to construct a surrogate model with an accuracy of <0.1%. We observed strong differences in the convergence properties of the methods between the analyzed test functions.

數據分析 · 數據集 · ML · INFORMS · Better ·

2022 年 2 月 4 日

A Topological Data Analysis Based Classifier

Rolando Kindelan,José Frías,Mauricio Cerda,Nancy Hitschfeld

from arxiv, The paper is under consideration at Advances in Data Analysis and Classification. arXiv admin note: text overlap with arXiv:2102.03709

Topological Data Analysis (TDA) is an emergent field that aims to discover topological information hidden in a dataset. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes an algorithm that applies TDA directly to multi-class classification problems, without any further ML stage, showing advantages for imbalanced datasets. The proposed algorithm builds a filtered simplicial complex on the dataset. Persistent Homology (PH) is applied to guide the selection of a sub-complex where unlabeled points obtain the label with the majority of votes from labeled neighboring points. We select 8 datasets with different dimensions, degrees of class overlap and imbalanced samples per class. On average, the proposed TDABC method was better than KNN and weighted-KNN. It behaves competitively with Local SVM and Random Forest baseline classifiers in balanced datasets, and it outperforms all baseline methods classifying entangled and minority classes.

節點分類 · 結點 · 圖 · 標注 · 標記傳播 ·

2021 年 10 月 8 日

Topology-Imbalance Learning for Semi-Supervised Node Classification

Deli Chen,Yankai Lin,Guangxiang Zhao,Xuancheng Ren,Peng Li,Jie Zhou,Xu Sun

from arxiv, Accepted By NeurIPS 2021

The class imbalance problem, as an important issue in learning node representations, has drawn increasing attention from the community. Although the imbalance considered by existing studies roots from the unequal quantity of labeled examples in different classes (quantity imbalance), we argue that graph data expose a unique source of imbalance from the asymmetric topological properties of the labeled nodes, i.e., labeled nodes are not equal in terms of their structural role in the graph (topology imbalance). In this work, we first probe the previously unknown topology-imbalance issue, including its characteristics, causes, and threats to semi-supervised node classification learning. We then provide a unified view to jointly analyzing the quantity- and topology- imbalance issues by considering the node influence shift phenomenon with the Label Propagation algorithm. In light of our analysis, we devise an influence conflict detection -- based metric Totoro to measure the degree of graph topology imbalance and propose a model-agnostic method ReNode to address the topology-imbalance issue by re-weighting the influence of labeled nodes adaptively based on their relative positions to class boundaries. Systematic experiments demonstrate the effectiveness and generalizability of our method in relieving topology-imbalance issue and promoting semi-supervised node classification. The further analysis unveils varied sensitivity of different graph neural networks (GNNs) to topology imbalance, which may serve as a new perspective in evaluating GNN architectures.

Extensibility · 學成 · 噪聲分布 · Networking · 表征學習 ·

2021 年 7 月 25 日

Image Manipulation Detection by Multi-View Multi-Scale Supervision

Xinru Chen,Chengbo Dong,Jiaqi Ji,Juan Cao,Xirong Li

from arxiv, Accepted by ICCV 2021

The key challenge of image manipulation detection is how to learn generalizable features that are sensitive to manipulations in novel data, whilst specific to prevent false alarms on authentic images. Current research emphasizes the sensitivity, with the specificity overlooked. In this paper we address both aspects by multi-view feature learning and multi-scale supervision. By exploiting noise distribution and boundary artifact surrounding tampered regions, the former aims to learn semantic-agnostic and thus more generalizable features. The latter allows us to learn from authentic images which are nontrivial to be taken into account by current semantic segmentation network based methods. Our thoughts are realized by a new network which we term MVSS-Net. Extensive experiments on five benchmark sets justify the viability of MVSS-Net for both pixel-level and image-level manipulation detection.

Networking · 圖 · Performer · 網絡嵌入 · Extensibility ·

2021 年 6 月 5 日

ImGAGN:Imbalanced Network Embedding via Generative Adversarial Graph Networks

Liang Qu,Huaisheng Zhu,Ruiqi Zheng,Yuhui Shi,Hongzhi Yin

from arxiv, to be published in KDD'2021

Imbalanced classification on graphs is ubiquitous yet challenging in many real-world applications, such as fraudulent node detection. Recently, graph neural networks (GNNs) have shown promising performance on many network analysis tasks. However, most existing GNNs have almost exclusively focused on the balanced networks, and would get unappealing performance on the imbalanced networks. To bridge this gap, in this paper, we present a generative adversarial graph network model, called ImGAGN to address the imbalanced classification problem on graphs. It introduces a novel generator for graph structure data, named GraphGenerator, which can simulate both the minority class nodes' attribute distribution and network topological structure distribution by generating a set of synthetic minority nodes such that the number of nodes in different classes can be balanced. Then a graph convolutional network (GCN) discriminator is trained to discriminate between real nodes and fake (i.e., generated) nodes, and also between minority nodes and majority nodes on the synthetic balanced network. To validate the effectiveness of the proposed method, extensive experiments are conducted on four real-world imbalanced network datasets. Experimental results demonstrate that the proposed method ImGAGN outperforms state-of-the-art algorithms for semi-supervised imbalanced node classification task.

contrastive · 圖 · 對比學習 · Performer · 學成 ·

2021 年 2 月 15 日

Graph Contrastive Learning with Adaptive Augmentation

Yanqiao Zhu,Yichen Xu,Feng Yu,Qiang Liu,Shu Wu,Liang Wang

from arxiv, Accepted to WWW 2021, authors' version. 12 pages, 3 figures, 5 tables. arXiv admin note: substantial text overlap with arXiv:2006.04131

Recently, contrastive learning (CL) has emerged as a successful method for unsupervised graph representation learning. Most graph CL methods first perform stochastic augmentation on the input graph to obtain two graph views and maximize the agreement of representations in the two views. Despite the prosperous development of graph CL methods, the design of graph augmentation schemes -- a crucial component in CL -- remains rarely explored. We argue that the data augmentation schemes should preserve intrinsic structures and attributes of graphs, which will force the model to learn representations that are insensitive to perturbation on unimportant nodes and edges. However, most existing methods adopt uniform data augmentation schemes, like uniformly dropping edges and uniformly shuffling features, leading to suboptimal performance. In this paper, we propose a novel graph contrastive representation learning method with adaptive augmentation that incorporates various priors for topological and semantic aspects of the graph. Specifically, on the topology level, we design augmentation schemes based on node centrality measures to highlight important connective structures. On the node attribute level, we corrupt node features by adding more noise to unimportant node features, to enforce the model to recognize underlying semantic information. We perform extensive experiments of node classification on a variety of real-world datasets. Experimental results demonstrate that our proposed method consistently outperforms existing state-of-the-art baselines and even surpasses some supervised counterparts, which validates the effectiveness of the proposed contrastive framework with adaptive augmentation.

損失函數（機器學習） · 學習的學習 · 學成 · entity · 泛函 ·

2019 年 9 月 9 日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Jiawei Wu,Wenhan Xiong,William Yang Wang

from arxiv, 11pages, 5 figures, accepted to EMNLP 2019

Many tasks in natural language processing can be viewed as multi-label classification problems. However, most of the existing models are trained with the standard cross-entropy loss function and use a fixed prediction policy (e.g., a threshold of 0.5) for all the labels, which completely ignores the complexity and dependencies among different labels. In this paper, we propose a meta-learning method to capture these complex label dependencies. More specifically, our method utilizes a meta-learner to jointly learn the training policies and prediction policies for different labels. The training policies are then used to train the classifier with the cross-entropy loss function, and the prediction policies are further implemented for prediction. Experimental results on fine-grained entity typing and text classification demonstrate that our proposed method can obtain more accurate multi-label classification results.

聚類假設 · 學成 · 數據集 · INFORMS · Performer ·

2019 年 3 月 24 日

Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets

Maria Perez-Ortiz,Peter Tino,Rafal Mantiuk,Cesar Hervas-Martinez

from arxiv, Published in the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Data augmentation is rapidly gaining attention in machine learning. Synthetic data can be generated by simple transformations or through the data distribution. In the latter case, the main challenge is to estimate the label associated to new synthetic patterns. This paper studies the effect of generating synthetic data by convex combination of patterns and the use of these as unsupervised information in a semi-supervised learning framework with support vector machines, avoiding thus the need to label synthetic examples. We perform experiments on a total of 53 binary classification datasets. Our results show that this type of data over-sampling supports the well-known cluster assumption in semi-supervised learning, showing outstanding results for small high-dimensional datasets and imbalanced learning problems.

網絡嵌入 · Networking · 可約的 · DC · 鏈路預測 ·

2018 年 5 月 7 日

Billion-scale Network Embedding with Iterative Random Projection

Ziwei Zhang,Peng Cui,Haoyang Li,Xiao Wang,Wenwu Zhu

from arxiv, 7 pages, 5 figures

Network embedding has attracted considerable research attention recently. However, the existing methods are incapable of handling billion-scale networks, because they are computationally expensive and, at the same time, difficult to be accelerated by distributed computing schemes. To address these problems, we propose RandNE, a novel and simple billion-scale network embedding method. Specifically, we propose a Gaussian random projection approach to map the network into a low-dimensional embedding space while preserving the high-order proximities between nodes. To reduce the time complexity, we design an iterative projection procedure to avoid the explicit calculation of the high-order proximities. Theoretical analysis shows that our method is extremely efficient, and friendly to distributed computing schemes without any communication cost in the calculation. We demonstrate the efficacy of RandNE over state-of-the-art methods in network reconstruction and link prediction tasks on multiple datasets with different scales, ranging from thousands to billions of nodes and edges.