国产欧美日韩综合在线,国产精品免费线观看你懂的,国产第一精品一区二区

The decomposition of sounds into sines, transients, and noise is a long-standing research problem in audio processing. The current solutions for this three-way separation detect either horizontal and vertical structures or anisotropy and orientations in the spectrogram to identify the properties of each spectral bin and classify it as sinusoidal, transient, or noise. This paper proposes an enhanced three-way decomposition method based on fuzzy logic, enabling soft masking while preserving the perfect reconstruction property. The proposed method allows each spectral bin to simultaneously belong to two classes, sine and noise or transient and noise. Results of a subjective listening test against three other techniques are reported, showing that the proposed decomposition yields a better or comparable quality. The main improvement appears in transient separation, which enjoys little or no loss of energy or leakage from the other components and performs well for test signals presenting strong transients. The audio quality of the separation is shown to depend on the complexity of the input signal for all tested methods. The proposed method helps improve the quality of various audio processing applications. A successful implementation over a state-of-the-art time-scale modification method is reported as an example.

相關內容

噪聲

關注 0

近似 · 估計/估計量 · 不變 · INFORMS · 離散化 ·

2022 年 12 月 9 日

A general framework for the rigorous computation of invariant densities and the coarse-fine strategy

Stefano Galatolo,Maurizio Monge,Isaia Nisoli,Federico Poloni

In this paper we present a general, axiomatical framework for the rigorous approximation of invariant densities and other important statistical features of dynamics. We approximate the system trough a finite element reduction, by composing the associated transfer operator with a suitable finite dimensional projection (a discretization scheme) as in the well-known Ulam method. We introduce a general framework based on a list of properties (of the system and of the projection) that need to be verified so that we can take advantage of a so-called ``coarse-fine'' strategy. This strategy is a novel method in which we exploit information coming from a coarser approximation of the system to get useful information on a finer approximation, speeding up the computation. This coarse-fine strategy allows a precise estimation of invariant densities and also allows to estimate rigorously the speed of mixing of the system by the speed of mixing of a coarse approximation of it, which can easily be estimated by the computer. The estimates obtained here are rigourous, i.e., they come with exact error bounds that are guaranteed to hold and take into account both the discretiazation and the approximations induced by finite-precision arithmetic. We apply this framework to several discretization schemes and examples of invariant density computation from previous works, obtaining a remarkable reduction in computation time. We have implemented the numerical methods described here in the Julia programming language, and released our implementation publicly as a Julia package.

Learning · 情景 · 誤差函數 · 深度學習 · Conformer ·

2022 年 12 月 9 日

Localizing the conceptual difference of two scenes using deep learning for house keeping usages

Ali Atghaei,Ehsan Rahnama,Kiavash Azimi

Finding the conceptual difference between the two images in an industrial environment has been especially important for HSE purposes and there is still no reliable and conformable method to find the major differences to alert the related controllers. Due to the abundance and variety of objects in different environments, the use of supervised learning methods in this field is facing a major problem. Due to the sharp and even slight change in lighting conditions in the two scenes, it is not possible to naively subtract the two images in order to find these differences. The goal of this paper is to find and localize the conceptual differences of two frames of one scene but in two different times and classify the differences to addition, reduction and change in the field. In this paper, we demonstrate a comprehensive solution for this application by presenting the deep learning method and using transfer learning and structural modification of the error function, as well as a process for adding and synthesizing data. An appropriate data set was provided and labeled, and the model results were evaluated on this data set and the possibility of using it in real and industrial applications was explained.

異常點 · 異常檢測 · 回火 · state-of-the-art · 得分 ·

2022 年 12 月 8 日

AIDA: Analytic Isolation and Distance-based Anomaly Detection Algorithm

Luis Antonio Souto Arias,Cornelis W. Oosterlee,Pasquale Cirillo

We combine the metrics of distance and isolation to develop the Analytic Isolation and Distance-based Anomaly (AIDA) detection algorithm. AIDA is the first distance-based method that does not rely on the concept of nearest-neighbours, making it a parameter-free model. Differently from the prevailing literature, in which the isolation metric is always computed via simulations, we show that AIDA admits an analytical expression for the outlier score, providing new insights into the isolation metric. Additionally, we present an anomaly explanation method based on AIDA, the Tempered Isolation-based eXplanation (TIX) algorithm, which finds the most relevant outlier features even in data sets with hundreds of dimensions. We test both algorithms on synthetic and empirical data: we show that AIDA is competitive when compared to other state-of-the-art methods, and it is superior in finding outliers hidden in multidimensional feature subspaces. Finally, we illustrate how the TIX algorithm is able to find outliers in multidimensional feature subspaces, and use these explanations to analyze common benchmarks used in anomaly detection.

簇 · 訓練數據 · Shapley value · 論文 · MoDELS ·

2022 年 12 月 8 日

Shapley values for cluster importance: How clusters of the training data affect a prediction

Andreas Brands?ter,Ingrid K. Glad

This paper proposes a novel approach to explain the predictions made by data-driven methods. Since such predictions rely heavily on the data used for training, explanations that convey information about how the training data affects the predictions are useful. The paper proposes a novel approach to quantify how different data-clusters of the training data affect a prediction. The quantification is based on Shapley values, a concept which originates from coalitional game theory, developed to fairly distribute the payout among a set of cooperating players. A player's Shapley value is a measure of that player's contribution. Shapley values are often used to quantify feature importance, ie. how features affect a prediction. This paper extends this to cluster importance, letting clusters of the training data act as players in a game where the predictions are the payouts. The novel methodology proposed in this paper lets us explore and investigate how different clusters of the training data affect the predictions made by any black-box model, allowing new aspects of the reasoning and inner workings of a prediction model to be conveyed to the users. The methodology is fundamentally different from existing explanation methods, providing insight which would not be available otherwise, and should complement existing explanation methods, including explanations based on feature importance.

Attention · ForCES · 門控循環單元 · INFORMS · DeepFakes ·

2022 年 12 月 7 日

Face Forgery Detection Based on Facial Region Displacement Trajectory Series

YuYang Sun,ZhiYong Zhang,Isao Echizen,Huy H. Nguyen,ChangZhen Qiu,Lu Sun

Deep-learning-based technologies such as deepfakes ones have been attracting widespread attention in both society and academia, particularly ones used to synthesize forged face images. These automatic and professional-skill-free face manipulation technologies can be used to replace the face in an original image or video with any target object while maintaining the expression and demeanor. Since human faces are closely related to identity characteristics, maliciously disseminated identity manipulated videos could trigger a crisis of public trust in the media and could even have serious political, social, and legal implications. To effectively detect manipulated videos, we focus on the position offset in the face blending process, resulting from the forced affine transformation of the normalized forged face. We introduce a method for detecting manipulated videos that is based on the trajectory of the facial region displacement. Specifically, we develop a virtual-anchor-based method for extracting the facial trajectory, which can robustly represent displacement information. This information was used to construct a network for exposing multidimensional artifacts in the trajectory sequences of manipulated videos that is based on dual-stream spatial-temporal graph attention and a gated recurrent unit backbone. Testing of our method on various manipulation datasets demonstrated that its accuracy and generalization ability is competitive with that of the leading detection methods.

SOFT · 全局優化 · 簇 · 優化器 · 講稿 ·

2022 年 12 月 7 日

On the Global Solution of Soft k-Means

Feiping Nie,Hong Chen,Rong Wang,Xuelong Li

This paper presents an algorithm to solve the Soft k-Means problem globally. Unlike Fuzzy c-Means, Soft k-Means (SkM) has a matrix factorization-type objective and has been shown to have a close relation with the popular probability decomposition-type clustering methods, e.g., Left Stochastic Clustering (LSC). Though some work has been done for solving the Soft k-Means problem, they usually use an alternating minimization scheme or the projected gradient descent method, which cannot guarantee global optimality since the non-convexity of SkM. In this paper, we present a sufficient condition for a feasible solution of Soft k-Means problem to be globally optimal and show the output of the proposed algorithm satisfies it. Moreover, for the Soft k-Means problem, we provide interesting discussions on stability, solutions non-uniqueness, and connection with LSC. Then, a new model, named Minimal Volume Soft k-Means (MVSkM), is proposed to address the solutions non-uniqueness issue. Finally, experimental results support our theoretical results.

Learning · Integration · Analysis · 成比例 · 回合 ·

2022 年 12 月 7 日

Policy Transfer via Enhanced Action Space

Zheng Zhang,Qingrui Zhang,Bo Zhu,Xiaohan Wang,Tianjiang Hu

from arxiv, 14 pages

Though transfer learning is promising to increase the learning efficiency, the existing methods are still subject to the challenges from long-horizon tasks, especially when expert policies are sub-optimal and partially useful. Hence, a novel algorithm named EASpace (Enhanced Action Space) is proposed in this paper to transfer the knowledge of multiple sub-optimal expert policies. EASpace formulates each expert policy into multiple macro actions with different execution time period, then integrates all macro actions into the primitive action space directly. Through this formulation, the proposed EASpace could learn when to execute which expert policy and how long it lasts. An intra-macro-action learning rule is proposed by adjusting the temporal difference target of macro actions to improve the data efficiency and alleviate the non-stationarity issue in multi-agent settings. Furthermore, an additional reward proportional to the execution time of macro actions is introduced to encourage the environment exploration via macro actions, which is significant to learn a long-horizon task. Theoretical analysis is presented to show the convergence of the proposed algorithm. The efficiency of the proposed algorithm is illustrated by a grid-based game and a multi-agent pursuit problem. The proposed algorithm is also implemented to real physical systems to justify its effectiveness.

蒸餾 · MoDELS · 學成 · Student-Teacher · Vision ·

2020 年 4 月 13 日

Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks

Lin Wang,Kuk-Jin Yoon

from arxiv, 30 pages, paper in submission

Deep neural models in recent years have been successful in almost every field, including extremely complex problem statements. However, these models are huge in size, with millions (and even billions) of parameters, thus demanding more heavy computation power and failing to be deployed on edge devices. Besides, the performance boost is highly dependent on redundant labeled data. To achieve faster speeds and to handle the problems caused by the lack of data, knowledge distillation (KD) has been proposed to transfer information learned from one model to another. KD is often characterized by the so-called `Student-Teacher' (S-T) learning framework and has been broadly applied in model compression and knowledge transfer. This paper is about KD and S-T learning, which are being actively studied in recent years. First, we aim to provide explanations of what KD is and how/why it works. Then, we provide a comprehensive survey on the recent progress of KD methods together with S-T frameworks typically for vision tasks. In general, we consider some fundamental questions that have been driving this research area and thoroughly generalize the research progress and technical details. Additionally, we systematically analyze the research status of KD in vision applications. Finally, we discuss the potentials and open challenges of existing methods and prospect the future directions of KD and S-T learning.

目標檢測 · 數據集 · 學成 · 數據驅動的方法 · 多樣性 ·

2019 年 9 月 22 日

Object Detection in Optical Remote Sensing Images: A Survey and A New Benchmark

Ke Li,Gang Wan,Gong Cheng,Liqiu Meng,Junwei Han

Substantial efforts have been devoted more recently to presenting various methods for object detection in optical remote sensing images. However, the current survey of datasets and deep learning based methods for object detection in optical remote sensing images is not adequate. Moreover, most of the existing datasets have some shortcomings, for example, the numbers of images and object categories are small scale, and the image diversity and variations are insufficient. These limitations greatly affect the development of deep learning based object detection methods. In the paper, we provide a comprehensive review of the recent deep learning based object detection progress in both the computer vision and earth observation communities. Then, we propose a large-scale, publicly available benchmark for object DetectIon in Optical Remote sensing images, which we name as DIOR. The dataset contains 23463 images and 192472 instances, covering 20 object classes. The proposed DIOR dataset 1) is large-scale on the object categories, on the object instance number, and on the total image number; 2) has a large range of object size variations, not only in terms of spatial resolutions, but also in the aspect of inter- and intra-class size variability across objects; 3) holds big variations as the images are obtained with different imaging conditions, weathers, seasons, and image quality; and 4) has high inter-class similarity and intra-class diversity. The proposed benchmark can help the researchers to develop and validate their data-driven methods. Finally, we evaluate several state-of-the-art approaches on our DIOR dataset to establish a baseline for future research.

VGG · 相似度 · Nuance · SimPLe · 評論員 ·

2018 年 1 月 11 日

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Richard Zhang,Phillip Isola,Alexei A. Efros,Eli Shechtman,Oliver Wang

from arxiv, Code and data available at //www.github.com/richzhang/PerceptualSimilarity

While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on the ImageNet classification task has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new Full Reference Image Quality Assessment (FR-IQA) dataset of perceptual human judgments, orders of magnitude larger than previous datasets. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by huge margins. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.