A级日本乱理伦片免费入口,亚洲无码一区二区久久,在线视频亚洲欧美,日韩字幕久久精品一区二区三区

2023 年 3 月 20 日

How (Implicit) Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part II: the Multi-D Case of Two Layers with Random First Layer

Jakob Heiss,Josef Teichmann,Hanna Wutte

from arxiv, 16 pages + appendix

Randomized neural networks (randomized NNs), where only the terminal layer's weights are optimized constitute a powerful model class to reduce computational time in training the neural network model. At the same time, these models generalize surprisingly well in various regression and classification tasks. In this paper, we give an exact macroscopic characterization (i.e., a characterization in function space) of the generalization behavior of randomized, shallow NNs with ReLU activation (RSNs). We show that RSNs correspond to a generalized additive model (GAM)-typed regression in which infinitely many directions are considered: the infinite generalized additive model (IGAM). The IGAM is formalized as solution to an optimization problem in function space for a specific regularization functional and a fairly general loss. This work is an extension to multivariate NNs of prior work, where we showed how wide RSNs with ReLU activation behave like spline regression under certain conditions and if the input is one-dimensional.

相關內容

泛函

關注 0

泛函 · Performer · 估計/估計量 · 邊緣化 · 序列化 ·

2023 年 5 月 12 日

Nonparametric data segmentation in multivariate time series via joint characteristic functions

Euan T. McGonigle,Haeran Cho

Modern time series data often exhibit complex dependence and structural changes which are not easily characterised by shifts in the mean or model parameters. We propose a nonparametric data segmentation methodology for multivariate time series termed NP-MOJO. By considering joint characteristic functions between the time series and its lagged values, NP-MOJO is able to detect change points in the marginal distribution, but also those in possibly non-linear serial dependence, all without the need to pre-specify the type of changes. We show the theoretical consistency of NP-MOJO in estimating the total number and the locations of the change points, and demonstrate the good performance of NP-MOJO against a variety of change point scenarios. We further demonstrate its usefulness in applications to seismology and economic time series.

優化器 · 相同 · Performer · 估計/估計量 · Facebook AI Research ·

2023 年 5 月 12 日

On the Fair Comparison of Optimization Algorithms in Different Machines

Etor Arza,Josu Ceberio,Ekhi?e Irurozki,Aritz Pérez

An experimental comparison of two or more optimization algorithms requires the same computational resources to be assigned to each algorithm. When a maximum runtime is set as the stopping criterion, all algorithms need to be executed in the same machine if they are to use the same resources. Unfortunately, the implementation code of the algorithms is not always available, which means that running the algorithms to be compared in the same machine is not always possible. And even if they are available, some optimization algorithms might be costly to run, such as training large neural-networks in the cloud. In this paper, we consider the following problem: how do we compare the performance of a new optimization algorithm B with a known algorithm A in the literature if we only have the results (the objective values) and the runtime in each instance of algorithm A? Particularly, we present a methodology that enables a statistical analysis of the performance of algorithms executed in different machines. The proposed methodology has two parts. Firstly, we propose a model that, given the runtime of an algorithm in a machine, estimates the runtime of the same algorithm in another machine. This model can be adjusted so that the probability of estimating a runtime longer than what it should be is arbitrarily low. Secondly, we introduce an adaptation of the one-sided sign test that uses a modified \textit{p}-value and takes into account that probability. Such adaptation avoids increasing the probability of type I error associated with executing algorithms A and B in different machines.

正則化項 · 平滑 · Learning · CASES · 核化 ·

2023 年 5 月 12 日

Random Smoothing Regularization in Kernel Gradient Descent Learning

Liang Ding,Tianyang Hu,Jiahang Jiang,Donghao Li,Wenjia Wang,Yuan Yao

Random smoothing data augmentation is a unique form of regularization that can prevent overfitting by introducing noise to the input data, encouraging the model to learn more generalized features. Despite its success in various applications, there has been a lack of systematic study on the regularization ability of random smoothing. In this paper, we aim to bridge this gap by presenting a framework for random smoothing regularization that can adaptively and effectively learn a wide range of ground truth functions belonging to the classical Sobolev spaces. Specifically, we investigate two underlying function spaces: the Sobolev space of low intrinsic dimension, which includes the Sobolev space in $D$-dimensional Euclidean space or low-dimensional sub-manifolds as special cases, and the mixed smooth Sobolev space with a tensor structure. By using random smoothing regularization as novel convolution-based smoothing kernels, we can attain optimal convergence rates in these cases using a kernel gradient descent algorithm, either with early stopping or weight decay. It is noteworthy that our estimator can adapt to the structural assumptions of the underlying data and avoid the curse of dimensionality. This is achieved through various choices of injected noise distributions such as Gaussian, Laplace, or general polynomial noises, allowing for broad adaptation to the aforementioned structural assumptions of the underlying data. The convergence rate depends only on the effective dimension, which may be significantly smaller than the actual data dimension. We conduct numerical experiments on simulated data to validate our theoretical results.

分解的 · 變換 · MoDELS · AIM · Nuance ·

2023 年 5 月 11 日

Traceability and Reuse Mechanisms, the most important Properties of Model Transformation Languages

Stefan H?ppner,Matthias Tichy

from arxiv, Submitted to EMSE as part of the Registered Reports track from ESEM 2022. arXiv admin note: text overlap with arXiv:2209.06570

Dedicated model transformation languages are claimed to provide many benefits over the use of general purpose languages for developing model transformations. However, the actual advantages associated with the use of MTLs are poorly understood empirically. There is little knowledge and empirical assessment about what advantages and disadvantages hold and where they originate from. In a prior interview study, we elicited expert opinions on what advantages result from what factors and a number of factors that moderate the influence. We aim to quantitatively asses the interview results to confirm or reject the effects posed by different factors. We intend to gain insights into how valuable different factors are so that future studies can draw on these data for designing targeted and relevant studies. We gather data on the factors and quality attributes using an online survey. To analyse the data, we use universal structure modelling based on a structure model. We use significance values and path coefficients produced bz USM for each hypothesised interdependence to confirm or reject correlation and to weigh the strength of influence present. We analyzed 113 responses. The results show that the Tracing and Reuse Mechanisms are most important overall. Though the observed effects were generally 10 times lower than anticipated. Additionally, we found that a more nuanced view of moderation effects is warranted. Their moderating influence differed significantly between the different influences, with the strongest effects being 1000 times higher than the weakest. The empirical assessment of MTLs is a complex topic that cannot be solved by looking at a single stand-alone factor. Our results provide clear indication that evaluation should consider transformations of different sizes and use-cases. Language development should focus on providing transformation specific reuse mechanisms .

APT · 穩健性 · 情景 · Extensibility · Pivotal（公司） ·

2023 年 5 月 10 日

Comparison of Check-All-That-Apply and Adapted-Pivot-Test methods for wine descriptive analyses with a panel of untrained students

Sylvain Nougarede,Alice Diot,Elie Maza,Alain Samson,Valérie Olivier-Salvagnac,Soline Caillé,Olivier Geffroy,Christian Chervin

The Check-All-That-Apply (CATA) method was compared to the Adapted-Pivot-Test (APT) method, a recently published method based on pair comparisons between a coded wine and a reference sample, called pivot, and using a set list of attributes as in CATA. Both methods were compared using identical wines, correspondence analyses and Chi-square test of independence, and very similar questionnaires. The list of attributes used for describing the wines was established in a prior analysis by a subset of the panel. The results showed that CATA was more robust and more descriptive than the APT with 50 to 60 panelists. The p-value of the Chi-square test of independence between wines and descriptors dropped below 0.05 around 50 panelists with the CATA method, when it never dropped below 0.8 with the APT. The discussion highlights differences in settings and logistics which render the CATA more robust and easier to run. One of the objectives was also to propose an easy setup for university and food industry laboratories. Practical applications: Our results describe a practical way of teaching and performing the CATA method with university students and online tools, as well as in extension courses. It should have applications with consumer studies for the characterization of various food products. Additionally, we provide an improved R script for correspondence analyses used in descriptive analyses and a Chi-square test to estimate the number of panelists leading to robust results. Finally, we give a set of data that could be useful for sensory and statistics teaching.

損失函數（機器學習） · 泛函 · 損失 · Taxonomy · Machine Learning ·

2023 年 1 月 13 日

A survey and taxonomy of loss functions in machine learning

Lorenzo Ciampiconi,Adam Elwood,Marco Leonardi,Ashraf Mohamed,Alessandro Rozza

Most state-of-the-art machine learning techniques revolve around the optimisation of loss functions. Defining appropriate loss functions is therefore critical to successfully solving problems in this field. We present a survey of the most commonly used loss functions for a wide range of different applications, divided into classification, regression, ranking, sample generation and energy based modelling. Overall, we introduce 33 different loss functions and we organise them into an intuitive taxonomy. Each loss function is given a theoretical backing and we describe where it is best used. This survey aims to provide a reference of the most essential loss functions for both beginner and advanced machine learning practitioners.

Performer · 對數幾率Sigmoid · Neural Networks · Networking · 激活函數 ·

2021 年 9 月 29 日

A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning

Shiv Ram Dubey,Satish Kumar Singh,Bidyut Baran Chaudhuri

from arxiv, Submitted to Springer

Neural networks have shown tremendous growth in recent years to solve numerous problems. Various types of neural networks have been introduced to deal with different types of problems. However, the main goal of any neural network is to transform the non-linearly separable input data into more linearly separable abstract features using a hierarchy of layers. These layers are combinations of linear and nonlinear functions. The most popular and common non-linearity layers are activation functions (AFs), such as Logistic Sigmoid, Tanh, ReLU, ELU, Swish and Mish. In this paper, a comprehensive overview and survey is presented for AFs in neural networks for deep learning. Different classes of AFs such as Logistic Sigmoid and Tanh based, ReLU based, ELU based, and Learning based are covered. Several characteristics of AFs such as output range, monotonicity, and smoothness are also pointed out. A performance comparison is also performed among 18 state-of-the-art AFs with different networks on different types of data. The insights of AFs are presented to benefit the researchers for doing further research and practitioners to select among different choices. The code used for experimental comparison is released at: \url{//github.com/shivram1987/ActivationFunctions}.

Networking · 殘差網絡 · 縮放 · Weight · 平滑 ·

2021 年 5 月 25 日

Scaling Properties of Deep Residual Networks

Alain-Sam Cohen,Rama Cont,Alain Rossier,Renyuan Xu

from arxiv, Published at ICML 2021

Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.

估計/估計量 · 估計誤差 · MoDELS · 學成 · 無偏 ·

2020 年 12 月 17 日

The Causal Learning of Retail Delinquency

Yiyan Huang,Cheuk Hang Leung,Xing Yan,Qi Wu,Nanbo Peng,Dongdong Wang,Zhixiang Huang

from arxiv, This paper was accepted and will be published in the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.

圖 · Processing（編程語言） · Signal Processing · 傅立葉變換 · Extensibility ·

2019 年 9 月 23 日

Graph Signal Processing -- Part II: Processing and Analyzing Signals on Graphs

Ljubisa Stankovic,Danilo Mandic,Milos Dakovic,Milos Brajovic,Bruno Scalzo,Anthony G. Constantinides

from arxiv, 60 pages, 50 figures,

The focus of Part I of this monograph has been on both the fundamental properties, graph topologies, and spectral representations of graphs. Part II embarks on these concepts to address the algorithmic and practical issues centered round data/signal processing on graphs, that is, the focus is on the analysis and estimation of both deterministic and random data on graphs. The fundamental ideas related to graph signals are introduced through a simple and intuitive, yet illustrative and general enough case study of multisensor temperature field estimation. The concept of systems on graph is defined using graph signal shift operators, which generalize the corresponding principles from traditional learning systems. At the core of the spectral domain representation of graph signals and systems is the Graph Discrete Fourier Transform (GDFT). The spectral domain representations are then used as the basis to introduce graph signal filtering concepts and address their design, including Chebyshev polynomial approximation series. Ideas related to the sampling of graph signals are presented and further linked with compressive sensing. Localized graph signal analysis in the joint vertex-spectral domain is referred to as the vertex-frequency analysis, since it can be considered as an extension of classical time-frequency analysis to the graph domain of a signal. Important topics related to the local graph Fourier transform (LGFT) are covered, together with its various forms including the graph spectral and vertex domain windows and the inversion conditions and relations. A link between the LGFT with spectral varying window and the spectral graph wavelet transform (SGWT) is also established. Realizations of the LGFT and SGWT using polynomial (Chebyshev) approximations of the spectral functions are further considered. Finally, energy versions of the vertex-frequency representations are introduced.