亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The asymptotic mean squared test error and sensitivity of the Random Features Regression model (RFR) have been recently studied. We build on this work and identify in closed-form the family of Activation Functions (AFs) that minimize a combination of the test error and sensitivity of the RFR under different notions of functional parsimony. We find scenarios under which the optimal AFs are linear, saturated linear functions, or expressible in terms of Hermite polynomials. Finally, we show how using optimal AFs impacts well-established properties of the RFR model, such as its double descent curve, and the dependency of its optimal regularization parameter on the observation noise level.

相關內容

在人工神經網絡中,給定一個輸入或一組輸入,節點的激活函數定義該節點的輸出。一個標準集成電路可以看作是一個由激活函數組成的數字網絡,根據輸入的不同,激活函數可以是開(1)或關(0)。這類似于神經網絡中的線性感知器的行為。然而,只有非線性激活函數允許這樣的網絡只使用少量的節點來計算重要問題,并且這樣的激活函數被稱為非線性。

Robots can learn to imitate humans by inferring what the human is optimizing for. One common framework for this is Bayesian reward learning, where the robot treats the human's demonstrations and corrections as observations of their underlying reward function. Unfortunately, this inference is doubly-intractable: the robot must reason over all the trajectories the person could have provided and all the rewards the person could have in mind. Prior work uses existing robotic tools to approximate this normalizer. In this paper, we group previous approaches into three fundamental classes and analyze the theoretical pros and cons of their approach. We then leverage recent research from the statistics community to introduce Double MH reward learning, a Monte Carlo method for asymptotically learning the human's reward in continuous spaces. We extend Double MH to conditionally independent settings (where each human correction is viewed as completely separate) and conditionally dependent environments (where the human's current correction may build on previous inputs). Across simulations and user studies, our proposed approach infers the human's reward parameters more accurately than the alternate approximations when learning from either demonstrations or corrections. See videos here: //youtu.be/EkmT3o5K5ko

A framework for the analysis of synchronous grant-free massive multiple access schemes based on the irregular repetition slotted ALOHA (IRSA) protocol and operating over the Gaussian multiple access channel is presented. IRSA-based schemes are considered here as an instance of the class of unsourced slotted random access codes, operating over a frame partitioned in time slots, and are obtained by concatenation of a medium access control layer code over the entire frame and a physical layer code over each slot. In this framework, an asymptotic analysis is carried out in presence of both collisions and slot decoding errors due to channel noise, which allows the derivation of density-evolution equations, asymptotic limits for minimum packet loss probability and average load threshold, and a converse bound for threshold values. This analysis is exploited as a tool for the evaluation of performance limits in terms of minimum signal-to-noise ratio required to achieve a given packet loss probability, and also provides convergence boundary limits that hold for any IRSA scheme with given physical layer coding scheme. The tradeoff between energy efficiency and spectrum efficiency is numerically evaluated comparing some known coding options, including those achieving random coding bounds at slot level. It is shown that IRSA-based schemes have a convergence boundary limit within few dB from the random coding bound when the number of active transmitters is sufficiently large.

While Convolutional Neural Networks (CNNs) have long been investigated and applied, as well as theorized, we aim to provide a slightly different perspective into their nature -- through the perspective of their Hessian maps. The reason is that the loss Hessian captures the pairwise interaction of parameters and therefore forms a natural ground to probe how the architectural aspects of CNN get manifested in its structure and properties. We develop a framework relying on Toeplitz representation of CNNs, and then utilize it to reveal the Hessian structure and, in particular, its rank. We prove tight upper bounds (with linear activations), which closely follow the empirical trend of the Hessian rank and hold in practice in more general settings. Overall, our work generalizes and establishes the key insight that, even in CNNs, the Hessian rank grows as the square root of the number of parameters.

Optimal control problems driven by evolutionary partial differential equations arise in many industrial applications and their numerical solution is known to be a challenging problem. One approach to obtain an optimal feedback control is via the Dynamic Programming principle. Nevertheless, despite many theoretical results, this method has been applied only to very special cases since it suffers from the curse of dimensionality. Our goalis to mitigate this crucial obstruction developing a new version of dynamic programming algorithms based on a tree structure and exploiting the compact representation of the dynamical systems based on tensors notations via a model reduction approach. Here, we want to show how this algorithm can be constructed for general nonlinear control problems and to illustrate its performances on a number of challenging numerical tests. Our numerical results indicate a large decrease in memory requirements, as well as computational time, for the proposed problems. Moreover, we prove the convergence of the algorithm and give some hints on its implementation

Hybrid ensemble, an essential branch of ensembles, has flourished in the regression field, with studies confirming diversity's importance. However, previous ensembles consider diversity in the sub-model training stage, with limited improvement compared to single models. In contrast, this study automatically selects and weights sub-models from a heterogeneous model pool. It solves an optimization problem using an interior-point filtering linear-search algorithm. The objective function innovatively incorporates negative correlation learning as a penalty term, with which a diverse model subset can be selected. The best sub-models from each model class are selected to build the NCL ensemble, which performance is better than the simple average and other state-of-the-art weighting methods. It is also possible to improve the NCL ensemble with a regularization term in the objective function. In practice, it is difficult to conclude the optimal sub-model for a dataset prior due to the model uncertainty. Regardless, our method would achieve comparable accuracy as the potential optimal sub-models. In conclusion, the value of this study lies in its ease of use and effectiveness, allowing the hybrid ensemble to embrace diversity and accuracy.

We characterise the unbiasedness of the score function, viewed as an inference function for a class of finite mixture models. The models studied represent the situation where there is a stratification of the observations in a finite number of groups. We show that, under mild regularity conditions, the score function for estimating the parameters identifying each group's distribution is unbiased. We also show that if one introduces a mixture in the scenario described above so that for some observations, it is only known that they belong to some of the groups with a probability not in $\{ 0, 1 \}$, then the score function becomes biased. We argue then that under further mild regularity, the maximum likelihood estimate is not consistent. The results above are extended to regular models containing arbitrary nuisance parameters, including semiparametric models.

Modern biomedical datasets are increasingly high dimensional and exhibit complex correlation structures. Generalized Linear Mixed Models (GLMMs) have long been employed to account for such dependencies. However, proper specification of the fixed and random effects in GLMMs is increasingly difficult in high dimensions, and computational complexity grows with increasing dimension of the random effects. We present a novel reformulation of the GLMM using a factor model decomposition of the random effects, enabling scalable computation of GLMMs in high dimensions by reducing the latent space from a large number of random effects to a smaller set of latent factors. We also extend our prior work to estimate model parameters using a modified Monte Carlo Expectation Conditional Minimization algorithm, allowing us to perform variable selection on both the fixed and random effects simultaneously. We show through simulation that through this factor model decomposition, our method can fit high dimensional penalized GLMMs faster than comparable methods and more easily scale to larger dimensions not previously seen in existing approaches.

The paper considers the distribution of a general linear combination of central and non-central chi-square random variables by exploring the branch cut regions that appear in the standard Laplace inversion process. Due to the original interest from the directional statistics, the focus of this paper is on the density function of such distributions and not on their cumulative distribution function. In fact, our results confirm that the latter is a special case of the former. Our approach provides new insight by generating alternative characterizations of the probability density function in terms of a finite number of feasible univariate integrals. In particular, the central cases seem to allow an interesting representation in terms of the branch cuts, while general degrees of freedom and non-centrality can be easily adopted using recursive differentiation. Numerical results confirm that the proposed approach works well while more transparency and therefore easier control in the accuracy is ensured.

Recurrent neural networks are a powerful means to cope with time series. We show how autoregressive linear, i.e., linearly activated recurrent neural networks (LRNNs) can approximate any time-dependent function f(t) given by a number of function values. The approximation can effectively be learned by simply solving a linear equation system; no backpropagation or similar methods are needed. Furthermore, and this is probably the main contribution of this article, the size of an LRNN can be reduced significantly in one step after inspecting the spectrum of the network transition matrix, i.e., its eigenvalues, by taking only the most relevant components. Therefore, in contrast to other approaches, we do not only learn network weights but also the network architecture. LRNNs have interesting properties: They end up in ellipse trajectories in the long run and allow the prediction of further values and compact representations of functions. We demonstrate this by several experiments, among them multiple superimposed oscillators (MSO), robotic soccer, and predicting stock prices. LRNNs outperform the previous state-of-the-art for the MSO task with a minimal number of units.

Neural networks have shown tremendous growth in recent years to solve numerous problems. Various types of neural networks have been introduced to deal with different types of problems. However, the main goal of any neural network is to transform the non-linearly separable input data into more linearly separable abstract features using a hierarchy of layers. These layers are combinations of linear and nonlinear functions. The most popular and common non-linearity layers are activation functions (AFs), such as Logistic Sigmoid, Tanh, ReLU, ELU, Swish and Mish. In this paper, a comprehensive overview and survey is presented for AFs in neural networks for deep learning. Different classes of AFs such as Logistic Sigmoid and Tanh based, ReLU based, ELU based, and Learning based are covered. Several characteristics of AFs such as output range, monotonicity, and smoothness are also pointed out. A performance comparison is also performed among 18 state-of-the-art AFs with different networks on different types of data. The insights of AFs are presented to benefit the researchers for doing further research and practitioners to select among different choices. The code used for experimental comparison is released at: \url{//github.com/shivram1987/ActivationFunctions}.

北京阿比特科技有限公司