亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We study the properties of differentiable neural networks activated by rectified power unit (RePU) functions. We show that the partial derivatives of RePU neural networks can be represented by RePUs mixed-activated networks and derive upper bounds for the complexity of the function class of derivatives of RePUs networks. We establish error bounds for simultaneously approximating $C^s$ smooth functions and their derivatives using RePU-activated deep neural networks. Furthermore, we derive improved approximation error bounds when data has an approximate low-dimensional support, demonstrating the ability of RePU networks to mitigate the curse of dimensionality. To illustrate the usefulness of our results, we consider a deep score matching estimator (DSME) and propose a penalized deep isotonic regression (PDIR) using RePU networks. We establish non-asymptotic excess risk bounds for DSME and PDIR under the assumption that the target functions belong to a class of $C^s$ smooth functions. We also show that PDIR has a robustness property in the sense it is consistent with vanishing penalty parameters even when the monotonicity assumption is not satisfied. Furthermore, if the data distribution is supported on an approximate low-dimensional manifold, we show that DSME and PDIR can mitigate the curse of dimensionality.

相關內容

Networking:IFIP International Conferences on Networking。 Explanation:國際網(wang)絡會議。 Publisher:IFIP。 SIT:

We focus on the task of approximating the optimal value function in deep reinforcement learning. This iterative process is comprised of approximately solving a sequence of optimization problems where the objective function can change per iteration. The common approach to solving the problem is to employ modern variants of the stochastic gradient descent algorithm such as Adam. These optimizers maintain their own internal parameters such as estimates of the first and the second moment of the gradient, and update these parameters over time. Therefore, information obtained in previous iterations is being used to solve the optimization problem in the current iteration. We hypothesize that this can contaminate the internal parameters of the employed optimizer in situations where the optimization landscape of the previous iterations is quite different from the current iteration. To hedge against this effect, a simple idea is to reset the internal parameters of the optimizer when starting a new iteration. We empirically investigate this resetting strategy by employing various optimizers in conjunction with the Rainbow algorithm. We demonstrate that this simple modification unleashes the true potential of modern optimizers, and significantly improves the performance of deep RL on the Atari benchmark.

The Sparse Identification of Nonlinear Dynamics (SINDy) algorithm can be applied to stochastic differential equations to estimate the drift and the diffusion function using data from a realization of the SDE. The SINDy algorithm requires sample data from each of these functions, which is typically estimated numerically from the data of the state. We analyze the performance of the previously proposed estimates for the drift and diffusion function to give bounds on the error for finite data. However, since this algorithm only converges as both the sampling frequency and the length of trajectory go to infinity, obtaining approximations within a certain tolerance may be infeasible. To combat this, we develop estimates with higher orders of accuracy for use in the SINDy framework. For a given sampling frequency, these estimates give more accurate approximations of the drift and diffusion functions, making SINDy a far more feasible system identification method.

The Gaussian graphical model (GGM) incorporates an undirected graph to represent the conditional dependence between variables, with the precision matrix encoding partial correlation between pair of variables given the others. To achieve flexible and accurate estimation and inference of GGM, we propose the novel method FLAG, which utilizes the random effects model for pairwise conditional regression to estimate the precision matrix and applies statistical tests to recover the graph. Compared with existing methods, FLAG has several unique advantages: (i) it provides accurate estimation without sparsity assumptions on the precision matrix, (ii) it allows for element-wise inference of the precision matrix, (iii) it achieves computational efficiency by developing an efficient PX-EM algorithm and a MM algorithm accelerated with low-rank updates, and (iv) it enables joint estimation of multiple graphs using FLAG-Meta or FLAG-CA. The proposed methods are evaluated using various simulation settings and real data applications, including gene expression in the human brain, term association in university websites, and stock prices in the U.S. financial market. The results demonstrate that FLAG and its extensions provide accurate precision estimation and graph recovery.

Models with intractable normalizing functions have numerous applications. Because the normalizing constants are functions of the parameters of interest, standard Markov chain Monte Carlo cannot be used for Bayesian inference for these models. A number of algorithms have been developed for such models. Some have the posterior distribution as their asymptotic distribution. Other ``asymptotically inexact'' algorithms do not possess this property. There is limited guidance for evaluating approximations based on these algorithms. Hence it is very hard to tune them. We propose two new diagnostics that address these problems for intractable normalizing function models. Our first diagnostic, inspired by the second Bartlett identity, is in principle broadly applicable to Monte Carlo approximations beyond the normalizing function problem. We develop an approximate version of this diagnostic that is applicable to intractable normalizing function problems. Our second diagnostic is a Monte Carlo approximation to a kernel Stein discrepancy-based diagnostic introduced by Gorham and Mackey (2017). We provide theoretical justification for our methods and apply them to several algorithms in challenging simulated and real data examples including an Ising model, an exponential random graph model, and a Conway--Maxwell--Poisson regression model, obtaining interesting insights about the algorithms in these contexts.

Singularly perturbed problems present inherent difficulty due to the presence of a thin boundary layer in its solution. To overcome this difficulty, we propose using deep operator networks (DeepONets), a method previously shown to be effective in approximating nonlinear operators between infinite-dimensional Banach spaces. In this paper, we demonstrate for the first time the application of DeepONets to one-dimensional singularly perturbed problems, achieving promising results that suggest their potential as a robust tool for solving this class of problems. We consider the convergence rate of the approximation error incurred by the operator networks in approximating the solution operator, and examine the generalization gap and empirical risk, all of which are shown to converge uniformly with respect to the perturbation parameter. By utilizing Shishkin mesh points as locations of the loss function, we conduct several numerical experiments that provide further support for the effectiveness of operator networks in capturing the singular boundary layer behavior.

The widely used stochastic gradient methods for minimizing nonconvex composite objective functions require the Lipschitz smoothness of the differentiable part. But the requirement does not hold true for problem classes including quadratic inverse problems and training neural networks. To address this issue, we investigate a family of stochastic Bregman proximal gradient (SBPG) methods, which only require smooth adaptivity of the differentiable part. SBPG replaces the upper quadratic approximation used in SGD with the Bregman proximity measure, resulting in a better approximation model that captures the non-Lipschitz gradients of the nonconvex objective. We formulate the vanilla SBPG and establish its convergence properties under nonconvex setting without finite-sum structure. Experimental results on quadratic inverse problems testify the robustness of SBPG. Moreover, we propose a momentum-based version of SBPG (MSBPG) and prove it has improved convergence properties. We apply MSBPG to the training of deep neural networks with a polynomial kernel function, which ensures the smooth adaptivity of the loss function. Experimental results on representative benchmarks demonstrate the effectiveness and robustness of MSBPG in training neural networks. Since the additional computation cost of MSBPG compared with SGD is negligible in large-scale optimization, MSBPG can potentially be employed as an universal open-source optimizer in the future.

The majority of machine learning methods can be regarded as the minimization of an unavailable risk function. To optimize the latter, given samples provided in a streaming fashion, we define a general stochastic Newton algorithm and its weighted average version. In several use cases, both implementations will be shown not to require the inversion of a Hessian estimate at each iteration, but a direct update of the estimate of the inverse Hessian instead will be favored. This generalizes a trick introduced in [2] for the specific case of logistic regression, by directly updating the estimate of the inverse Hessian. Under mild assumptions such as local strong convexity at the optimum, we establish almost sure convergences and rates of convergence of the algorithms, as well as central limit theorems for the constructed parameter estimates. The unified framework considered in this paper covers the case of linear, logistic or softmax regressions to name a few. Numerical experiments on simulated data give the empirical evidence of the pertinence of the proposed methods, which outperform popular competitors particularly in case of bad initializa-tions.

The analysis of multiple time-to-event outcomes in a randomised controlled clinical trial can be accomplished with exisiting methods. However, depending on the characteristics of the disease under investigation and the circumstances in which the study is planned, it may be of interest to conduct interim analyses and adapt the study design if necessary. Due to the expected dependency of the endpoints, the full available information on the involved endpoints may not be used for this purpose. We suggest a solution to this problem by embedding the endpoints in a multi-state model. If this model is Markovian, it is possible to take the disease history of the patients into account and allow for data-dependent design adaptiations. To this end, we introduce a flexible test procedure for a variety of applications, but are particularly concerned with the simultaneous consideration of progression-free survival (PFS) and overall survival (OS). This setting is of key interest in oncological trials. We conduct simulation studies to determine the properties for small sample sizes and demonstrate an application based on data from the NB2004-HR study.

This paper presents an intuitive application of multivariate kernel density estimation (KDE) for data correction. The method utilizes the expected value of the conditional probability density function (PDF) and a credible interval to quantify correction uncertainty. A selective KDE factor is proposed to adjust both kernel size and shape, determined through least-squares cross-validation (LSCV) or mean conditional squared error (MCSE) criteria. The selective bandwidth method can be used in combination with the adaptive method to potentially improve accuracy. Two examples, involving a hypothetical dataset and a realistic dataset, demonstrate the efficacy of the method. The selective bandwidth methods consistently outperform non-selective methods, while the adaptive bandwidth methods improve results for the hypothetical dataset but not for the realistic dataset. The MCSE criterion minimizes root mean square error but may yield under-smoothed distributions, whereas the LSCV criterion strikes a balance between PDF fitness and low RMSE.

Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph. This allows one to apply GCNs not only in scenarios where the given graph is incomplete or corrupted but also in those where a graph is not available. We conduct a series of experiments that analyze the behavior of the proposed method and demonstrate that it outperforms related methods by a significant margin.

北京阿比特科技有限公司