人人操人人莫人人草,很A很色很黄的免费视频,欧美日韩国产视频,成人性午夜视频在线观看欧美日韩

We investigate the nonlinear regression problem under L2 loss (square loss) functions. Traditional nonlinear regression models often result in non-convex optimization problems with respect to the parameter set. We show that a convex nonlinear regression model exists for the traditional least squares problem, which can be a promising towards designing more complex systems with easier to train models.

知識薈萃

精品入(ru)門(men)和(he)進階教程、論文和(he)代碼整理等

查看相(xiang)關VIP內容、論文、資訊等

通用動力公司 · 對數幾率回歸 · 有偏 · 對率損失 · 損失 ·

2023 年 5 月 19 日

Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability

Jingfeng Wu,Vladimir Braverman,Jason D. Lee

Recent research has observed that in machine learning optimization, gradient descent (GD) often operates at the edge of stability (EoS) [Cohen, et al., 2021], where the stepsizes are set to be large, resulting in non-monotonic losses induced by the GD iterates. This paper studies the convergence and implicit bias of constant-stepsize GD for logistic regression on linearly separable data in the EoS regime. Despite the presence of local oscillations, we prove that the logistic loss can be minimized by GD with any constant stepsize over a long time scale. Furthermore, we prove that with any constant stepsize, the GD iterates tend to infinity when projected to a max-margin direction (the hard-margin SVM direction) and converge to a fixed vector that minimizes a strongly convex potential when projected to the orthogonal complement of the max-margin direction. In contrast, we also show that in the EoS regime, GD iterates may diverge catastrophically under the exponential loss, highlighting the superiority of the logistic loss. These theoretical findings are in line with numerical simulations and complement existing theories on the convergence and implicit bias of GD, which are only applicable when the stepsizes are sufficiently small.

優化器 · 機器人 · 不等式約束 · Performer · 極小點 ·

2023 年 5 月 19 日

Time Optimal Ergodic Search

Dayi Dong,Henry Berger,Ian Abraham

from arxiv, 13 pages, 8 figures, Robotics: Science and Systems

Robots with the ability to balance time against the thoroughness of search have the potential to provide time-critical assistance in applications such as search and rescue. Current advances in ergodic coverage-based search methods have enabled robots to completely explore and search an area in a fixed amount of time. However, optimizing time against the quality of autonomous ergodic search has yet to be demonstrated. In this paper, we investigate solutions to the time-optimal ergodic search problem for fast and adaptive robotic search and exploration. We pose the problem as a minimum time problem with an ergodic inequality constraint whose upper bound regulates and balances the granularity of search against time. Solutions to the problem are presented analytically using Pontryagin's conditions of optimality and demonstrated numerically through a direct transcription optimization approach. We show the efficacy of the approach in generating time-optimal ergodic search trajectories in simulation and with drone experiments in a cluttered environment. Obstacle avoidance is shown to be readily integrated into our formulation, and we perform ablation studies that investigate parameter dependence on optimized time and trajectory sensitivity for search.

高斯過程回歸 · Processing（編程語言） · 推斷 · MoDELS · 輸出 ·

2023 年 5 月 19 日

Bayesian approach to Gaussian process regression with uncertain inputs

Dongwei Ye,Mengwu Guo

Conventional Gaussian process regression exclusively assumes the existence of noise in the output data of model observations. In many scientific and engineering applications, however, the input locations of observational data may also be compromised with uncertainties owing to modeling assumptions, measurement errors, etc. In this work, we propose a Bayesian method that integrates the variability of input data into Gaussian process regression. Considering two types of observables -- noise-corrupted outputs with fixed inputs and those with prior-distribution-defined uncertain inputs, a posterior distribution is estimated via a Bayesian framework to infer the uncertain data locations. Thereafter, such quantified uncertainties of inputs are incorporated into Gaussian process predictions by means of marginalization. The effectiveness of this new regression technique is demonstrated through several numerical examples, in which a consistently good performance of generalization is observed, while a substantial reduction in the predictive uncertainties is achieved by the Bayesian inference of uncertain inputs.

重參數化 · 參數空間 · 不變 · 極小值 · 泛化理論 ·

2023 年 5 月 18 日

The Geometry of Neural Nets' Parameter Spaces Under Reparametrization

Agustinus Kristiadi,Felix Dangel,Philipp Hennig

Model reparametrization, which follows the change-of-variable rule of calculus, is a popular way to improve the training of neural nets. But it can also be problematic since it can induce inconsistencies in, e.g., Hessian-based flatness measures, optimization trajectories, and modes of probability densities. This complicates downstream analyses: e.g. one cannot definitively relate flatness with generalization since arbitrary reparametrization changes their relationship. In this work, we study the invariance of neural nets under reparametrization from the perspective of Riemannian geometry. From this point of view, invariance is an inherent property of any neural net if one explicitly represents the metric and uses the correct associated transformation rules. This is important since although the metric is always present, it is often implicitly assumed as identity, and thus dropped from the notation, then lost under reparametrization. We discuss implications for measuring the flatness of minima, optimization, and for probability-density maximization. Finally, we explore some interesting directions where invariance is useful.

估計/估計量 · 統計量 · MoDELS · 特化 · ASSETS ·

2023 年 5 月 18 日

Statistical Estimation for Covariance Structures with Tail Estimates using Nodewise Quantile Predictive Regression Models

Christis Katsouris

This paper considers the specification of covariance structures with tail estimates. We focus on two aspects: (i) the estimation of the VaR-CoVaR risk matrix in the case of larger number of time series observations than assets in a portfolio using quantile predictive regression models without assuming the presence of nonstationary regressors and; (ii) the construction of a novel variable selection algorithm, so-called, Feature Ordering by Centrality Exclusion (FOCE), which is based on an assumption-lean regression framework, has no tuning parameters and is proved to be consistent under general sparsity assumptions. We illustrate the usefulness of our proposed methodology with numerical studies of real and simulated datasets when modelling systemic risk in a network.

估計/估計量 · MoDELS · 泛函 · 分解的 · SimPLe ·

2023 年 5 月 18 日

Evidence Networks: simple losses for fast, amortized, neural Bayesian model comparison

Niall Jeffrey,Benjamin D. Wandelt

from arxiv, 21 pages, 8 figures

Evidence Networks can enable Bayesian model comparison when state-of-the-art methods (e.g. nested sampling) fail and even when likelihoods or priors are intractable or unknown. Bayesian model comparison, i.e. the computation of Bayes factors or evidence ratios, can be cast as an optimization problem. Though the Bayesian interpretation of optimal classification is well-known, here we change perspective and present classes of loss functions that result in fast, amortized neural estimators that directly estimate convenient functions of the Bayes factor. This mitigates numerical inaccuracies associated with estimating individual model probabilities. We introduce the leaky parity-odd power (l-POP) transform, leading to the novel ``l-POP-Exponential'' loss function. We explore neural density estimation for data probability in different models, showing it to be less accurate and scalable than Evidence Networks. Multiple real-world and synthetic examples illustrate that Evidence Networks are explicitly independent of dimensionality of the parameter space and scale mildly with the complexity of the posterior probability density function. This simple yet powerful approach has broad implications for model inference tasks. As an application of Evidence Networks to real-world data we compute the Bayes factor for two models with gravitational lensing data of the Dark Energy Survey. We briefly discuss applications of our methods to other, related problems of model comparison and evaluation in implicit inference settings.

Learning · 平滑 · 控制器 · Analysis · 優化器 ·

2023 年 5 月 18 日

Convergence Analysis of Over-the-Air FL with Compression and Power Control via Clipping

Haifeng Wen,Hong Xing,Osvaldo Simeone

from arxiv, 6 pages, 3 figures, submitted for possible publication

One of the key challenges towards the deployment of over-the-air federated learning (AirFL) is the design of mechanisms that can comply with the power and bandwidth constraints of the shared channel, while causing minimum deterioration to the learning performance as compared to baseline noiseless implementations. For additive white Gaussian noise (AWGN) channels with instantaneous per-device power constraints, prior work has demonstrated the optimality of a power control mechanism based on norm clipping. This was done through the minimization of an upper bound on the optimality gap for smooth learning objectives satisfying the Polyak-{\L}ojasiewicz (PL) condition. In this paper, we make two contributions to the development of AirFL based on norm clipping, which we refer to as AirFL-Clip. First, we provide a convergence bound for AirFLClip that applies to general smooth and non-convex learning objectives. Unlike existing results, the derived bound is free from run-specific parameters, thus supporting an offline evaluation. Second, we extend AirFL-Clip to include Top-k sparsification and linear compression. For this generalized protocol, referred to as AirFL-Clip-Comp, we derive a convergence bound for general smooth and non-convex learning objectives. We argue, and demonstrate via experiments, that the only time-varying quantities present in the bound can be efficiently estimated offline by leveraging the well-studied properties of sparse recovery algorithms.

雅克比 · 核化 · 核嶺回歸 · 嶺回歸 · 高斯核 ·

2023 年 5 月 17 日

Bandwidth Selection for Gaussian Kernel Ridge Regression via Jacobian Control

Oskar Allerbo,Rebecka J?rnsten

Most machine learning methods require tuning of hyper-parameters. For kernel ridge regression with the Gaussian kernel, the hyper-parameter is the bandwidth. The bandwidth specifies the length-scale of the kernel and has to be carefully selected in order to obtain a model with good generalization. The default methods for bandwidth selection is cross-validation and marginal likelihood maximization, which often yields good results, albeit at high computational costs. Furthermore, the estimates provided by these methods tend to have very high variance, especially when training data are scarce. Inspired by Jacobian regularization, we formulate an approximate expression for how the derivatives of the functions inferred by kernel ridge regression with the Gaussian kernel depend on the kernel bandwidth. We then use this expression to propose a closed-form, computationally feather-light, bandwidth selection heuristic based on controlling the Jacobian. In addition, the Jacobian expression illuminates how the bandwidth selection is a trade-off between the smoothness of the inferred function, and the conditioning of the training data kernel matrix. We show on real and synthetic data that compared to cross-validation and marginal likelihood maximization, our method is considerably faster and considerably more stable in terms of bandwidth selection.

Ray · 跡 · Networking · 模型評估 · Learning ·

2023 年 5 月 16 日

PMNet: Robust Pathloss Map Prediction via Supervised Learning

Ju-Hyung Lee,Omer Gokalp Serbetci,Dheeraj Panneer Selvam,Andreas F. Molisch

Pathloss prediction is an essential component of wireless network planning. While ray tracing based methods have been successfully used for many years, they require significant computational effort that may become prohibitive with the increased network densification and/or use of higher frequencies in 5G/B5G (beyond 5G) systems. In this paper, we propose and evaluate a data-driven and model-free pathloss prediction method, dubbed PMNet. This method uses a supervised learning approach: training a neural network (NN) with a limited amount of ray tracing (or channel measurement) data and map data and then predicting the pathloss over location with no ray tracing data with a high level of accuracy. Our proposed pathloss map prediction-oriented NN architecture, which is empowered by state-of-the-art computer vision techniques, outperforms other architectures that have been previously proposed (e.g., UNet, RadioUNet) in terms of accuracy while showing generalization capability. Moreover, PMNet trained on a 4-fold smaller dataset surpasses the other baselines (trained on a 4-fold larger dataset), corroborating the potential of PMNet.

Networking · MoDELS · 卷積神經網絡 · 變換 · DNN ·

2021 年 8 月 30 日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Yucheng Zhao,Guangting Wang,Chuanxin Tang,Chong Luo,Wenjun Zeng,Zheng-Jun Zha

Convolutional neural networks (CNN) are the dominant deep neural network (DNN) architecture for computer vision. Recently, Transformer and multi-layer perceptron (MLP)-based models, such as Vision Transformer and MLP-Mixer, started to lead new trends as they showed promising results in the ImageNet classification task. In this paper, we conduct empirical studies on these DNN structures and try to understand their respective pros and cons. To ensure a fair comparison, we first develop a unified framework called SPACH which adopts separate modules for spatial and channel processing. Our experiments under the SPACH framework reveal that all structures can achieve competitive performance at a moderate scale. However, they demonstrate distinctive behaviors when the network size scales up. Based on our findings, we propose two hybrid models using convolution and Transformer modules. The resulting Hybrid-MS-S+ model achieves 83.9% top-1 accuracy with 63M parameters and 12.3G FLOPS. It is already on par with the SOTA models with sophisticated designs. The code and models will be made publicly available.