好男人在线观看免费2019,伊人久久大香线蕉精品69,日本一区二区三区精品久久,成人亚洲午夜精品A片一区二区

In this paper, we propose a novel uniform generalization bound on the time and inverse temperature for stochastic gradient Langevin dynamics (SGLD) in a non-convex setting. While previous works derive their generalization bounds by uniform stability, we use Rademacher complexity to make our generalization bound independent of the time and inverse temperature. Using Rademacher complexity, we can reduce the problem to derive a generalization bound on the whole space to that on a bounded region and therefore can remove the effect of the time and inverse temperature from our generalization bound. As an application of our generalization bound, an evaluation on the effectiveness of the simulated annealing in a non-convex setting is also described. For the sample size $n$ and time $s$, we derive evaluations with orders $\sqrt{n^{-1} \log (n+1)}$ and $|(\log)^4(s)|^{-1}$, respectively. Here, $(\log)^4$ denotes the $4$ times composition of the logarithmic function.

相關內容

泛化理論

關注 25

SICOMP · 向量化 · 稀疏 · FOCS · 泛函 ·

2022 年 7 月 21 日

On Hardness of Testing Equivalence to Sparse Polynomials Under Shifts

Suryajith Chillara,Coral Grichener,Amir Shpilka

We say that two given polynomials $f, g \in R[X]$, over a ring $R$, are equivalent under shifts if there exists a vector $a \in R^n$ such that $f(X+a) = g(X)$. Grigoriev and Karpinski (FOCS 1990), Lakshman and Saunders (SICOMP, 1995), and Grigoriev and Lakshman (ISSAC 1995) studied the problem of testing polynomial equivalence of a given polynomial to any $t$-sparse polynomial, over the rational numbers, and gave exponential time algorithms. In this paper, we provide hardness results for this problem. Formally, for a ring $R$, let $\mathrm{SparseShift}_R$ be the following decision problem. Given a polynomial $P(X)$, is there a vector $a$ such that $P(X+a)$ contains fewer monomials than $P(X)$. We show that $\mathrm{SparseShift}_R$ is at least as hard as checking if a given system of polynomial equations over $R[x_1,\ldots, x_n]$ has a solution (Hilbert's Nullstellensatz). As a consequence of this reduction, we get the following results. 1. $\mathrm{SparseShift}_\mathbb{Z}$ is undecidable. 2. For any ring $R$ (which is not a field) such that $\mathrm{HN}_R$ is $\mathrm{NP}_R$-complete over the Blum-Shub-Smale model of computation, $\mathrm{SparseShift}_{R}$ is also $\mathrm{NP}_{R}$-complete. In particular, $\mathrm{SparseShift}_{\mathbb{Z}}$ is also $\mathrm{NP}_{\mathbb{Z}}$-complete. We also study the gap version of the $\mathrm{SparseShift}_R$ and show the following. 1. For every function $\beta: \mathbb{N}\to\mathbb{R}_+$ such that $\beta\in o(1)$, $N^\beta$-gap-$\mathrm{SparseShift}_\mathbb{Z}$ is also undecidable (where $N$ is the input length). 2. For $R=\mathbb{F}_p, \mathbb{Q}, \mathbb{R}$ or $\mathbb{Z}_q$ and for every $\beta>1$ the $\beta$-gap-$\mathrm{SparseShift}_R$ problem is $\mathrm{NP}$-hard.

可約的 · 區塊鏈 · Performer · EASE · Boosting（一種模型訓練加速方式） ·

2022 年 7 月 21 日

Denial-of-Service Vulnerability of Hash-based Transaction Sharding: Attack and Countermeasure

Truc Nguyen,My T. Thai

from arxiv, To be published in IEEE Transactions on Computers

Since 2016, sharding has become an auspicious solution to tackle the scalability issue in legacy blockchain systems. Despite its potential to strongly boost the blockchain throughput, sharding comes with its own security issues. To ease the process of deciding which shard to place transactions, existing sharding protocols use a hash-based transaction sharding in which the hash value of a transaction determines its output shard. Unfortunately, we show that this mechanism opens up a loophole that could be exploited to conduct a single-shard flooding attack, a type of Denial-of-Service (DoS) attack, to overwhelm a single shard that ends up reducing the performance of the system as a whole. To counter the single-shard flooding attack, we propose a countermeasure that essentially eliminates the loophole by rejecting the use of hash-based transaction sharding. The countermeasure leverages the Trusted Execution Environment (TEE) to let blockchain's validators securely execute a transaction sharding algorithm with a negligible overhead. We provide a formal specification for the countermeasure and analyze its security properties in the Universal Composability (UC) framework. Finally, a proof-of-concept is developed to demonstrate the feasibility and practicality of our solution.

Boosting（一種模型訓練加速方式） · 正交 · 設計矩陣 · 貪心逐層預訓練 · Projection ·

2022 年 7 月 21 日

High-Dimensional $L_2$Boosting: Rate of Convergence

Ye Luo,Martin Spindler,Jannis Kück

from arxiv, 19 pages, 4 tables; AMS 2000 subject classifications: Primary 62J05, 62J07, 41A25; secondary 49M15, 68Q32

Boosting is one of the most significant developments in machine learning. This paper studies the rate of convergence of $L_2$Boosting, which is tailored for regression, in a high-dimensional setting. Moreover, we introduce so-called \textquotedblleft post-Boosting\textquotedblright. This is a post-selection estimator which applies ordinary least squares to the variables selected in the first stage by $L_2$Boosting. Another variant is \textquotedblleft Orthogonal Boosting\textquotedblright\ where after each step an orthogonal projection is conducted. We show that both post-$L_2$Boosting and the orthogonal boosting achieve the same rate of convergence as LASSO in a sparse, high-dimensional setting. We show that the rate of convergence of the classical $L_2$Boosting depends on the design matrix described by a sparse eigenvalue constant. To show the latter results, we derive new approximation results for the pure greedy algorithm, based on analyzing the revisiting behavior of $L_2$Boosting. We also introduce feasible rules for early stopping, which can be easily implemented and used in applied work. Our results also allow a direct comparison between LASSO and boosting which has been missing from the literature. Finally, we present simulation studies and applications to illustrate the relevance of our theoretical results and to provide insights into the practical aspects of boosting. In these simulation studies, post-$L_2$Boosting clearly outperforms LASSO.

泛函 · 估計/估計量 · Analysis · 近似 · 似然 ·

2022 年 7 月 21 日

Asymptotic analysis of ML-covariance parameter estimators based on covariance approximations

Reinhard Furrer,Michael Hediger

from arxiv, 39 pages, 1 Figure

Given a zero-mean Gaussian random field with a covariance function that belongs to a parametric family of covariance functions, we introduce a new notion of likelihood approximations, termed truncated-likelihood functions. Truncated-likelihood functions are based on direct functional approximations of the presumed family of covariance functions. For compactly supported covariance functions, within an increasing-domain asymptotic framework, we provide sufficient conditions under which consistency and asymptotic normality of estimators based on truncated-likelihood functions are preserved. We apply our result to the family of generalized Wendland covariance functions and discuss several examples of Wendland approximations. For families of covariance functions that are not compactly supported, we combine our results with the covariance tapering approach and show that ML estimators, based on truncated-tapered likelihood functions, asymptotically minimize the Kullback-Leibler divergence, when the taper range is fixed.

Analysis · 可辨認的 · MoDELS · 推斷 · 矩 ·

2022 年 7 月 21 日

Efficient inference and identifiability analysis for differential equation models with random parameters

Alexander P. Browning,Christopher Drovandi,Ian W. Turner,Adrianne L. Jenner,Matthew J. Simpson

Heterogeneity is a dominant factor in the behaviour of many biological processes. Despite this, it is common for mathematical and statistical analyses to ignore biological heterogeneity as a source of variability in experimental data. Therefore, methods for exploring the identifiability of models that explicitly incorporate heterogeneity through variability in model parameters are relatively underdeveloped. We develop a new likelihood-based framework, based on moment matching, for inference and identifiability analysis of differential equation models that capture biological heterogeneity through parameters that vary according to probability distributions. As our novel method is based on an approximate likelihood function, it is highly flexible; we demonstrate identifiability analysis using both a frequentist approach based on profile likelihood, and a Bayesian approach based on Markov-chain Monte Carlo. Through three case studies, we demonstrate our method by providing a didactic guide to inference and identifiability analysis of hyperparameters that relate to the statistical moments of model parameters from independent observed data. Our approach has a computational cost comparable to analysis of models that neglect heterogeneity, a significant improvement over many existing alternatives. We demonstrate how analysis of random parameter models can aid better understanding of the sources of heterogeneity from biological data.

Processing（編程語言） · 估計/估計量 · 線性的 · 泛函 · Extensibility ·

2022 年 7 月 20 日

Bias-correction and Test for Mark-point Dependence with Replicated Marked Point Processes

Ganggang Xu,Jingfei Zhang,Yehua Li,Yongtao Guan

Mark-point dependence plays a critical role in research problems that can be fitted into the general framework of marked point processes. In this work, we focus on adjusting for mark-point dependence when estimating the mean and covariance functions of the mark process, given independent replicates of the marked point process. We assume that the mark process is a Gaussian process and the point process is a log-Gaussian Cox process, where the mark-point dependence is generated through the dependence between two latent Gaussian processes. Under this framework, naive local linear estimators ignoring the mark-point dependence can be severely biased. We show that this bias can be corrected using a local linear estimator of the cross-covariance function and establish uniform convergence rates of the bias-corrected estimators. Furthermore, we propose a test statistic based on local linear estimators for mark-point independence, which is shown to converge to an asymptotic normal distribution in a parametric $\sqrt{n}$-convergence rate. Model diagnostics tools are developed for key model assumptions and a robust functional permutation test is proposed for a more general class of mark-point processes. The effectiveness of the proposed methods is demonstrated using extensive simulations and applications to two real data examples.

估計/估計量 · Learning · 聯邦學習 · 優化器 · 方差減小 ·

2022 年 7 月 20 日

AdaBest: Minimizing Client Drift in Federated Learning via Adaptive Bias Estimation

Farshid Varno,Marzie Saghayi,Laya Rafiee Sevyeri,Sharut Gupta,Stan Matwin,Mohammad Havaei

from arxiv, AdaBest

In Federated Learning (FL), a number of clients or devices collaborate to train a model without sharing their data. Models are optimized locally at each client and further communicated to a central hub for aggregation. While FL is an appealing decentralized training paradigm, heterogeneity among data from different clients can cause the local optimization to drift away from the global objective. In order to estimate and therefore remove this drift, variance reduction techniques have been incorporated into FL optimization recently. However, these approaches inaccurately estimate the clients' drift and ultimately fail to remove it properly. In this work, we propose an adaptive algorithm that accurately estimates drift across clients. In comparison to previous works, our approach necessitates less storage and communication bandwidth, as well as lower compute costs. Additionally, our proposed methodology induces stability by constraining the norm of estimates for client drift, making it more practical for large scale FL. Experimental findings demonstrate that the proposed algorithm converges significantly faster and achieves higher accuracy than the baselines across various FL benchmarks.

預測器/決策函數 · 分離的 · 損失 · 硬間隔 · 優化器 ·

2022 年 7 月 19 日

The Implicit Bias of Gradient Descent on Separable Data

Daniel Soudry,Elad Hoffer,Mor Shpigel Nacson,Suriya Gunasekar,Nathan Srebro

from arxiv, Fixed a few minor issues in v4: typo in assumption 2, Latex issue in Lemma 1, and added a few words to proof sketch

We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result also generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. Our methodology can also aid in understanding implicit regularization n more complex models and with other optimization methods.

Continuity · 相似度度量 · 相似度 · 簇 · dynamic programming ·

2022 年 7 月 19 日

Computing Continuous Dynamic Time Warping of Time Series in Polynomial Time

Kevin Buchin,André Nusser,Sampson Wong

from arxiv, In SoCG 2022

Dynamic Time Warping is arguably the most popular similarity measure for time series, where we define a time series to be a one-dimensional polygonal curve. The drawback of Dynamic Time Warping is that it is sensitive to the sampling rate of the time series. The Fr\'echet distance is an alternative that has gained popularity, however, its drawback is that it is sensitive to outliers. Continuous Dynamic Time Warping (CDTW) is a recently proposed alternative that does not exhibit the aforementioned drawbacks. CDTW combines the continuous nature of the Fr\'echet distance with the summation of Dynamic Time Warping, resulting in a similarity measure that is robust to sampling rate and to outliers. In a recent experimental work of Brankovic et al., it was demonstrated that clustering under CDTW avoids the unwanted artifacts that appear when clustering under Dynamic Time Warping and under the Fr\'echet distance. Despite its advantages, the major shortcoming of CDTW is that there is no exact algorithm for computing CDTW, in polynomial time or otherwise. In this work, we present the first exact algorithm for computing CDTW of one-dimensional curves. Our algorithm runs in time $O(n^5)$ for a pair of one-dimensional curves, each with complexity at most $n$. In our algorithm, we propagate continuous functions in the dynamic program for CDTW, where the main difficulty lies in bounding the complexity of the functions. We believe that our result is an important first step towards CDTW becoming a practical similarity measure between curves.

Learning · MCMC · DAG · Analysis · 類別 ·

2022 年 7 月 19 日

Complexity analysis of Bayesian learning of high-dimensional DAG models and their equivalence classes

Quan Zhou,Hyunwoong Chang

Structure learning via MCMC sampling is known to be very challenging because of the enormous search space and the existence of Markov equivalent DAGs. Theoretical results on the mixing behavior are lacking. In this work, we prove the rapid mixing of a random walk Metropolis-Hastings algorithm, which reveals that the complexity of Bayesian learning of sparse equivalence classes grows only polynomially in $n$ and $p$, under some high-dimensional assumptions. A series of high-dimensional consistency results is obtained, including the strong selection consistency of an empirical Bayes model for structure learning. Our proof is based on two new results. First, we derive a general mixing time bound on finite state spaces, which can be applied to various local MCMC schemes for other model selection problems. Second, we construct greedy search paths on the space of equivalence classes with node degree constraints by proving a combinatorial property of the comparison between two DAGs. Simulation studies on the proposed MCMC sampler are conducted to illustrate the main theoretical findings.