亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

For the intersection of the Stiefel manifold and the set of nonnegative matrices in $\mathbb{R}^{n\times r}$, we present global and local error bounds with easily computable residual functions and explicit coefficients. Moreover, we show that the error bounds cannot be improved except for the coefficients, which explains why two square-root terms are necessary in the bounds when $1 < r < n$ for the nonnegativity and orthogonality, respectively. The error bounds are applied to penalty methods for minimizing a Lipschitz continuous function with nonnegative orthogonality constraints. Under only the Lipschitz continuity of the objective function, we prove the exactness of penalty problems that penalize the nonnegativity constraint, or the orthogonality constraint, or both constraints. Our results cover both global and local minimizers.

相關內容

We study the complexity of optimizing nonsmooth nonconvex Lipschitz functions by producing $(\delta,\epsilon)$-stationary points. Several recent works have presented randomized algorithms that produce such points using $\tilde O(\delta^{-1}\epsilon^{-3})$ first-order oracle calls, independent of the dimension $d$. It has been an open problem as to whether a similar result can be obtained via a deterministic algorithm. We resolve this open problem, showing that randomization is necessary to obtain a dimension-free rate. In particular, we prove a lower bound of $\Omega(d)$ for any deterministic algorithm. Moreover, we show that unlike smooth or convex optimization, access to function values is required for any deterministic algorithm to halt within any finite time. On the other hand, we prove that if the function is even slightly smooth, then the dimension-free rate of $\tilde O(\delta^{-1}\epsilon^{-3})$ can be obtained by a deterministic algorithm with merely a logarithmic dependence on the smoothness parameter. Motivated by these findings, we turn to study the complexity of deterministically smoothing Lipschitz functions. Though there are efficient black-box randomized smoothings, we start by showing that no such deterministic procedure can smooth functions in a meaningful manner, resolving an open question. We then bypass this impossibility result for the structured case of ReLU neural networks. To that end, in a practical white-box setting in which the optimizer is granted access to the network's architecture, we propose a simple, dimension-free, deterministic smoothing that provably preserves $(\delta,\epsilon)$-stationary points. Our method applies to a variety of architectures of arbitrary depth, including ResNets and ConvNets. Combined with our algorithm, this yields the first deterministic dimension-free algorithm for optimizing ReLU networks, circumventing our lower bound.

We give a simple characterization of which functions can be computed deterministically by anonymous processes in disconnected dynamic networks, depending on the number of leaders in the network. In addition, we provide efficient distributed algorithms for computing all such functions assuming minimal or no knowledge about the network. Each of our algorithms comes in two versions: one that terminates with the correct output and a faster one that stabilizes on the correct output without explicit termination. Notably, these are the first deterministic algorithms whose running times scale linearly with both the number of processes and a parameter of the network which we call "dynamic disconnectivity". We also provide matching lower bounds, showing that all our algorithms are asymptotically optimal for any fixed number of leaders. While most of the existing literature on anonymous dynamic networks relies on classical mass-distribution techniques, our work makes use of a recently introduced combinatorial structure called "history tree", also developing its theory in new directions. Among other contributions, our results make definitive progress on two popular fundamental problems for anonymous dynamic networks: leaderless Average Consensus (i.e., computing the mean value of input numbers distributed among the processes) and multi-leader Counting (i.e., determining the exact number of processes in the network). In fact, our approach unifies and improves upon several independent lines of research on anonymous networks, including Nedic et al., IEEE Trans. Automat. Contr. 2009; Olshevsky, SIAM J. Control Optim. 2017; Kowalski-Mosteiro, ICALP 2019, SPAA 2021; Di Luna-Viglietta, FOCS 2022.

Quasar convexity is a condition that allows some first-order methods to efficiently minimize a function even when the optimization landscape is non-convex. Previous works develop near-optimal accelerated algorithms for minimizing this class of functions, however, they require a subroutine of binary search which results in multiple calls to gradient evaluations in each iteration, and consequently the total number of gradient evaluations does not match a known lower bound. In this work, we show that a recently proposed continuized Nesterov acceleration can be applied to minimizing quasar convex functions and achieves the optimal bound with a high probability. Furthermore, we find that the objective functions of training generalized linear models (GLMs) satisfy quasar convexity, which broadens the applicability of the relevant algorithms, while known practical examples of quasar convexity in non-convex learning are sparse in the literature. We also show that if a smooth and one-point strongly convex, Polyak-Lojasiewicz, or quadratic-growth function satisfies quasar convexity, then attaining an accelerated linear rate for minimizing the function is possible under certain conditions, while acceleration is not known in general for these classes of functions.

We present recent finite element numerical results on a model convection-diffusion problem in the singular perturbed case when the convection term dominates the problem. We compare the standard Galerkin discretization using the linear element with a saddle point least square discretization that uses quadratic test functions, trying to control and explain the non-physical oscillations of the discrete solutions. We also relate the up-winding Petrov-Galerkin method and the stream-line diffusion discretization method, by emphasizing the resulting linear systems and by comparing appropriate error norms. Some results can be extended to the multidimensional case in order to come up with efficient approximations for more general singular perturbed problems, including convection dominated models.

We present trust region bounds for optimizing decentralized policies in cooperative Multi-Agent Reinforcement Learning (MARL), which holds even when the transition dynamics are non-stationary. This new analysis provides a theoretical understanding of the strong performance of two recent actor-critic methods for MARL, which both rely on independent ratios, i.e., computing probability ratios separately for each agent's policy. We show that, despite the non-stationarity that independent ratios cause, a monotonic improvement guarantee still arises as a result of enforcing the trust region constraint over all decentralized policies. We also show this trust region constraint can be effectively enforced in a principled way by bounding independent ratios based on the number of agents in training, providing a theoretical foundation for proximal ratio clipping. Finally, our empirical results support the hypothesis that the strong performance of IPPO and MAPPO is a direct result of enforcing such a trust region constraint via clipping in centralized training, and tuning the hyperparameters with regards to the number of agents, as predicted by our theoretical analysis.

We study the parameterized problem of satisfying ``almost all'' constraints of a given formula $F$ over a fixed, finite Boolean constraint language $\Gamma$, with or without weights. More precisely, for each finite Boolean constraint language $\Gamma$, we consider the following two problems. In Min SAT$(\Gamma)$, the input is a formula $F$ over $\Gamma$ and an integer $k$, and the task is to find an assignment $\alpha \colon V(F) \to \{0,1\}$ that satisfies all but at most $k$ constraints of $F$, or determine that no such assignment exists. In Weighted Min SAT$(\Gamma$), the input additionally contains a weight function $w \colon F \to \mathbb{Z}_+$ and an integer $W$, and the task is to find an assignment $\alpha$ such that (1) $\alpha$ satisfies all but at most $k$ constraints of $F$, and (2) the total weight of the violated constraints is at most $W$. We give a complete dichotomy for the fixed-parameter tractability of these problems: We show that for every Boolean constraint language $\Gamma$, either Weighted Min SAT$(\Gamma)$ is FPT; or Weighted Min SAT$(\Gamma)$ is W[1]-hard but Min SAT$(\Gamma)$ is FPT; or Min SAT$(\Gamma)$ is W[1]-hard. This generalizes recent work of Kim et al. (SODA 2021) which did not consider weighted problems, and only considered languages $\Gamma$ that cannot express implications $(u \to v)$ (as is used to, e.g., model digraph cut problems). Our result generalizes and subsumes multiple previous results, including the FPT algorithms for Weighted Almost 2-SAT, weighted and unweighted $\ell$-Chain SAT, and Coupled Min-Cut, as well as weighted and directed versions of the latter. The main tool used in our algorithms is the recently developed method of directed flow-augmentation (Kim et al., STOC 2022).

Randomized orthogonal projection methods (ROPMs) can be used to speed up the computation of Krylov subspace methods in various contexts. Through a theoretical and numerical investigation, we establish that these methods produce quasi-optimal approximations over the Krylov subspace. Our numerical experiments outline the convergence of ROPMs for all matrices in our test set, with occasional spikes, but overall with a convergence rate similar to that of standard OPMs.

In this paper we study the problem of multiclass classification with a bounded number of different labels $k$, in the realizable setting. We extend the traditional PAC model to a) distribution-dependent learning rates, and b) learning rates under data-dependent assumptions. First, we consider the universal learning setting (Bousquet, Hanneke, Moran, van Handel and Yehudayoff, STOC '21), for which we provide a complete characterization of the achievable learning rates that holds for every fixed distribution. In particular, we show the following trichotomy: for any concept class, the optimal learning rate is either exponential, linear or arbitrarily slow. Additionally, we provide complexity measures of the underlying hypothesis class that characterize when these rates occur. Second, we consider the problem of multiclass classification with structured data (such as data lying on a low dimensional manifold or satisfying margin conditions), a setting which is captured by partial concept classes (Alon, Hanneke, Holzman and Moran, FOCS '21). Partial concepts are functions that can be undefined in certain parts of the input space. We extend the traditional PAC learnability of total concept classes to partial concept classes in the multiclass setting and investigate differences between partial and total concepts.

In this article, we study the effect of small-cut elements on the critical time-step size in an immersogeometric context. We analyze different formulations for second-order (membrane) and fourth-order (shell-type) equations, and derive scaling relations between the critical time-step size and the cut-element size for various types of cuts. In particular, we focus on different approaches for the weak imposition of Dirichlet conditions: by penalty enforcement and with Nitsche's method. The stability requirement for Nitsche's method necessitates either a cut-size dependent penalty parameter, or an additional ghost-penalty stabilization term is necessary. Our findings show that both techniques suffer from cut-size dependent critical time-step sizes, but the addition of a ghost-penalty term to the mass matrix serves to mitigate this issue. We confirm that this form of `mass-scaling' does not adversely affect error and convergence characteristics for a transient membrane example, and has the potential to increase the critical time-step size by orders of magnitude. Finally, for a prototypical simulation of a Kirchhoff-Love shell, our stabilized Nitsche formulation reduces the solution error by well over an order of magnitude compared to a penalty formulation at equal time-step size.

In group sequential analysis, data is collected and analyzed in batches until pre-defined stopping criteria are met. Inference in the parametric setup typically relies on the limiting asymptotic multivariate normality of the repeatedly computed maximum likelihood estimators (MLEs), a result first rigorously proved by Jennison and Turbull (1997) under general regularity conditions. In this work, using Stein's method we provide optimal order, non-asymptotic bounds on the distance for smooth test functions between the joint group sequential MLEs and the appropriate normal distribution under the same conditions. Our results assume independent observations but allow heterogeneous (i.e., non-identically distributed) data. We examine how the resulting bounds simplify when the data comes from an exponential family. Finally, we present a general result relating multivariate Kolmogorov distance to smooth function distance which, in addition to extending our results to the former metric, may be of independent interest.

北京阿比特科技有限公司