亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We develop the notion of discrete degrees of freedom of a log-concave sequence and use it to prove that geometric distribution minimises R\'enyi entropy of order infinity under fixed variance, among all discrete log-concave random variables in $\mathbb{Z}$. We also show that the quantity $\mathbb{P}(X=\mathbb{E} X)$ is maximised, among all ultra-log-concave random variables with fixed integral mean, for a Poisson distribution.

相關內容

In stochastic zeroth-order optimization, a problem of practical relevance is understanding how to fully exploit the local geometry of the underlying objective function. We consider a fundamental setting in which the objective function is quadratic, and provide the first tight characterization of the optimal Hessian-dependent sample complexity. Our contribution is twofold. First, from an information-theoretic point of view, we prove tight lower bounds on Hessian-dependent complexities by introducing a concept called energy allocation, which captures the interaction between the searching algorithm and the geometry of objective functions. A matching upper bound is obtained by solving the optimal energy spectrum. Then, algorithmically, we show the existence of a Hessian-independent algorithm that universally achieves the asymptotic optimal sample complexities for all Hessian instances. The optimal sample complexities achieved by our algorithm remain valid for heavy-tailed noise distributions, which are enabled by a truncation method.

The algebraic degree is an important parameter of Boolean functions used in cryptography. When a function in a large number of variables is not given explicitly in algebraic normal form, it might not be feasible to compute its degree. Instead, one can try to estimate the degree using probabilistic tests. We propose a probabilistic test for deciding whether the algebraic degree of a Boolean function $f$ is below a certain value $k$. The test involves picking an affine space of dimension $k$ and testing whether the values on $f$ on that space sum up to zero. If $deg(f)<k$, then $f$ will always pass the test, otherwise it will sometimes pass and sometimes fail the test, depending on which affine space was chosen. The probability of failing the proposed test is closely related to the number of monomials of degree $k$ in a polynomial $g$, averaged over all the polynomials $g$ which are affine equivalent to $f$. We initiate the study of the probability of failing the proposed ``$deg(f)<k$'' test. We show that in the particular case when the degree of $f$ is actually equal to $k$, the probability will be in the interval $(0.288788, 0.5]$, and therefore a small number of runs of the test is sufficient to give, with very high probability, the correct answer. Exact values of this probability for all the polynomials in 8 variables were computed using the representatives listed by Hou and by Langevin and Leander.

We give improved and almost optimal testers for several classes of Boolean functions on $n$ inputs that have concise representation in the uniform and distribution-free model. Classes, such as $k$-junta, $k$-linear functions, $s$-term DNF, $s$-term monotone DNF, $r$-DNF, decision list, $r$-decision list, size-$s$ decision tree, size-$s$ Boolean formula, size-$s$ branching programs, $s$-sparse polynomials over the binary field and function with Fourier degree at most $d$. The method can be extended to several other classes of functions over any domain that can be approximated by functions that have a small number of relevant variables.

Connectivity is a fundamental structural property of matroids, and has been studied algorithmically over 50 years. In 1974, Cunningham proposed a deterministic algorithm consuming $O(n^{2})$ queries to the independence oracle to determine whether a matroid is connected. Since then, no algorithm, not even a random one, has worked better. To the best of our knowledge, the classical query complexity lower bound and the quantum complexity for this problem have not been considered. Thus, in this paper we are devoted to addressing these issues, and our contributions are threefold as follows: (i) First, we prove that the randomized query complexity of determining whether a matroid is connected is $\Omega(n^2)$ and thus the algorithm proposed by Cunningham is optimal in classical computing. (ii) Second, we present a quantum algorithm with $O(n^{3/2})$ queries, which exhibits provable quantum speedups over classical ones. (iii) Third, we prove that any quantum algorithm requires $\Omega(n)$ queries, which indicates that quantum algorithms can achieve at most a quadratic speedup over classical ones. Therefore, we have a relatively comprehensive understanding of the potential of quantum computing in determining the connectedness of matroids.\

We study the problem of computing an optimal policy of an infinite-horizon discounted constrained Markov decision process (constrained MDP). Despite the popularity of Lagrangian-based policy search methods used in practice, the oscillation of policy iterates in these methods has not been fully understood, bringing out issues such as violation of constraints and sensitivity to hyper-parameters. To fill this gap, we employ the Lagrangian method to cast a constrained MDP into a constrained saddle-point problem in which max/min players correspond to primal/dual variables, respectively, and develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy. Specifically, we first propose a regularized policy gradient primal-dual (RPG-PD) method that updates the policy using an entropy-regularized policy gradient, and the dual via a quadratic-regularized gradient ascent, simultaneously. We prove that the policy primal-dual iterates of RPG-PD converge to a regularized saddle point with a sublinear rate, while the policy iterates converge sublinearly to an optimal constrained policy. We further instantiate RPG-PD in large state or action spaces by including function approximation in policy parametrization, and establish similar sublinear last-iterate policy convergence. Second, we propose an optimistic policy gradient primal-dual (OPG-PD) method that employs the optimistic gradient method to update primal/dual variables, simultaneously. We prove that the policy primal-dual iterates of OPG-PD converge to a saddle point that contains an optimal constrained policy, with a linear rate. To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.

This paper studies the online vector bin packing (OVBP) problem and the related problem of online hypergraph coloring (OHC). Firstly, we use a double counting argument to prove an upper bound of the competitive ratio of $FirstFit$ for OVBP. Our proof is conceptually simple, and strengthens the result in Azar et. al. by removing the dependency on the bin size parameter. Secondly, we introduce a notion of an online incidence matrix that is defined for every instance of OHC. Using this notion, we provide a reduction from OHC to OVBP, which allows us to carry known lower bounds of the competitive ratio of algorithms for OHC to OVBP. Our approach significantly simplifies the previous argument from Azar et. al. that relied on using intricate graph structures. In addition, we slightly improve their lower bounds. Lastly, we establish a tight bound of the competitive ratio of algorithms for OHC, where input is restricted to be a hypertree, thus resolving a conjecture in Nagy-Gyorgy et. al. The crux of this proof lies in solving a certain combinatorial partition problem about multi-family of subsets, which might be of independent interest.

Consider a two-person zero-sum search game between a Hider and a Searcher. The Hider chooses to hide in one of $n$ discrete locations (or "boxes") and the Searcher chooses a search sequence specifying which order to look in these boxes until finding the Hider. A search at box $i$ takes $t_i$ time units and finds the Hider - if hidden there - independently with probability $q_i$, for $i=1,\ldots,n$. The Searcher wants to minimize the expected total time needed to find the Hider, while the Hider wants to maximize it. It is shown in the literature that the Searcher has an optimal search strategy that mixes up to $n$ distinct search sequences with appropriate probabilities. This paper investigates the existence of optimal pure strategies for the Searcher - a single deterministic search sequence that achieves the optimal expected total search time regardless of where the Hider hides. We identify several cases in which the Searcher has an optimal pure strategy, and several cases in which such optimal pure strategy does not exist. An optimal pure search strategy has significant practical value because the Searcher does not need to randomize their actions and will avoid second guessing themselves if the chosen search sequence from an optimal mixed strategy does not turn out well.

We investigate the statistical behavior of gradient descent iterates with dropout in the linear regression model. In particular, non-asymptotic bounds for expectations and covariance matrices of the iterates are derived. In contrast with the widely cited connection between dropout and $\ell_2$-regularization in expectation, the results indicate a much more subtle relationship, owing to interactions between the gradient descent dynamics and the additional randomness induced by dropout. We also study a simplified variant of dropout which does not have a regularizing effect and converges to the least squares estimator.

An algorithm is said to be adaptive to a certain parameter (of the problem) if it does not need a priori knowledge of such a parameter but performs competitively to those that know it. This dissertation presents our work on adaptive algorithms in following scenarios: 1. In the stochastic optimization setting, we only receive stochastic gradients and the level of noise in evaluating them greatly affects the convergence rate. Tuning is typically required when without prior knowledge of the noise scale in order to achieve the optimal rate. Considering this, we designed and analyzed noise-adaptive algorithms that can automatically ensure (near)-optimal rates under different noise scales without knowing it. 2. In training deep neural networks, the scales of gradient magnitudes in each coordinate can scatter across a very wide range unless normalization techniques, like BatchNorm, are employed. In such situations, algorithms not addressing this problem of gradient scales can behave very poorly. To mitigate this, we formally established the advantage of scale-free algorithms that adapt to the gradient scales and presented its real benefits in empirical experiments. 3. Traditional analyses in non-convex optimization typically rely on the smoothness assumption. Yet, this condition does not capture the properties of some deep learning objective functions, including the ones involving Long Short-Term Memory networks and Transformers. Instead, they satisfy a much more relaxed condition, with potentially unbounded smoothness. Under this condition, we show that a generalized SignSGD algorithm can theoretically match the best-known convergence rates obtained by SGD with gradient clipping but does not need explicit clipping at all, and it can empirically match the performance of Adam and beat others. Moreover, it can also be made to automatically adapt to the unknown relaxed smoothness.

With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.

北京阿比特科技有限公司