亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Tsallis and R\'{e}nyi entropies, which are monotone transformations of each other, are deformations of the celebrated Shannon entropy. Maximization of these deformed entropies, under suitable constraints, leads to the $q$-exponential family which has applications in non-extensive statistical physics, information theory and statistics. In previous information-geometric studies, the $q$-exponential family was analyzed using classical convex duality and Bregman divergence. In this paper, we show that a generalized $\lambda$-duality, where $\lambda = 1 - q$ is the constant information-geometric curvature, leads to a generalized exponential family which is essentially equivalent to the $q$-exponential family and has deep connections with R\'{e}nyi entropy and optimal transport. Using this generalized convex duality and its associated logarithmic divergence, we show that our $\lambda$-exponential family satisfies properties that parallel and generalize those of the exponential family. Under our framework, the R\'{e}nyi entropy and divergence arise naturally, and we give a new proof of the Tsallis/R\'{e}nyi entropy maximizing property of the $q$-exponential family. We also introduce a $\lambda$-mixture family which may be regarded as the dual of the $\lambda$-exponential family, and connect it with other mixture-type families. Finally, we discuss a duality between the $\lambda$-exponential family and the $\lambda$-logarithmic divergence, and study its statistical consequences.

相關內容

Spectral independence is a recently-developed framework for obtaining sharp bounds on the convergence time of the classical Glauber dynamics. This new framework has yielded optimal $O(n \log n)$ sampling algorithms on bounded-degree graphs for a large class of problems throughout the so-called uniqueness regime, including, for example, the problems of sampling independent sets, matchings, and Ising-model configurations. Our main contribution is to relax the bounded-degree assumption that has so far been important in establishing and applying spectral independence. Previous methods for avoiding degree bounds rely on using $L^p$-norms to analyse contraction on graphs with bounded connective constant (Sinclair, Srivastava, Yin; FOCS'13). The non-linearity of $L^p$-norms is an obstacle to applying these results to bound spectral independence. Our solution is to capture the $L^p$-analysis recursively by amortising over the subtrees of the recurrence used to analyse contraction. Our method generalises previous analyses that applied only to bounded-degree graphs. As a main application of our techniques, we consider the random graph $G(n,d/n)$, where the previously known algorithms run in time $n^{O(\log d)}$ or applied only to large $d$. We refine these algorithmic bounds significantly, and develop fast $n^{1+o(1)}$ algorithms based on Glauber dynamics that apply to all $d$, throughout the uniqueness regime.

Escaping from saddle points and finding local minimum is a central problem in nonconvex optimization. Perturbed gradient methods are perhaps the simplest approach for this problem. However, to find $(\epsilon, \sqrt{\epsilon})$-approximate local minima, the existing best stochastic gradient complexity for this type of algorithms is $\tilde O(\epsilon^{-3.5})$, which is not optimal. In this paper, we propose LENA (Last stEp shriNkAge), a faster perturbed stochastic gradient framework for finding local minima. We show that LENA with stochastic gradient estimators such as SARAH/SPIDER and STORM can find $(\epsilon, \epsilon_{H})$-approximate local minima within $\tilde O(\epsilon^{-3} + \epsilon_{H}^{-6})$ stochastic gradient evaluations (or $\tilde O(\epsilon^{-3})$ when $\epsilon_H = \sqrt{\epsilon}$). The core idea of our framework is a step-size shrinkage scheme to control the average movement of the iterates, which leads to faster convergence to the local minima.

The Bayesian information criterion (BIC), defined as the observed data log likelihood minus a penalty term based on the sample size $N$, is a popular model selection criterion for factor analysis with complete data. This definition has also been suggested for incomplete data. However, the penalty term based on the `complete' sample size $N$ is the same no matter whether in a complete or incomplete data case. For incomplete data, there are often only $N_i<N$ observations for variable $i$, which means that using the `complete' sample size $N$ implausibly ignores the amounts of missing information inherent in incomplete data. Given this observation, a novel criterion called hierarchical BIC (HBIC) for factor analysis with incomplete data is proposed. The novelty is that it only uses the actual amounts of observed information, namely $N_i$'s, in the penalty term. Theoretically, it is shown that HBIC is a large sample approximation of variational Bayesian (VB) lower bound, and BIC is a further approximation of HBIC, which means that HBIC shares the theoretical consistency of BIC. Experiments on synthetic and real data sets are conducted to access the finite sample performance of HBIC, BIC, and related criteria with various missing rates. The results show that HBIC and BIC perform similarly when the missing rate is small, but HBIC is more accurate when the missing rate is not small.

In this paper we generalize Dillon's switching method to characterize the exact $c$-differential uniformity of functions constructed via this method. More precisely, we modify some PcN/APcN and other functions with known $c$-differential uniformity in a controllable number of coordinates to render more such functions. We present several applications of the method in constructing PcN and APcN functions with respect to all $c\neq 1$. As a byproduct, we generalize some result of [Y. Wu, N. Li, X. Zeng, {\em New PcN and APcN functions over finite fields}, Designs Codes Crypt. 89 (2021), 2637--2651]. Computational results rendering functions with low differential uniformity, as well as, other good cryptographic properties are sprinkled throughout the paper.

We propose a novel framework for learning a low-dimensional representation of data based on nonlinear dynamical systems, which we call dynamical dimension reduction (DDR). In the DDR model, each point is evolved via a nonlinear flow towards a lower-dimensional subspace; the projection onto the subspace gives the low-dimensional embedding. Training the model involves identifying the nonlinear flow and the subspace. Following the equation discovery method, we represent the vector field that defines the flow using a linear combination of dictionary elements, where each element is a pre-specified linear/nonlinear candidate function. A regularization term for the average total kinetic energy is also introduced and motivated by optimal transport theory. We prove that the resulting optimization problem is well-posed and establish several properties of the DDR method. We also show how the DDR method can be trained using a gradient-based optimization method, where the gradients are computed using the adjoint method from optimal control theory. The DDR method is implemented and compared on synthetic and example datasets to other dimension reductions methods, including PCA, t-SNE, and Umap.

Computing a dense subgraph is a fundamental problem in graph mining, with a diverse set of applications ranging from electronic commerce to community detection in social networks. In many of these applications, the underlying context is better modelled as a weighted hypergraph that keeps evolving with time. This motivates the problem of maintaining the densest subhypergraph of a weighted hypergraph in a {\em dynamic setting}, where the input keeps changing via a sequence of updates (hyperedge insertions/deletions). Previously, the only known algorithm for this problem was due to Hu et al. [HWC17]. This algorithm worked only on unweighted hypergraphs, and had an approximation ratio of $(1+\epsilon)r^2$ and an update time of $O(\text{poly} (r, \log n))$, where $r$ denotes the maximum rank of the input across all the updates. We obtain a new algorithm for this problem, which works even when the input hypergraph is weighted. Our algorithm has a significantly improved (near-optimal) approximation ratio of $(1+\epsilon)$ that is independent of $r$, and a similar update time of $O(\text{poly} (r, \log n))$. It is the first $(1+\epsilon)$-approximation algorithm even for the special case of weighted simple graphs. To complement our theoretical analysis, we perform experiments with our dynamic algorithm on large-scale, real-world data-sets. Our algorithm significantly outperforms the state of the art [HWC17] both in terms of accuracy and efficiency.

In this short note, we show that for any $\epsilon >0$ and $k<n^{0.5-\epsilon}$ the choice number of the Kneser graph $KG_{n,k}$ is $\Theta (n\log n)$.

The stochastic gradient Langevin Dynamics is one of the most fundamental algorithms to solve sampling problems and non-convex optimization appearing in several machine learning applications. Especially, its variance reduced versions have nowadays gained particular attention. In this paper, we study two variants of this kind, namely, the Stochastic Variance Reduced Gradient Langevin Dynamics and the Stochastic Recursive Gradient Langevin Dynamics. We prove their convergence to the objective distribution in terms of KL-divergence under the sole assumptions of smoothness and Log-Sobolev inequality which are weaker conditions than those used in prior works for these algorithms. With the batch size and the inner loop length set to $\sqrt{n}$, the gradient complexity to achieve an $\epsilon$-precision is $\tilde{O}((n+dn^{1/2}\epsilon^{-1})\gamma^2 L^2\alpha^{-2})$, which is an improvement from any previous analyses. We also show some essential applications of our result to non-convex optimization.

Tensor PCA is a stylized statistical inference problem introduced by Montanari and Richard to study the computational difficulty of estimating an unknown parameter from higher-order moment tensors. Unlike its matrix counterpart, Tensor PCA exhibits a statistical-computational gap, i.e., a sample size regime where the problem is information-theoretically solvable but conjectured to be computationally hard. This paper derives computational lower bounds on the run-time of memory bounded algorithms for Tensor PCA using communication complexity. These lower bounds specify a trade-off among the number of passes through the data sample, the sample size, and the memory required by any algorithm that successfully solves Tensor PCA. While the lower bounds do not rule out polynomial-time algorithms, they do imply that many commonly-used algorithms, such as gradient descent and power method, must have a higher iteration count when the sample size is not large enough. Similar lower bounds are obtained for Non-Gaussian Component Analysis, a family of statistical estimation problems in which low-order moment tensors carry no information about the unknown parameter. Finally, stronger lower bounds are obtained for an asymmetric variant of Tensor PCA and related statistical estimation problems. These results explain why many estimators for these problems use a memory state that is significantly larger than the effective dimensionality of the parameter of interest.

Convection-diffusion-reaction equations model the conservation of scalar quantities. From the analytic point of view, solution of these equations satisfy under certain conditions maximum principles, which represent physical bounds of the solution. That the same bounds are respected by numerical approximations of the solution is often of utmost importance in practice. The mathematical formulation of this property, which contributes to the physical consistency of a method, is called Discrete Maximum Principle (DMP). In many applications, convection dominates diffusion by several orders of magnitude. It is well known that standard discretizations typically do not satisfy the DMP in this convection-dominated regime. In fact, in this case, it turns out to be a challenging problem to construct discretizations that, on the one hand, respect the DMP and, on the other hand, compute accurate solutions. This paper presents a survey on finite element methods, with a main focus on the convection-dominated regime, that satisfy a local or a global DMP. The concepts of the underlying numerical analysis are discussed. The survey reveals that for the steady-state problem there are only a few discretizations, all of them nonlinear, that at the same time satisfy the DMP and compute reasonably accurate solutions, e.g., algebraically stabilized schemes. Moreover, most of these discretizations have been developed in recent years, showing the enormous progress that has been achieved lately. Methods based on algebraic stabilization, nonlinear and linear ones, are currently as well the only finite element methods that combine the satisfaction of the global DMP and accurate numerical results for the evolutionary equations in the convection-dominated situation.

北京阿比特科技有限公司