亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We address the challenge of exploration in reinforcement learning (RL) when the agent operates in an unknown environment with sparse or no rewards. In this work, we study the maximum entropy exploration problem of two different types. The first type is visitation entropy maximization previously considered by Hazan et al.(2019) in the discounted setting. For this type of exploration, we propose a game-theoretic algorithm that has $\widetilde{\mathcal{O}}(H^3S^2A/\varepsilon^2)$ sample complexity thus improving the $\varepsilon$-dependence upon existing results, where $S$ is a number of states, $A$ is a number of actions, $H$ is an episode length, and $\varepsilon$ is a desired accuracy. The second type of entropy we study is the trajectory entropy. This objective function is closely related to the entropy-regularized MDPs, and we propose a simple algorithm that has a sample complexity of order $\widetilde{\mathcal{O}}(\mathrm{poly}(S,A,H)/\varepsilon)$. Interestingly, it is the first theoretical result in RL literature that establishes the potential statistical advantage of regularized MDPs for exploration. Finally, we apply developed regularization techniques to reduce sample complexity of visitation entropy maximization to $\widetilde{\mathcal{O}}(H^2SA/\varepsilon^2)$, yielding a statistical separation between maximum entropy exploration and reward-free exploration.

相關內容

We are interested in creating statistical methods to provide informative summaries of random fields through the geometry of their excursion sets. To this end, we introduce an estimator for the length of the perimeter of excursion sets of random fields on $\mathbb{R}^2$ observed over regular square tilings. The proposed estimator acts on the empirically accessible binary digital images of the excursion regions and computes the length of a piecewise linear approximation of the excursion boundary. The estimator is shown to be consistent as the pixel size decreases, without the need of any normalization constant, and with neither assumption of Gaussianity nor isotropy imposed on the underlying random field. In this general framework, even when the domain grows to cover $\mathbb{R}^2$, the estimation error is shown to be of smaller order than the side length of the domain. For affine, strongly mixing random fields, this translates to a multivariate Central Limit Theorem for our estimator when multiple levels are considered simultaneously. Finally, we conduct several numerical studies to investigate statistical properties of the proposed estimator in the finite-sample data setting.

An NP-complete graph decision problem, the "Multi-stage graph Simple Path" (abbr. MSP) problem, is introduced. The main contribution of this paper is a poly-time algorithm named the ZH algorithm for the problem together with the proof of its correctness, which implies NP=P. (1) A crucial structural property is discovered, whereby all MSP instances are arranged into the sequence $G_{0}$,$G_{1}$,$G_{2}$,... ($G_{k}$ essentially stands for a group of graphs for all $k\geq 0$). For each $G_{j}(j>0)$ in the sequence, there is a graph $G_{i}(0\leq i<j)$ "mathematically homomorphic" to $G_{j}$ which keeps completely accordant with $G_{j}$ on the existence of global solutions. This naturally provides a chance of applying mathematical induction for the proof of an algorithm. In previous attempts, algorithms used for making global decisions were mostly guided by heuristics and intuition. Rather, the ZH algorithm is dedicatedly designed to comply with the proposed proving framework of mathematical induction. (2) Although the ZH algorithm deals with paths, it always regards paths as a collection of edge sets. This is the key to the avoidance of exponential complexity. (3) Any poly-time algorithm that seeks global information can barely avoid the error caused by localized computation. In the ZH algorithm, the proposed reachable-path edge-set $R(e)$ and the computed information for it carry all necessary contextual information, which can be utilized to summarize the "history" and to detect the "future" for searching global solutions. (4) The relation between local strategies and global strategies is discovered and established, wherein preceding decisions can pose constraints to subsequent decisions (and vice versa). This interplay resembles the paradigm of dynamic programming, while much more convoluted. Nevertheless, the computation is always strait forward and decreases monotonically.

Recently developed reduced-order modeling techniques aim to approximate nonlinear dynamical systems on low-dimensional manifolds learned from data. This is an effective approach for modeling dynamics in a post-transient regime where the effects of initial conditions and other disturbances have decayed. However, modeling transient dynamics near an underlying manifold, as needed for real-time control and forecasting applications, is complicated by the effects of fast dynamics and nonnormal sensitivity mechanisms. To begin to address these issues, we introduce a parametric class of nonlinear projections described by constrained autoencoder neural networks in which both the manifold and the projection fibers are learned from data. Our architecture uses invertible activation functions and biorthogonal weight matrices to ensure that the encoder is a left inverse of the decoder. We also introduce new dynamics-aware cost functions that promote learning of oblique projection fibers that account for fast dynamics and nonnormality. To demonstrate these methods and the specific challenges they address, we provide a detailed case study of a three-state model of vortex shedding in the wake of a bluff body immersed in a fluid, which has a two-dimensional slow manifold that can be computed analytically. In anticipation of future applications to high-dimensional systems, we also propose several techniques for constructing computationally efficient reduced-order models using our proposed nonlinear projection framework. This includes a novel sparsity-promoting penalty for the encoder that avoids detrimental weight matrix shrinkage via computation on the Grassmann manifold.

We introduce an information-theoretic quantity with similar properties to mutual information that can be estimated from data without making explicit assumptions on the underlying distribution. This quantity is based on a recently proposed matrix-based entropy that uses the eigenvalues of a normalized Gram matrix to compute an estimate of the eigenvalues of an uncentered covariance operator in a reproducing kernel Hilbert space. We show that a difference of matrix-based entropies (DiME) is well suited for problems involving the maximization of mutual information between random variables. While many methods for such tasks can lead to trivial solutions, DiME naturally penalizes such outcomes. We compare DiME to several baseline estimators of mutual information on a toy Gaussian dataset. We provide examples of use cases for DiME, such as latent factor disentanglement and a multiview representation learning problem where DiME is used to learn a shared representation among views with high mutual information.

Coded distributed computing, proposed by Li et al., offers significant potential for reducing the communication load in MapReduce computing systems. In the setting of the \emph{cascaded} coded distributed computing that consisting of $K$ nodes, $N$ input files, and $Q$ output functions, the objective is to compute each output function through $s\geq 1$ nodes with a computation load $r\geq 1$, enabling the application of coding techniques during the Shuffle phase to achieve minimum communication load. However, for most existing coded distributed computing schemes, a major limitation lies in their demand for splitting the original data into an exponentially growing number of input files in terms of $N/\binom{K}{r} \in\mathbb{N}$ and requiring an exponentially large number of output functions $Q/\binom{K}{s} \in\mathbb{N}$, which imposes stringent requirements for implementation and results in significant coding complexity when $K$ is large. In this paper, we focus on the cascaded case of $K/s\in\mathbb{N} $, deliberately designing the strategy of input files store and output functions assignment based on a grouping method, such that a low-complexity two-round Shuffle phase is available. The main advantages of our proposed scheme contains: 1) the communication load is quilt close to or surprisingly better than the optimal state-of-the-art scheme proposed by Li et al.; 2) our scheme requires significantly less number of input files and output functions; 3) all the operations are implemented over the minimum binary field $\mathbb{F}_2$.

This paper elaborates on Conditional Handover (CHO) modelling, aimed at maximizing the use of contention free random access (CFRA) during mobility. This is a desirable behavior as CFRA increases the chance of fast and successful handover. In CHO this may be especially challenging as the time between the preparation and the actual cell change can be significantly longer in comparison to non-conditional handover. Thus, new means to mitigate this issue need to be defined. We present the scheme where beam-specific measurement reporting can lead to CFRA resource updating prior to CHO execution. We have run system level simulations to confirm that the proposed solution increases the ratio of CFRA attempts during CHO. In the best-case scenario, we observe a gain exceeding 13%. We also show how the average delay of completing the handover is reduced. To provide the entire perspective, we present at what expense these gains can be achieved by analyzing the increased signaling overhead for updating the random access resources. The study has been conducted for various network settings and considering higher frequency ranges at which the user communicates with the network. Finally, we provide an outlook on future extensions of the investigated solution.

According to ICH Q8 guidelines, the biopharmaceutical manufacturer submits a design space (DS) definition as part of the regulatory approval application, in which case process parameter (PP) deviations within this space are not considered a change and do not trigger a regulatory post approval procedure. A DS can be described by non-linear PP ranges, i.e., the range of one PP conditioned on specific values of another. However, independent PP ranges (linear combinations) are often preferred in biopharmaceutical manufacturing due to their operation simplicity. While some statistical software supports the calculation of a DS comprised of linear combinations, such methods are generally based on discretizing the parameter space - an approach that scales poorly as the number of PPs increases. Here, we introduce a novel method for finding linear PP combinations using a numeric optimizer to calculate the largest design space within the parameter space that results in critical quality attribute (CQA) boundaries within acceptance criteria, predicted by a regression model. A precomputed approximation of tolerance intervals is used in inequality constraints to facilitate fast evaluations of this boundary using a single matrix multiplication. Correctness of the method was validated against different ground truths with known design spaces. Compared to stateof-the-art, grid-based approaches, the optimizer-based procedure is more accurate, generally yields a larger DS and enables the calculation in higher dimensions. Furthermore, a proposed weighting scheme can be used to favor certain PPs over others and therefore enabling a more dynamic approach to DS definition and exploration. The increased PP ranges of the larger DS provide greater operational flexibility for biopharmaceutical manufacturers.

Contrastive learning on graphs aims at extracting distinguishable high-level representations of nodes. In this paper, we theoretically illustrate that the entropy of a dataset can be approximated by maximizing the lower bound of the mutual information across different views of a graph, \ie, entropy is estimated by a neural network. Based on this finding, we propose a simple yet effective subset sampling strategy to contrast pairwise representations between views of a dataset. In particular, we randomly sample nodes and edges from a given graph to build the input subset for a view. Two views are fed into a parameter-shared Siamese network to extract the high-dimensional embeddings and estimate the information entropy of the entire graph. For the learning process, we propose to optimize the network using two objectives, simultaneously. Concretely, the input of the contrastive loss function consists of positive and negative pairs. Our selection strategy of pairs is different from previous works and we present a novel strategy to enhance the representation ability of the graph encoder by selecting nodes based on cross-view similarities. We enrich the diversity of the positive and negative pairs by selecting highly similar samples and totally different data with the guidance of cross-view similarity scores, respectively. We also introduce a cross-view consistency constraint on the representations generated from the different views. This objective guarantees the learned representations are consistent across views from the perspective of the entire graph. We conduct extensive experiments on seven graph benchmarks, and the proposed approach achieves competitive performance compared to the current state-of-the-art methods. The source code will be publicly released once this paper is accepted.

Stochastic gradient descent (SGD) is the simplest deep learning optimizer with which to train deep neural networks. While SGD can use various learning rates, such as constant or diminishing rates, the previous numerical results showed that SGD performs better than other deep learning optimizers using when it uses learning rates given by line search methods. In this paper, we perform a convergence analysis on SGD with a learning rate given by an Armijo line search for nonconvex optimization. The analysis indicates that the upper bound of the expectation of the squared norm of the full gradient becomes small when the number of steps and the batch size are large. Next, we show that, for SGD with the Armijo-line-search learning rate, the number of steps needed for nonconvex optimization is a monotone decreasing convex function of the batch size; that is, the number of steps needed for nonconvex optimization decreases as the batch size increases. Furthermore, we show that the stochastic first-order oracle (SFO) complexity, which is the stochastic gradient computation cost, is a convex function of the batch size; that is, there exists a critical batch size that minimizes the SFO complexity. Finally, we provide numerical results that support our theoretical results. The numerical results indicate that the number of steps needed for training deep neural networks decreases as the batch size increases and that there exist the critical batch sizes that can be estimated from the theoretical results.

A mainstream type of current self-supervised learning methods pursues a general-purpose representation that can be well transferred to downstream tasks, typically by optimizing on a given pretext task such as instance discrimination. In this work, we argue that existing pretext tasks inevitably introduce biases into the learned representation, which in turn leads to biased transfer performance on various downstream tasks. To cope with this issue, we propose Maximum Entropy Coding (MEC), a more principled objective that explicitly optimizes on the structure of the representation, so that the learned representation is less biased and thus generalizes better to unseen downstream tasks. Inspired by the principle of maximum entropy in information theory, we hypothesize that a generalizable representation should be the one that admits the maximum entropy among all plausible representations. To make the objective end-to-end trainable, we propose to leverage the minimal coding length in lossy data coding as a computationally tractable surrogate for the entropy, and further derive a scalable reformulation of the objective that allows fast computation. Extensive experiments demonstrate that MEC learns a more generalizable representation than previous methods based on specific pretext tasks. It achieves state-of-the-art performance consistently on various downstream tasks, including not only ImageNet linear probe, but also semi-supervised classification, object detection, instance segmentation, and object tracking. Interestingly, we show that existing batch-wise and feature-wise self-supervised objectives could be seen equivalent to low-order approximations of MEC. Code and pre-trained models are available at //github.com/xinliu20/MEC.

北京阿比特科技有限公司