亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The universal approximation property of width-bounded networks has been studied as a dual of the classical universal approximation theorem for depth-bounded ones. There were several attempts to characterize the minimum width $w_{\min}$ enabling the universal approximation property; however, only a few of them found the exact values. In this work, we show that the minimum width for the universal approximation of $L^p$ functions from $[0,1]^{d_x}$ to $\mathbb R^{d_y}$ is exactly $\max\{d_x,d_y,2\}$ if an activation function is ReLU-Like (e.g., ReLU, GELU, Softplus). Compared to the known result $w_{\min}=\max\{d_x+1,d_y\}$ when the domain is ${\mathbb R^{d_x}}$, our result first shows that approximation on a compact domain requires smaller width than on ${\mathbb R^{d_x}}$. We next prove a lower bound on $w_{\min}$ for uniform approximation using general activation functions including ReLU: $w_{\min}\ge d_y+1$ if $d_x<d_y\le2d_x$. Together with our first result, this shows a dichotomy between $L^p$ and uniform approximations for general activation functions and input/output dimensions.

相關內容

Quantum computing has recently emerged as a transformative technology. Yet, its promised advantages rely on efficiently translating quantum operations into viable physical realizations. In this work, we use generative machine learning models, specifically denoising diffusion models (DMs), to facilitate this transformation. Leveraging text-conditioning, we steer the model to produce desired quantum operations within gate-based quantum circuits. Notably, DMs allow to sidestep during training the exponential overhead inherent in the classical simulation of quantum dynamics -- a consistent bottleneck in preceding ML techniques. We demonstrate the model's capabilities across two tasks: entanglement generation and unitary compilation. The model excels at generating new circuits and supports typical DM extensions such as masking and editing to, for instance, align the circuit generation to the constraints of the targeted quantum device. Given their flexibility and generalization abilities, we envision DMs as pivotal in quantum circuit synthesis, enhancing both practical applications but also insights into theoretical quantum computation.

In theory, the choice of ReLU(0) in [0, 1] for a neural network has a negligible influence both on backpropagation and training. Yet, in the real world, 32 bits default precision combined with the size of deep learning problems makes it a hyperparameter of training methods. We investigate the importance of the value of ReLU'(0) for several precision levels (16, 32, 64 bits), on various networks (fully connected, VGG, ResNet) and datasets (MNIST, CIFAR10, SVHN, ImageNet). We observe considerable variations of backpropagation outputs which occur around half of the time in 32 bits precision. The effect disappears with double precision, while it is systematic at 16 bits. For vanilla SGD training, the choice ReLU'(0) = 0 seems to be the most efficient. For our experiments on ImageNet the gain in test accuracy over ReLU'(0) = 1 was more than 10 points (two runs). We also evidence that reconditioning approaches as batch-norm or ADAM tend to buffer the influence of ReLU'(0)'s value. Overall, the message we convey is that algorithmic differentiation of nonsmooth problems potentially hides parameters that could be tuned advantageously.

We study stability properties of the expected utility function in Bayesian optimal experimental design. We provide a framework for this problem in a non-parametric setting and prove a convergence rate of the expected utility with respect to a likelihood perturbation. This rate is uniform over the design space and its sharpness in the general setting is demonstrated by proving a lower bound in a special case. To make the problem more concrete we proceed by considering non-linear Bayesian inverse problems with Gaussian likelihood and prove that the assumptions set out for the general case are satisfied and regain the stability of the expected utility with respect to perturbations to the observation map. Theoretical convergence rates are demonstrated numerically in three different examples.

The synthesis of information deriving from complex networks is a topic receiving increasing relevance in ecology and environmental sciences. In particular, the aggregation of multilayer networks, i.e. network structures formed by multiple interacting networks (the layers), constitutes a fast-growing field. In several environmental applications, the layers of a multilayer network are modelled as a collection of similarity matrices describing how similar pairs of biological entities are, based on different types of features (e.g. biological traits). The present paper first discusses two main techniques for combining the multi-layered information into a single network (the so-called monoplex), i.e. Similarity Network Fusion (SNF) and Similarity Matrix Average (SMA). Then, the effectiveness of the two methods is tested on a real-world dataset of the relative abundance of microbial species in the ecosystems of nine glaciers (four glaciers in the Alps and five in the Andes). A preliminary clustering analysis on the monoplexes obtained with different methods shows the emergence of a tightly connected community formed by species that are typical of cryoconite holes worldwide. Moreover, the weights assigned to different layers by the SMA algorithm suggest that two large South American glaciers (Exploradores and Perito Moreno) are structurally different from the smaller glaciers in both Europe and South America. Overall, these results highlight the importance of integration methods in the discovery of the underlying organizational structure of biological entities in multilayer ecological networks.

We establish conditions under which latent causal graphs are nonparametrically identifiable and can be reconstructed from unknown interventions in the latent space. Our primary focus is the identification of the latent structure in measurement models without parametric assumptions such as linearity or Gaussianity. Moreover, we do not assume the number of hidden variables is known, and we show that at most one unknown intervention per hidden variable is needed. This extends a recent line of work on learning causal representations from observations and interventions. The proofs are constructive and introduce two new graphical concepts -- imaginary subsets and isolated edges -- that may be useful in their own right. As a matter of independent interest, the proofs also involve a novel characterization of the limits of edge orientations within the equivalence class of DAGs induced by unknown interventions. These are the first results to characterize the conditions under which causal representations are identifiable without making any parametric assumptions in a general setting with unknown interventions and without faithfulness.

To improve the predictability of complex computational models in the experimentally-unknown domains, we propose a Bayesian statistical machine learning framework utilizing the Dirichlet distribution that combines results of several imperfect models. This framework can be viewed as an extension of Bayesian stacking. To illustrate the method, we study the ability of Bayesian model averaging and mixing techniques to mine nuclear masses. We show that the global and local mixtures of models reach excellent performance on both prediction accuracy and uncertainty quantification and are preferable to classical Bayesian model averaging. Additionally, our statistical analysis indicates that improving model predictions through mixing rather than mixing of corrected models leads to more robust extrapolations.

Uniform continuity bounds on entropies are generally expressed in terms of a single distance measure between a pair of probability distributions or quantum states, typically, the total variation distance or trace distance. However, if an additional distance measure between the probability distributions or states is known, then the continuity bounds can be significantly strengthened. Here, we prove a tight uniform continuity bound for the Shannon entropy in terms of both the local- and total variation distances, sharpening an inequality proven in [I. Sason, IEEE Trans. Inf. Th., 59, 7118 (2013)]. We also obtain a uniform continuity bound for the von Neumann entropy in terms of both the operator norm- and trace distances. The bound is tight when the quotient of the trace distance by the operator norm distance is an integer. We then apply our results to compute upper bounds on the quantum- and private classical capacities of channels. We begin by refining the concept of approximate degradable channels, namely, $\varepsilon$-degradable channels, which are, by definition, $\varepsilon$-close in diamond norm to their complementary channel when composed with a degrading channel. To this end, we introduce the notion of $(\varepsilon,\nu)$-degradable channels; these are $\varepsilon$-degradable channels that are, in addition, $\nu$-close in completely bounded spectral norm to their complementary channel, when composed with the same degrading channel. This allows us to derive improved upper bounds to the quantum- and private classical capacities of such channels. Moreover, these bounds can be further improved by considering certain unstabilized versions of the above norms. We show that upper bounds on the latter can be efficiently expressed as semidefinite programs. We illustrate our results by obtaining a new upper bound on the quantum capacity of the qubit depolarizing channel.

We show a deterministic constant-time local algorithm for constructing an approximately maximum flow and minimum fractional cut in multisource-multitarget networks with bounded degrees and bounded edge capacities. Locality means that the decision we make about each edge only depends on its constant radius neighborhood. We show two applications of the algorithms: one is related to the Aldous-Lyons Conjecture, and the other is about approximating the neighborhood distribution of graphs by bounded-size graphs. The scope of our results can be extended to unimodular random graphs and networks. As a corollary, we generalize the Maximum Flow Minimum Cut Theorem to unimodular random flow networks.

Cloud computing and the evolution of management methodologies such as Lean Management or Agile entail a profound transformation in both system construction and maintenance approaches. These practices are encompassed within the term "DevOps." This descriptive approach to an information system or application, alongside the configuration of its constituent components, has necessitated the development of descriptive languages paired with specialized engines for automating systems administration tasks. Among these, the tandem of Ansible (engine) and YAML (descriptive language) stands out as the two most prevalent tools in the market, facing notable competition mainly from Terraform. The current document presents an inquiry into a solution for generating and managing Ansible YAML roles and playbooks, utilizing Generative LLMs (Language Models) to translate human descriptions into code. Our efforts are focused on identifying plausible directions and outlining the potential industrial applications. Note: For the purpose of this experiment, we have opted against the use of Ansible Lightspeed. This is due to its reliance on an IBM Watson model, for which we have not found any publicly available references. Comprehensive information regarding this remarkable technology can be found [1] directly on our partner's website, RedHat.

Deep learning is usually described as an experiment-driven field under continuous criticizes of lacking theoretical foundations. This problem has been partially fixed by a large volume of literature which has so far not been well organized. This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analyzing the generalizability of deep learning; (2) stochastic differential equations and their dynamic systems for modelling stochastic gradient descent and its variants, which characterize the optimization and generalization of deep learning, partially inspired by Bayesian inference; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; (4) the roles of over-parameterization of deep neural networks from both positive and negative perspectives; (5) theoretical foundations of several special structures in network architectures; and (6) the increasingly intensive concerns in ethics and security and their relationships with generalizability.

北京阿比特科技有限公司