亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

A recent development in Bayesian optimization is the use of local optimization strategies, which can deliver strong empirical performance on high-dimensional problems compared to traditional global strategies. The "folk wisdom" in the literature is that the focus on local optimization sidesteps the curse of dimensionality; however, little is known concretely about the expected behavior or convergence of Bayesian local optimization routines. We first study the behavior of the local approach, and find that the statistics of individual local solutions of Gaussian process sample paths are surprisingly good compared to what we would expect to recover from global methods. We then present the first rigorous analysis of such a Bayesian local optimization algorithm recently proposed by M\"uller et al. (2021), and derive convergence rates in both the noisy and noiseless settings.

相關內容

Delays and asynchrony are inevitable in large-scale machine-learning problems where communication plays a key role. As such, several works have extensively analyzed stochastic optimization with delayed gradients. However, as far as we are aware, no analogous theory is available for min-max optimization, a topic that has gained recent popularity due to applications in adversarial robustness, game theory, and reinforcement learning. Motivated by this gap, we examine the performance of standard min-max optimization algorithms with delayed gradient updates. First, we show (empirically) that even small delays can cause prominent algorithms like Extra-gradient (\texttt{EG}) to diverge on simple instances for which \texttt{EG} guarantees convergence in the absence of delays. Our empirical study thus suggests the need for a careful analysis of delayed versions of min-max optimization algorithms. Accordingly, under suitable technical assumptions, we prove that Gradient Descent-Ascent (\texttt{GDA}) and \texttt{EG} with delayed updates continue to guarantee convergence to saddle points for convex-concave and strongly convex-strongly concave settings. Our complexity bounds reveal, in a transparent manner, the slow-down in convergence caused by delays.

Zero-shot learning (ZSL) aims to identify unseen classes with zero samples during training. Broadly speaking, present ZSL methods usually adopt class-level semantic labels and compare them with instance-level semantic predictions to infer unseen classes. However, we find that such existing models mostly produce imbalanced semantic predictions, i.e. these models could perform precisely for some semantics, but may not for others. To address the drawback, we aim to introduce an imbalanced learning framework into ZSL. However, we find that imbalanced ZSL has two unique challenges: (1) Its imbalanced predictions are highly correlated with the value of semantic labels rather than the number of samples as typically considered in the traditional imbalanced learning; (2) Different semantics follow quite different error distributions between classes. To mitigate these issues, we first formalize ZSL as an imbalanced regression problem which offers empirical evidences to interpret how semantic labels lead to imbalanced semantic predictions. We then propose a re-weighted loss termed Re-balanced Mean-Squared Error (ReMSE), which tracks the mean and variance of error distributions, thus ensuring rebalanced learning across classes. As a major contribution, we conduct a series of analyses showing that ReMSE is theoretically well established. Extensive experiments demonstrate that the proposed method effectively alleviates the imbalance in semantic prediction and outperforms many state-of-the-art ZSL methods. Our code is available at //github.com/FouriYe/ReZSL-TIP23.

In Statistical Relational Artificial Intelligence, a branch of AI and machine learning which combines the logical and statistical schools of AI, one uses the concept {\em para\-metrized probabilistic graphical model (PPGM)} to model (conditional) dependencies between random variables and to make probabilistic inferences about events on a space of "possible worlds". The set of possible worlds with underlying domain $D$ (a set of objects) can be represented by the set $\mathbf{W}_D$ of all first-order structures (for a suitable signature) with domain $D$. Using a formal logic we can describe events on $\mathbf{W}_D$. By combining a logic and a PPGM we can also define a probability distribution $\mathbb{P}_D$ on $\mathbf{W}_D$ and use it to compute the probability of an event. We consider a logic, denoted $PLA$, with truth values in the unit interval, which uses aggregation functions, such as arithmetic mean, geometric mean, maximum and minimum instead of quantifiers. However we face the problem of computational efficiency and this problem is an obstacle to the wider use of methods from Statistical Relational AI in practical applications. We address this problem by proving that the described probability will, under certain assumptions on the PPGM and the sentence $\varphi$, converge as the size of $D$ tends to infinity. The convergence result is obtained by showing that every formula $\varphi(x_1, \ldots, x_k)$ which contains only "admissible" aggregation functions (e.g. arithmetic and geometric mean, max and min) is asymptotically equivalent to a formula $\psi(x_1, \ldots, x_k)$ without aggregation functions.

Guided by the principles of differential privacy protection the Australian Bureau of Statistics modifies the data summaries from the Australian Census provided through TableBuilder to researchers at approved institutions. This modification algorithm includes the injection of a small degree of artificial noise to every nonzero cell count followed by the suppression of very small cell counts to zero. Researchers working with small area TableBuilder outputs with a high suppression fraction have proposed various algorithmic solutions to reconciling these with less suppressed outputs from larger enclosing areas. Here we propose that a Bayesian, likelihood-based statistical approach in which the perturbation algorithm itself is explicitly represented is well suited to analyses with such randomly perturbed data. Using both real (TableBuilder) and mock datasets representing dwelling classifications in the Perth Greater Capital City Area we demonstrate the feasibility and utility of multi-scale Bayesian reconstruction of modified cell counts in a spatial setting.

The number of modes in a probability density function is representative of the model's complexity and can also be viewed as the number of existing subpopulations. Despite its relevance, little research has been devoted to its estimation. Focusing on the univariate setting, we propose a novel approach targeting prediction accuracy inspired by some overlooked aspects of the problem. We argue for the need for structure in the solutions, the subjective and uncertain nature of modes, and the convenience of a holistic view blending global and local density properties. Our method builds upon a combination of flexible kernel estimators and parsimonious compositional splines. Feature exploration, model selection and mode testing are implemented in the Bayesian inference paradigm, providing soft solutions and allowing to incorporate expert judgement in the process. The usefulness of our proposal is illustrated through a case study in sports analytics, showcasing multiple companion visualisation tools. A thorough simulation study demonstrates that traditional modality-driven approaches paradoxically struggle to provide accurate results. In this context, our method emerges as a top-tier alternative offering innovative solutions for analysts.

Density power divergence (DPD) [Basu et al. (1998), Biometrika], designed to estimate the underlying distribution of the observations robustly, comprises an integral term of the power of the parametric density models to be estimated. While the explicit form of the integral term can be obtained for some specific densities (such as normal density and exponential density), its computational intractability has prohibited the application of DPD-based estimation to more general parametric densities, over a quarter of a century since the proposal of DPD. This study proposes a stochastic optimization approach to minimize DPD for general parametric density models and explains its adequacy by referring to conventional theories on stochastic optimization. The proposed approach also can be applied to the minimization of another density power-based $\gamma$-divergence with the aid of unnormalized models [Kanamori and Fujisawa (2015), Biometrika].

In this paper, we provide a novel framework for the analysis of generalization error of first-order optimization algorithms for statistical learning when the gradient can only be accessed through partial observations given by an oracle. Our analysis relies on the regularity of the gradient w.r.t. the data samples, and allows to derive near matching upper and lower bounds for the generalization error of multiple learning problems, including supervised learning, transfer learning, robust learning, distributed learning and communication efficient learning using gradient quantization. These results hold for smooth and strongly-convex optimization problems, as well as smooth non-convex optimization problems verifying a Polyak-Lojasiewicz assumption. In particular, our upper and lower bounds depend on a novel quantity that extends the notion of conditional standard deviation, and is a measure of the extent to which the gradient can be approximated by having access to the oracle. As a consequence, our analysis provides a precise meaning to the intuition that optimization of the statistical learning objective is as hard as the estimation of its gradient. Finally, we show that, in the case of standard supervised learning, mini-batch gradient descent with increasing batch sizes and a warm start can reach a generalization error that is optimal up to a multiplicative factor, thus motivating the use of this optimization scheme in practical applications.

This paper presents a decentralized algorithm for solving distributed convex optimization problems in dynamic networks with time-varying objectives. The unique feature of the algorithm lies in its ability to accommodate a wide range of communication systems, including previously unsupported ones, by abstractly modeling the information exchange in the network. Specifically, it supports a novel communication protocol based on the "over-the-air" function computation (OTA-C) technology, that is designed for an efficient and truly decentralized implementation of the consensus step of the algorithm. Unlike existing OTA-C protocols, the proposed protocol does not require the knowledge of network graph structure or channel state information, making it particularly suitable for decentralized implementation over ultra-dense wireless networks with time-varying topologies and fading channels. Furthermore, the proposed algorithm synergizes with the "superiorization" methodology, allowing the development of new distributed algorithms with enhanced performance for the intended applications. The theoretical analysis establishes sufficient conditions for almost sure convergence of the algorithm to a common time-invariant solution for all agents, assuming such a solution exists. Our algorithm is applied to a real-world distributed random field estimation problem, showcasing its efficacy in terms of convergence speed, scalability, and spectral efficiency. Furthermore, we present a superiorized version of our algorithm that achieves faster convergence with significantly reduced energy consumption compared to the unsuperiorized algorithm.

We consider the problem of discovering $K$ related Gaussian directed acyclic graphs (DAGs), where the involved graph structures share a consistent causal order and sparse unions of supports. Under the multi-task learning setting, we propose a $l_1/l_2$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models. We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order (or topological order) than separate estimations. Moreover, the joint estimator is able to recover non-identifiable DAGs, by estimating them together with some identifiable DAGs. Lastly, our analysis also shows the consistency of union support recovery of the structures. To allow practical implementation, we design a continuous optimization problem whose optimizer is the same as the joint estimator and can be approximated efficiently by an iterative algorithm. We validate the theoretical analysis and the effectiveness of the joint estimator in experiments.

This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.

北京阿比特科技有限公司