亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We show how any PAC learning algorithm that works under the uniform distribution can be transformed, in a blackbox fashion, into one that works under an arbitrary and unknown distribution $\mathcal{D}$. The efficiency of our transformation scales with the inherent complexity of $\mathcal{D}$, running in $\mathrm{poly}(n, (md)^d)$ time for distributions over $\{\pm 1\}^n$ whose pmfs are computed by depth-$d$ decision trees, where $m$ is the sample complexity of the original algorithm. For monotone distributions our transformation uses only samples from $\mathcal{D}$, and for general ones it uses subcube conditioning samples. A key technical ingredient is an algorithm which, given the aforementioned access to $\mathcal{D}$, produces an optimal decision tree decomposition of $\mathcal{D}$: an approximation of $\mathcal{D}$ as a mixture of uniform distributions over disjoint subcubes. With this decomposition in hand, we run the uniform-distribution learner on each subcube and combine the hypotheses using the decision tree. This algorithmic decomposition lemma also yields new algorithms for learning decision tree distributions with runtimes that exponentially improve on the prior state of the art -- results of independent interest in distribution learning.

相關內容

決(jue)策樹(shu)(Decision Tree)是(shi)(shi)在(zai)已知各種(zhong)(zhong)情況(kuang)發(fa)生概(gai)(gai)率的基(ji)礎上,通(tong)過構成決(jue)策樹(shu)來求取(qu)凈現值的期望值大于(yu)等于(yu)零(ling)的概(gai)(gai)率,評價項目風險,判斷(duan)其可行性(xing)的決(jue)策分(fen)析方法(fa),是(shi)(shi)直觀運用概(gai)(gai)率分(fen)析的一(yi)(yi)(yi)(yi)(yi)種(zhong)(zhong)圖解法(fa)。由于(yu)這(zhe)種(zhong)(zhong)決(jue)策分(fen)支畫成圖形(xing)很像(xiang)一(yi)(yi)(yi)(yi)(yi)棵樹(shu)的枝干,故稱決(jue)策樹(shu)。在(zai)機器(qi)(qi)學(xue)(xue)習(xi)中(zhong),決(jue)策樹(shu)是(shi)(shi)一(yi)(yi)(yi)(yi)(yi)個(ge)(ge)預測(ce)模型,他代表(biao)的是(shi)(shi)對(dui)象(xiang)屬(shu)(shu)性(xing)與對(dui)象(xiang)值之(zhi)間的一(yi)(yi)(yi)(yi)(yi)種(zhong)(zhong)映(ying)射關系。Entropy = 系統的凌亂程度,使用算法(fa)ID3, C4.5和(he)C5.0生成樹(shu)算法(fa)使用熵。這(zhe)一(yi)(yi)(yi)(yi)(yi)度量是(shi)(shi)基(ji)于(yu)信息學(xue)(xue)理論中(zhong)熵的概(gai)(gai)念。 決(jue)策樹(shu)是(shi)(shi)一(yi)(yi)(yi)(yi)(yi)種(zhong)(zhong)樹(shu)形(xing)結構,其中(zhong)每(mei)個(ge)(ge)內部節(jie)(jie)點表(biao)示一(yi)(yi)(yi)(yi)(yi)個(ge)(ge)屬(shu)(shu)性(xing)上的測(ce)試,每(mei)個(ge)(ge)分(fen)支代表(biao)一(yi)(yi)(yi)(yi)(yi)個(ge)(ge)測(ce)試輸出(chu)(chu),每(mei)個(ge)(ge)葉節(jie)(jie)點代表(biao)一(yi)(yi)(yi)(yi)(yi)種(zhong)(zhong)類(lei)別。 分(fen)類(lei)樹(shu)(決(jue)策樹(shu))是(shi)(shi)一(yi)(yi)(yi)(yi)(yi)種(zhong)(zhong)十分(fen)常用的分(fen)類(lei)方法(fa)。他是(shi)(shi)一(yi)(yi)(yi)(yi)(yi)種(zhong)(zhong)監(jian)管學(xue)(xue)習(xi),所謂監(jian)管學(xue)(xue)習(xi)就是(shi)(shi)給定一(yi)(yi)(yi)(yi)(yi)堆樣本,每(mei)個(ge)(ge)樣本都有一(yi)(yi)(yi)(yi)(yi)組(zu)屬(shu)(shu)性(xing)和(he)一(yi)(yi)(yi)(yi)(yi)個(ge)(ge)類(lei)別,這(zhe)些(xie)類(lei)別是(shi)(shi)事先確定的,那么通(tong)過學(xue)(xue)習(xi)得到一(yi)(yi)(yi)(yi)(yi)個(ge)(ge)分(fen)類(lei)器(qi)(qi),這(zhe)個(ge)(ge)分(fen)類(lei)器(qi)(qi)能夠對(dui)新出(chu)(chu)現的對(dui)象(xiang)給出(chu)(chu)正確的分(fen)類(lei)。這(zhe)樣的機器(qi)(qi)學(xue)(xue)習(xi)就被稱之(zhi)為監(jian)督(du)學(xue)(xue)習(xi)。

知識薈萃

精(jing)品入門和進階教程(cheng)、論文和代碼整理等

更多

查(cha)看相(xiang)關(guan)VIP內容、論文、資(zi)訊等

The vanilla image completion approaches are sensitive to the large missing regions due to limited available reference information for plausible generation. To mitigate this, existing methods incorporate the extra cue as a guidance for image completion. Despite improvements, these approaches are often restricted to employing a single modality (e.g., segmentation or sketch maps), which lacks scalability in leveraging multi-modality for more plausible completion. In this paper, we propose a novel, simple yet effective method for Multi-modal Guided Image Completion, dubbed MaGIC, which not only supports a wide range of single modality as the guidance (e.g., text, canny edge, sketch, segmentation, reference image, depth, and pose), but also adapts to arbitrarily customized combination of these modalities (i.e., arbitrary multi-modality) for image completion. For building MaGIC, we first introduce a modality-specific conditional U-Net (MCU-Net) that injects single-modal signal into a U-Net denoiser for single-modal guided image completion. Then, we devise a consistent modality blending (CMB) method to leverage modality signals encoded in multiple learned MCU-Nets through gradient guidance in latent space. Our CMB is training-free, and hence avoids the cumbersome joint re-training of different modalities, which is the secret of MaGIC to achieve exceptional flexibility in accommodating new modalities for completion. Experiments show the superiority of MaGIC over state-of-arts and its generalization to various completion tasks including in/out-painting and local editing. Our project with code and models is available at yeates.github.io/MaGIC-Page/.

Random Search is one of the most widely-used method for Hyperparameter Optimization, and is critical to the success of deep learning models. Despite its astonishing performance, little non-heuristic theory has been developed to describe the underlying working mechanism. This paper gives a theoretical accounting of Random Search. We introduce the concept of \emph{scattering dimension} that describes the landscape of the underlying function, and quantifies the performance of random search. We show that, when the environment is noise-free, the output of random search converges to the optimal value in probability at rate $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s} } \right) $, where $ d_s \ge 0 $ is the scattering dimension of the underlying function. When the observed function values are corrupted by bounded $iid$ noise, the output of random search converges to the optimal value in probability at rate $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s + 1} } \right) $. In addition, based on the principles of random search, we introduce an algorithm, called BLiN-MOS, for Lipschitz bandits in doubling metric spaces that are also emdowed with a Borel measure, and show that BLiN-MOS achieves a regret rate of order $ \widetilde{\mathcal{O}} \left( T^{ \frac{d_z}{d_z + 1} } \right) $, where $d_z$ is the zooming dimension of the problem instance. Our results show that in metric spaces with a Borel measure, the classic theory of Lipschitz bandits can be improved. This result suggests an intrinsic axiomatic gap between metric spaces and metric measure spaces from an algorithmic perspective, since the upper bound in a metric measure space breaks the known information-theoretical lower bounds for Lipschitz bandits in a metric space with no measure structure.

For effective decision support in scenarios with conflicting objectives, sets of potentially optimal solutions can be presented to the decision maker. We explore both what policies these sets should contain and how such sets can be computed efficiently. With this in mind, we take a distributional approach and introduce a novel dominance criterion relating return distributions of policies directly. Based on this criterion, we present the distributional undominated set and show that it contains optimal policies otherwise ignored by the Pareto front. In addition, we propose the convex distributional undominated set and prove that it comprises all policies that maximise expected utility for multivariate risk-averse decision makers. We propose a novel algorithm to learn the distributional undominated set and further contribute pruning operators to reduce the set to the convex distributional undominated set. Through experiments, we demonstrate the feasibility and effectiveness of these methods, making this a valuable new approach for decision support in real-world problems.

The study of robustness has received much attention due to its inevitability in data-driven settings where many systems face uncertainty. One such example of concern is Bayesian Optimization (BO), where uncertainty is multi-faceted, yet there only exists a limited number of works dedicated to this direction. In particular, there is the work of Kirschner et al. (2020), which bridges the existing literature of Distributionally Robust Optimization (DRO) by casting the BO problem from the lens of DRO. While this work is pioneering, it admittedly suffers from various practical shortcomings such as finite contexts assumptions, leaving behind the main question Can one devise a computationally tractable algorithm for solving this DRO-BO problem? In this work, we tackle this question to a large degree of generality by considering robustness against data-shift in $\phi$-divergences, which subsumes many popular choices, such as the $\chi^2$-divergence, Total Variation, and the extant Kullback-Leibler (KL) divergence. We show that the DRO-BO problem in this setting is equivalent to a finite-dimensional optimization problem which, even in the continuous context setting, can be easily implemented with provable sublinear regret bounds. We then show experimentally that our method surpasses existing methods, attesting to the theoretical results.

We study the parametric online changepoint detection problem, where the underlying distribution of the streaming data changes from a known distribution to an alternative that is of a known parametric form but with unknown parameters. We propose a joint detection/estimation scheme, which we call Window-Limited CUSUM, that combines the cumulative sum (CUSUM) test with a sliding window-based consistent estimate of the post-change parameters. We characterize the optimal choice of the window size and show that the Window-Limited CUSUM enjoys first-order asymptotic optimality as average run length approaches infinity under the optimal choice of window length. Compared to existing schemes with similar asymptotic optimality properties, our test can be much faster computed because it can recursively update the CUSUM statistic by employing the estimate of the post-change parameters. A parallel variant is also proposed that facilitates the practical implementation of the test. Numerical simulations corroborate our theoretical findings.

Sparse joint shift (SJS) was recently proposed as a tractable model for general dataset shift which may cause changes to the marginal distributions of features and labels as well as the posterior probabilities and the class-conditional feature distributions. Fitting SJS for a target dataset without label observations may produce valid predictions of labels and estimates of class prior probabilities. We present new results on the transmission of SJS from sets of features to larger sets of features, a conditional correction formula for the class posterior probabilities under the target distribution, identifiability of SJS, and the relationship between SJS and covariate shift. In addition, we point out inconsistencies in the algorithms which were proposed for estimating the characteristics of SJS, as they could hamper the search for optimal solutions.

Moment restrictions and their conditional counterparts emerge in many areas of machine learning and statistics ranging from causal inference to reinforcement learning. Estimators for these tasks, generally called methods of moments, include the prominent generalized method of moments (GMM) which has recently gained attention in causal inference. GMM is a special case of the broader family of empirical likelihood estimators which are based on approximating a population distribution by means of minimizing a $\varphi$-divergence to an empirical distribution. However, the use of $\varphi$-divergences effectively limits the candidate distributions to reweightings of the data samples. We lift this long-standing limitation and provide a method of moments that goes beyond data reweighting. This is achieved by defining an empirical likelihood estimator based on maximum mean discrepancy which we term the kernel method of moments (KMM). We provide a variant of our estimator for conditional moment restrictions and show that it is asymptotically first-order optimal for such problems. Finally, we show that our method achieves competitive performance on several conditional moment restriction tasks.

Collaborative Filtering (CF) models, despite their great success, suffer from severe performance drops due to popularity distribution shifts, where these changes are ubiquitous and inevitable in real-world scenarios. Unfortunately, most leading popularity debiasing strategies, rather than tackling the vulnerability of CF models to varying popularity distributions, require prior knowledge of the test distribution to identify the degree of bias and further learn the popularity-entangled representations to mitigate the bias. Consequently, these models result in significant performance benefits in the target test set, while dramatically deviating the recommendation from users' true interests without knowing the popularity distribution in advance. In this work, we propose a novel learning framework, Invariant Collaborative Filtering (InvCF), to discover disentangled representations that faithfully reveal the latent preference and popularity semantics without making any assumption about the popularity distribution. At its core is the distillation of unbiased preference representations (i.e., user preference on item property), which are invariant to the change of popularity semantics, while filtering out the popularity feature that is unstable or outdated. Extensive experiments on five benchmark datasets and four evaluation settings (i.e., synthetic long-tail, unbiased, temporal split, and out-of-distribution evaluations) demonstrate that InvCF outperforms the state-of-the-art baselines in terms of popularity generalization ability on real recommendations. Visualization studies shed light on the advantages of InvCF for disentangled representation learning. Our codes are available at //github.com/anzhang314/InvCF.

The rank invariant (RI), one of the best known invariants of persistence modules $M$ over a given poset P, is defined as the map sending each comparable pair $p\leq q$ in P to the rank of the linear map $M(p\leq q)$. The recently introduced notion of generalized rank invariant (GRI) acquires more discriminating power than the RI at the expense of enlarging the domain of RI to the set Int(P) of intervals of P or to an even larger set. Given that the size of Int(P) can be much larger than that of the domain of the RI, restricting the domain of the GRI to smaller, more manageable subcollections $\mathcal{I}$ of Int(P) would be desirable to reduce the total cost of computing the GRI. This work studies the tension which exists between computational efficiency and strength when restricting the domain of the GRI to different choices of $\mathcal{I}$. In particular, we prove that the discriminating power of the GRI over restricted collections $\mathcal{I}$ strictly increases as $\mathcal{I}$ interpolates between the domain of RI and Int(P). Along the way, some well-known results regarding the RI or GRI from the literature are contextualized within the framework of the M\"obius inversion formula and we obtain a notion of generalize persistence diagram that does not require local finiteness of the indexing poset for persistence modules. Lastly, motivated by a recent finding that zigzag persistence can be used to compute the GRI, we pay a special attention to comparing the discriminating power of the GRI for persistence modules $M$ over $\mathbb{Z}^2$ with the so-called Zigzag-path-Indexed Barcode (ZIB), a map sending each zigzag path $\Gamma$ in $\mathbb{Z}^2$ to the barcode of the restriction of $M$ to $\Gamma$. Clarifying the connection between the GRI and the ZIB is potentially important to understand to what extent zigzag persistence algorithms can be exploited for computing the GRI.

Out-of-distribution (OOD) detection is critical to ensuring the reliability and safety of machine learning systems. For instance, in autonomous driving, we would like the driving system to issue an alert and hand over the control to humans when it detects unusual scenes or objects that it has never seen before and cannot make a safe decision. This problem first emerged in 2017 and since then has received increasing attention from the research community, leading to a plethora of methods developed, ranging from classification-based to density-based to distance-based ones. Meanwhile, several other problems are closely related to OOD detection in terms of motivation and methodology. These include anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD). Despite having different definitions and problem settings, these problems often confuse readers and practitioners, and as a result, some existing studies misuse terms. In this survey, we first present a generic framework called generalized OOD detection, which encompasses the five aforementioned problems, i.e., AD, ND, OSR, OOD detection, and OD. Under our framework, these five problems can be seen as special cases or sub-tasks, and are easier to distinguish. Then, we conduct a thorough review of each of the five areas by summarizing their recent technical developments. We conclude this survey with open challenges and potential research directions.

北京阿比特科技有限公司