亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The goal of inductive logic programming (ILP) is to search for a logic program that generalises training examples and background knowledge. We introduce an ILP approach that identifies minimal unsatisfiable subprograms (MUSPs). We show that finding MUSPs allows us to efficiently and soundly prune the search space. Our experiments on multiple domains, including program synthesis and game playing, show that our approach can reduce learning times by 99%.

相關內容

歸納邏輯程序設計(ILP)是機器學習的一個分支,它依賴于邏輯程序作為一種統一的表示語言來表達例子、背景知識和假設。基于一階邏輯的ILP具有很強的表示形式,為多關系學習和數據挖掘提供了一種很好的方法。International Conference on Inductive Logic Programming系列始于1991年,是學習結構化或半結構化關系數據的首要國際論壇。最初專注于邏輯程序的歸納,多年來,它大大擴展了研究范圍,并歡迎在邏輯學習、多關系數據挖掘、統計關系學習、圖形和樹挖掘等各個方面作出貢獻,學習其他(非命題)基于邏輯的知識表示框架,探索統計學習和其他概率方法的交叉點。官網鏈接: · 表示 · 優化器 · 損失函數(機器學習) · 知識 (knowledge) ·
2024 年 3 月 11 日

We propose a game-based formulation for learning dimensionality-reducing representations of feature vectors, when only a prior knowledge on future prediction tasks is available. In this game, the first player chooses a representation, and then the second player adversarially chooses a prediction task from a given class, representing the prior knowledge. The first player aims is to minimize, and the second player to maximize, the regret: The minimal prediction loss using the representation, compared to the same loss using the original features. For the canonical setting in which the representation, the response to predict and the predictors are all linear functions, and under the mean squared error loss function, we derive the theoretically optimal representation in pure strategies, which shows the effectiveness of the prior knowledge, and the optimal regret in mixed strategies, which shows the usefulness of randomizing the representation. For general representations and loss functions, we propose an efficient algorithm to optimize a randomized representation. The algorithm only requires the gradients of the loss function, and is based on incrementally adding a representation rule to a mixture of such rules.

High-performing out-of-distribution (OOD) detection, both anomaly and novel class, is an important prerequisite for the practical use of classification models. In this paper, we focus on the species recognition task in images concerned with large databases, a large number of fine-grained hierarchical classes, severe class imbalance, and varying image quality. We propose a framework for combining individual OOD measures into one combined OOD (COOD) measure using a supervised model. The individual measures are several existing state-of-the-art measures and several novel OOD measures developed with novel class detection and hierarchical class structure in mind. COOD was extensively evaluated on three large-scale (500k+ images) biodiversity datasets in the context of anomaly and novel class detection. We show that COOD outperforms individual, including state-of-the-art, OOD measures by a large margin in terms of TPR@1% FPR in the majority of experiments, e.g., improving detecting ImageNet images (OOD) from 54.3% to 85.4% for the iNaturalist 2018 dataset. SHAP (feature contribution) analysis shows that different individual OOD measures are essential for various tasks, indicating that multiple OOD measures and combinations are needed to generalize. Additionally, we show that explicitly considering ID images that are incorrectly classified for the original (species) recognition task is important for constructing high-performing OOD detection methods and for practical applicability. The framework can easily be extended or adapted to other tasks and media modalities.

Maximum entropy (Maxent) models are a class of statistical models that use the maximum entropy principle to estimate probability distributions from data. Due to the size of modern data sets, Maxent models need efficient optimization algorithms to scale well for big data applications. State-of-the-art algorithms for Maxent models, however, were not originally designed to handle big data sets; these algorithms either rely on technical devices that may yield unreliable numerical results, scale poorly, or require smoothness assumptions that many practical Maxent models lack. In this paper, we present novel optimization algorithms that overcome the shortcomings of state-of-the-art algorithms for training large-scale, non-smooth Maxent models. Our proposed first-order algorithms leverage the Kullback-Leibler divergence to train large-scale and non-smooth Maxent models efficiently. For Maxent models with discrete probability distribution of $n$ elements built from samples, each containing $m$ features, the stepsize parameters estimation and iterations in our algorithms scale on the order of $O(mn)$ operations and can be trivially parallelized. Moreover, the strong $\ell_{1}$ convexity of the Kullback--Leibler divergence allows for larger stepsize parameters, thereby speeding up the convergence rate of our algorithms. To illustrate the efficiency of our novel algorithms, we consider the problem of estimating probabilities of fire occurrences as a function of ecological features in the Western US MTBS-Interagency wildfire data set. Our numerical results show that our algorithms outperform the state of the arts by one order of magnitude and yield results that agree with physical models of wildfire occurrence and previous statistical analyses of wildfire drivers.

In decision-making, maxitive functions are used for worst-case and best-case evaluations. Maxitivity gives rise to a rich structure that is well-studied in the context of the pointwise order. In this article, we investigate maxitivity with respect to general preorders and provide a representation theorem for such functionals. The results are illustrated for different stochastic orders in the literature, including the usual stochastic order, the increasing convex/concave order, and the dispersive order.

Active learning can improve the efficiency of training prediction models by identifying the most informative new labels to acquire. However, non-response to label requests can impact active learning's effectiveness in real-world contexts. We conceptualise this degradation by considering the type of non-response present in the data, demonstrating that biased non-response is particularly detrimental to model performance. We argue that biased non-response is likely in contexts where the labelling process, by nature, relies on user interactions. To mitigate the impact of biased non-response, we propose a cost-based correction to the sampling strategy--the Upper Confidence Bound of the Expected Utility (UCB-EU)--that can, plausibly, be applied to any active learning algorithm. Through experiments, we demonstrate that our method successfully reduces the harm from labelling non-response in many settings. However, we also characterise settings where the non-response bias in the annotations remains detrimental under UCB-EU for specific sampling methods and data generating processes. Finally, we evaluate our method on a real-world dataset from an e-commerce platform. We show that UCB-EU yields substantial performance improvements to conversion models that are trained on clicked impressions. Most generally, this research serves to both better conceptualise the interplay between types of non-response and model improvements via active learning, and to provide a practical, easy-to-implement correction that mitigates model degradation.

This article is concerned with the multilevel Monte Carlo (MLMC) methods for approximating expectations of some functions of the solution to the Heston 3/2-model from mathematical finance, which takes values in $(0, \infty)$ and possesses superlinearly growing drift and diffusion coefficients. To discretize the SDE model, a new Milstein-type scheme is proposed to produce independent sample paths. The proposed scheme can be explicitly solved and is positivity-preserving unconditionally, i.e., for any time step-size $h>0$. This positivity-preserving property for large discretization time steps is particularly desirable in the MLMC setting. Furthermore, a mean-square convergence rate of order one is proved in the non-globally Lipschitz regime, which is not trivial, as the diffusion coefficient grows super-linearly. The obtained order-one convergence in turn promises the desired relevant variance of the multilevel estimator and justifies the optimal complexity $\mathcal{O}(\epsilon^{-2})$ for the MLMC approach, where $\epsilon > 0$ is the required target accuracy. Numerical experiments are finally reported to confirm the theoretical findings.

The purpose of this study is to introduce a new approach to feature ranking for classification tasks, called in what follows greedy feature selection. In statistical learning, feature selection is usually realized by means of methods that are independent of the classifier applied to perform the prediction using that reduced number of features. Instead, greedy feature selection identifies the most important feature at each step and according to the selected classifier. In the paper, the benefits of such scheme are investigated theoretically in terms of model capacity indicators, such as the Vapnik-Chervonenkis (VC) dimension or the kernel alignment, and tested numerically by considering its application to the problem of predicting geo-effective manifestations of the active Sun.

Language models (LMs) show promise as tools for communicating science to the general public by simplifying and summarizing complex language. Because models can be prompted to generate text for a specific audience (e.g., college-educated adults), LMs might be used to create multiple versions of plain language summaries for people with different familiarities of scientific topics. However, it is not clear what the benefits and pitfalls of adaptive plain language are. When is simplifying necessary, what are the costs in doing so, and do these costs differ for readers with different background knowledge? Through three within-subjects studies in which we surface summaries for different envisioned audiences to participants of different backgrounds, we found that while simpler text led to the best reading experience for readers with little to no familiarity in a topic, high familiarity readers tended to ignore certain details in overly plain summaries (e.g., study limitations). Our work provides methods and guidance on ways of adapting plain language summaries beyond the single "general" audience.

One of the main challenges for interpreting black-box models is the ability to uniquely decompose square-integrable functions of non-independent random inputs into a sum of functions of every possible subset of variables. However, dealing with dependencies among inputs can be complicated. We propose a novel framework to study this problem, linking three domains of mathematics: probability theory, functional analysis, and combinatorics. We show that, under two reasonable assumptions on the inputs (non-perfect functional dependence and non-degenerate stochastic dependence), it is always possible to decompose such a function uniquely. This generalizes the well-known Hoeffding decomposition. The elements of this decomposition can be expressed using oblique projections and allow for novel interpretability indices for evaluation and variance decomposition purposes. The properties of these novel indices are studied and discussed. This generalization offers a path towards a more precise uncertainty quantification, which can benefit sensitivity analysis and interpretability studies whenever the inputs are dependent. This decomposition is illustrated analytically, and the challenges for adopting these results in practice are discussed.

We present a new methodology for decomposing flows with multiple transports that further extends the shifted proper orthogonal decomposition (sPOD). The sPOD tries to approximate transport-dominated flows by a sum of co-moving data fields. The proposed methods stem from sPOD but optimize the co-moving fields directly and penalize their nuclear norm to promote low rank of the individual data in the decomposition. Furthermore, we add a robustness term to the decomposition that can deal with interpolation error and data noises. Leveraging tools from convex optimization, we derive three proximal algorithms to solve the decomposition problem. We report a numerical comparison with existing methods against synthetic data benchmarks and then show the separation ability of our methods on 1D and 2D incompressible and reactive flows. The resulting methodology is the basis of a new analysis paradigm that results in the same interpretability as the POD for the individual co-moving fields.

北京阿比特科技有限公司