亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Many modern datasets, such as those in ecology and geology, are composed of samples with spatial structure and dependence. With such data violating the usual independent and identically distributed (IID) assumption in machine learning and classical statistics, it is unclear a priori how one should measure the performance and generalization of models. Several authors have empirically investigated cross-validation (CV) methods in this setting, reaching mixed conclusions. We provide a class of unbiased estimation methods for general quadratic errors, correlated Gaussian response, and arbitrary prediction function $g$, for a noise-elevated version of the error. Our approach generalizes the coupled bootstrap (CB) from the normal means problem to general normal data, allowing correlation both within and between the training and test sets. CB relies on creating bootstrap samples that are intelligently decoupled, in the sense of being statistically independent. Specifically, the key to CB lies in generating two independent "views" of our data and using them as stand-ins for the usual independent training and test samples. Beginning with Mallows' $C_p$, we generalize the estimator to develop our generalized $C_p$ estimators (GC). We show at under only a moment condition on $g$, this noise-elevated error estimate converges smoothly to the noiseless error estimate. We show that when Stein's unbiased risk estimator (SURE) applies, GC converges to SURE as in the normal means problem. Further, we use these same tools to analyze CV and provide some theoretical analysis to help understand when CV will provide good estimates of error. Simulations align with our theoretical results, demonstrating the effectiveness of GC and illustrating the behavior of CV methods. Lastly, we apply our estimator to a model selection task on geothermal data in Nevada.

相關內容

Software log analysis can be laborious and time consuming. Time and labeled data are usually lacking in industrial settings. This paper studies unsupervised and time efficient methods for anomaly detection. We study two custom and two established models. The custom models are: an OOV (Out-Of-Vocabulary) detector, which counts the terms in the test data that are not present in the training data, and the Rarity Model (RM), which calculates a rarity score for terms based on their infrequency. The established models are KMeans and Isolation Forest. The models are evaluated on four public datasets (BGL, Thunderbird, Hadoop, HDFS) with three different representation techniques for the log messages (Words, character Trigrams, Parsed events). We used the AUC-ROC metric for the evaluation. The results reveal discrepancies based on the dataset and representation technique. Different configurations are advised based on specific requirements. For speed, the OOV detector with word representation is optimal. For accuracy, the OOV detector combined with trigram representation yields the highest AUC-ROC (0.846). When dealing with unfiltered data where training includes both normal and anomalous instances, the most effective combination is the Isolation Forest with event representation, achieving an AUC-ROC of 0.829.

We assume to be given structural equations over discrete variables inducing a directed acyclic graph, namely, a structural causal model, together with data about its internal nodes. The question we want to answer is how we can compute bounds for partially identifiable counterfactual queries from such an input. We start by giving a map from structural casual models to credal networks. This allows us to compute exact counterfactual bounds via algorithms for credal nets on a subclass of structural causal models. Exact computation is going to be inefficient in general given that, as we show, causal inference is NP-hard even on polytrees. We target then approximate bounds via a causal EM scheme. We evaluate their accuracy by providing credible intervals on the quality of the approximation; we show through a synthetic benchmark that the EM scheme delivers accurate results in a fair number of runs. In the course of the discussion, we also point out what seems to be a neglected limitation to the trending idea that counterfactual bounds can be computed without knowledge of the structural equations. We also present a real case study on palliative care to show how our algorithms can readily be used for practical purposes.

Fractional derivatives are a well-studied generalization of integer order derivatives. Naturally, for optimization, it is of interest to understand the convergence properties of gradient descent using fractional derivatives. Convergence analysis of fractional gradient descent is currently limited both in the methods analyzed and the settings analyzed. This paper aims to fill in these gaps by analyzing variations of fractional gradient descent in smooth and convex, smooth and strongly convex, and smooth and non-convex settings. First, novel bounds will be established bridging fractional and integer derivatives. Then, these bounds will be applied to the aforementioned settings to prove $O(1/T)$ convergence for smooth and convex functions and linear convergence for smooth and strongly convex functions. Additionally, we prove $O(1/T)$ convergence for smooth and non-convex functions using an extended notion of smoothness that is more natural for fractional derivatives. Finally, empirical results will be presented on the potential speed up of fractional gradient descent over standard gradient descent as well as the challenges of predicting which will be faster in general.

We study arrangements of geodesic arcs on a sphere, where all arcs are internally disjoint and each arc has its endpoints located within the interior of other arcs. We establish fundamental results concerning the minimum number of arcs in such arrangements, depending on local geometric constraints such as "one-sidedness" and "k-orientation". En route to these results, we generalize and settle an open problem from CCCG 2022, proving that any such arrangement has at least two "clockwise swirls" and at least two "counterclockwise swirls".

The introduction of large public legal datasets has brought about a renaissance in legal NLP. Many of these datasets are comprised of legal judgements - the product of judges deciding cases. This fact, together with the way machine learning works, means that several legal NLP models are models of judges. While some have argued for the automation of judges, in this position piece, we argue that automating the role of the judge raises difficult ethical challenges, in particular for common law legal systems. Our argument follows from the social role of the judge in actively shaping the law, rather than merely applying it. Since current NLP models come nowhere close to having the facilities necessary for this task, they should not be used to automate judges. Furthermore, even in the case the models could achieve human-level capabilities, there would still be remaining ethical concerns inherent in the automation of the legal process.

Transformer, an attention-based encoder-decoder architecture, has revolutionized the field of natural language processing. Inspired by this significant achievement, some pioneering works have recently been done on adapting Transformerliked architectures to Computer Vision (CV) fields, which have demonstrated their effectiveness on various CV tasks. Relying on competitive modeling capability, visual Transformers have achieved impressive performance on multiple benchmarks such as ImageNet, COCO, and ADE20k as compared with modern Convolution Neural Networks (CNN). In this paper, we have provided a comprehensive review of over one hundred different visual Transformers for three fundamental CV tasks (classification, detection, and segmentation), where a taxonomy is proposed to organize these methods according to their motivations, structures, and usage scenarios. Because of the differences in training settings and oriented tasks, we have also evaluated these methods on different configurations for easy and intuitive comparison instead of only various benchmarks. Furthermore, we have revealed a series of essential but unexploited aspects that may empower Transformer to stand out from numerous architectures, e.g., slack high-level semantic embeddings to bridge the gap between visual and sequential Transformers. Finally, three promising future research directions are suggested for further investment.

Human-in-the-loop aims to train an accurate prediction model with minimum cost by integrating human knowledge and experience. Humans can provide training data for machine learning applications and directly accomplish some tasks that are hard for computers in the pipeline with the help of machine-based approaches. In this paper, we survey existing works on human-in-the-loop from a data perspective and classify them into three categories with a progressive relationship: (1) the work of improving model performance from data processing, (2) the work of improving model performance through interventional model training, and (3) the design of the system independent human-in-the-loop. Using the above categorization, we summarize major approaches in the field, along with their technical strengths/ weaknesses, we have simple classification and discussion in natural language processing, computer vision, and others. Besides, we provide some open challenges and opportunities. This survey intends to provide a high-level summarization for human-in-the-loop and motivates interested readers to consider approaches for designing effective human-in-the-loop solutions.

We consider the problem of explaining the predictions of graph neural networks (GNNs), which otherwise are considered as black boxes. Existing methods invariably focus on explaining the importance of graph nodes or edges but ignore the substructures of graphs, which are more intuitive and human-intelligible. In this work, we propose a novel method, known as SubgraphX, to explain GNNs by identifying important subgraphs. Given a trained GNN model and an input graph, our SubgraphX explains its predictions by efficiently exploring different subgraphs with Monte Carlo tree search. To make the tree search more effective, we propose to use Shapley values as a measure of subgraph importance, which can also capture the interactions among different subgraphs. To expedite computations, we propose efficient approximation schemes to compute Shapley values for graph data. Our work represents the first attempt to explain GNNs via identifying subgraphs explicitly and directly. Experimental results show that our SubgraphX achieves significantly improved explanations, while keeping computations at a reasonable level.

Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.

We describe the new field of mathematical analysis of deep learning. This field emerged around a list of research questions that were not answered within the classical framework of learning theory. These questions concern: the outstanding generalization power of overparametrized neural networks, the role of depth in deep architectures, the apparent absence of the curse of dimensionality, the surprisingly successful optimization performance despite the non-convexity of the problem, understanding what features are learned, why deep architectures perform exceptionally well in physical problems, and which fine aspects of an architecture affect the behavior of a learning task in which way. We present an overview of modern approaches that yield partial answers to these questions. For selected approaches, we describe the main ideas in more detail.

北京阿比特科技有限公司