亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Ensemble methods such as bagging and random forests are ubiquitous in various fields, from finance to genomics. Despite their prevalence, the question of the efficient tuning of ensemble parameters has received relatively little attention. This paper introduces a cross-validation method, ECV (Extrapolated Cross-Validation), for tuning the ensemble and subsample sizes in randomized ensembles. Our method builds on two primary ingredients: initial estimators for small ensemble sizes using out-of-bag errors and a novel risk extrapolation technique that leverages the structure of prediction risk decomposition. By establishing uniform consistency of our risk extrapolation technique over ensemble and subsample sizes, we show that ECV yields $\delta$-optimal (with respect to the oracle-tuned risk) ensembles for squared prediction risk. Our theory accommodates general ensemble predictors, only requires mild moment assumptions, and allows for high-dimensional regimes where the feature dimension grows with the sample size. As a practical case study, we employ ECV to predict surface protein abundances from gene expressions in single-cell multiomics using random forests. In comparison to sample-split cross-validation and $K$-fold cross-validation, ECV achieves higher accuracy avoiding sample splitting. At the same time, its computational cost is considerably lower owing to the use of the risk extrapolation technique. Additional numerical results validate the finite-sample accuracy of ECV for several common ensemble predictors under a computational constraint on the maximum ensemble size.

相關內容

We present a rigorous and precise analysis of the maximum degree and the average degree in a dynamic duplication-divergence graph model introduced by Sol\'e, Pastor-Satorras et al. in which the graph grows according to a duplication-divergence mechanism, i.e. by iteratively creating a copy of some node and then randomly alternating the neighborhood of a new node with probability $p$. This model captures the growth of some real-world processes e.g. biological or social networks. In this paper, we prove that for some $0 < p < 1$ the maximum degree and the average degree of a duplication-divergence graph on $t$ vertices are asymptotically concentrated with high probability around $t^p$ and $\max\{t^{2 p - 1}, 1\}$, respectively, i.e. they are within at most a polylogarithmic factor from these values with probability at least $1 - t^{-A}$ for any constant $A > 0$.

The main computational cost per iteration of adaptive cubic regularization methods for solving large-scale nonconvex problems is the computation of the step $s_k$, which requires an approximate minimizer of the cubic model. We propose a new approach in which this minimizer is sought in a low dimensional subspace that, in contrast to classical approaches, is reused for a number of iterations. A regularized Newton step to correct $s_k$ is also incorporated whenever needed. We show that our method increases efficiency while preserving the worst-case complexity of classical cubic regularized methods. We also explore the use of rational Krylov subspaces for the subspace minimization, to overcome some of the issues encountered when using polynomial Krylov subspaces. We provide several experimental results illustrating the gains of the new approach when compared to classic implementations.

This article revisits the fundamental problem of parameter selection for Gaussian process interpolation. By choosing the mean and the covariance functions of a Gaussian process within parametric families, the user obtains a family of Bayesian procedures to perform predictions about the unknown function, and must choose a member of the family that will hopefully provide good predictive performances. We base our study on the general concept of scoring rules, which provides an effective framework for building leave-one-out selection and validation criteria, and a notion of extended likelihood criteria based on an idea proposed by Fasshauer and co-authors in 2009, which makes it possible to recover standard selection criteria such as, for instance, the generalized cross-validation criterion. Under this setting, we empirically show on several test problems of the literature that the choice of an appropriate family of models is often more important than the choice of a particular selection criterion (e.g., the likelihood versus a leave-one-out selection criterion). Moreover, our numerical results show that the regularity parameter of a Mat{\'e}rn covariance can be selected effectively by most selection criteria.

This article study the average conditioning for a random underdetermined polynomial system. The expected value of the moments of the condition number are compared to the moments of the condition number of random matrices. An expression for these moments is given by studying the kernel finding problem for random matrices. Furthermore, the second moment of the Frobenius condition number is computed.

Consider minimizing the entropy of a mixture of states by choosing each state subject to constraints. If the spectrum of each state is fixed, we expect that in order to reduce the entropy of the mixture, we should make the states less distinguishable in some sense. Here, we study a class of optimization problems that are inspired by this situation and shed light on the relevant notions of distinguishability. The motivation for our study is the recently introduced spin alignment conjecture. In the original version of the underlying problem, each state in the mixture is constrained to be a freely chosen state on a subset of $n$ qubits tensored with a fixed state $Q$ on each of the qubits in the complement. According to the conjecture, the entropy of the mixture is minimized by choosing the freely chosen state in each term to be a tensor product of projectors onto a fixed maximal eigenvector of $Q$, which maximally "aligns" the terms in the mixture. We generalize this problem in several ways. First, instead of minimizing entropy, we consider maximizing arbitrary unitarily invariant convex functions such as Fan norms and Schatten norms. To formalize and generalize the conjectured required alignment, we define alignment as a preorder on tuples of self-adjoint operators that is induced by majorization. We prove the generalized conjecture for Schatten norms of integer order, for the case where the freely chosen states are constrained to be classical, and for the case where only two states contribute to the mixture and $Q$ is proportional to a projector. The last case fits into a more general situation where we give explicit conditions for maximal alignment. The spin alignment problem has a natural "dual" formulation, versions of which have further generalizations that we introduce.

We study Lindstrom quantifiers that satisfy certain closure properties which are motivated by the study of polymorphisms in the context of constraint satisfaction problems (CSP). When the algebra of polymorphisms of a finite structure B satisfies certain equations, this gives rise to a natural closure condition on the class of structures that map homomorphically to B. The collection of quantifiers that satisfy closure conditions arising from a fixed set of equations are rather more general than those arising as CSP. For any such conditions P, we define a pebble game that delimits the distinguishing power of the infinitary logic with all quantifiers that are P-closed. We use the pebble game to show that the problem of deciding whether a system of linear equations is solvable in Z2 is not expressible in the infinitary logic with all quantifiers closed under a near-unanimity condition.

Aiming for a mixbiotic society that combines freedom and solidarity among people with diverse values, I focused on nonviolent communication (NVC) that enables compassionate giving in various situations of social division and conflict, and tried a generative AI for it. Specifically, ChatGPT was used in place of the traditional certified trainer to test the possibility of mediating (modifying) input sentences in four processes: observation, feelings, needs, and requests. The results indicate that there is potential for the application of generative AI, although not yet at a practical level. Suggested improvement guidelines included adding model responses, relearning revised responses, specifying appropriate terminology for each process, and re-asking for required information. The use of generative AI will be useful initially to assist certified trainers, to prepare for and review events and workshops, and in the future to support consensus building and cooperative behavior in digital democracy, platform cooperatives, and cyber-human social co-operating systems. It is hoped that the widespread use of NVC mediation using generative AI will lead to the early realization of a mixbiotic society.

V. Levenshtein first proposed the sequence reconstruction problem in 2001. This problem studies the model where the same sequence from some set is transmitted over multiple channels, and the decoder receives the different outputs. Assume that the transmitted sequence is at distance $d$ from some code and there are at most $r$ errors in every channel. Then the sequence reconstruction problem is to find the minimum number of channels required to recover exactly the transmitted sequence that has to be greater than the maximum intersection between two metric balls of radius $r$, where the distance between their centers is at least $d$. In this paper, we study the sequence reconstruction problem of permutations under the Hamming distance. In this model we define a Cayley graph over the symmetric group, study its properties and find the exact value of the largest intersection of its two metric balls for $d=2r$. Moreover, we give a lower bound on the largest intersection of two metric balls for $d=2r-1$.

In this paper, we consider the problem of parameter estimating for a family of exponential distributions. We develop the improved estimation method, which generalized the James--Stein approach for a wide class of distributions. The proposed estimator dominates the classical maximum likelihood estimator under the quadratic risk. The estimating procedure is applied to special cases of distributions. The numerical simulations results are given.

Occupancy models are frequently used by ecologists to quantify spatial variation in species distributions while accounting for observational biases in the collection of detection-nondetection data. However, the common assumption that a single set of regression coefficients can adequately explain species-environment relationships is often unrealistic, especially across large spatial domains. Here we develop single-species (i.e., univariate) and multi-species (i.e., multivariate) spatially-varying coefficient (SVC) occupancy models to account for spatially-varying species-environment relationships. We employ Nearest Neighbor Gaussian Processes and Polya-Gamma data augmentation in a hierarchical Bayesian framework to yield computationally efficient Gibbs samplers, which we implement in the spOccupancy R package. For multi-species models, we use spatial factor dimension reduction to efficiently model datasets with large numbers of species (e.g., > 10). The hierarchical Bayesian framework readily enables generation of posterior predictive maps of the SVCs, with fully propagated uncertainty. We apply our SVC models to quantify spatial variability in the relationships between maximum breeding season temperature and occurrence probability of 21 grassland bird species across the U.S. Jointly modeling species generally outperformed single-species models, which all revealed substantial spatial variability in species occurrence relationships with maximum temperatures. Our models are particularly relevant for quantifying species-environment relationships using detection-nondetection data from large-scale monitoring programs, which are becoming increasingly prevalent for answering macroscale ecological questions regarding wildlife responses to global change.

北京阿比特科技有限公司