亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We introduce a new nonparametric framework for classification problems in the presence of missing data. The key aspect of our framework is that the regression function decomposes into an anova-type sum of orthogonal functions, of which some (or even many) may be zero. Working under a general missingness setting, which allows features to be missing not at random, our main goal is to derive the minimax rate for the excess risk in this problem. In addition to the decomposition property, the rate depends on parameters that control the tail behaviour of the marginal feature distributions, the smoothness of the regression function and a margin condition. The ambient data dimension does not appear in the minimax rate, which can therefore be faster than in the classical nonparametric setting. We further propose a new method, called the Hard-thresholding Anova Missing data (HAM) classifier, based on a careful combination of a k-nearest neighbour algorithm and a thresholding step. The HAM classifier attains the minimax rate up to polylogarithmic factors and numerical experiments further illustrate its utility.

相關內容

Diffusion models are typically trained using score matching, yet score matching is agnostic to the particular forward process that defines the model. This paper argues that Markov diffusion models enjoy an advantage over other types of diffusion model, as their associated operators can be exploited to improve the training process. In particular, (i) there exists an explicit formal solution to the forward process as a sequence of time-dependent kernel mean embeddings; and (ii) the derivation of score-matching and related estimators can be streamlined. Building upon (i), we propose Riemannian diffusion kernel smoothing, which ameliorates the need for neural score approximation, at least in the low-dimensional context; Building upon (ii), we propose operator-informed score matching, a variance reduction technique that is straightforward to implement in both low- and high-dimensional diffusion modeling and is demonstrated to improve score matching in an empirical proof-of-concept.

A Riemannian geometric framework for Markov chain Monte Carlo (MCMC) is developed where using the Fisher-Rao metric on the manifold of probability density functions (pdfs) informed proposal densities for Metropolis-Hastings (MH) algorithms are constructed. We exploit the square-root representation of pdfs under which the Fisher-Rao metric boils down to the standard $L^2$ metric on the positive orthant of the unit hypersphere. The square-root representation allows us to easily compute the geodesic distance between densities, resulting in a straightforward implementation of the proposed geometric MCMC methodology. Unlike the random walk MH that blindly proposes a candidate state using no information about the target, the geometric MH algorithms effectively move an uninformed base density (e.g., a random walk proposal density) towards different global/local approximations of the target density. We compare the proposed geometric MH algorithm with other MCMC algorithms for various Markov chain orderings, namely the covariance, efficiency, Peskun, and spectral gap orderings. The superior performance of the geometric algorithms over other MH algorithms like the random walk Metropolis, independent MH and variants of Metropolis adjusted Langevin algorithms is demonstrated in the context of various multimodal, nonlinear and high dimensional examples. In particular, we use extensive simulation and real data applications to compare these algorithms for analyzing mixture models, logistic regression models and ultra-high dimensional Bayesian variable selection models. A publicly available R package accompanies the article.

Assume that an interferer behaves according to a parametric model but one does not know the value of the model parameters. Sensing enables to improve the model knowledge and therefore perform a better link adaptation. However, we consider a half-duplex scenario where, at each time slot, the communication system should decide between sensing and communication. We thus propose to investigate the optimal policy to maximize the expected sum rate given a finite-time communication. % the following question therefore arises: At a given time slot, should one sense or communicate? We first show that this problem can be modelled in the Markov decision process (MDP) framework. We then demonstrate that the optimal open-loop and closed-loop policies can be found significantly faster than the standard backward-induction algorithm.

The Spatial AutoRegressive model (SAR) is commonly used in studies involving spatial and network data to estimate the spatial or network peer influence and the effects of covariates on the response, taking into account the spatial or network dependence. While the model can be efficiently estimated with a Quasi maximum likelihood approach (QMLE), the detrimental effect of covariate measurement error on the QMLE and how to remedy it is currently unknown. If covariates are measured with error, then the QMLE may not have the $\sqrt{n}$ convergence and may even be inconsistent even when a node is influenced by only a limited number of other nodes or spatial units. We develop a measurement error-corrected ML estimator (ME-QMLE) for the parameters of the SAR model when covariates are measured with error. The ME-QMLE possesses statistical consistency and asymptotic normality properties. We consider two types of applications. The first is when the true covariate cannot be measured directly, and a proxy is observed instead. The second one involves including latent homophily factors estimated with error from the network for estimating peer influence. Our numerical results verify the bias correction property of the estimator and the accuracy of the standard error estimates in finite samples. We illustrate the method on a real dataset related to county-level death rates from the COVID-19 pandemic.

In this work, we delve into the EEG classification task in the domain of visual brain decoding via two frameworks, involving two different learning paradigms. Considering the spatio-temporal nature of EEG data, one of our frameworks is based on a CNN-BiLSTM model. The other involves a CNN-Transformer architecture which inherently involves the more versatile attention based learning paradigm. In both cases, a special 1D-CNN feature extraction module is used to generate the initial embeddings with 1D convolutions in the time and the EEG channel domains. Considering the EEG signals are noisy, non stationary and the discriminative features are even less clear (than in semantically structured data such as text or image), we also follow a window-based classification followed by majority voting during inference, to yield labels at a signal level. To illustrate how brain patterns correlate with different image classes, we visualize t-SNE plots of the BiLSTM embeddings alongside brain activation maps for the top 10 classes. These visualizations provide insightful revelations into the distinct neural signatures associated with each visual category, showcasing the BiLSTM's capability to capture and represent the discriminative brain activity linked to visual stimuli. We demonstrate the performance of our approach on the updated EEG-Imagenet dataset with positive comparisons with state-of-the-art methods.

The task of analyzing extreme events with censoring effects is considered under a framework allowing for random covariate information. A wide class of estimators that can be cast as product-limit integrals is considered, for when the conditional distributions belong to the Frechet max-domain of attraction. The main mathematical contribution is establishing uniform conditions on the families of the regularly varying tails for which the asymptotic behaviour of the resulting estimators is tractable. In particular, a decomposition of the integral estimators in terms of exchangeable sums is provided, which leads to a law of large numbers and several central limit theorems. Subsequently, the finite-sample behaviour of the estimators is explored through a simulation study, and through the analysis of two real-life datasets. In particular, the inclusion of covariates makes the model significantly versatile and, as a consequence, practically relevant.

This paper explores the multiple testing problem for sparse high-dimensional data with binary outcomes. We utilize the empirical Bayes posterior to construct multiple testing procedures and evaluate their performance on false discovery rate (FDR) control. We first show that the $\ell$-value (a.k.a. the local FDR) procedure can be overly conservative in estimating the FDR if choosing the conjugate spike and uniform slab prior. To address this, we propose two new procedures that calibrate the posterior to achieve correct FDR control. Sharp frequentist theoretical results are established for these procedures, and numerical experiments are conducted to validate our theory in finite samples. To the best of our knowledge, we obtain the first {\it uniform} FDR control result in multiple testing for high-dimensional data with binary outcomes under the sparsity assumption.

We give a new formulation of Turing reducibility in terms of higher modalities, inspired by an embedding of the Turing degrees in the lattice of subtoposes of the effective topos discovered by Hyland. In this definition, higher modalities play a similar role to I/O monads or dialogue trees in allowing a function to receive input from an external oracle. However, in homotopy type theory they have better logical properties than monads: they are compatible with higher types, and each modality corresponds to a reflective subuniverse that under suitable conditions is itself a model of homotopy type theory. We give synthetic proofs of some basic results about Turing reducibility in cubical type theory making use of two axioms of Markov induction and computable choice. Both axioms are variants of axioms already studied in the effective topos. We show they hold in certain reflective subuniverses of cubical assemblies, demonstrate their use in some simple proofs in synthetic computability theory using modalities, and show they are downwards absolute for oracle modalities. These results have been formalised using cubical mode of the Agda proof assistant. We explore some first connections between Turing reducibility and homotopy theory. This includes a synthetic proof that two Turing degrees are equal as soon as they induce isomorphic permutation groups on the natural numbers, making essential use of both Markov induction and the formulation of groups in HoTT as pointed, connected, 1-truncated types. We also give some simple non-topological examples of modalities in cubical assemblies based on these ideas, to illustrate what we expect higher dimensional analogues of the Turing degrees to look like.

Sparse autoencoders provide a promising unsupervised approach for extracting interpretable features from a language model by reconstructing activations from a sparse bottleneck layer. Since language models learn many concepts, autoencoders need to be very large to recover all relevant features. However, studying the properties of autoencoder scaling is difficult due to the need to balance reconstruction and sparsity objectives and the presence of dead latents. We propose using k-sparse autoencoders [Makhzani and Frey, 2013] to directly control sparsity, simplifying tuning and improving the reconstruction-sparsity frontier. Additionally, we find modifications that result in few dead latents, even at the largest scales we tried. Using these techniques, we find clean scaling laws with respect to autoencoder size and sparsity. We also introduce several new metrics for evaluating feature quality based on the recovery of hypothesized features, the explainability of activation patterns, and the sparsity of downstream effects. These metrics all generally improve with autoencoder size. To demonstrate the scalability of our approach, we train a 16 million latent autoencoder on GPT-4 activations for 40 billion tokens. We release training code and autoencoders for open-source models, as well as a visualizer.

We exploit Gaussian copulas to specify a class of multivariate circular distributions and obtain parametric models for the analysis of correlated circular data. This approach provides a straightforward extension of traditional multivariate normal models to the circular setting, without imposing restrictions on the marginal data distribution nor requiring overwhelming routines for parameter estimation. The proposal is illustrated on two case studies of animal orientation and sea currents, where we propose an autoregressive model for circular time series and a geostatistical model for circular spatial series.

北京阿比特科技有限公司