亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Given $n$ observations from two balanced classes, consider the task of labeling an additional $m$ inputs that are known to all belong to \emph{one} of the two classes. Special cases of this problem are well-known: with complete knowledge of class distributions ($n=\infty$) the problem is solved optimally by the likelihood-ratio test; when $m=1$ it corresponds to binary classification; and when $m\approx n$ it is equivalent to two-sample testing. The intermediate settings occur in the field of likelihood-free inference, where labeled samples are obtained by running forward simulations and the unlabeled sample is collected experimentally. In recent work it was discovered that there is a fundamental trade-off between $m$ and $n$: increasing the data sample $m$ reduces the amount $n$ of training/simulation data needed. In this work we (a) introduce a generalization where unlabeled samples come from a mixture of the two classes -- a case often encountered in practice; (b) study the minimax sample complexity for non-parametric classes of densities under \textit{maximum mean discrepancy} (MMD) separation; and (c) investigate the empirical performance of kernels parameterized by neural networks on two tasks: detection of the Higgs boson and detection of planted DDPM generated images amidst CIFAR-10 images. For both problems we confirm the existence of the theoretically predicted asymmetric $m$ vs $n$ trade-off.

相關內容

We propose a new framework to design and analyze accelerated methods that solve general monotone equation (ME) problems $F(x)=0$. Traditional approaches include generalized steepest descent methods and inexact Newton-type methods. If $F$ is uniformly monotone and twice differentiable, these methods achieve local convergence rates while the latter methods are globally convergent thanks to line search and hyperplane projection. However, a global rate is unknown for these methods. The variational inequality methods can be applied to yield a global rate that is expressed in terms of $\|F(x)\|$ but these results are restricted to first-order methods and a Lipschitz continuous operator. It has not been clear how to obtain global acceleration using high-order Lipschitz continuity. This paper takes a continuous-time perspective where accelerated methods are viewed as the discretization of dynamical systems. Our contribution is to propose accelerated rescaled gradient systems and prove that they are equivalent to closed-loop control systems. Based on this connection, we establish the properties of solution trajectories. Moreover, we provide a unified algorithmic framework obtained from discretization of our system, which together with two approximation subroutines yields both existing high-order methods and new first-order methods. We prove that the $p^{th}$-order method achieves a global rate of $O(k^{-p/2})$ in terms of $\|F(x)\|$ if $F$ is $p^{th}$-order Lipschitz continuous and the first-order method achieves the same rate if $F$ is $p^{th}$-order strongly Lipschitz continuous. If $F$ is strongly monotone, the restarted versions achieve local convergence with order $p$ when $p \geq 2$. Our discrete-time analysis is largely motivated by the continuous-time analysis and demonstrates the fundamental role that rescaled gradients play in global acceleration for solving ME problems.

Kernel-based methods are heavily used in machine learning. However, they suffer from $O(N^2)$ complexity in the number $N$ of considered data points. In this paper, we propose an approximation procedure, which reduces this complexity to $O(N)$. Our approach is based on two ideas. First, we prove that any radial kernel with analytic basis function can be represented as sliced version of some one-dimensional kernel and derive an analytic formula for the one-dimensional counterpart. It turns out that the relation between one- and $d$-dimensional kernels is given by a generalized Riemann-Liouville fractional integral. Hence, we can reduce the $d$-dimensional kernel summation to a one-dimensional setting. Second, for solving these one-dimensional problems efficiently, we apply fast Fourier summations on non-equispaced data, a sorting algorithm or a combination of both. Due to its practical importance we pay special attention to the Gaussian kernel, where we show a dimension-independent error bound and represent its one-dimensional counterpart via a closed-form Fourier transform. We provide a run time comparison and error estimate of our fast kernel summations.

The multi-view hash method converts heterogeneous data from multiple views into binary hash codes, which is one of the critical technologies in multimedia retrieval. However, the current methods mainly explore the complementarity among multiple views while lacking confidence learning and fusion. Moreover, in practical application scenarios, the single-view data contain redundant noise. To conduct the confidence learning and eliminate unnecessary noise, we propose a novel Adaptive Confidence Multi-View Hashing (ACMVH) method. First, a confidence network is developed to extract useful information from various single-view features and remove noise information. Furthermore, an adaptive confidence multi-view network is employed to measure the confidence of each view and then fuse multi-view features through a weighted summation. Lastly, a dilation network is designed to further enhance the feature representation of the fused features. To the best of our knowledge, we pioneer the application of confidence learning into the field of multimedia retrieval. Extensive experiments on two public datasets show that the proposed ACMVH performs better than state-of-the-art methods (maximum increase of 3.24%). The source code is available at //github.com/HackerHyper/ACMVH.

$\newcommand{\eps}{\varepsilon}$We present an auction algorithm using multiplicative instead of constant weight updates to compute a $(1-\eps)$-approximate maximum weight matching (MWM) in a bipartite graph with $n$ vertices and $m$ edges in time $O(m\eps^{-1}\log(\eps^{-1}))$, matching the running time of the linear-time approximation algorithm of Duan and Pettie [JACM '14]. Our algorithm is very simple and it can be extended to give a dynamic data structure that maintains a $(1-\eps)$-approximate maximum weight matching under (1) one-sided vertex deletions (with incident edges) and (2) one-sided vertex insertions (with incident edges sorted by weight) to the other side. The total time time used is $O(m\eps^{-1}\log(\eps^{-1}))$, where $m$ is the sum of the number of initially existing and inserted edges.

Cone distribution functions from statistics are turned into Multi-Criteria Decision Making tools. It is demonstrated that this procedure can be considered as an upgrade of the weighted sum scalarization insofar as it absorbs a whole collection of weighted sum scalarizations at once instead of fixing a particular one in advance. As examples show, this type of scalarization--in contrast to a pure weighted sum scalarization-is also able to detect ``non-convex" parts of the Pareto frontier. Situations are characterized in which different types of rank reversal occur, and it is explained why this might even be useful for analyzing the ranking procedure. The ranking functions are then extended to sets providing unary indicators for set preferences which establishes, for the first time, the link between set optimization methods and set-based multi-objective optimization. A potential application in machine learning is outlined.

A Reeb graph is a graphical representation of a scalar function $f: X \to \mathbb{R}$ on a topological space $X$ that encodes the topology of the level sets. A Reeb space is a generalization of the Reeb graph to a multivariate function $f: X \to \mathbb{R}^d$. In this paper, we propose novel constructions of Reeb graphs and Reeb spaces that incorporate the use of a measure. Specifically, we introduce measure theoretic Reeb graphs and Reeb spaces when the domain or the range is modeled as a metric measure space (i.e.,~a metric space equipped with a measure). Our main goal is to enhance the robustness of the Reeb graph and Reeb space in representing the topological features of a scalar field while accounting for the distribution of the measure. We first prove the stability of our measure theoretic constructions with respect to the interleaving distance. We then prove their stability with respect to the measure, defined using the distance to a measure or the kernel distance to a measure, respectively.

Open set recognition (OSR) requires the model to classify samples that belong to closed sets while rejecting unknown samples during test. Currently, generative models often perform better than discriminative models in OSR, but recent studies show that generative models may be computationally infeasible or unstable on complex tasks. In this paper, we provide insights into OSR and find that learning supplementary representations can theoretically reduce the open space risk. Based on the analysis, we propose a new model, namely Multi-Expert Diverse Attention Fusion (MEDAF), that learns diverse representations in a discriminative way. MEDAF consists of multiple experts that are learned with an attention diversity regularization term to ensure the attention maps are mutually different. The logits learned by each expert are adaptively fused and used to identify the unknowns through the score function. We show that the differences in attention maps can lead to diverse representations so that the fused representations can well handle the open space. Extensive experiments are conducted on standard and OSR large-scale benchmarks. Results show that the proposed discriminative method can outperform existing generative models by up to 9.5% on AUROC and achieve new state-of-the-art performance with little computational cost. Our method can also seamlessly integrate existing classification models. Code is available at //github.com/Vanixxz/MEDAF.

The prevalence of the powerful multilingual models, such as Whisper, has significantly advanced the researches on speech recognition. However, these models often struggle with handling the code-switching setting, which is essential in multilingual speech recognition. Recent studies have attempted to address this setting by separating the modules for different languages to ensure distinct latent representations for languages. Some other methods considered the switching mechanism based on language identification. In this study, a new attention-guided adaptation is proposed to conduct parameter-efficient learning for bilingual ASR. This method selects those attention heads in a model which closely express language identities and then guided those heads to be correctly attended with their corresponding languages. The experiments on the Mandarin-English code-switching speech corpus show that the proposed approach achieves a 14.2% mixed error rate, surpassing state-of-the-art method, where only 5.6% additional parameters over Whisper are trained.

Classic machine learning methods are built on the $i.i.d.$ assumption that training and testing data are independent and identically distributed. However, in real scenarios, the $i.i.d.$ assumption can hardly be satisfied, rendering the sharp drop of classic machine learning algorithms' performances under distributional shifts, which indicates the significance of investigating the Out-of-Distribution generalization problem. Out-of-Distribution (OOD) generalization problem addresses the challenging setting where the testing distribution is unknown and different from the training. This paper serves as the first effort to systematically and comprehensively discuss the OOD generalization problem, from the definition, methodology, evaluation to the implications and future directions. Firstly, we provide the formal definition of the OOD generalization problem. Secondly, existing methods are categorized into three parts based on their positions in the whole learning pipeline, namely unsupervised representation learning, supervised model learning and optimization, and typical methods for each category are discussed in detail. We then demonstrate the theoretical connections of different categories, and introduce the commonly used datasets and evaluation metrics. Finally, we summarize the whole literature and raise some future directions for OOD generalization problem. The summary of OOD generalization methods reviewed in this survey can be found at //out-of-distribution-generalization.com.

While existing work in robust deep learning has focused on small pixel-level $\ell_p$ norm-based perturbations, this may not account for perturbations encountered in several real world settings. In many such cases although test data might not be available, broad specifications about the types of perturbations (such as an unknown degree of rotation) may be known. We consider a setup where robustness is expected over an unseen test domain that is not i.i.d. but deviates from the training domain. While this deviation may not be exactly known, its broad characterization is specified a priori, in terms of attributes. We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space, without having access to the data from the test domain. Our adversarial training solves a min-max optimization problem, with the inner maximization generating adversarial perturbations, and the outer minimization finding model parameters by optimizing the loss on adversarial perturbations generated from the inner maximization. We demonstrate the applicability of our approach on three types of naturally occurring perturbations -- object-related shifts, geometric transformations, and common image corruptions. Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations. We demonstrate the usefulness of the proposed approach by showing the robustness gains of deep neural networks trained using our adversarial training on MNIST, CIFAR-10, and a new variant of the CLEVR dataset.

北京阿比特科技有限公司