亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We revisit $M$-ary classification of Gutman (TIT 1989), where one is tasked to determine whether a testing sequence is generated with the same distribution as one of the $M$ training sequences or not. Our main result is a two-phase test, its theoretical analysis and its optimality guarantee. Specifically, our two-phase test is a special case of a sequential test with only two decision time points: the first phase of our test is a fixed-length test with a reject option, the second-phase of our test proceeds only if a reject option is decided in the first phase and the second phase of our test does \emph{not} allow a reject option. To provide theoretical guarantee for our test, we derive achievable error exponents using the method of types and derive a converse result for the optimal sequential test using the techniques recently proposed by Hsu, Li and Wang (ITW, 2022) for binary classification. Analytically and numerically, we show that our two phase test achieves the performance of an optimal sequential test with proper choice of test parameters. In particular, similarly as the optimal sequential test, our test does not need a final reject option to achieve the optimal error exponent region while an optimal fixed-length test needs a reject option to achieve the same region. Finally, we specialize our results to binary classification when $M=2$ and to $M$-ary hypothesis testing when the ratio of the lengths of training sequences and testing sequences tends to infinity so that generating distributions can be estimated perfectly.

相關內容

The mismatch between close-set training and open-set testing usually leads to significant performance degradation for speaker verification task. For existing loss functions, metric learning-based objectives depend strongly on searching effective pairs which might hinder further improvements. And popular multi-classification methods are usually observed with degradation when evaluated on unseen speakers. In this work, we introduce SphereFace2 framework which uses several binary classifiers to train the speaker model in a pair-wise manner instead of performing multi-classification. Benefiting from this learning paradigm, it can efficiently alleviate the gap between training and evaluation. Experiments conducted on Voxceleb show that the SphereFace2 outperforms other existing loss functions, especially on hard trials. Besides, large margin fine-tuning strategy is proven to be compatible with it for further improvements. Finally, SphereFace2 also shows its strong robustness to class-wise noisy labels which has the potential to be applied in the semi-supervised training scenario with inaccurate estimated pseudo labels. Codes are available in //github.com/Hunterhuan/sphereface2_speaker_verification

The main contribution of this paper is a new improved variant of the laser method for designing matrix multiplication algorithms. Building upon the recent techniques of [Duan, Wu, Zhou FOCS'2023], the new method introduces several new ingredients that not only yield an improved bound on the matrix multiplication exponent $\omega$, but also improves the known bounds on rectangular matrix multiplication by [Le Gall and Urrutia SODA'2018]. In particular, the new bound on $\omega$ is $\omega \le 2.371552$ (improved from $\omega \le 2.371866$). For the dual matrix multiplication exponent $\alpha$ defined as the largest $\alpha$ for which $\omega(1, \alpha, 1) = 2$, we obtain the improvement $\alpha \ge 0.321334$ (improved from $\alpha \ge 0.31389$). Similar improvements are obtained for various other exponents for multiplying rectangular matrices.

The partial information decomposition (PID) framework is concerned with decomposing the information that a set of random variables has with respect to a target variable into three types of components: redundant, synergistic, and unique. Classical information theory alone does not provide a unique way to decompose information in this manner and additional assumptions have to be made. Recently, Kolchinsky proposed a new general axiomatic approach to obtain measures of redundant information, based on choosing an order relation between information sources (equivalently, order between communication channels). In this paper, we exploit this approach to introduce three new measures of redundant information (and the resulting decompositions) based on well-known preorders between channels, thus contributing to the enrichment of the PID landscape. We relate the new decompositions to existing ones, study some of their properties, and provide examples illustrating their novelty. As a side result, we prove that any preorder that satisfies Kolchinsky's axioms yields a decomposition that meets the axioms originally introduced by Williams and Beer when they first propose the PID.

Hypothesis transfer learning (HTL) contrasts domain adaptation by allowing for a previous task leverage, named the source, into a new one, the target, without requiring access to the source data. Indeed, HTL relies only on a hypothesis learnt from such source data, relieving the hurdle of expansive data storage and providing great practical benefits. Hence, HTL is highly beneficial for real-world applications relying on big data. The analysis of such a method from a theoretical perspective faces multiple challenges, particularly in classification tasks. This paper deals with this problem by studying the learning theory of HTL through algorithmic stability, an attractive theoretical framework for machine learning algorithms analysis. In particular, we are interested in the statistical behaviour of the regularized empirical risk minimizers in the case of binary classification. Our stability analysis provides learning guarantees under mild assumptions. Consequently, we derive several complexity-free generalization bounds for essential statistical quantities like the training error, the excess risk and cross-validation estimates. These refined bounds allow understanding the benefits of transfer learning and comparing the behaviour of standard losses in different scenarios, leading to valuable insights for practitioners.

The selection of model's parameters plays an important role in the application of support vector classification (SVC). The commonly used method of selecting model's parameters is the k-fold cross validation with grid search (CV). It is extremely time-consuming because it needs to train a large number of SVC models. In this paper, a new method is proposed to train SVC with the selection of model's parameters. Firstly, training SVC with the selection of model's parameters is modeled as a minimax optimization problem (MaxMin-L2-SVC-NCH), in which the minimization problem is an optimization problem of finding the closest points between two normal convex hulls (L2-SVC-NCH) while the maximization problem is an optimization problem of finding the optimal model's parameters. A lower time complexity can be expected in MaxMin-L2-SVC-NCH because CV is abandoned. A gradient-based algorithm is then proposed to solve MaxMin-L2-SVC-NCH, in which L2-SVC-NCH is solved by a projected gradient algorithm (PGA) while the maximization problem is solved by a gradient ascent algorithm with dynamic learning rate. To demonstrate the advantages of the PGA in solving L2-SVC-NCH, we carry out a comparison of the PGA and the famous sequential minimal optimization (SMO) algorithm after a SMO algorithm and some KKT conditions for L2-SVC-NCH are provided. It is revealed that the SMO algorithm is a special case of the PGA. Thus, the PGA can provide more flexibility. The comparative experiments between MaxMin-L2-SVC-NCH and the classical parameter selection models on public datasets show that MaxMin-L2-SVC-NCH greatly reduces the number of models to be trained and the test accuracy is not lost to the classical models. It indicates that MaxMin-L2-SVC-NCH performs better than the other models. We strongly recommend MaxMin-L2-SVC-NCH as a preferred model for SVC task.

One of the most studied extensions of the famous Traveling Salesperson Problem (TSP) is the {\sc Multiple TSP}: a set of $m\geq 1$ salespersons collectively traverses a set of $n$ cities by $m$ non-trivial tours, to minimize the total length of their tours. This problem can also be considered to be a variant of {\sc Uncapacitated Vehicle Routing} where the objective function is the sum of all tour lengths. When all $m$ tours start from a single common \emph{depot} $v_0$, then the metric {\sc Multiple TSP} can be approximated equally well as the standard metric TSP, as shown by Frieze (1983). The {\sc Multiple TSP} becomes significantly harder to approximate when there is a \emph{set} $D$ of $d \geq 1$ depots that form the starting and end points of the $m$ tours. For this case only a $(2-1/d)$-approximation in polynomial time is known, as well as a $3/2$-approximation for \emph{constant} $d$ which requires a prohibitive run time of $n^{\Theta(d)}$ (Xu and Rodrigues, \emph{INFORMS J. Comput.}, 2015). A recent work of Traub, Vygen and Zenklusen (STOC 2020) gives another approximation algorithm for {\sc Multiple TSP} running in time $n^{\Theta(d)}$ and reducing the problem to approximating TSP. In this paper we overcome the $n^{\Theta(d)}$ time barrier: we give the first efficient approximation algorithm for {\sc Multiple TSP} with a \emph{variable} number $d$ of depots that yields a better-than-2 approximation. Our algorithm runs in time $(1/\varepsilon)^{\mathcal O(d\log d)}\cdot n^{\mathcal O(1)}$, and produces a $(3/2+\varepsilon)$-approximation with constant probability. For the graphic case, we obtain a deterministic $3/2$-approximation in time $2^d\cdot n^{\mathcal O(1)}$.ithm for metric {\sc Multiple TSP} with run time $n^{\Theta(d)}$, which reduces the problem to approximating metric TSP.

The automatic classification of 3D medical data is memory-intensive. Also, variations in the number of slices between samples is common. Naive solutions such as subsampling can solve these problems, but at the cost of potentially eliminating relevant diagnosis information. Transformers have shown promising performance for sequential data analysis. However, their application for long-sequences is data, computationally, and memory demanding. In this paper, we propose an end-to-end Transformer-based framework that allows to classify volumetric data of variable length in an efficient fashion. Particularly, by randomizing the input slice-wise resolution during training, we enhance the capacity of the learnable positional embedding assigned to each volume slice. Consequently, the accumulated positional information in each positional embedding can be generalized to the neighbouring slices, even for high resolution volumes at the test time. By doing so, the model will be more robust to variable volume length and amenable to different computational budgets. We evaluated the proposed approach in retinal OCT volume classification and achieved 21.96% average improvement in balanced accuracy on a 9-class diagnostic task, compared to state-of-the-art video transformers. Our findings show that varying the slice-wise resolution of the input during training results in more informative volume representation as compared to training with fixed number of slices per volume. Our code is available at: //github.com/marziehoghbaie/VLFAT.

Accurate and efficient classification of different types of cancer is critical for early detection and effective treatment. In this paper, we present the results of our experiments using the EfficientNet algorithm for classification of brain tumor, breast cancer mammography, chest cancer, and skin cancer. We used publicly available datasets and preprocessed the images to ensure consistency and comparability. Our experiments show that the EfficientNet algorithm achieved high accuracy, precision, recall, and F1 scores on each of the cancer datasets, outperforming other state-of-the-art algorithms in the literature. We also discuss the strengths and weaknesses of the EfficientNet algorithm and its potential applications in clinical practice. Our results suggest that the EfficientNet algorithm is well-suited for classification of different types of cancer and can be used to improve the accuracy and efficiency of cancer diagnosis.

With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.

High spectral dimensionality and the shortage of annotations make hyperspectral image (HSI) classification a challenging problem. Recent studies suggest that convolutional neural networks can learn discriminative spatial features, which play a paramount role in HSI interpretation. However, most of these methods ignore the distinctive spectral-spatial characteristic of hyperspectral data. In addition, a large amount of unlabeled data remains an unexploited gold mine for efficient data use. Therefore, we proposed an integration of generative adversarial networks (GANs) and probabilistic graphical models for HSI classification. Specifically, we used a spectral-spatial generator and a discriminator to identify land cover categories of hyperspectral cubes. Moreover, to take advantage of a large amount of unlabeled data, we adopted a conditional random field to refine the preliminary classification results generated by GANs. Experimental results obtained using two commonly studied datasets demonstrate that the proposed framework achieved encouraging classification accuracy using a small number of data for training.

北京阿比特科技有限公司