亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Given a collection of $m$ sets from a universe $\mathcal{U}$, the Maximum Set Coverage problem consists of finding $k$ sets whose union has largest cardinality. This problem is NP-Hard, but the solution can be approximated by a polynomial time algorithm up to a factor $1-1/e$. However, this algorithm does not scale well with the input size. In a streaming context, practical high-quality solutions are found, but with space complexity that scales linearly with respect to the size of the universe $|\mathcal{U}|$. However, one randomized streaming algorithm has been shown to produce a $1-1/e-\varepsilon$ approximation of the optimal solution with a space complexity that scales only poly-logarithmically with respect to $m$ and $|\mathcal{U}|$. In order to achieve such a low space complexity, the authors used a technique called subsampling, based on independent-wise hash functions. This article focuses on this sublinear-space algorithm and introduces methods to reduce the time cost of subsampling. We first show how to accelerate by several orders of magnitude without altering the space complexity, number of passes and approximation quality of the original algorithm. Secondly, we derive a new lower bound for the probability of producing a $1-1/e-\varepsilon$ approximation using only pairwise independence: $1-\tfrac{4}{c k \log m}$ compared to the original $1-\tfrac{2e}{m^{ck/6}}$. Although the theoretical approximation guarantees are weaker, for large streams, our algorithm performs well in practice and present the best time-space-performance trade-off for maximum coverage in streams.

相關內容

The Age of Incorrect Information (AoII) is a recently proposed metric for real-time remote monitoring systems. In particular, AoII measures the time the information at the monitor is incorrect, weighted by the magnitude of this incorrectness, thereby combining the notions of freshness and distortion. This paper addresses the definition of an AoII-optimal transmission policy in a discrete-time communication scheme with a resource constraint and a hybrid automatic repeat request (HARQ) protocol. Considering an $N$-ary symmetric Markov source, the problem is formulated as an infinite-horizon average-cost constrained Markov decision process (CMDP). The source model is characterized by the cardinality of the state space and the probability of staying at the same state. Interestingly, it is proved that under some conditions, the optimal transmission policy is to never transmit. This reveals that there exists a region of the source dynamics where communication is inadequate in reducing the AoII. Elsewhere, there exists an optimal transmission policy, which is a randomized mixture of two discrete threshold-based policies that randomize at one state. The optimal threshold and the randomization component are derived analytically. Numerical results illustrate the impact of source dynamics, channel conditions, and the resource constraint on the average AoII.

Average-K classification is an alternative to top-K classification in which the number of labels returned varies with the ambiguity of the input image but must average to K over all the samples. A simple method to solve this task is to threshold the softmax output of a model trained with the cross-entropy loss. This approach is theoretically proven to be asymptotically consistent, but it is not guaranteed to be optimal for a finite set of samples. In this paper, we propose a new loss function based on a multi-label classification head in addition to the classical softmax. This second head is trained using pseudo-labels generated by thresholding the softmax head while guaranteeing that K classes are returned on average. We show that this approach allows the model to better capture ambiguities between classes and, as a result, to return more consistent sets of possible classes. Experiments on two datasets from the literature demonstrate that our approach outperforms the softmax baseline, as well as several other loss functions more generally designed for weakly supervised multi-label classification. The gains are larger the higher the uncertainty, especially for classes with few samples.

In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time LQR problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently exciting input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. Finally, a method to determine a stabilizing policy to initialize the algorithm using only measured data is proposed.

In supervised learning, the regularization path is sometimes used as a convenient theoretical proxy for the optimization path of gradient descent initialized with zero. In this paper, we study a modification of the regularization path for infinite-width 2-layer ReLU neural networks with non-zero initial distribution of the weights at different scales. By exploiting a link with unbalanced optimal transport theory, we show that, despite the non-convexity of the 2-layer network training, this problem admits an infinite dimensional convex counterpart. We formulate the corresponding functional optimization problem and investigate its main properties. In particular, we show that as the scale of the initialization ranges between $0$ and $+\infty$, the associated path interpolates continuously between the so-called kernel and rich regimes. The numerical experiments confirm that, in our setting, the scaling path and the final states of the optimization path behave similarly even beyond these extreme points.

To improve how neural networks function it is crucial to understand their learning process. The information bottleneck theory of deep learning proposes that neural networks achieve good generalization by compressing their representations to disregard information that is not relevant to the task. However, empirical evidence for this theory is conflicting, as compression was only observed when networks used saturating activation functions. In contrast, networks with non-saturating activation functions achieved comparable levels of task performance but did not show compression. In this paper we developed more robust mutual information estimation techniques, that adapt to hidden activity of neural networks and produce more sensitive measurements of activations from all functions, especially unbounded functions. Using these adaptive estimation techniques, we explored compression in networks with a range of different activation functions. With two improved methods of estimation, firstly, we show that saturation of the activation function is not required for compression, and the amount of compression varies between different activation functions. We also find that there is a large amount of variation in compression between different network initializations. Secondary, we see that L2 regularization leads to significantly increased compression, while preventing overfitting. Finally, we show that only compression of the last layer is positively correlated with generalization.

This article presents a three-step framework for learning and solving partial differential equations (PDEs) using kernel methods. Given a training set consisting of pairs of noisy PDE solutions and source/boundary terms on a mesh, kernel smoothing is utilized to denoise the data and approximate derivatives of the solution. This information is then used in a kernel regression model to learn the algebraic form of the PDE. The learned PDE is then used within a kernel based solver to approximate the solution of the PDE with a new source/boundary term, thereby constituting an operator learning framework. Numerical experiments compare the method to state-of-the-art algorithms and demonstrate its competitive performance.

A mainstream type of current self-supervised learning methods pursues a general-purpose representation that can be well transferred to downstream tasks, typically by optimizing on a given pretext task such as instance discrimination. In this work, we argue that existing pretext tasks inevitably introduce biases into the learned representation, which in turn leads to biased transfer performance on various downstream tasks. To cope with this issue, we propose Maximum Entropy Coding (MEC), a more principled objective that explicitly optimizes on the structure of the representation, so that the learned representation is less biased and thus generalizes better to unseen downstream tasks. Inspired by the principle of maximum entropy in information theory, we hypothesize that a generalizable representation should be the one that admits the maximum entropy among all plausible representations. To make the objective end-to-end trainable, we propose to leverage the minimal coding length in lossy data coding as a computationally tractable surrogate for the entropy, and further derive a scalable reformulation of the objective that allows fast computation. Extensive experiments demonstrate that MEC learns a more generalizable representation than previous methods based on specific pretext tasks. It achieves state-of-the-art performance consistently on various downstream tasks, including not only ImageNet linear probe, but also semi-supervised classification, object detection, instance segmentation, and object tracking. Interestingly, we show that existing batch-wise and feature-wise self-supervised objectives could be seen equivalent to low-order approximations of MEC. Code and pre-trained models are available at //github.com/xinliu20/MEC.

Federated learning is a new distributed machine learning framework, where a bunch of heterogeneous clients collaboratively train a model without sharing training data. In this work, we consider a practical and ubiquitous issue in federated learning: intermittent client availability, where the set of eligible clients may change during the training process. Such an intermittent client availability model would significantly deteriorate the performance of the classical Federated Averaging algorithm (FedAvg for short). We propose a simple distributed non-convex optimization algorithm, called Federated Latest Averaging (FedLaAvg for short), which leverages the latest gradients of all clients, even when the clients are not available, to jointly update the global model in each iteration. Our theoretical analysis shows that FedLaAvg attains the convergence rate of $O(1/(N^{1/4} T^{1/2}))$, achieving a sublinear speedup with respect to the total number of clients. We implement and evaluate FedLaAvg with the CIFAR-10 dataset. The evaluation results demonstrate that FedLaAvg indeed reaches a sublinear speedup and achieves 4.23% higher test accuracy than FedAvg.

With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.

Image segmentation is considered to be one of the critical tasks in hyperspectral remote sensing image processing. Recently, convolutional neural network (CNN) has established itself as a powerful model in segmentation and classification by demonstrating excellent performances. The use of a graphical model such as a conditional random field (CRF) contributes further in capturing contextual information and thus improving the segmentation performance. In this paper, we propose a method to segment hyperspectral images by considering both spectral and spatial information via a combined framework consisting of CNN and CRF. We use multiple spectral cubes to learn deep features using CNN, and then formulate deep CRF with CNN-based unary and pairwise potential functions to effectively extract the semantic correlations between patches consisting of three-dimensional data cubes. Effective piecewise training is applied in order to avoid the computationally expensive iterative CRF inference. Furthermore, we introduce a deep deconvolution network that improves the segmentation masks. We also introduce a new dataset and experimented our proposed method on it along with several widely adopted benchmark datasets to evaluate the effectiveness of our method. By comparing our results with those from several state-of-the-art models, we show the promising potential of our method.

北京阿比特科技有限公司