亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Missing value imputation is crucial for real-world data science workflows. Imputation is harder in the online setting, as it requires the imputation method itself to be able to evolve over time. For practical applications, imputation algorithms should produce imputations that match the true data distribution, handle data of mixed types, including ordinal, boolean, and continuous variables, and scale to large datasets. In this work we develop a new online imputation algorithm for mixed data using the Gaussian copula. The online Gaussian copula model meets all the desiderata: its imputations match the data distribution even for mixed data, improve over its offline counterpart on the accuracy when the streaming data has a changing distribution, and on the speed (up to an order of magnitude) especially on large scale datasets. By fitting the copula model to online data, we also provide a new method to detect change points in the multivariate dependence structure with missing values. Experimental results on synthetic and real world data validate the performance of the proposed methods.

相關內容

The recommendation of points of interest (POIs) is essential in location-based social networks. It makes it easier for users and locations to share information. Recently, researchers tend to recommend POIs by treating them as large-scale retrieval systems that require a large amount of training data representing query-item relevance. However, gathering user feedback in retrieval systems is an expensive task. Existing POI recommender systems make recommendations based on user and item (location) interactions solely. However, there are numerous sources of feedback to consider. For example, when the user visits a POI, what is the POI is about and such. Integrating all these different types of feedback is essential when developing a POI recommender. In this paper, we propose using user and item information and auxiliary information to improve the recommendation modelling in a retrieval system. We develop a deep neural network architecture to model query-item relevance in the presence of both collaborative and content information. We also improve the quality of the learned representations of queries and items by including the contextual information from the user feedback data. The application of these learned representations to a large-scale dataset resulted in significant improvements.

Responding appropriately to the detections of a sequential change detector requires knowledge of the rate at which false positives occur in the absence of change. Setting detection thresholds to achieve a desired false positive rate is challenging. Existing works resort to setting time-invariant thresholds that focus on the expected runtime of the detector in the absence of change, either bounding it loosely from below or targeting it directly but with asymptotic arguments that we show cause significant miscalibration in practice. We present a simulation-based approach to setting time-varying thresholds that allows a desired expected runtime to be accurately targeted whilst additionally keeping the false positive rate constant across time steps. Whilst the approach to threshold setting is metric agnostic, we show how the cost of using the popular quadratic time MMD estimator can be reduced from $O(N^2B)$ to $O(N^2+NB)$ during configuration and from $O(N^2)$ to $O(N)$ during operation, where $N$ and $B$ are the numbers of reference and bootstrap samples respectively.

Airborne diseases, including COVID-19, raise the question of transmission risk in public transportation systems. However, quantitative analysis of the effectiveness of transmission risk mitigation methods in public transportation is lacking. The paper develops a transmission risk modeling framework based on the Wells-Riley model using as inputs transit operating characteristics, schedule, Origin-Destination (OD) demand, and virus characteristics. The model is sensitive to various factors that operators can control, as well as external factors that may be subject of broader policy decisions (e.g. mask wearing). The model is utilized to assess transmission risk as a function of OD flows, planned operations, and factors such as mask-wearing, ventilation, and infection rates. Using actual data from the Massachusetts Bay Transportation Authority (MBTA) Red Line, the paper explores the transmission risk under different infection rate scenarios, both in magnitude and spatial characteristics. The paper assesses the combined impact from viral load related factors and passenger load factors. Increasing frequency can mitigate transmission risk, but cannot fully compensate for increases in infection rates. Imbalanced passenger distribution on different cars of a train is shown to increase the overall system-wide infection probability. Spatial infection rate patterns should also be taken into account during policymaking as it is shown to impact transmission risk. For lines with branches, demand distribution among the branches is important and headway allocation adjustment among branches to balance the load on trains to different branches can help reduce risk.

We present a faster interior-point method for optimizing sum-of-squares (SOS) polynomials, which are a central tool in polynomial optimization and capture convex programming in the Lasserre hierarchy. Let $p = \sum_i q^2_i$ be an $n$-variate SOS polynomial of degree $2d$. Denoting by $L := \binom{n+d}{d}$ and $U := \binom{n+2d}{2d}$ the dimensions of the vector spaces in which $q_i$'s and $p$ live respectively, our algorithm runs in time $\tilde{O}(LU^{1.87})$. This is polynomially faster than state-of-art SOS and semidefinite programming solvers, which achieve runtime $\tilde{O}(L^{0.5}\min\{U^{2.37}, L^{4.24}\})$. The centerpiece of our algorithm is a dynamic data structure for maintaining the inverse of the Hessian of the SOS barrier function under the polynomial interpolant basis, which efficiently extends to multivariate SOS optimization, and requires maintaining spectral approximations to low-rank perturbations of elementwise (Hadamard) products. This is the main challenge and departure from recent IPM breakthroughs using inverse-maintenance, where low-rank updates to the slack matrix readily imply the same for the Hessian matrix.

In 5G and beyond systems, the notion of latency gets a great momentum in wireless connectivity as a metric for serving real-time communications requirements. However, in many applications, research has pointed out that latency could be inefficient to handle applications with data freshness requirements. Recently, the notion of Age of Information (AoI) that can capture the freshness of the data has attracted a lot of attention. In this work, we consider mixed traffic with time-sensitive users; a deadline-constrained user, and an AoI-oriented user. To develop an efficient scheduling policy, we cast a novel optimization problem formulation for minimizing the average AoI while satisfying the timely throughput constraints. The formulated problem is cast as a Constrained Markov Decision Process (CMDP). We relax the constrained problem to an unconstrained Markov Decision Process (MDP) problem by utilizing Lyapunov optimization theory and it can be proved that it is solved per frame by applying backward dynamic programming algorithms with optimality guarantees. Simulation results show that the timely throughput constraints are satisfied while minimizing the average AoI. Also, simulation results show the convergence of the algorithm for different values of the weighted factor and the trade-off between the AoI and the timely throughput.

Few researches have studied simultaneous detection of smoke and flame accompanying fires due to their different physical natures that lead to uncertain fluid patterns. In this study, we collect a large image data set to re-label them as a multi-label image classification problem so as to identify smoke and flame simultaneously. In order to solve the generalization ability of the detection model on account of the movable fluid objects with uncertain shapes like fire and smoke, and their not compactible natures as well as the complex backgrounds with high variations, we propose a data augment method by random image stitch to deploy resizing, deforming, position variation, and background altering so as to enlarge the view of the learner. Moreover, we propose a self-learning data augment method by using the class activation map to extract the highly trustable region as new data source of positive examples to further enhance the data augment. By the mutual reinforcement between the data augment and the detection model that are performed iteratively, both modules make progress in an evolutionary manner. Experiments show that the proposed method can effectively improve the generalization performance of the model for concurrent smoke and fire detection.

Anomaly detection is a widely studied task for a broad variety of data types; among them, multiple time series appear frequently in applications, including for example, power grids and traffic networks. Detecting anomalies for multiple time series, however, is a challenging subject, owing to the intricate interdependencies among the constituent series. We hypothesize that anomalies occur in low density regions of a distribution and explore the use of normalizing flows for unsupervised anomaly detection, because of their superior quality in density estimation. Moreover, we propose a novel flow model by imposing a Bayesian network among constituent series. A Bayesian network is a directed acyclic graph (DAG) that models causal relationships; it factorizes the joint probability of the series into the product of easy-to-evaluate conditional probabilities. We call such a graph-augmented normalizing flow approach GANF and propose joint estimation of the DAG with flow parameters. We conduct extensive experiments on real-world datasets and demonstrate the effectiveness of GANF for density estimation, anomaly detection, and identification of time series distribution drift.

Change Point Detection techniques aim to capture changes in trends and sequences in time-series data to describe the underlying behaviour of the system. Detecting changes and anomalies in the web services, the trend of applications usage can provide valuable insight towards the system, however, many existing approaches are done in a supervised manner, requiring well-labelled data. As the amount of data produced and captured by sensors are growing rapidly, it is getting harder and even impossible to annotate the data. Therefore, coming up with a self-supervised solution is a necessity these days. In this work, we propose TSCP a novel self-supervised technique for temporal change point detection, based on representation learning with Temporal Convolutional Network (TCN). To the best of our knowledge, our proposed method is the first method which employs Contrastive Learning for prediction with the aim change point detection. Through extensive evaluations, we demonstrate that our method outperforms multiple state-of-the-art change point detection and anomaly detection baselines, including those adopting either unsupervised or semi-supervised approach. TSCP is shown to improve both non-Deep learning- and Deep learning-based methods by 0.28 and 0.12 in terms of average F1-score across three datasets.

In one-class-learning tasks, only the normal case (foreground) can be modeled with data, whereas the variation of all possible anomalies is too erratic to be described by samples. Thus, due to the lack of representative data, the wide-spread discriminative approaches cannot cover such learning tasks, and rather generative models, which attempt to learn the input density of the foreground, are used. However, generative models suffer from a large input dimensionality (as in images) and are typically inefficient learners. We propose to learn the data distribution of the foreground more efficiently with a multi-hypotheses autoencoder. Moreover, the model is criticized by a discriminator, which prevents artificial data modes not supported by data, and enforces diversity across hypotheses. Our multiple-hypothesesbased anomaly detection framework allows the reliable identification of out-of-distribution samples. For anomaly detection on CIFAR-10, it yields up to 3.9% points improvement over previously reported results. On a real anomaly detection task, the approach reduces the error of the baseline models from 6.8% to 1.5%.

eCommerce transaction frauds keep changing rapidly. This is the major issue that prevents eCommerce merchants having a robust machine learning model for fraudulent transactions detection. The root cause of this problem is that rapid changing fraud patterns alters underlying data generating system and causes the performance deterioration for machine learning models. This phenomenon in statistical modeling is called "Concept Drift". To overcome this issue, we propose an approach which adds dynamic risk features as model inputs. Dynamic risk features are a set of features built on entity profile with fraud feedback. They are introduced to quantify the fluctuation of probability distribution of risk features from certain entity profile caused by concept drift. In this paper, we also illustrate why this strategy can successfully handle the effect of concept drift under statistical learning framework. We also validate our approach on multiple businesses in production and have verified that the proposed dynamic model has a superior ROC curve than a static model built on the same data and training parameters.

北京阿比特科技有限公司