亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Expectation-Maximization (EM) algorithm is a widely used iterative algorithm for computing maximum likelihood estimate when dealing with Gaussian Mixture Model (GMM). When the sample size is smaller than the data dimension, this could lead to a singular or poorly conditioned covariance matrix and, thus, to performance reduction. This paper presents a regularized version of the EM algorithm that efficiently uses prior knowledge to cope with a small sample size. This method aims to maximize a penalized GMM likelihood where regularized estimation may ensure positive definiteness of covariance matrix updates by shrinking the estimators towards some structured target covariance matrices. Finally, experiments on real data highlight the good performance of the proposed algorithm for clustering purposes

相關內容

We propose local prediction pools as a method for combining the predictive distributions of a set of experts conditional on a set of variables believed to be related to the predictive accuracy of the experts. This is done in a two step process where we first estimate the conditional predictive accuracy of each expert given a vector of covariates$\unicode{x2014}$or pooling variables$\unicode{x2014}$and then combine the predictive distributions of the experts conditional on this local predictive accuracy. To estimate the local predictive accuracy of each expert, we introduce the simple, fast, and interpretable caliper method. Expert pooling weights from the local prediction pool approaches the equal weight solution whenever there is little data on local predictive performance, making the pools robust and adaptive. We also propose a local version of the widely used optimal prediction pools. Local prediction pools are shown to outperform the widely used optimal linear pools in a macroeconomic forecasting evaluation, and in predicting daily bike usage for a bike rental company.

We present Constrained Stein Variational Trajectory Optimization (CSVTO), an algorithm for performing trajectory optimization with constraints on a set of trajectories in parallel. We frame constrained trajectory optimization as a novel form of constrained functional minimization over trajectory distributions, which avoids treating the constraints as a penalty in the objective and allows us to generate diverse sets of constraint-satisfying trajectories. Our method uses Stein Variational Gradient Descent (SVGD) to find a set of particles that approximates a distribution over low-cost trajectories while obeying constraints. CSVTO is applicable to problems with arbitrary equality and inequality constraints and includes a novel particle resampling step to escape local minima. By explicitly generating diverse sets of trajectories, CSVTO is better able to avoid poor local minima and is more robust to initialization. We demonstrate that CSVTO outperforms baselines in challenging highly-constrained tasks, such as a 7DoF wrench manipulation task, where CSVTO succeeds in 20/20 trials vs 13/20 for the closest baseline. Our results demonstrate that generating diverse constraint-satisfying trajectories improves robustness to disturbances and initialization over baselines.

Given an array A[1: n] of n elements drawn from an ordered set, the sorted range selection problem is to build a data structure that can be used to answer the following type of queries efficiently: Given a pair of indices i, j $ (1\le i\le j \le n)$, and a positive integer k, report the k smallest elements from the sub-array A[i: j] in order. Brodal et al. (Brodal, G.S., Fagerberg, R., Greve, M., and L{\'o}pez-Ortiz, A., Online sorted range reporting. Algorithms and Computation (2009) pp. 173--182) introduced the problem and gave an optimal solution. After O(n log n) time for preprocessing, the query time is O(k). The space used is O(n). In this paper, we propose the only other possible optimal trade-off for the problem. We present a linear space solution to the problem that takes O(k log k) time to answer a range selection query. The preprocessing time is O(n). Moreover, the proposed algorithm reports the output elements one by one in non-decreasing order. Our solution is simple and practical.

Deep Neural Networks (DNNs) deployed to the real world are regularly subject to out-of-distribution (OoD) data, various types of noise, and shifting conceptual objectives. This paper proposes a framework for adapting to data distribution drift modeled by benchmark Continual Learning datasets. We develop and evaluate a method of Continual Learning that leverages uncertainty quantification from Bayesian Inference to mitigate catastrophic forgetting. We expand on previous approaches by removing the need for Monte Carlo sampling of the model weights to sample the predictive distribution. We optimize a closed-form Evidence Lower Bound (ELBO) objective approximating the predictive distribution by propagating the first two moments of a distribution, i.e. mean and covariance, through all network layers. Catastrophic forgetting is mitigated by using the closed-form ELBO to approximate the Minimum Description Length (MDL) Principle, inherently penalizing changes in the model likelihood by minimizing the KL Divergence between the variational posterior for the current task and the previous task's variational posterior acting as the prior. Leveraging the approximation of the MDL principle, we aim to initially learn a sparse variational posterior and then minimize additional model complexity learned for subsequent tasks. Our approach is evaluated for the task incremental learning scenario using density propagated versions of fully-connected and convolutional neural networks across multiple sequential benchmark datasets with varying task sequence lengths. Ultimately, this procedure produces a minimally complex network over a series of tasks mitigating catastrophic forgetting.

2D-based Industrial Anomaly Detection has been widely discussed, however, multimodal industrial anomaly detection based on 3D point clouds and RGB images still has many untouched fields. Existing multimodal industrial anomaly detection methods directly concatenate the multimodal features, which leads to a strong disturbance between features and harms the detection performance. In this paper, we propose Multi-3D-Memory (M3DM), a novel multimodal anomaly detection method with hybrid fusion scheme: firstly, we design an unsupervised feature fusion with patch-wise contrastive learning to encourage the interaction of different modal features; secondly, we use a decision layer fusion with multiple memory banks to avoid loss of information and additional novelty classifiers to make the final decision. We further propose a point feature alignment operation to better align the point cloud and RGB features. Extensive experiments show that our multimodal industrial anomaly detection model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTec-3D AD dataset. Code is available at //github.com/nomewang/M3DM.

Disentangled Representation Learning (DRL) aims to learn a model capable of identifying and disentangling the underlying factors hidden in the observable data in representation form. The process of separating underlying factors of variation into variables with semantic meaning benefits in learning explainable representations of data, which imitates the meaningful understanding process of humans when observing an object or relation. As a general learning strategy, DRL has demonstrated its power in improving the model explainability, controlability, robustness, as well as generalization capacity in a wide range of scenarios such as computer vision, natural language processing, data mining etc. In this article, we comprehensively review DRL from various aspects including motivations, definitions, methodologies, evaluations, applications and model designs. We discuss works on DRL based on two well-recognized definitions, i.e., Intuitive Definition and Group Theory Definition. We further categorize the methodologies for DRL into four groups, i.e., Traditional Statistical Approaches, Variational Auto-encoder Based Approaches, Generative Adversarial Networks Based Approaches, Hierarchical Approaches and Other Approaches. We also analyze principles to design different DRL models that may benefit different tasks in practical applications. Finally, we point out challenges in DRL as well as potential research directions deserving future investigations. We believe this work may provide insights for promoting the DRL research in the community.

Graph Neural Networks (GNNs) have been successfully used in many problems involving graph-structured data, achieving state-of-the-art performance. GNNs typically employ a message-passing scheme, in which every node aggregates information from its neighbors using a permutation-invariant aggregation function. Standard well-examined choices such as the mean or sum aggregation functions have limited capabilities, as they are not able to capture interactions among neighbors. In this work, we formalize these interactions using an information-theoretic framework that notably includes synergistic information. Driven by this definition, we introduce the Graph Ordering Attention (GOAT) layer, a novel GNN component that captures interactions between nodes in a neighborhood. This is achieved by learning local node orderings via an attention mechanism and processing the ordered representations using a recurrent neural network aggregator. This design allows us to make use of a permutation-sensitive aggregator while maintaining the permutation-equivariance of the proposed GOAT layer. The GOAT model demonstrates its increased performance in modeling graph metrics that capture complex information, such as the betweenness centrality and the effective size of a node. In practical use-cases, its superior modeling capability is confirmed through its success in several real-world node classification benchmarks.

The information bottleneck (IB) method is a technique for extracting information that is relevant for predicting the target random variable from the source random variable, which is typically implemented by optimizing the IB Lagrangian that balances the compression and prediction terms. However, the IB Lagrangian is hard to optimize, and multiple trials for tuning values of Lagrangian multiplier are required. Moreover, we show that the prediction performance strictly decreases as the compression gets stronger during optimizing the IB Lagrangian. In this paper, we implement the IB method from the perspective of supervised disentangling. Specifically, we introduce Disentangled Information Bottleneck (DisenIB) that is consistent on compressing source maximally without target prediction performance loss (maximum compression). Theoretical and experimental results demonstrate that our method is consistent on maximum compression, and performs well in terms of generalization, robustness to adversarial attack, out-of-distribution detection, and supervised disentangling.

Graph Convolutional Networks (GCNs) and their variants have experienced significant attention and have become the de facto methods for learning graph representations. GCNs derive inspiration primarily from recent deep learning approaches, and as a result, may inherit unnecessary complexity and redundant computation. In this paper, we reduce this excess complexity through successively removing nonlinearities and collapsing weight matrices between consecutive layers. We theoretically analyze the resulting linear model and show that it corresponds to a fixed low-pass filter followed by a linear classifier. Notably, our experimental evaluation demonstrates that these simplifications do not negatively impact accuracy in many downstream applications. Moreover, the resulting model scales to larger datasets, is naturally interpretable, and yields up to two orders of magnitude speedup over FastGCN.

We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon. Compared with character-based methods, our model explicitly leverages word and word sequence information. Compared with word-based methods, lattice LSTM does not suffer from segmentation errors. Gated recurrent cells allow our model to choose the most relevant characters and words from a sentence for better NER results. Experiments on various datasets show that lattice LSTM outperforms both word-based and character-based LSTM baselines, achieving the best results.

北京阿比特科技有限公司