亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Latent variable collaborative filtering methods have been a standard approach to modelling user-click interactions due to their simplicity and effectiveness. However, there is limited work on analyzing the mathematical properties of these methods in particular on preventing the overfitting towards the identity, and such methods typically utilize loss functions that overlook the geometry between items. In this work, we introduce a notion of generalization gap in collaborative filtering and analyze this with respect to latent collaborative filtering models. We present a geometric upper bound that gives rise to loss functions, and a way to meaningfully utilize the geometry of item-metadata to improve recommendations. We show how these losses can be minimized and gives the recipe to a new latent collaborative filtering algorithm, which we refer to as GeoCF, due to the geometric nature of our results. We then show experimentally that our proposed GeoCF algorithm can outperform other all existing methods on the Movielens20M and Netflix datasets, as well as two large-scale internal datasets. In summary, our work proposes a theoretically sound method which paves a way to better understand generalization of collaborative filtering at large.

相關內容

協同過濾(英語:Collaborative Filtering),簡單來說是利用某興趣相投、擁有共同經驗之群體的喜好來推薦用戶感興趣的信息,個人透過合作的機制給予信息相當程度的回應(如評分)并記錄下來以達到過濾的目的進而幫助別人篩選信息,回應不一定局限于特別感興趣的,特別不感興趣信息的紀錄也相當重要。協同過濾又可分為評比(rating)或者群體過濾(social filtering)。其后成為電子商務當中很重要的一環,即根據某顧客以往的購買行為以及從具有相似購買行為的顧客群的購買行為去推薦這個顧客其“可能喜歡的品項”,也就是借由社群的喜好提供個人化的信息、商品等的推薦服務。除了推薦之外,近年來也發展出數學運算讓系統自動計算喜好的強弱進而去蕪存菁使得過濾的內容更有依據,也許不是百分之百完全準確,但由于加入了強弱的評比讓這個概念的應用更為廣泛,除了電子商務之外尚有信息檢索領域、網絡個人影音柜、個人書架等的應用等。

Scoring rules are an established way of comparing predictive performances across model classes. In the context of survival analysis, they require adaptation in order to accommodate censoring. This work investigates using scoring rules for model training rather than evaluation. Doing so, we establish a general framework for training survival models that is model agnostic and can learn event time distributions parametrically or non-parametrically. In addition, our framework is not restricted to any specific scoring rule. While we focus on neural network-based implementations, we also provide proof-of-concept implementations using gradient boosting, generalized additive models, and trees. Empirical comparisons on synthetic and real-world data indicate that scoring rules can be successfully incorporated into model training and yield competitive predictive performance with established time-to-event models.

An important challenge in robust machine learning is when training data is provided by strategic sources who may intentionally report erroneous data for their own benefit. A line of work at the intersection of machine learning and mechanism design aims to deter strategic agents from reporting erroneous training data by designing learning algorithms that are strategyproof. Strategyproofness is a strong and desirable property, but it comes at a cost in the approximation ratio of even simple risk minimization problems. In this paper, we study strategyproof regression and classification problems in a model with advice. This model is part of a recent line on mechanism design with advice where the goal is to achieve both an improved approximation ratio when the advice is correct (consistency) and a bounded approximation ratio when the advice is incorrect (robustness). We provide the first non-trivial consistency-robustness tradeoffs for strategyproof regression and classification, which hold for simple yet interesting classes of functions. For classes of constant functions, we give a deterministic and strategyproof mechanism that is, for any $\gamma \in (0, 2]$, $1+\gamma$ consistent and $1 + 4/\gamma$ robust and provide a lower bound that shows that this tradeoff is optimal. We extend this mechanism and its guarantees to homogeneous linear regression over $\mathbb{R}$. In the binary classification problem of selecting from three or more labelings, we present strong impossibility results for both deterministic and randomized mechanism. Finally, we provide deterministic and randomized mechanisms for selecting from two labelings.

We propose a scaling law hypothesis for multimodal models processing text, audio, images, and video within a shared token and embedding space. Our framework predicts model performance based on modality-specific compression and tokenization efficiency, extending established scaling laws from text-based decoder models to mixed-modality systems. We explore whether leveraging more training data in multiple modalities can reduce the size of the multimodal model, enabling efficient deployment on resource-constrained devices.

We consider multi-user commitment models that capture the problem of enabling multiple bidders to simultaneously submit auctions to verifiers while ensuring that i) verifiers do not obtain information on the auctions until bidders reveal them at a later stage; and, ii) bidders cannot change their auction once committed. Specifically, we assume that bidders and verifiers have access to a noiseless channel as well as a noisy multiple-access channel or broadcast channel, where inputs are controlled by the bidders and outputs are observed by verifiers. In the case of multiple bidders and a single verifier connected by a non-redundant multiple-access channel, we characterize the commitment capacity region when bidders are not colluding. When the bidders are colluding, we derive an achievable region and a tight converse for the sum rate. In both cases our proposed achievable commitment schemes are constructive. In the case of a single bidder and multiple verifiers connected by a non-redundant broadcast channel, in which verifiers could drop out of the network after auctions are committed, we also characterize the commitment capacity. Our results demonstrate how commitment schemes can benefit from multi-user protocols, and develop resilience when some verifiers may become unavailable.

Pipeline parallelism is widely used to scale the training of transformer-based large language models, various works have been done to improve its throughput and memory footprint. In this paper, we address a frequently overlooked issue: the vocabulary layers can cause imbalanced computation and memory usage across pipeline stages, worsening pipeline bubbles and the memory bottleneck. To tackle this, we partition the vocabulary layers evenly across pipeline devices and group the computation into pipeline passes. To reduce the activation memory overhead, we propose several algorithms to reduce communication barriers within vocabulary layers. Additionally, we utilize a generalizable method to integrate Vocabulary Parallelism with existing pipeline schedules. By combining these techniques, our methods effectively balance the computation and parameter memory, with only a small constant activation memory overhead. Notably, when combined with activation memory-balanced schedules like V-Half, our approach achieves perfect balance in both memory and computation. Extensive evaluations demonstrate that our method achieves computation and memory balance regardless of the vocabulary size, resulting in a 5% to 51% improvement in throughput compared to naive approaches, meanwhile significantly reducing peak memory usage especially for large vocabulary scenarios. Our implementation is open-sourced at //github.com/sail-sg/VocabularyParallelism .

The adaptive processing of structured data is a long-standing research topic in machine learning that investigates how to automatically learn a mapping from a structured input to outputs of various nature. Recently, there has been an increasing interest in the adaptive processing of graphs, which led to the development of different neural network-based methodologies. In this thesis, we take a different route and develop a Bayesian Deep Learning framework for graph learning. The dissertation begins with a review of the principles over which most of the methods in the field are built, followed by a study on graph classification reproducibility issues. We then proceed to bridge the basic ideas of deep learning for graphs with the Bayesian world, by building our deep architectures in an incremental fashion. This framework allows us to consider graphs with discrete and continuous edge features, producing unsupervised embeddings rich enough to reach the state of the art on several classification tasks. Our approach is also amenable to a Bayesian nonparametric extension that automatizes the choice of almost all model's hyper-parameters. Two real-world applications demonstrate the efficacy of deep learning for graphs. The first concerns the prediction of information-theoretic quantities for molecular simulations with supervised neural models. After that, we exploit our Bayesian models to solve a malware-classification task while being robust to intra-procedural code obfuscation techniques. We conclude the dissertation with an attempt to blend the best of the neural and Bayesian worlds together. The resulting hybrid model is able to predict multimodal distributions conditioned on input graphs, with the consequent ability to model stochasticity and uncertainty better than most works. Overall, we aim to provide a Bayesian perspective into the articulated research field of deep learning for graphs.

It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small- and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.

It is always well believed that modeling relationships between objects would be helpful for representing and eventually describing an image. Nevertheless, there has not been evidence in support of the idea on image description generation. In this paper, we introduce a new design to explore the connections between objects for image captioning under the umbrella of attention-based encoder-decoder framework. Specifically, we present Graph Convolutional Networks plus Long Short-Term Memory (dubbed as GCN-LSTM) architecture that novelly integrates both semantic and spatial object relationships into image encoder. Technically, we build graphs over the detected objects in an image based on their spatial and semantic connections. The representations of each region proposed on objects are then refined by leveraging graph structure through GCN. With the learnt region-level features, our GCN-LSTM capitalizes on LSTM-based captioning framework with attention mechanism for sentence generation. Extensive experiments are conducted on COCO image captioning dataset, and superior results are reported when comparing to state-of-the-art approaches. More remarkably, GCN-LSTM increases CIDEr-D performance from 120.1% to 128.7% on COCO testing set.

Many current applications use recommendations in order to modify the natural user behavior, such as to increase the number of sales or the time spent on a website. This results in a gap between the final recommendation objective and the classical setup where recommendation candidates are evaluated by their coherence with past user behavior, by predicting either the missing entries in the user-item matrix, or the most likely next event. To bridge this gap, we optimize a recommendation policy for the task of increasing the desired outcome versus the organic user behavior. We show this is equivalent to learning to predict recommendation outcomes under a fully random recommendation policy. To this end, we propose a new domain adaptation algorithm that learns from logged data containing outcomes from a biased recommendation policy and predicts recommendation outcomes according to random exposure. We compare our method against state-of-the-art factorization methods, in addition to new approaches of causal recommendation and show significant improvements.

Deep neural networks (DNNs) have been found to be vulnerable to adversarial examples resulting from adding small-magnitude perturbations to inputs. Such adversarial examples can mislead DNNs to produce adversary-selected results. Different attack strategies have been proposed to generate adversarial examples, but how to produce them with high perceptual quality and more efficiently requires more research efforts. In this paper, we propose AdvGAN to generate adversarial examples with generative adversarial networks (GANs), which can learn and approximate the distribution of original instances. For AdvGAN, once the generator is trained, it can generate adversarial perturbations efficiently for any instance, so as to potentially accelerate adversarial training as defenses. We apply AdvGAN in both semi-whitebox and black-box attack settings. In semi-whitebox attacks, there is no need to access the original target model after the generator is trained, in contrast to traditional white-box attacks. In black-box attacks, we dynamically train a distilled model for the black-box model and optimize the generator accordingly. Adversarial examples generated by AdvGAN on different target models have high attack success rate under state-of-the-art defenses compared to other attacks. Our attack has placed the first with 92.76% accuracy on a public MNIST black-box attack challenge.

北京阿比特科技有限公司