亚洲十八禁无码在线免费观看,日韩一区二区三区免费视,精品人妻系列无码专区不卡,高清亞洲無碼AV一区二区三区,青青草草青青视频

Maximum Inner Product Search or top-k retrieval on sparse vectors is well-understood in information retrieval, with a number of mature algorithms that solve it exactly. However, all existing algorithms are tailored to text and frequency-based similarity measures. To achieve optimal memory footprint and query latency, they rely on the near stationarity of documents and on laws governing natural languages. We consider, instead, a setup in which collections are streaming -- necessitating dynamic indexing -- and where indexing and retrieval must work with arbitrarily distributed real-valued vectors. As we show, existing algorithms are no longer competitive in this setup, even against naive solutions. We investigate this gap and present a novel approximate solution, called Sinnamon, that can efficiently retrieve the top-k results for sparse real valued vectors drawn from arbitrary distributions. Notably, Sinnamon offers levers to trade-off memory consumption, latency, and accuracy, making the algorithm suitable for constrained applications and systems. We give theoretical results on the error introduced by the approximate nature of the algorithm, and present an empirical evaluation of its performance on two hardware platforms and synthetic and real-valued datasets. We conclude by laying out concrete directions for future research on this general top-k retrieval problem over sparse vectors.

相關內容

向量化

關注 1

Analysis · MoDELS · AIM · 查準率/準確率 · Integration ·

2023 年 3 月 16 日

Classifying Mental-Disorders through Clinicians Subjective Approach based on Three-way Decision

Md Sakib Ullah Sourav,Huidong Wang

from arxiv, for making this confidential

In psychiatric diagnosis, a contemporary data-driven, manual-based method for mental disorders classification is the most popular technique; however, it has several inevitable flaws. Using the three-way decision as a framework, we propose a unified model that stands for clinicians' subjective approach (CSA) analysis consisting of three parts: quantitative analysis, quantitative analysis, and evaluation-based analysis. A ranking list and a set of numerical weights based on illness magnitude levels according to the clinician's greatest degree of assumptions are the findings of the qualitative and quantitative investigation. We further create a comparative classification of illnesses into three groups with varying important levels; a three-way evaluation-based model is utilized in this study for the aim of understanding and portraying these results in a more clear way. This proposed method might be integrated with the manual-based process as a complementary tool to improve precision while diagnosing mental disorders

跡 · 穩健性 · 有向 · 似然 · 對數幾率回歸 ·

2023 年 3 月 16 日

Evaluation of distance-based approaches for forensic comparison: Application to hand odor evidence

Isabelle Rivals,Cédric Sautier,Guillaume Cognon,Vincent Cuzuel

The issue of distinguishing between the same-source and different-source hypotheses based on various types of traces is a generic problem in forensic science. This problem is often tackled with Bayesian approaches, which are able to provide a likelihood ratio that quantifies the relative strengths of evidence supporting each of the two competing hypotheses. Here, we focus on distance-based approaches, whose robustness and specifically whose capacity to deal with high-dimensional evidence are very different, and need to be evaluated and optimized. A unified framework for direct methods based on estimating the likelihoods of the distance between traces under each of the two competing hypotheses, and indirect methods using logistic regression to discriminate between same-source and different-source distance distributions, is presented. Whilst direct methods are more flexible, indirect methods are more robust and quite natural in machine learning. Moreover, indirect methods also enable the use of a vectorial distance, thus preventing the severe information loss suffered by scalar distance approaches.Direct and indirect methods are compared in terms of sensitivity, specificity and robustness, with and without dimensionality reduction, with and without feature selection, on the example of hand odor profiles, a novel and challenging type of evidence in the field of forensics. Empirical evaluations on a large panel of 534 subjects and their 1690 odor traces show the significant superiority of the indirect methods, especially without dimensionality reduction, be it with or without feature selection.

SSL · Learning · 數據集 · 可辨認的 · 標注 ·

2023 年 3 月 16 日

SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning

Mengxin Zheng,Jiaqi Xue,Xun Chen,Lei Jiang,Qian Lou

from arxiv, 10 pages, 6 figures

Self-supervised learning (SSL) is a commonly used approach to learning and encoding data representations. By using a pre-trained SSL image encoder and training a downstream classifier on top of it, impressive performance can be achieved on various tasks with very little labeled data. The increasing usage of SSL has led to an uptick in security research related to SSL encoders and the development of various Trojan attacks. The danger posed by Trojan attacks inserted in SSL encoders lies in their ability to operate covertly and spread widely among various users and devices. The presence of backdoor behavior in Trojaned encoders can inadvertently be inherited by downstream classifiers, making it even more difficult to detect and mitigate the threat. Although current Trojan detection methods in supervised learning can potentially safeguard SSL downstream classifiers, identifying and addressing triggers in the SSL encoder before its widespread dissemination is a challenging task. This is because downstream tasks are not always known, dataset labels are not available, and even the original training dataset is not accessible during the SSL encoder Trojan detection. This paper presents an innovative technique called SSL-Cleanse that is designed to detect and mitigate backdoor attacks in SSL encoders. We evaluated SSL-Cleanse on various datasets using 300 models, achieving an average detection success rate of 83.7% on ImageNet-100. After mitigating backdoors, on average, backdoored encoders achieve 0.24% attack success rate without great accuracy loss, proving the effectiveness of SSL-Cleanse.

GM · INFORMS · MoDELS · 相互獨立的 · 近似 ·

2023 年 3 月 15 日

An Approximate Bayesian Approach to Covariate-dependent Graphical Modeling

Sutanoy Dasgupta,Peng Zhao,Jacob Helwig,Prasenjit Ghosh,Debdeep Pati,Bani K. Mallick

Gaussian graphical models typically assume a homogeneous structure across all subjects, which is often restrictive in applications. In this article, we propose a weighted pseudo-likelihood approach for graphical modeling which allows different subjects to have different graphical structures depending on extraneous covariates. The pseudo-likelihood approach replaces the joint distribution by a product of the conditional distributions of each variable. We cast the conditional distribution as a heteroscedastic regression problem, with covariate-dependent variance terms, to enable information borrowing directly from the data instead of a hierarchical framework. This allows independent graphical modeling for each subject, while retaining the benefits of a hierarchical Bayes model and being computationally tractable. An efficient embarrassingly parallel variational algorithm is developed to approximate the posterior and obtain estimates of the graphs. Using a fractional variational framework, we derive asymptotic risk bounds for the estimate in terms of a novel variant of the $\alpha$-R\'{e}nyi divergence. We theoretically demonstrate the advantages of information borrowing across covariates over independent modeling. We show the practical advantages of the approach through simulation studies and illustrate the dependence structure in protein expression levels on breast cancer patients using CNV information as covariates.

MIMO · Networking · 優化器 · Processing（編程語言） · 可交換的 ·

2023 年 3 月 15 日

A Distributed Machine Learning-Based Approach for IRS-Enhanced Cell-Free MIMO Networks

Chen Chen,Sai Xu,Jiliang Zhang,Jie Zhang

from arxiv, This paper has been submitted to IEEE Transactions for possible publications

In cell-free multiple input multiple output (MIMO) networks, multiple base stations (BSs) collaborate to achieve high spectral efficiency. Nevertheless, high penetration loss due to large blockages in harsh propagation environments is often an issue that severely degrades communication performance. Considering that intelligent reflecting surface (IRS) is capable of constructing digitally controllable reflection links in a low-cost manner, we investigate an IRS-enhanced downlink cell-free MIMO network in this paper. We aim to maximize the sum rate of all the users by jointly optimizing the transmit beamforming at the BSs and the reflection coefficients at the IRS. To address the optimization problem, we propose a fully distributed machine learning algorithm. Different from the conventional iterative optimization algorithms that require a central processing at the central processing unit (CPU) and large amount of channel state information and signaling exchange between the BSs and the CPU, in the proposed algorithm, each BS can locally design its beamforming vectors. Meanwhile, the IRS reflection coefficients are determined by one of the BSs. Simulation results show that the deployment of IRS can significantly boost the sum user rate and that the proposed algorithm can achieve a high sum user rate with a low computational complexity.

INTERACT · 劃分 · 代價 · 容差 · 時間步 ·

2023 年 3 月 15 日

On the number of subproblem iterations per coupling step in partitioned fluid-structure interaction simulations

Thomas Spenke,Nicolas Delaissé,Joris Degroote,Norbert Hosters

In literature, the cost of a partitioned fluid-structure interaction scheme is typically assessed by the number of coupling iterations required per time step, while ignoring the internal iterations within the nonlinear subproblems. In this work, we demonstrate that these internal iterations have a significant influence on the computational cost of the coupled simulation. Particular attention is paid to how limiting the number of iterations within each solver call can shorten the overall run time, as it avoids polishing the subproblem solution using unconverged coupling data. Based on systematic parameter studies, we investigate the optimal number of subproblem iterations per coupling step. Lastly, this work proposes a new convergence criterion for coupled systems that is based on the residuals of the subproblems and therefore does not require any additional convergence tolerance for the coupling loop.

分離的 · 視覺問答 · 逼真度 · 估計/估計量 · 可約的 ·

2023 年 3 月 14 日

Quantum Steering Algorithm for Estimating Fidelity of Separability

Aby Philip,Soorya Rethinasamy,Vincent Russo,Mark M. Wilde

from arxiv, v1: 19 pages, 10 figures, all source code available as arXiv ancillary files

Quantifying entanglement is an important task by which the resourcefulness of a state can be measured. Here we develop a quantum algorithm that tests for and quantifies the separability of a general bipartite state, by making use of the quantum steering effect. Our first separability test consists of a distributed quantum computation involving two parties: a computationally limited client, who prepares a purification of the state of interest, and a computationally unbounded server, who tries to steer the reduced systems to a probabilistic ensemble of pure product states. To design a practical algorithm, we replace the role of the server by a combination of parameterized unitary circuits and classical optimization techniques to perform the necessary computation. The result is a variational quantum steering algorithm (VQSA), which is our second separability test that is better suited for the capabilities of quantum computers available today. This VQSA has an additional interpretation as a distributed variational quantum algorithm (VQA) that can be executed over a quantum network, in which each node is equipped with classical and quantum computers capable of executing VQA. We then simulate our VQSA on noisy quantum simulators and find favorable convergence properties on the examples tested. We also develop semidefinite programs, executable on classical computers, that benchmark the results obtained from our VQSA. Our findings here thus provide a meaningful connection between steering, entanglement, quantum algorithms, and quantum computational complexity theory. They also demonstrate the value of a parameterized mid-circuit measurement in a VQSA and represent a first-of-its-kind application for a distributed VQA. Finally, the whole framework generalizes to the case of multipartite states and entanglement.

ARM · Processing（編程語言） · 可約的 · 賭博機/老虎機 · 樣本復雜度 ·

2023 年 3 月 14 日

Best arm identification in rare events

Anirban Bhattacharjee,Sushant Vijayan,Sandeep K Juneja

from arxiv, 32 pages

We consider the best arm identification problem in the stochastic multi-armed bandit framework where each arm has a tiny probability of realizing large rewards while with overwhelming probability the reward is zero. A key application of this framework is in online advertising where click rates of advertisements could be a fraction of a single percent and final conversion to sales, while highly profitable, may again be a small fraction of the click rates. Lately, algorithms for BAI problems have been developed that minimise sample complexity while providing statistical guarantees on the correct arm selection. As we observe, these algorithms can be computationally prohibitive. We exploit the fact that the reward process for each arm is well approximated by a Compound Poisson process to arrive at algorithms that are faster, with a small increase in sample complexity. We analyze the problem in an asymptotic regime as rarity of reward occurrence reduces to zero, and reward amounts increase to infinity. This helps illustrate the benefits of the proposed algorithm. It also sheds light on the underlying structure of the optimal BAI algorithms in the rare event setting.

損失 · 損失函數（機器學習） · binary · 泛函 · 近似 ·

2023 年 3 月 13 日

General Loss Functions Lead to (Approximate) Interpolation in High Dimensions

Kuo-Wei Lai,Vidya Muthukumar

from arxiv, 52 pages

We provide a unified framework, applicable to a general family of convex losses and across binary and multiclass settings in the overparameterized regime, to approximately characterize the implicit bias of gradient descent in closed form. Specifically, we show that the implicit bias is approximated (but not exactly equal to) the minimum-norm interpolation in high dimensions, which arises from training on the squared loss. In contrast to prior work which was tailored to exponentially-tailed losses and used the intermediate support-vector-machine formulation, our framework directly builds on the primal-dual analysis of Ji and Telgarsky (2021), allowing us to provide new approximate equivalences for general convex losses through a novel sensitivity analysis. Our framework also recovers existing exact equivalence results for exponentially-tailed losses across binary and multiclass settings. Finally, we provide evidence for the tightness of our techniques, which we use to demonstrate the effect of certain loss functions designed for out-of-distribution problems on the closed-form solution.

主動學習 · 自由能 · Extensibility · 學成 · TAP ·

2021 年 12 月 2 日

Active Learning for Domain Adaptation: An Energy-based Approach

Binhui Xie,Longhui Yuan,Shuang Li,Chi Harold Liu,Xinjing Cheng,Guoren Wang

from arxiv, Accepted by AAAI 2022. Code is available at //github.com/BIT-DA/EADA

Unsupervised domain adaptation has recently emerged as an effective paradigm for generalizing deep neural networks to new target domains. However, there is still enormous potential to be tapped to reach the fully supervised performance. In this paper, we present a novel active learning strategy to assist knowledge transfer in the target domain, dubbed active domain adaptation. We start from an observation that energy-based models exhibit free energy biases when training (source) and test (target) data come from different distributions. Inspired by this inherent mechanism, we empirically reveal that a simple yet efficient energy-based sampling strategy sheds light on selecting the most valuable target samples than existing approaches requiring particular architectures or computation of the distances. Our algorithm, Energy-based Active Domain Adaptation (EADA), queries groups of targe data that incorporate both domain characteristic and instance uncertainty into every selection round. Meanwhile, by aligning the free energy of target data compact around the source domain via a regularization term, domain gap can be implicitly diminished. Through extensive experiments, we show that EADA surpasses state-of-the-art methods on well-known challenging benchmarks with substantial improvements, making it a useful option in the open world. Code is available at //github.com/BIT-DA/EADA.