欧美丰满大乳屁股流白浆-久久91超碰色中文字幕总站

The fundamental concept underlying K-Nearest Neighbors (KNN) is the classification of samples based on the majority through their nearest neighbors. Although distance and neighbors' labels are critical in KNN, traditional KNN treats all samples equally. However, some KNN variants weigh neighbors differently based on a specific rule, considering each neighbor's distance and label. Many KNN methodologies introduce complex algorithms that do not significantly outperform the traditional KNN, often leading to less satisfactory outcomes. The gap in reliably extracting information for accurately predicting true weights remains an open research challenge. In our proposed method, information-modified KNN (IMKNN), we bridge the gap by presenting a straightforward algorithm that achieves effective results. To this end, we introduce a classification method to improve the performance of the KNN algorithm. By exploiting mutual information (MI) and incorporating ideas from Shapley's values, we improve the traditional KNN performance in accuracy, precision, and recall, offering a more refined and effective solution. To evaluate the effectiveness of our method, it is compared with eight variants of KNN. We conduct experiments on 12 widely-used datasets, achieving 11.05\%, 12.42\%, and 12.07\% in accuracy, precision, and recall performance, respectively, compared to traditional KNN. Additionally, we compared IMKNN with traditional KNN across four large-scale datasets to highlight the distinct advantages of IMKNN in the impact of monotonicity, noise, density, subclusters, and skewed distributions. Our research indicates that IMKNN consistently surpasses other methods in diverse datasets.

相關內容

KNN

關注 0

線性的 · 工商管理碩士（MBA） · 相似度 · 變換 · 類別 ·

2024 年 6 月 21 日

Deobfuscation of Semi-Linear Mixed Boolean-Arithmetic Expressions

Colton Skees

from arxiv, - Note that Zhou et al's 2007 paper was first to prove the N-bit to 1-bit transform, as opposed to Liu et al. in 2021 - Update email href - Apply constant propagation over multiplication by two constants before computing # of nodes - Change example MBA used in the introduction section

Mixed Boolean-Arithmetic (MBA) obfuscation is a common technique used to transform simple expressions into semantically equivalent but more complex combinations of boolean and arithmetic operators. Its widespread usage in DRM systems, malware, and software protectors is well documented. In 2021, Liu et al. proposed a groundbreaking method of simplifying linear MBAs, utilizing a hidden two-way transformation between 1-bit and n-bit variables. In 2022, Reichenwallner et al. proposed a similar but more effective method of simplifying linear MBAs, SiMBA, relying on a similar but more involved theorem. However, because current linear MBA simplifiers operate in 1-bit space, they cannot handle expressions which utilize constants inside of their bitwise operands, e.g. (x&1), (x&1111) + (y&1111). We propose an extension to SiMBA that enables simplification of this broader class of expressions. It surpasses peer tools, achieving efficient simplification of a class of MBAs that current simplifiers struggle with.

Attention · GROUP · 變換 · 模型性能 · MoDELS ·

2024 年 6 月 21 日

Optimised Grouped-Query Attention Mechanism for Transformers

Yuang Chen,Cheng Zhang,Xitong Gao,Robert D. Mullins,George A. Constantinides,Yiren Zhao

from arxiv, Accepted at ICML2024 ES-FoMo-II Workshop

Grouped-query attention (GQA) has been widely adopted in LLMs to mitigate the complexity of multi-head attention (MHA). To transform an MHA to a GQA, neighbour queries in MHA are evenly split into groups where each group shares the value and key layers. In this work, we propose AsymGQA, an activation-informed approach to asymmetrically grouping an MHA to a GQA for better model performance. Our AsymGQA outperforms the GQA within the same model size budget. For example, AsymGQA LLaMA-2-7B has an accuracy increase of 7.5% on MMLU compared to neighbour grouping. Our approach addresses the GQA's trade-off problem between model performance and hardware efficiency.

估計/估計量 · 設計 · 損失函數（機器學習） · 吉布斯采樣/吉布斯抽樣 · Continuity ·

2024 年 6 月 19 日

Hierarchical Regression Discontinuity Design: Pursuing Subgroup Treatment Effects

Shonosuke Sugasawa,Takuya Ishihara,Daisuke Kurisu

from arxiv, 24 pages

Regression discontinuity design (RDD) is widely adopted for causal inference under intervention determined by a continuous variable. While one is interested in treatment effect heterogeneity by subgroups in many applications, RDD typically suffers from small subgroup-wise sample sizes, which makes the estimation results highly instable. To solve this issue, we introduce hierarchical RDD (HRDD), a hierarchical Bayes approach for pursuing treatment effect heterogeneity in RDD. A key feature of HRDD is to employ a pseudo-model based on a loss function to estimate subgroup-level parameters of treatment effects under RDD, and assign a hierarchical prior distribution to ''borrow strength'' from other subgroups. The posterior computation can be easily done by a simple Gibbs sampling, and the optimal bandwidth can be automatically selected by the Hyv\"{a}rinen scores for unnormalized models. We demonstrate the proposed HRDD through simulation and real data analysis, and show that HRDD provides much more stable point and interval estimation than separately applying the standard RDD method to each subgroup.

Prophet · 樣本 · 在線 · 算法與數據結構 ·

2024 年 6 月 18 日

Sample-Based Matroid Prophet Inequalities

Hu Fu,Pinyan Lu,Zhihao Gavin Tang,Hongxun Wu,Jinzhao Wu,Qianfan Zhang

from arxiv, To appear at EC'24

We study matroid prophet inequalities when distributions are unknown and accessible only through samples. While single-sample prophet inequalities for special matroids are known, no constant-factor competitive algorithm with even a sublinear number of samples was known for general matroids. Adding more to the stake, the single-sample version of the question for general matroids has close (two-way) connections with the long-standing matroid secretary conjecture. In this work, we give a $(\frac14 - \varepsilon)$-competitive matroid prophet inequality with only $O_\varepsilon(\mathrm{poly} \log n)$ samples. Our algorithm consists of two parts: (i) a novel quantile-based reduction from matroid prophet inequalities to online contention resolution schemes (OCRSs) with $O_\varepsilon(\log n)$ samples, and (ii) a $(\frac14 - \varepsilon)$-selectable matroid OCRS with $O_\varepsilon(\mathrm{poly} \log n)$ samples which carefully addresses an adaptivity challenge.

代價 · 情景 · Agent · 近似 · 可行 ·

2024 年 6 月 18 日

Agent-Constrained Truthful Facility Location Games

Argyrios Deligkas,Mohammad Lotfi,Alexandros A. Voudouris

We consider a truthful facility location problem in which there is a set of agents with private locations on the line of real numbers, and the goal is to place a number of facilities at different locations chosen from the set of those reported by the agents. Given a feasible solution, each agent suffers an individual cost that is either its total distance to all facilities (sum-variant) or its distance to the farthest facility (max-variant). For both variants, we show tight bounds on the approximation ratio of strategyproof mechanisms in terms of the social cost, the total individual cost of the agents.

可理解性 · 可辨認的 · TOOLS · state-of-the-art · HTTPS ·

2024 年 6 月 18 日

Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

Bowen Jiang,Yangxinyu Xie,Xiaomeng Wang,Weijie J. Su,Camillo J. Taylor,Tanwi Mallick

Rationality is the quality of being guided by reason, characterized by logical thinking and decision-making that align with evidence and logical rules. This quality is essential for effective problem-solving, as it ensures that solutions are well-founded and systematically derived. Despite the advancements of large language models (LLMs) in generating human-like text with remarkable accuracy, they present biases inherited from the training data, inconsistency across different contexts, and difficulty understanding complex scenarios involving multiple layers of context. Therefore, recent research attempts to leverage the strength of multiple agents working collaboratively with various types of data and tools for enhanced consistency and reliability. To that end, this paper aims to understand whether multi-modal and multi-agent systems are advancing toward rationality by surveying the state-of-the-art works, identifying advancements over single-agent and single-modal systems in terms of rationality, and discussing open problems and future directions. We maintain an open repository at //github.com/bowen-upenn/MMMA_Rationality.

圖 · INFORMS · Storage · 推斷 · 語言模型化 ·

2024 年 6 月 18 日

Privacy-Preserved Neural Graph Databases

Qi Hu,Haoran Li,Jiaxin Bai,Zihao Wang,Yangqiu Song

In the era of large language models (LLMs), efficient and accurate data retrieval has become increasingly crucial for the use of domain-specific or private data in the retrieval augmented generation (RAG). Neural graph databases (NGDBs) have emerged as a powerful paradigm that combines the strengths of graph databases (GDBs) and neural networks to enable efficient storage, retrieval, and analysis of graph-structured data which can be adaptively trained with LLMs. The usage of neural embedding storage and Complex neural logical Query Answering (CQA) provides NGDBs with generalization ability. When the graph is incomplete, by extracting latent patterns and representations, neural graph databases can fill gaps in the graph structure, revealing hidden relationships and enabling accurate query answering. Nevertheless, this capability comes with inherent trade-offs, as it introduces additional privacy risks to the domain-specific or private databases. Malicious attackers can infer more sensitive information in the database using well-designed queries such as from the answer sets of where Turing Award winners born before 1950 and after 1940 lived, the living places of Turing Award winner Hinton are probably exposed, although the living places may have been deleted in the training stage due to the privacy concerns. In this work, we propose a privacy-preserved neural graph database (P-NGDB) framework to alleviate the risks of privacy leakage in NGDBs. We introduce adversarial training techniques in the training stage to enforce the NGDBs to generate indistinguishable answers when queried with private information, enhancing the difficulty of inferring sensitive information through combinations of multiple innocuous queries.

穩健性 · 置換 · 解碼 · 輸出 · MoDELS ·

2024 年 6 月 18 日

Pseudorandom Error-Correcting Codes

Miranda Christ,Sam Gunn

We construct pseudorandom error-correcting codes (or simply pseudorandom codes), which are error-correcting codes with the property that any polynomial number of codewords are pseudorandom to any computationally-bounded adversary. Efficient decoding of corrupted codewords is possible with the help of a decoding key. We build pseudorandom codes that are robust to substitution and deletion errors, where pseudorandomness rests on standard cryptographic assumptions. Specifically, pseudorandomness is based on either $2^{O(\sqrt{n})}$-hardness of LPN, or polynomial hardness of LPN and the planted XOR problem at low density. As our primary application of pseudorandom codes, we present an undetectable watermarking scheme for outputs of language models that is robust to cropping and a constant rate of random substitutions and deletions. The watermark is undetectable in the sense that any number of samples of watermarked text are computationally indistinguishable from text output by the original model. This is the first undetectable watermarking scheme that can tolerate a constant rate of errors. Our second application is to steganography, where a secret message is hidden in innocent-looking content. We present a constant-rate stateless steganography scheme with robustness to a constant rate of substitutions. Ours is the first stateless steganography scheme with provable steganographic security and any robustness to errors.

估計/估計量 · 樣本 · Copulas · Extensibility · 統計量 ·

2024 年 6 月 14 日

Alternative Approaches for Estimating Highest-Density Regions

Nina Deliu,Brunero Liseo

from arxiv, Main paper: 26 pages (7 Figures, 1 Table); Supplementary Material: 36 pages

Among the variety of statistical intervals, highest-density regions (HDRs) stand out for their ability to effectively summarize a distribution or sample, unveiling its distinctive and salient features. An HDR represents the minimum size set that satisfies a certain probability coverage, and current methods for their computation require knowledge or estimation of the underlying probability distribution or density $f$. In this work, we illustrate a broader framework for computing HDRs, which generalizes the classical density quantile method introduced in the seminal paper of Hyndman (1996). The framework is based on neighbourhood measures, i.e., measures that preserve the order induced in the sample by $f$, and include the density $f$ as a special case. We explore a number of suitable distance-based measures, such as the $k$-nearest neighborhood distance, and some probabilistic variants based on copula models. An extensive comparison is provided, showing the advantages of the copula-based strategy, especially in those scenarios that exhibit complex structures (e.g., multimodalities or particular dependencies). Finally, we discuss the practical implications of our findings for estimating HDRs in real-world applications.

自動問答 · MoDELS · Networking · Processing（編程語言） · state-of-the-art ·

2018 年 6 月 1 日

An Interpretable Reasoning Network for Multi-Relation Question Answering

Mantong Zhou,Minlie Huang,Xiaoyan Zhu

from arxiv, COLING 2018, 13pages

Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis, thereby allowing manual manipulation in predicting the final answer.