日本一区二区三区不卡网站,日本在线视频网站WWW色下载

In this work, we propose an efficient two-stage algorithm solving a joint problem of correlation detection and partial alignment recovery between two Gaussian databases. Correlation detection is a hypothesis testing problem; under the null hypothesis, the databases are independent, and under the alternate hypothesis, they are correlated, under an unknown row permutation. We develop bounds on the type-I and type-II error probabilities, and show that the analyzed detector performs better than a recently proposed detector, at least for some specific parameter choices. Since the proposed detector relies on a statistic, which is a sum of dependent indicator random variables, then in order to bound the type-I probability of error, we develop a novel graph-theoretic technique for bounding the $k$-th order moments of such statistics. When the databases are accepted as correlated, the algorithm also recovers some partial alignment between the given databases. We also propose two more algorithms: (i) One more algorithm for partial alignment recovery, whose reliability and computational complexity are both higher than those of the first proposed algorithm. (ii) An algorithm for full alignment recovery, which has a reduced amount of calculations and a not much lower error probability, when compared to the optimal recovery procedure.

相關內容

相關系數

關注 0

估計/估計量 · 似然 · 廣義線性模型 · 線性的 · 線性模型 ·

2023 年 7 月 14 日

Bounded-memory adjusted scores estimation in generalized linear models with large data sets

Patrick Zietkiewicz,Ioannis Kosmidis

The widespread use of maximum Jeffreys'-prior penalized likelihood in binomial-response generalized linear models, and in logistic regression, in particular, are supported by the results of Kosmidis and Firth (2021, Biometrika), who show that the resulting estimates are also always finite-valued, even in cases where the maximum likelihood estimates are not, which is a practical issue regardless of the size of the data set. In logistic regression, the implied adjusted score equations are formally bias-reducing in asymptotic frameworks with a fixed number of parameters and appear to deliver a substantial reduction in the persistent bias of the maximum likelihood estimator in high-dimensional settings where the number of parameters grows asymptotically linearly and slower than the number of observations. In this work, we develop and present two new variants of iteratively reweighted least squares for estimating generalized linear models with adjusted score equations for mean bias reduction and maximization of the likelihood penalized by a positive power of the Jeffreys-prior penalty, which eliminate the requirement of storing $O(n)$ quantities in memory, and can operate with data sets that exceed computer memory or even hard drive capacity. We achieve that through incremental QR decompositions, which enable IWLS iterations to have access only to data chunks of predetermined size. We assess the procedures through a real-data application with millions of observations, and in high-dimensional logistic regression, where a large-scale simulation experiment produces concrete evidence for the existence of a simple adjustment to the maximum Jeffreys'-penalized likelihood estimates that delivers high accuracy in terms of signal recovery even in cases where estimates from ML and other recently-proposed corrective methods do not exist.

NLP · 文本分類 · MoDELS · state-of-the-art · CASES ·

2023 年 7 月 13 日

Classical Out-of-Distribution Detection Methods Benchmark in Text Classification Tasks

Mateusz Baran,Joanna Baran,Mateusz Wójcik,Maciej Zi?ba,Adam Gonczarek

from arxiv, 11 pages, 3 figures, Association for Computational Linguistics

State-of-the-art models can perform well in controlled environments, but they often struggle when presented with out-of-distribution (OOD) examples, making OOD detection a critical component of NLP systems. In this paper, we focus on highlighting the limitations of existing approaches to OOD detection in NLP. Specifically, we evaluated eight OOD detection methods that are easily integrable into existing NLP systems and require no additional OOD data or model modifications. One of our contributions is providing a well-structured research environment that allows for full reproducibility of the results. Additionally, our analysis shows that existing OOD detection methods for NLP tasks are not yet sufficiently sensitive to capture all samples characterized by various types of distributional shifts. Particularly challenging testing scenarios arise in cases of background shift and randomly shuffled word order within in domain texts. This highlights the need for future work to develop more effective OOD detection approaches for the NLP problems, and our work provides a well-defined foundation for further research in this area.

Minimax · 優化器 · 類別 · 凸集 · 統計量 ·

2023 年 7 月 13 日

Optimal Decision Rules Under Partial Identification

Kohei Yata

I consider a class of statistical decision problems in which the policy maker must decide between two alternative policies to maximize social welfare based on a finite sample. The central assumption is that the underlying, possibly infinite-dimensional parameter, lies in a known convex set, potentially leading to partial identification of the welfare effect. An example of such restrictions is the smoothness of counterfactual outcome functions. As the main theoretical result, I derive a finite-sample, exact minimax regret decision rule within the class of all decision rules under normal errors with known variance. When the error distribution is unknown, I obtain a feasible decision rule that is asymptotically minimax regret. I apply my results to the problem of whether to change a policy eligibility cutoff in a regression discontinuity setup, and illustrate them in an empirical application to a school construction program in Burkina Faso.

Performer · 統計量 · Learning · 輸入分布 · Conformer ·

2023 年 7 月 12 日

Multiple Testing Framework for Out-of-Distribution Detection

Akshayaa Magesh,Venugopal V. Veeravalli,Anirban Roy,Susmit Jha

We study the problem of Out-of-Distribution (OOD) detection, that is, detecting whether a learning algorithm's output can be trusted at inference time. While a number of tests for OOD detection have been proposed in prior work, a formal framework for studying this problem is lacking. We propose a definition for the notion of OOD that includes both the input distribution and the learning algorithm, which provides insights for the construction of powerful tests for OOD detection. We propose a multiple hypothesis testing inspired procedure to systematically combine any number of different statistics from the learning algorithm using conformal p-values. We further provide strong guarantees on the probability of incorrectly classifying an in-distribution sample as OOD. In our experiments, we find that threshold-based tests proposed in prior work perform well in specific settings, but not uniformly well across different types of OOD instances. In contrast, our proposed method that combines multiple statistics performs uniformly well across different datasets and neural networks.

情景 · 估計/估計量 · 求逆 · Bloom filter · 相似度 ·

2023 年 7 月 12 日

A rate-compatible solution to the set reconciliation problem

Francisco Lázaro,Balázs Matuz

from arxiv, Accepted for publication in IEEE Transactions on Communications

We consider a set reconciliation setting in which two parties hold similar sets which they would like to reconcile In particular, we focus on set reconciliation based on invertible Bloom lookup tables (IBLTs), a probabilistic data structure inspired by Bloom filters but allowing for more complex operations. IBLT-based set reconciliation schemes have the advantage of exhibiting a low complexity, however, the schemes available in literature are known to be far from optimal in terms of communication complexity (overhead). The inefficiency of IBLT-based set reconciliation can be attributed to two facts. First, it requires an estimate of the cardinality of the set difference between the sets, which implies an increase in overhead. Second, in order to cope with errors in the aforementioned estimation of the cardinality of the set difference, IBLT schemes in literature make a worst-case assumption and oversize the data structures, thus further increasing the overhead. In this work, we present a novel IBLT-based set reconciliation protocol that does not require estimating the cardinality of the set difference. The scheme we propose relies on what we term multi-edge-type (MET) IBLTs. The simulation results shown in this paper show that the novel scheme outperforms previous IBLT-based approaches to set reconciliation

Weight · 通道 · 提議分布 · massive MIMO · MIMO ·

2023 年 7 月 12 日

On the Uplink Distributed Detection in UAV-enabled Aerial Cell-Free mMIMO Systems

Xuesong Pan,Zhong Zheng,Xueqing Huang,Zesong Fei

In this paper, we investigate the uplink signal detection approaches in the cell-free massive MIMO systems with unmanned aerial vehicles (UAVs) serving as aerial access points (APs). The ground users are equipped with multiple antennas and the ground-to-air propagation channels are subject to correlated Rician fading. To overcome huge signaling overhead in the fully-centralized detection, we propose a two-layer distributed uplink detection scheme, where the uplink signals are first detected in the AP-UAVs by using the minimum mean-squared error (MMSE) detector depending on local channel state information (CSI), and then collected and weighted combined at the CPU-UAV to obtain the refined detection. By using the operator-valued free probability theory, the asymptotic expressions of the combining weights are obtained, which only depend on the statistical CSI and show excellent accuracy. Based on the proposed distributed scheme, we further investigate the impacts of different distributed deployments on the achieved spectral efficiency (SE). Numerical results show that in urban and dense urban environments, it is more beneficial to deploy more AP-UAVs to achieve higher SE. On the other hand, in suburban environment, an optimal ratio between the number of deployed UAVs and the number of antennas per UAV exists to maximize the SE.

Networking · 控制器 · 查準率/準確率 · Analysis · DDoS ·

2023 年 7 月 12 日

Introducing Packet-Level Analysis in Programmable Data Planes to Advance Network Intrusion Detection

Roberto Doriguzzi-Corin,Luis Augusto Dias Knob,Luca Mendozzi,Domenico Siracusa,Marco Savi

Programmable data planes offer precise control over the low-level processing steps applied to network packets, serving as a valuable tool for analysing malicious flows in the field of intrusion detection. Albeit with limitations on physical resources and capabilities, they allow for the efficient extraction of detailed traffic information, which can then be utilised by Machine Learning (ML) algorithms responsible for identifying security threats. In addressing resource constraints, existing solutions in the literature rely on compressing network data through the collection of statistical traffic features in the data plane. While this compression saves memory resources in switches and minimises the burden on the control channel between the data and the control plane, it also results in a loss of information available to the Network Intrusion Detection System (NIDS), limiting access to packet payload, categorical features, and the semantic understanding of network communications, such as the behaviour of packets within traffic flows. This paper proposes P4DDLe, a framework that exploits the flexibility of P4-based programmable data planes for packet-level feature extraction and pre-processing. P4DDLe leverages the programmable data plane to extract raw packet features from the network traffic, categorical features included, and to organise them in a way that the semantics of traffic flows is preserved. To minimise memory and control channel overheads, P4DDLe selectively processes and filters packet-level data, so that all and only the relevant features required by the NIDS are collected. The experimental evaluation with recent Distributed Denial of Service (DDoS) attack data demonstrates that the proposed approach is very efficient in collecting compact and high-quality representations of network flows, ensuring precise detection of DDoS attacks.

坐標下降 · 估計/估計量 · GM · 成比例 · MoDELS ·

2023 年 7 月 10 日

On some algorithms for estimation in Gaussian graphical models

S?ren H?jsgaard,Steffen Lauritzen

In Gaussian graphical models, the likelihood equations must typically be solved iteratively. We investigate two algorithms: A version of iterative proportional scaling which avoids inversion of large matrices, resulting in increased speed when graphs are sparse and we compare this to an algorithm based on convex duality and operating on the covariance matrix by neighbourhood coordinate descent, essentially corresponding to the graphical lasso with zero penalty. For large, sparse graphs, this version of the iterative proportional scaling algorithm appears feasible and has simple convergence properties. The algorithm based on neighbourhood coordinate descent is extremely fast and less dependent on sparsity, but needs a positive definite starting value to converge, which may be difficult to achieve when the number of variables exceeds the number of observations.

單元 · 成比例 · 統計量 · bulk · 估計/估計量 ·

2023 年 7 月 9 日

Robust Statistical Inference for Cell Type Deconvolution

Dongyue Xie,Jingshu Wang

Cell type deconvolution is a computational method that estimates the proportions of different cell types within bulk transcriptomics data by leveraging information from reference single-cell RNA sequencing data. Despite its origin as a simple linear regression model, this approach faces challenges due to technical and biological variability and biases between the bulk and single-cell datasets. While several new methods have been developed, most only provide point estimates of cell type proportions, neglecting the uncertainty inherent in these estimates. Consequently, false positives can arise when comparing changes in cell type proportions across multiple individuals. In this paper, we introduce MEAD, a comprehensive statistical framework for efficient cell type deconvolution. Our approach constructs asymptotically valid confidence intervals for individual cell type proportions, as well as for quantifying changes in cell type proportions across multiple individuals. Our analysis accounts for factors such as biological variability in gene expressions, gene-gene dependence, cross-platform biases, and sequencing errors, without relying on parametric assumptions about the data distributions. Moreover, we establish necessary and sufficient conditions for identifying cell type proportions in the presence of platform-specific biases across sequencing technologies.

異常點 · CASES · 異常檢測 · 評論員 · Machine Learning ·

2021 年 10 月 21 日

Generalized Out-of-Distribution Detection: A Survey

Jingkang Yang,Kaiyang Zhou,Yixuan Li,Ziwei Liu

from arxiv, Issues, comments, and questions are all welcomed in //github.com/Jingkang50/OODSurvey

Out-of-distribution (OOD) detection is critical to ensuring the reliability and safety of machine learning systems. For instance, in autonomous driving, we would like the driving system to issue an alert and hand over the control to humans when it detects unusual scenes or objects that it has never seen before and cannot make a safe decision. This problem first emerged in 2017 and since then has received increasing attention from the research community, leading to a plethora of methods developed, ranging from classification-based to density-based to distance-based ones. Meanwhile, several other problems are closely related to OOD detection in terms of motivation and methodology. These include anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD). Despite having different definitions and problem settings, these problems often confuse readers and practitioners, and as a result, some existing studies misuse terms. In this survey, we first present a generic framework called generalized OOD detection, which encompasses the five aforementioned problems, i.e., AD, ND, OSR, OOD detection, and OD. Under our framework, these five problems can be seen as special cases or sub-tasks, and are easier to distinguish. Then, we conduct a thorough review of each of the five areas by summarizing their recent technical developments. We conclude this survey with open challenges and potential research directions.