免费在线黄色电影-中文字幕一二三区乱码不卡

Given a heterogeneous Gaussian sequence model with unknown mean $\theta \in \mathbb R^d$ and known covariance matrix $\Sigma = \operatorname{diag}(\sigma_1^2,\dots, \sigma_d^2)$, we study the signal detection problem against sparse alternatives, for known sparsity $s$. Namely, we characterize how large $\epsilon^*>0$ should be, in order to distinguish with high probability the null hypothesis $\theta=0$ from the alternative composed of $s$-sparse vectors in $\mathbb R^d$, separated from $0$ in $L^t$ norm ($t \in [1,\infty]$) by at least $\epsilon^*$. We find minimax upper and lower bounds over the minimax separation radius $\epsilon^*$ and prove that they are always matching. We also derive the corresponding minimax tests achieving these bounds. Our results reveal new phase transitions regarding the behavior of $\epsilon^*$ with respect to the level of sparsity, to the $L^t$ metric, and to the heteroscedasticity profile of $\Sigma$. In the case of the Euclidean (i.e. $L^2$) separation, we bridge the remaining gaps in the literature.

相關內容

Minimax

關注 0

離散化 · 納什均衡 · Continuity · 近似 · 優化器 ·

2023 年 5 月 9 日

Computing Bayes Nash Equilibrium Strategies in Auction Games via Simultaneous Online Dual Averaging

Martin Bichler,Maximilian Fichtl,Matthias Oberlechner

Auctions are modeled as Bayesian games with continuous type and action spaces. Determining equilibria in auction games is computationally hard in general and no exact solution theory is known. We introduce an algorithmic framework in which we discretize type and action space and then learn distributional strategies via online optimization algorithms. One advantage of distributional strategies is that we do not have to make any assumptions on the shape of the bid function. Besides, the expected utility of agents is linear in the strategies. It follows that if our optimization algorithms converge to a pure strategy, then they converge to an approximate equilibrium of the discretized game with high precision. Importantly, we show that the equilibrium of the discretized game approximates an equilibrium in the continuous game. In a wide variety of auction games, we provide empirical evidence that the approach approximates the analytical (pure) Bayes Nash equilibrium closely. This speed and precision is remarkable, because in many finite games learning dynamics do not converge or are even chaotic. In standard models where agents are symmetric, we find equilibrium in seconds. While we focus on dual averaging, we show that the overall approach converges independent of the regularizer and alternative online convex optimization methods achieve similar results, even though the discretized game neither satisfies monotonicity nor variational stability globally. The method allows for interdependent valuations and different types of utility functions and provides a foundation for broadly applicable equilibrium solvers that can push the boundaries of equilibrium analysis in auction markets and beyond.

解碼 · 估計/估計量 · 和積 · 通道 · state-of-the-art ·

2023 年 5 月 9 日

On the Limits of HARQ Prediction for Short Deterministic Codes with Error Detection in Memoryless Channels (Extended Version with Proofs)

Bar?? G?ktepe,Cornelius Hellge,Tatiana Rykova,Thomas Schierl,Slawomir Stanczak

from arxiv, ISIT23

We provide a mathematical framework to analyze the limits of Hybrid Automatic Repeat reQuest (HARQ) and derive analytical expressions for the most powerful test for estimating the decodability under maximum-likelihood decoding and $t$-error decoding. Furthermore, we numerically approximate the most powerful test for sum-product decoding. We compare the performance of previously studied HARQ prediction schemes and show that none of the state-of-the-art HARQ prediction is most powerful to estimate the decodability of a partially received signal vector under maximum-likelihood decoding and sum-product decoding. Furthermore, we demonstrate that decoding in general is suboptimal for predicting the decodability.

約束 · 泛函 · 近似 · 優化器 · 全局優化 ·

2023 年 5 月 8 日

Computation of Rate-Distortion-Perception Function under f-Divergence Perception Constraints

Giuseppe Serra,Photios A. Stavrou,Marios Kountouris

from arxiv, Accepted paper to ISIT 2023 without proofs

In this paper, we study the computation of the rate-distortion-perception function (RDPF) for discrete memoryless sources subject to a single-letter average distortion constraint and a perception constraint that belongs to the family of f-divergences. For that, we leverage the fact that RDPF, assuming mild regularity conditions on the perception constraint, forms a convex programming problem. We first develop parametric characterizations of the optimal solution and utilize them in an alternating minimization approach for which we prove convergence guarantees. The resulting structure of the iterations of the alternating minimization approach renders the implementation of a generalized Blahut-Arimoto (BA) type of algorithm infeasible. To overcome this difficulty, we propose a relaxed formulation of the structure of the iterations in the alternating minimization approach, which allows for the implementation of an approximate iterative scheme. This approximation is shown, via the derivation of necessary and sufficient conditions, to guarantee convergence to a globally optimal solution. We also provide sufficient conditions on the distortion and the perception constraints which guarantee that our algorithm converges exponentially fast. We corroborate our theoretical results with numerical simulations, and we draw connections with existing results.

線性的 · CASE · 求逆 · Performer · Integration ·

2023 年 5 月 8 日

Diagonalization-based preconditioners and generalized convergence bounds for ParaOpt

Arne Bouillon,Giovanni Samaey,Karl Meerbergen

The ParaOpt algorithm was recently introduced as a time-parallel solver for optimal-control problems with a terminal-cost objective, and convergence results have been presented for the linear diffusive case with implicit-Euler time integrators. We reformulate ParaOpt for tracking problems and provide generalized convergence analyses for both objectives. We focus on linear diffusive equations and prove convergence bounds that are generic in the time integrators used. For large problem dimensions, ParaOpt's performance depends crucially on having a good preconditioner to solve the arising linear systems. For the case where ParaOpt's cheap, coarse-grained propagator is linear, we introduce diagonalization-based preconditioners inspired by recent advances in the ParaDiag family of methods. These preconditioners not only lead to a weakly-scalable ParaOpt version, but are themselves invertible in parallel, making maximal use of available concurrency. They have proven convergence properties in the linear diffusive case that are generic in the time discretization used, similarly to our ParaOpt results. Numerical results confirm that the iteration count of the iterative solvers used for ParaOpt's linear systems becomes constant in the limit of an increasing processor count. The paper is accompanied by a sequential MATLAB implementation.

INFORMS · Information Systems · 泛函 · MoDELS · 正則化項 ·

2023 年 5 月 7 日

Rate-Distortion Theory for Mixed States

Zahra Baghali Khanian,Kohdai Kuroiwa,Debbie Leung

from arxiv, Substantial revisions with additional results

In this paper we consider the compression of asymptotically many i.i.d. copies of ensembles of mixed quantum states where the encoder has access to a side information system. The figure of merit is per-copy or local error criterion. Rate-distortion theory studies the trade-off between the compression rate and the per-copy error. The optimal trade-off can be characterized by the rate-distortion function, which is the best rate given a certain distortion. In this paper, we derive the rate-distortion function of mixed-state compression. The rate-distortion functions in the entanglement-assisted and unassisted scenarios are in terms of a single-letter mutual information quantity and the regularized entanglement of purification, respectively. For the general setting where the consumption of both communication and entanglement are considered, we present the full qubit-entanglement rate region. Our compression scheme covers both blind and visible compression models (and other models in between) depending on the structure of the side information system.

估計/估計量 · Minimax · SIR · Performer · Subspace ·

2023 年 5 月 7 日

Sliced Inverse Regression with Large Structural Dimensions

Dongming Huang,Songtao Tian,Qian Lin

from arxiv, 63 pages,44 figures

The central space of a joint distribution $(\vX,Y)$ is the minimal subspace $\mathcal S$ such that $Y\perp\hspace{-2mm}\perp \vX \mid P_{\mathcal S}\vX$ where $P_{\mathcal S}$ is the projection onto $\mathcal S$. Sliced inverse regression (SIR), one of the most popular methods for estimating the central space, often performs poorly when the structural dimension $d=\operatorname{dim}\left( \mathcal S \right)$ is large (e.g., $\geqs 5$). In this paper, we demonstrate that the generalized signal-noise-ratio (gSNR) tends to be extremely small for a general multiple-index model when $d$ is large. Then we determine the minimax rate for estimating the central space over a large class of high dimensional distributions with a large structural dimension $d$ (i.e., there is no constant upper bound on $d$) in the low gSNR regime. This result not only extends the existing minimax rate results for estimating the central space of distributions with fixed $d$ to that with a large $d$, but also clarifies that the degradation in SIR performance is caused by the decay of signal strength. The technical tools developed here might be of independent interest for studying other central space estimation methods.

語言模型化 · 代碼 · Continuity · 向量化 · MoDELS ·

2023 年 5 月 5 日

Large Language Models for Code: Security Hardening and Adversarial Testing

Jingxuan He,Martin Vechev

Large language models (LMs) are increasingly pretrained on massive codebases and used to generate code. However, LMs lack awareness of security and are found to frequently produce unsafe code. This work studies the security of LMs along two important axes: (i) security hardening, which aims to enhance LMs' reliability in generating secure code, and (ii) adversarial testing, which seeks to evaluate LMs' security at an adversarial standpoint. We address both of these by formulating a new security task called controlled code generation. The task is parametric and takes as input a binary property to guide the LM to generate secure or unsafe code, while preserving the LM's capability of generating functionally correct code. We propose a novel learning-based approach called SVEN to solve this task. SVEN leverages property-specific continuous vectors to guide program generation towards the given property, without modifying the LM's weights. Our training procedure optimizes these continuous vectors by enforcing specialized loss terms on different regions of code, using a high-quality dataset carefully curated by us. Our extensive evaluation shows that SVEN is highly effective in achieving strong security control. For instance, a state-of-the-art CodeGen LM with 2.7B parameters generates secure code for 59.1% of the time. When we employ SVEN to perform security hardening (or adversarial testing) on this LM, the ratio is significantly boosted to 92.3% (or degraded to 36.8%). Importantly, SVEN closely matches the original LMs in functional correctness.

正則化項 · 平滑 · Learning · CASES · 核化 ·

2023 年 5 月 5 日

Random Smoothing Regularization in Kernel Gradient Descent Learning

Liang Ding,Tianyang Hu,Jiahang Jiang,Donghao Li,Wenjia Wang,Yuan Yao

Random smoothing data augmentation is a unique form of regularization that can prevent overfitting by introducing noise to the input data, encouraging the model to learn more generalized features. Despite its success in various applications, there has been a lack of systematic study on the regularization ability of random smoothing. In this paper, we aim to bridge this gap by presenting a framework for random smoothing regularization that can adaptively and effectively learn a wide range of ground truth functions belonging to the classical Sobolev spaces. Specifically, we investigate two underlying function spaces: the Sobolev space of low intrinsic dimension, which includes the Sobolev space in $D$-dimensional Euclidean space or low-dimensional sub-manifolds as special cases, and the mixed smooth Sobolev space with a tensor structure. By using random smoothing regularization as novel convolution-based smoothing kernels, we can attain optimal convergence rates in these cases using a kernel gradient descent algorithm, either with early stopping or weight decay. It is noteworthy that our estimator can adapt to the structural assumptions of the underlying data and avoid the curse of dimensionality. This is achieved through various choices of injected noise distributions such as Gaussian, Laplace, or general polynomial noises, allowing for broad adaptation to the aforementioned structural assumptions of the underlying data. The convergence rate depends only on the effective dimension, which may be significantly smaller than the actual data dimension. We conduct numerical experiments on simulated data to validate our theoretical results.

線性的 · 優化器 · 閾值 · 稀疏 · Performer ·

2023 年 5 月 5 日

Heavy-ball-based optimal thresholding algorithms for sparse linear inverse problems

Zhong-Feng Sun,Jin-Chuan Zhou,Yun-Bin Zhao

Linear inverse problems arise in diverse engineering fields especially in signal and image reconstruction. The development of computational methods for linear inverse problems with sparsity tool is one of the recent trends in this area. The so-called optimal $k$-thresholding is a newly introduced method for sparse optimization and linear inverse problems. Compared to other sparsity-aware algorithms, the advantage of optimal $k$-thresholding method lies in that it performs thresholding and error metric reduction simultaneously and thus works stably and robustly for solving medium-sized linear inverse problems. However, the runtime of this method remains high when the problem size is relatively large. The purpose of this paper is to propose an acceleration strategy for this method. Specifically, we propose a heavy-ball-based optimal $k$-thresholding (HBOT) algorithm and its relaxed variants for sparse linear inverse problems. The convergence of these algorithms is shown under the restricted isometry property. In addition, the numerical performance of the heavy-ball-based relaxed optimal $k$-thresholding pursuit (HBROTP) has been studied, and simulations indicate that HBROTP admits robust capability for signal and image reconstruction even in noisy environments.

語言模型化 · MoDELS · 詞表 · 優化器 · state-of-the-art ·

2019 年 9 月 25 日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Sanqiang Zhao,Raghav Gupta,Yang Song,Denny Zhou

Pre-trained deep neural network language models such as ELMo, GPT, BERT and XLNet have recently achieved state-of-the-art performance on a variety of language understanding tasks. However, their size makes them impractical for a number of scenarios, especially on mobile and edge devices. In particular, the input word embedding matrix accounts for a significant proportion of the model's memory footprint, due to the large input vocabulary and embedding dimensions. Knowledge distillation techniques have had success at compressing large neural network models, but they are ineffective at yielding student models with vocabularies different from the original teacher models. We introduce a novel knowledge distillation technique for training a student model with a significantly smaller vocabulary as well as lower embedding and hidden state dimensions. Specifically, we employ a dual-training mechanism that trains the teacher and student models simultaneously to obtain optimal word embeddings for the student vocabulary. We combine this approach with learning shared projection matrices that transfer layer-wise knowledge from the teacher model to the student model. Our method is able to compress the BERT_BASE model by more than 60x, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7MB. Experimental results also demonstrate higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques.