It is well known that bridge regression enjoys superior theoretical properties than traditional LASSO. However, the current latent variable representation of its Bayesian counterpart, based on the exponential power prior, is computationally expensive in higher dimensions. In this paper, we show that the exponential power prior has a closed-form scale mixture of normal decomposition for $\alpha=(\frac{1}{2})^\gamma, \gamma \in \mathbb{N}^+$. We develop a partially collapsed Gibbs sampling scheme, which outperforms existing Markov chain Monte Carlo strategies, we also study theoretical properties under this prior when $p>n$. In addition, we introduce a non-separable bridge penalty function inspired by the fully Bayesian formulation and a novel, efficient, coordinate-descent algorithm. We prove the algorithm's convergence and show that the local minimizer from our optimization algorithm has an oracle property. Finally, simulation studies were carried out to illustrate the performance of the new algorithms.
This paper studies the non-asymptotic merits of the double $\ell_1$-regularized for heterogeneous overdispersed count data via negative binomial regressions. Under the restricted eigenvalue conditions, we prove the oracle inequalities for Lasso estimators of two partial regression coefficients for the first time, using concentration inequalities of empirical processes. Furthermore, derived from the oracle inequalities, the consistency and convergence rate for the estimators are the theoretical guarantees for further statistical inference. Finally, both simulations and a real data analysis demonstrate that the new methods are effective.
Finding the optimal configuration of parameters in ResNet is a nonconvex minimization problem, but first-order methods nevertheless find the global optimum in the overparameterized regime. We study this phenomenon with mean-field analysis, by translating the training process of ResNet to a gradient-flow partial differential equation (PDE) and examining the convergence properties of this limiting process. The activation function is assumed to be $2$-homogeneous or partially $1$-homogeneous; the regularized ReLU satisfies the latter condition. We show that if the ResNet is sufficiently large, with depth and width depending algebraically on the accuracy and confidence levels, first-order optimization methods can find global minimizers that fit the training data.
The efficient estimation of an approximate model order is very important for real applications with multi-dimensional data if the observed low-rank data is corrupted by additive noise. In this paper, we present a novel robust method for model order estimation of noise-corrupted multi-dimensional low-rank data based on the LineAr Regression of Global Eigenvalues (LaRGE). The LaRGE method uses the multi-linear singular values obtained from the HOSVD of the measurement tensor to construct global eigenvalues. In contrast to the Modified Exponential Test (EFT) that also exploits the approximate exponential profile of the noise eigenvalues, LaRGE does not require the calculation of the probability of false alarm. Moreover, LaRGE achieves a significantly improved performance in comparison with popular state-of-the-art methods. It is well suited for the analysis of biomedical data. The excellent performance of the LaRGE method is illustrated via simulations and results obtained from EEG recordings.
This paper considers inference in a linear regression model with random right censoring and outliers. The number of outliers can grow with the sample size while their proportion goes to zero. The model is semiparametric and we make only very mild assumptions on the distribution of the error term, contrary to most other existing approaches in the literature. We propose to penalize the estimator proposed by Stute for censored linear regression by the l1-norm. We derive rates of convergence and establish asymptotic normality of the estimator of the regression coefficients. Our estimator has the same asymptotic variance as Stute's estimator in the censored linear model without outliers. Hence, there is no loss of efficiency as a result of robustness. Tests and confidence sets can therefore rely on the theory developed by Stute. The outlined procedure is also computationally advantageous, since it amounts to solving a convex optimization program. We also propose a second estimator which uses the proposed penalized Stute estimator as a first step to detect outliers. It has similar theoretical properties but better performance in finite samples as assessed by simulations.
This letter introduces a novel framework to optimize the power allocation for users in a Rate Splitting Multiple Access (RSMA) network. In the network, messages intended for users are split into different parts that are a single common part and respective private parts. This mechanism enables RSMA to flexibly manage interference and thus enhance energy and spectral efficiency. Although possessing outstanding advantages, optimizing power allocation in RSMA is very challenging under the uncertainty of the communication channel and the transmitter has limited knowledge of the channel information. To solve the problem, we first develop a Markov Decision Process framework to model the dynamic of the communication channel. The deep reinforcement algorithm is then proposed to find the optimal power allocation policy for the transmitter without requiring any prior information of the channel. The simulation results show that the proposed scheme can outperform baseline schemes in terms of average sum-rate under different power and QoS requirements.
Lifting theorems are theorems that relate the query complexity of a function $f:\{0,1\}^{n}\to\{0,1\}$ to the communication complexity of the composed function $f \circ g^{n}$, for some "gadget" $g:\{0,1\}^{b}\times\{0,1\}^{b}\to\{0,1\}$. Such theorems allow transferring lower bounds from query complexity to the communication complexity, and have seen numerous applications in the recent years. In addition, such theorems can be viewed as a strong generalization of a direct-sum theorem for the gadget $g$. We prove a new lifting theorem that works for all gadgets $g$ that have logarithmic length and exponentially-small discrepancy, for both deterministic and randomized communication complexity. Thus, we significantly increase the range of gadgets for which such lifting theorems hold. Our result has two main motivations: First, allowing a larger variety of gadgets may support more applications. In particular, our work is the first to prove a randomized lifting theorem for logarithmic-size gadgets, thus improving some applications of the theorem. Second, our result can be seen as a strong generalization of a direct-sum theorem for functions with low discrepancy.
In this paper, we derive improved a priori error estimates for families of hybridizable interior penalty discontinuous Galerkin (H-IP) methods using a variable penalty for second-order elliptic problems. The strategy is to use a penalization function of the form $\mathcal{O}(1/h^{1+\delta})$, where $h$ denotes the mesh size and $\delta$ is a user-dependent parameter. We then quantify its direct impact on the convergence analysis, namely, the (strong) consistency, discrete coercivity, and boundedness (with $h^{\delta}$-dependency), and we derive updated error estimates for both discrete energy- and $L^{2}$-norms. The originality of the error analysis relies specifically on the use of conforming interpolants of the exact solution. All theoretical results are supported by numerical evidence.
We propose a general and scalable approximate sampling strategy for probabilistic models with discrete variables. Our approach uses gradients of the likelihood function with respect to its discrete inputs to propose updates in a Metropolis-Hastings sampler. We show empirically that this approach outperforms generic samplers in a number of difficult settings including Ising models, Potts models, restricted Boltzmann machines, and factorial hidden Markov models. We also demonstrate the use of our improved sampler for training deep energy-based models on high dimensional discrete data. This approach outperforms variational auto-encoders and existing energy-based models. Finally, we give bounds showing that our approach is near-optimal in the class of samplers which propose local updates.
We consider the exploration-exploitation trade-off in reinforcement learning and we show that an agent imbued with a risk-seeking utility function is able to explore efficiently, as measured by regret. The parameter that controls how risk-seeking the agent is can be optimized exactly, or annealed according to a schedule. We call the resulting algorithm K-learning and show that the corresponding K-values are optimistic for the expected Q-values at each state-action pair. The K-values induce a natural Boltzmann exploration policy for which the `temperature' parameter is equal to the risk-seeking parameter. This policy achieves an expected regret bound of $\tilde O(L^{3/2} \sqrt{S A T})$, where $L$ is the time horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the total number of elapsed time-steps. This bound is only a factor of $L$ larger than the established lower bound. K-learning can be interpreted as mirror descent in the policy space, and it is similar to other well-known methods in the literature, including Q-learning, soft-Q-learning, and maximum entropy policy gradient, and is closely related to optimism and count based exploration methods. K-learning is simple to implement, as it only requires adding a bonus to the reward at each state-action and then solving a Bellman equation. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.
We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates the number of topics K from the observed data. We derive new finite sample minimax lower bounds for the estimation of A, as well as new upper bounds for our proposed estimator. We describe the scenarios where our estimator is minimax adaptive. Our finite sample analysis is valid for any number of documents (n), individual document length (N_i), dictionary size (p) and number of topics (K), and both p and K are allowed to increase with n, a situation not handled well by previous analyses. We complement our theoretical results with a detailed simulation study. We illustrate that the new algorithm is faster and more accurate than the current ones, although we start out with a computational and theoretical disadvantage of not knowing the correct number of topics K, while we provide the competing methods with the correct value in our simulations.