This paper is concerned with a partially linear semiparametric regression model containing an unknown regression coefficient, an unknown nonparametric function, and an unobservable Gaussian distributed random error. We focus on the case of simultaneous variable selection and estimation with a divergent number of covariates under the assumption that the regression coefficient is sparse. We consider the applications of the least squares to semiparametric regression and particularly present an adaptive lasso penalized least squares (PLS) method to select the regression coefficient. We note that there are many algorithms for PLS in various applications, but they seem to be rarely used in semiparametric regression. This paper focuses on using a semismooth Newton augmented Lagrangian (SSNAL) algorithm to solve the dual of PLS which is the sum of a smooth strongly convex function and an indicator function. At each iteration, there must be a strongly semismooth nonlinear system, which can be solved by semismooth Newton by making full use of the penalized term. We show that the algorithm offers a significant computational advantage, and the semismooth Newton method admits fast local convergence rate. Numerical experiments on some simulation data and real data to demonstrate that the PLS is effective and the SSNAL is progressive.
We study the stochastic $p$-Laplace system in a bounded domain. We propose two new space-time discretizations based on the approximation of time-averaged values. We establish linear convergence in space and $1/2$ convergence in time. Additionally, we provide a sampling algorithm to construct the necessary random input in an efficient way. The theoretical error analysis is complemented by numerical experiments.
This paper focuses on stochastic saddle point problems with decision-dependent distributions. These are problems whose objective is the expected value of a stochastic payoff function, where random variables are drawn from a distribution induced by a distributional map. For general distributional maps, the problem of finding saddle points is in general computationally burdensome, even if the distribution is known. To enable a tractable solution approach, we introduce the notion of equilibrium points -- which are saddle points for the stationary stochastic minimax problem that they induce -- and provide conditions for their existence and uniqueness. We demonstrate that the distance between the two solution types is bounded provided that the objective has a strongly-convex-strongly-concave payoff and a Lipschitz continuous distributional map. We develop deterministic and stochastic primal-dual algorithms and demonstrate their convergence to the equilibrium point. In particular, by modeling errors emerging from a stochastic gradient estimator as sub-Weibull random variables, we provide error bounds in expectation and in high probability that hold for each iteration. Moreover, we show convergence to a neighborhood almost surely. Finally, we investigate a condition on the distributional map -- which we call opposing mixture dominance -- that ensures that the objective is strongly-convex-strongly-concave. We tailor the convergence results for the primal-dual algorithms to this opposing mixture dominance setup.
We study the decentralized consensus and stochastic optimization problems with compressed communications over static directed graphs. We propose an iterative gradient-based algorithm that compresses messages according to a desired compression ratio. The proposed method provably reduces the communication overhead on the network at every communication round. Contrary to existing literature, we allow for arbitrary compression ratios in the communicated messages. We show a linear convergence rate for the proposed method on the consensus problem. Moreover, we provide explicit convergence rates for decentralized stochastic optimization problems on smooth functions that are either (i) strongly convex, (ii) convex, or (iii) non-convex. Finally, we provide numerical experiments to illustrate convergence under arbitrary compression ratios and the communication efficiency of our algorithm.
In this article we suggest two discretization methods based on isogeometric analysis (IGA) for planar linear elasticity. On the one hand, we apply the well-known ansatz of weakly imposed symmetry for the stress tensor and obtain a well-posed mixed formulation. Such modified mixed problems have been already studied by different authors. But we concentrate on the exploitation of IGA results to handle also curved boundary geometries. On the other hand, we consider the more complicated situation of strong symmetry, i.e. we discretize the mixed weak form determined by the so-called Hellinger-Reissner variational principle. We show the existence of suitable approximate fields leading to an inf-sup stable saddle-point problem. For both discretization approaches we prove convergence statements and in case of weak symmetry we illustrate the approximation behavior by means of several numerical experiments.
The instrumental variable method is widely used in the health and social sciences for identification and estimation of causal effects in the presence of potentially unmeasured confounding. In order to improve efficiency, multiple instruments are routinely used, leading to concerns about bias due to possible violation of the instrumental variable assumptions. To address this concern, we introduce a new class of g-estimators that are guaranteed to remain consistent and asymptotically normal for the causal effect of interest provided that a set of at least $\gamma$ out of $K$ candidate instruments are valid, for $\gamma\leq K$ set by the analyst ex ante, without necessarily knowing the identities of the valid and invalid instruments. We provide formal semiparametric efficiency theory supporting our results. Both simulation studies and applications to the UK Biobank data demonstrate the superior empirical performance of our estimators compared to competing methods.
Policy gradient (PG) estimation becomes a challenge when we are not allowed to sample with the target policy but only have access to a dataset generated by some unknown behavior policy. Conventional methods for off-policy PG estimation often suffer from either significant bias or exponentially large variance. In this paper, we propose the double Fitted PG estimation (FPG) algorithm. FPG can work with an arbitrary policy parameterization, assuming access to a Bellman-complete value function class. In the case of linear value function approximation, we provide a tight finite-sample upper bound on policy gradient estimation error, that is governed by the amount of distribution mismatch measured in feature space. We also establish the asymptotic normality of FPG estimation error with a precise covariance characterization, which is further shown to be statistically optimal with a matching Cramer-Rao lower bound. Empirically, we evaluate the performance of FPG on both policy gradient estimation and policy optimization, using either softmax tabular or ReLU policy networks. Under various metrics, our results show that FPG significantly outperforms existing off-policy PG estimation methods based on importance sampling and variance reduction techniques.
Vector Perturbation Precoding (VPP) can speed up downlink data transmissions in Large and Massive Multi-User MIMO systems but is known to be NP-hard. While there are several algorithms in the literature for VPP under total power constraint, they are not applicable for VPP under per-antenna power constraint. This paper proposes a novel, parallel tree search algorithm for VPP under per-antenna power constraint, called \emph{\textbf{TreeStep}}, to find good quality solutions to the VPP problem with practical computational complexity. We show that our method can provide huge performance gain over simple linear precoding like Regularised Zero Forcing. We evaluate TreeStep for several large MIMO~($16\times16$ and $24\times24$) and massive MIMO~($16\times32$ and $24\times 48$) and demonstrate that TreeStep outperforms the popular polynomial-time VPP algorithm, the Fixed Complexity Sphere Encoder, by achieving the extremely low BER of $10^{-6}$ at a much lower SNR.
The single shortest path algorithm is undefined for weighted finite-state automata over non-idempotent semirings because such semirings do not guarantee the existence of a shortest path. However, in non-idempotent semirings admitting an order satisfying a monotonicity condition (such as the plus-times or log semirings), the notion of shortest string is well-defined. We describe an algorithm which finds the shortest string for a weighted non-deterministic automaton over such semirings using the backwards shortest distance of an equivalent deterministic automaton (DFA) as a heuristic for A* search performed over a companion idempotent semiring, which is proven to return the shortest string. While there may be exponentially more states in the DFA, this algorithm needs to visit only a small fraction of them if determinization is performed "on the fly".
CP decomposition (CPD) is prevalent in chemometrics, signal processing, data mining and many more fields. While many algorithms have been proposed to compute the CPD, alternating least squares (ALS) remains one of the most widely used algorithm for computing the decomposition. Recent works have introduced the notion of eigenvalues and singular values of a tensor and explored applications of eigenvectors and singular vectors in areas like signal processing, data analytics and in various other fields. We introduce a new formulation for deriving singular values and vectors of a tensor by considering the critical points of a function different from what is used in the previous work. Computing these critical points in an alternating manner motivates an alternating optimization algorithm which corresponds to alternating least squares algorithm in the matrix case. However, for tensors with order greater than equal to $3$, it minimizes an objective function which is different from the commonly used least squares loss. Alternating optimization of this new objective leads to simple updates to the factor matrices with the same asymptotic computational cost as ALS. We show that a subsweep of this algorithm can achieve a superlinear convergence rate for exact CPD with known rank and verify it experimentally. We then view the algorithm as optimizing a Mahalanobis distance with respect to each factor with ground metric dependent on the other factors. This perspective allows us to generalize our approach to interpolate between updates corresponding to the ALS and the new algorithm to manage the tradeoff between stability and fitness of the decomposition. Our experimental results show that for approximating synthetic and real-world tensors, this algorithm and its variants converge to a better conditioned decomposition with comparable and sometimes better fitness as compared to the ALS algorithm.
This paper takes a different approach for the distributed linear parameter estimation over a multi-agent network. The parameter vector is considered to be stochastic with a Gaussian distribution. The sensor measurements at each agent are linear and corrupted with additive white Gaussian noise. Under such settings, this paper presents a novel distributed estimation algorithm that fuses the the concepts of consensus and innovations by incorporating the consensus terms (of neighboring estimates) into the innovation terms. Under the assumption of distributed parameter observability, introduced in this paper, we design the optimal gain matrices such that the distributed estimates are consistent and achieves fast convergence.