We develop novel theory and algorithms for computing approximate solution to $Ax=b$, or to $A^TAx=A^Tb$, where $A$ is an $m \times n$ real matrix of arbitrary rank. First, we describe the {\it Triangle Algorithm} (TA), where given an ellipsoid $E_{A,\rho}=\{Ax: \Vert x \Vert \leq \rho\}$, in each iteration it either computes successively improving approximation $b_k=Ax_k \in E_{A,\rho}$, or proves $b \not \in E_{A, \rho}$. We then extend TA for computing an approximate solution or minimum-norm solution. Next, we develop a dynamic version of TA, the {\it Centering Triangle Algorithm} (CTA), generating residuals $r_k=b - Ax_k$ via iterations of the simple formula, $F_1(r)=r-(r^THr/r^TH^2r)Hr$, where $H=A$ when $A$ is symmetric PSD, otherwise $H=AA^T$ but need not be computed explicitly. More generally, CTA extends to a family of iteration function, $F_t( r)$, $t=1, \dots, m$ satisfying: On the one hand, given $t \leq m$ and $r_0=b-Ax_0$, where $x_0=A^Tw_0$ with $w_0 \in \mathbb{R}^m$ arbitrary, for all $k \geq 1$, $r_k=F_t(r_{k-1})=b-Ax_k$ and $A^Tr_k$ converges to zero. Algorithmically, if $H$ is invertible with condition number $\kappa$, in $k=O( (\kappa/t) \ln \varepsilon^{-1})$ iterations $\Vert r_k \Vert \leq \varepsilon$. If $H$ is singular with $\kappa^+$ the ratio of its largest to smallest positive eigenvalues, in $k =O(\kappa^+/t\varepsilon)$ iterations either $\Vert r_k \Vert \leq \varepsilon$ or $\Vert A^T r_k\Vert= O(\sqrt{\varepsilon})$. If $N$ is the number of nonzero entries of $A$, each iteration take $O(Nt+t^3)$ operations. On the other hand, given $r_0=b-Ax_0$, suppose its minimal polynomial with respect to $H$ has degree $s$. Then $Ax=b$ is solvable if and only if $F_{s}(r_0)=0$. Moreover, exclusively $A^TAx=A^Tb$ is solvable, if and only if $F_{s}(r_0) \not= 0$ but $A^T F_s(r_0)=0$. Additionally, $\{F_t(r_0)\}_{t=1}^s$ is computable in $O(Ns+s^3)$ operations.
Nonconvex-nonconcave minimax optimization has gained widespread interest over the last decade. However, most existing work focuses on variants of gradient descent-ascent (GDA) algorithms, which are only applicable in smooth nonconvex-concave settings. To address this limitation, we propose a novel algorithm named smoothed proximal linear descent-ascent (smoothed PLDA), which can effectively handle a broad range of structured nonsmooth nonconvex-nonconcave minimax problems. Specifically, we consider the setting where the primal function has a nonsmooth composite structure and the dual function possesses the Kurdyka-\L{}ojasiewicz (K\L{}) property with exponent $\theta \in [0,1)$. We introduce a novel convergence analysis framework for smoothed PLDA, the key components of which are our newly developed nonsmooth primal error bound and dual error bound properties. Using this framework, we show that smoothed PLDA can find both $\epsilon$-game-stationary points and $\epsilon$-optimization-stationary points of the problems of interest in $\mathcal{O}(\epsilon^{-2\max\{2\theta,1\}})$ iterations. Furthermore, when $\theta \in [0,1/2]$, smoothed PLDA achieves the optimal iteration complexity of $\mathcal{O}(\epsilon^{-2})$. To further demonstrate the effectiveness and wide applicability of our analysis framework, we show that certain max-structure problem possesses the K\L{} property with exponent $\theta=0$ under mild assumptions. As a by-product, we establish algorithm-independent quantitative relationships among various stationarity concepts, which may be of independent interest.
In the paper, we investigate the coordination process of sensing and computation offloading in a reconfigurable intelligent surface (RIS)-aided base station (BS)-centric symbiotic radio (SR) systems. Specifically, the Internet-of-Things (IoT) devices first sense data from environment and then tackle the data locally or offload the data to BS for remote computing, while RISs are leveraged to enhance the quality of blocked channels and also act as IoT devices to transmit its sensed data. To explore the mechanism of cooperative sensing and computation offloading in this system, we aim at maximizing the total completed sensed bits of all users and RISs by jointly optimizing the time allocation parameter, the passive beamforming at each RIS, the transmit beamforming at BS, and the energy partition parameters for all users subject to the size of sensed data, energy supply and given time cycle. The formulated nonconvex problem is tightly coupled by the time allocation parameter and involves the mathematical expectations, which cannot be solved straightly. We use Monte Carlo and fractional programming methods to transform the nonconvex objective function and then propose an alternating optimization-based algorithm to find an approximate solution with guaranteed convergence. Numerical results show that the RIS-aided SR system outperforms other benchmarks in sensing. Furthermore, with the aid of RIS, the channel and system performance can be significantly improved.
It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset. As a specific case, we consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks. We propose a Linearized Activation Function TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks. The key idea is to approximate the architecture as a linear network with stochastic gating. Despite requiring only one parameter per unit of the network, our approach outcompetes other parametric approximations with larger memory requirements. Applied to continual learning, our parametric approximation is competitive with state-of-the-art nonparametric approximations, which require storing many training examples. Furthermore, we show its efficacy in estimating influence functions accurately and detecting mislabeled examples without expensive iterations over the entire dataset.
Given a closed simple polygon $P$, we say two points $p,q$ see each other if the segment $pq$ is fully contained in $P$. The art gallery problem seeks a minimum size set $G\subset P$ of guards that sees $P$ completely. The only currently correct algorithm to solve the art gallery problem exactly uses algebraic methods and is attributed to Sharir. As the art gallery problem is ER-complete, it seems unlikely to avoid algebraic methods, without additional assumptions. In this paper, we introduce the notion of vision stability. In order to describe vision stability consider an enhanced guard that can see "around the corner" by an angle of $\delta$ or a diminished guard whose vision is by an angle of $\delta$ "blocked" by reflex vertices. A polygon $P$ has vision stability $\delta$ if the optimal number of enhanced guards to guard $P$ is the same as the optimal number of diminished guards to guard $P$. We will argue that most relevant polygons are vision stable. We describe a one-shot vision stable algorithm that computes an optimal guard set for visionstable polygons using polynomial time and solving one integer program. It guarantees to find the optimal solution for every vision stable polygon. We implemented an iterative visionstable algorithm and show its practical performance is slower, but comparable with other state of the art algorithms. Our iterative algorithm is inspired and follows closely the one-shot algorithm. It delays several steps and only computes them when deemed necessary. Given a chord $c$ of a polygon, we denote by $n(c)$ the number of vertices visible from $c$. The chord-width of a polygon is the maximum $n(c)$ over all possible chords $c$. The set of vision stable polygons admits an FPT algorithm when parametrized by the chord-width. Furthermore, the one-shot algorithm runs in FPT time, when parameterized by the number of reflex vertices.
We present efficient algorithms for Quantile Join Queries, abbreviated as %JQ. A %JQ asks for the answer at a specified relative position (e.g., 50% for the median) under some ordering over the answers to a Join Query (JQ). Our goal is to avoid materializing the set of all join answers, and to achieve quasilinear time in the size of the database, regardless of the total number of answers. A recent dichotomy result rules out the existence of such an algorithm for a general family of queries and orders. Specifically, for acyclic JQs without self-joins, the problem becomes intractable for ordering by sum whenever we join more than two relations (and these joins are not trivial intersections). Moreover, even for basic ranking functions beyond sum, such as min or max over different attributes, so far it is not known whether there is any nontrivial tractable %JQ. In this work, we develop a new approach to solving %JQ. Our solution uses two subroutines: The first one needs to select what we call a "pivot answer". The second subroutine partitions the space of query answers according to this pivot, and continues searching in one partition that is represented as new %JQ over a new database. For pivot selection, we develop an algorithm that works for a large class of ranking functions that are appropriately monotone. The second subroutine requires a customized construction for the specific ranking function at hand. We show the benefit and generality of our approach by using it to establish several new complexity results. First, we prove the tractability of min and max for all acyclic JQs, thereby resolving the above question. Second, we extend the previous %JQ dichotomy for sum to all partial sums. Third, we handle the intractable cases of sum by devising a deterministic approximation scheme that applies to every acyclic JQ.
The paper revisits the robust $s$-$t$ path problem, one of the most fundamental problems in robust optimization. In the problem, we are given a directed graph with $n$ vertices and $k$ distinct cost functions (scenarios) defined over edges, and aim to choose an $s$-$t$ path such that the total cost of the path is always provable no matter which scenario is realized. With the view of each cost function being associated with an agent, our goal is to find a common $s$-$t$ path minimizing the maximum objective among all agents, and thus create a fair solution for them. The problem is hard to approximate within $o(\log k)$ by any quasi-polynomial time algorithm unless $\mathrm{NP} \subseteq \mathrm{DTIME}(n^{\mathrm{poly}\log n})$, and the best approximation ratio known to date is $\widetilde{O}(\sqrt{n})$ which is based on the natural flow linear program. A longstanding open question is whether we can achieve a polylogarithmic approximation even when a quasi-polynomial running time is allowed. We give the first polylogarithmic approximation for robust $s$-$t$ path since the problem was proposed more than two decades ago. In particular, we introduce a $O(\log n \log k)$-approximate algorithm running in quasi-polynomial time. The algorithm is built on a novel linear program formulation for a decision-tree-type structure which enables us to get rid of the $\Omega(\max\{k,\sqrt{n}\})$ integrality gap of the natural flow LP. Further, we also consider some well-known graph classes, e.g., graphs with bounded treewidth, and show that the polylogarithmic approximation can be achieved polynomially on these graphs. We hope the new proposed techniques in the paper can offer new insights into the robust $s$-$t$ path problem and related problems in robust optimization.
Since the control of the Lipschitz constant has a great impact on the training stability, generalization, and robustness of neural networks, the estimation of this value is nowadays a real scientific challenge. In this paper we introduce a precise, fast, and differentiable upper bound for the spectral norm of convolutional layers using circulant matrix theory and a new alternative to the Power iteration. Called the Gram iteration, our approach exhibits a superlinear convergence. First, we show through a comprehensive set of experiments that our approach outperforms other state-of-the-art methods in terms of precision, computational cost, and scalability. Then, it proves highly effective for the Lipschitz regularization of convolutional neural networks, with competitive results against concurrent approaches.
We study the inverse eigenvalue problem for finding doubly stochastic matrices with specified eigenvalues. By making use of a combination of Dykstra's algorithm and an alternating projection process onto a non-convex set, we derive hybrid algorithms for finding doubly stochastic matrices and symmetric doubly stochastic matrices with prescribed eigenvalues. Furthermore, we prove that the proposed algorithms converge and linear convergence is also proved. Numerical examples are presented to demonstrate the efficiency of our method.
Recurrent neural networks are a powerful means to cope with time series. We show how autoregressive linear, i.e., linearly activated recurrent neural networks (LRNNs) can approximate any time-dependent function f(t) given by a number of function values. The approximation can effectively be learned by simply solving a linear equation system; no backpropagation or similar methods are needed. Furthermore, and this is probably the main contribution of this article, the size of an LRNN can be reduced significantly in one step after inspecting the spectrum of the network transition matrix, i.e., its eigenvalues, by taking only the most relevant components. Therefore, in contrast to other approaches, we do not only learn network weights but also the network architecture. LRNNs have interesting properties: They end up in ellipse trajectories in the long run and allow the prediction of further values and compact representations of functions. We demonstrate this by several experiments, among them multiple superimposed oscillators (MSO), robotic soccer, and predicting stock prices. LRNNs outperform the previous state-of-the-art for the MSO task with a minimal number of units.
The $L_{2}$-regularized loss of Deep Linear Networks (DLNs) with more than one hidden layers has multiple local minima, corresponding to matrices with different ranks. In tasks such as matrix completion, the goal is to converge to the local minimum with the smallest rank that still fits the training data. While rank-underestimating minima can easily be avoided since they do not fit the data, gradient descent might get stuck at rank-overestimating minima. We show that with SGD, there is always a probability to jump from a higher rank minimum to a lower rank one, but the probability of jumping back is zero. More precisely, we define a sequence of sets $B_{1}\subset B_{2}\subset\cdots\subset B_{R}$ so that $B_{r}$ contains all minima of rank $r$ or less (and not more) that are absorbing for small enough ridge parameters $\lambda$ and learning rates $\eta$: SGD has prob. 0 of leaving $B_{r}$, and from any starting point there is a non-zero prob. for SGD to go in $B_{r}$.