It is an open question to determine if the theory of self-concordant barriers can provide an interior point method with strongly polynomial complexity in linear programming. In the special case of the logarithmic barrier, it was shown in [Allamigeon, Benchimol, Gaubert and Joswig, SIAM J. on Applied Algebra and Geometry, 2018] that the answer is negative. In this paper, we show that none of the self-concordant barrier interior point methods is strongly polynomial. This result is obtained by establishing that, on parametric families of convex optimization problems, the log-limit of the central path degenerates to a piecewise linear curve, independently of the choice of the barrier function. We provide an explicit linear program that falls in the same class as the Klee-Minty counterexample, i.e., in dimension $n$ with $2n$ constraints, in which the number of iterations is $\Omega(2^n)$.
The basic goal of survivable network design is to build cheap networks that guarantee the connectivity of certain pairs of nodes despite the failure of a few edges or nodes. A celebrated result by Jain [Combinatorica'01] provides a 2-approximation for a wide class of these problems. However nothing better is known even for very basic special cases, raising the natural question whether any improved approximation factor is possible at all. In this paper we address one of the most basic problems in this family for which 2 is still the best-known approximation factor, the Forest Augmentation Problem (FAP): given an undirected unweighted graph (that w.l.o.g. is a forest) and a collection of extra edges (links), compute a minimum cardinality subset of links whose addition to the graph makes it 2-edge-connected. Several better-than-2 approximation algorithms are known for the special case where the input graph is a tree, a.k.a. the Tree Augmentation Problem (TAP). Recently this was achieved also for the weighted version of TAP, and for the k-edge-connectivity generalization of TAP. These results heavily exploit the fact that the input graph is connected, a condition that does not hold in FAP. In this paper we breach the 2-approximation barrier for FAP. Our result is based on two main ingredients. First, we describe a reduction to the Path Augmentation Problem (PAP), the special case of FAP where the input graph is a collection of disjoint paths. Our reduction is not approximation preserving, however it is sufficiently accurate to improve on a factor 2 approximation. Second, we present a better-than-2 approximation algorithm for PAP, an open problem on its own. Here we exploit a novel notion of implicit credits which might turn out to be helpful in future related work.
The coordinate descent method is an effective iterative method for solving large linear least-squares problems. In this paper, for the highly coherent columns case, we construct an effective coordinate descent method which iteratively projects the estimate onto a solution space formed by two greedily selected hyperplanes via Gram-Schmidt orthogonalization. Our methods may be regarded as a simple block version of coordinate descent method which involves two active columns. The convergence analysis of this method is provided and numerical simulations also confirm the effectiveness for matrices with highly coherent columns.
Given a set $P$ of $n$ points in the plane, the $k$-center problem is to find $k$ congruent disks of minimum possible radius such that their union covers all the points in $P$. The $2$-center problem is a special case of the $k$-center problem that has been extensively studied in the recent past \cite{CAHN,HT,SH}. In this paper, we consider a generalized version of the $2$-center problem called \textit{proximity connected} $2$-center (PCTC) problem. In this problem, we are also given a parameter $\delta\geq 0$ and we have the additional constraint that the distance between the centers of the disks should be at most $\delta$. Note that when $\delta=0$, the PCTC problem is reduced to the $1$-center(minimum enclosing disk) problem and when $\delta$ tends to infinity, it is reduced to the $2$-center problem. The PCTC problem first appeared in the context of wireless networks in 1992 \cite{ACN0}, but obtaining a nontrivial deterministic algorithm for the problem remained open. In this paper, we resolve this open problem by providing a deterministic $O(n^2\log n)$ time algorithm for the problem.
The success of large-scale models in recent years has increased the importance of statistical models with numerous parameters. Several studies have analyzed over-parameterized linear models with high-dimensional data that may not be sparse; however, existing results depend on the independent setting of samples. In this study, we analyze a linear regression model with dependent time series data under over-parameterization settings. We consider an estimator via interpolation and developed a theory for excess risk of the estimator under multiple dependence types. This theory can treat infinite-dimensional data without sparsity and handle long-memory processes in a unified manner. Moreover, we bound the risk in our theory via the integrated covariance and nondegeneracy of autocorrelation matrices. The results show that the convergence rate of risks with short-memory processes is identical to that of cases with independent data, while long-memory processes slow the convergence rate. We also present several examples of specific dependent processes that can be applied to our setting.
Many existing algorithms for streaming geometric data analysis have been plagued by exponential dependencies in the space complexity, which are undesirable for processing high-dimensional data sets. In particular, once $d\geq\log n$, there are no known non-trivial streaming algorithms for problems such as maintaining convex hulls and L\"owner-John ellipsoids of $n$ points, despite a long line of work in streaming computational geometry since [AHV04]. We simultaneously improve these results to $\mathrm{poly}(d,\log n)$ bits of space by trading off with a $\mathrm{poly}(d,\log n)$ factor distortion. We achieve these results in a unified manner, by designing the first streaming algorithm for maintaining a coreset for $\ell_\infty$ subspace embeddings with $\mathrm{poly}(d,\log n)$ space and $\mathrm{poly}(d,\log n)$ distortion. Our algorithm also gives similar guarantees in the \emph{online coreset} model. Along the way, we sharpen results for online numerical linear algebra by replacing a log condition number dependence with a $\log n$ dependence, answering a question of [BDM+20]. Our techniques provide a novel connection between leverage scores, a fundamental object in numerical linear algebra, and computational geometry. For $\ell_p$ subspace embeddings, we give nearly optimal trade-offs between space and distortion for one-pass streaming algorithms. For instance, we give a deterministic coreset using $O(d^2\log n)$ space and $O((d\log n)^{1/2-1/p})$ distortion for $p>2$, whereas previous deterministic algorithms incurred a $\mathrm{poly}(n)$ factor in the space or the distortion [CDW18]. Our techniques have implications in the offline setting, where we give optimal trade-offs between the space complexity and distortion of subspace sketch data structures. To do this, we give an elementary proof of a "change of density" theorem of [LT80] and make it algorithmic.
Stochastic Gradient Descent (SGD) is a central tool in machine learning. We prove that SGD converges to zero loss, even with a fixed (non-vanishing) learning rate - in the special case of homogeneous linear classifiers with smooth monotone loss functions, optimized on linearly separable data. Previous works assumed either a vanishing learning rate, iterate averaging, or loss assumptions that do not hold for monotone loss functions used for classification, such as the logistic loss. We prove our result on a fixed dataset, both for sampling with or without replacement. Furthermore, for logistic loss (and similar exponentially-tailed losses), we prove that with SGD the weight vector converges in direction to the $L_2$ max margin vector as $O(1/\log(t))$ for almost all separable datasets, and the loss converges as $O(1/t)$ - similarly to gradient descent. Lastly, we examine the case of a fixed learning rate proportional to the minibatch size. We prove that in this case, the asymptotic convergence rate of SGD (with replacement) does not depend on the minibatch size in terms of epochs, if the support vectors span the data. These results may suggest an explanation to similar behaviors observed in deep networks, when trained with SGD.
There has been an arising trend of adopting deep learning methods to study partial differential equations (PDEs). This article is to propose a Deep Learning Galerkin Method (DGM) for the closed-loop geothermal system, which is a new coupled multi-physics PDEs and mainly consists of a framework of underground heat exchange pipelines to extract the geothermal heat from the geothermal reservoir. This method is a natural combination of Galerkin Method and machine learning with the solution approximated by a neural network instead of a linear combination of basis functions. We train the neural network by randomly sampling the spatiotemporal points and minimize loss function to satisfy the differential operators, initial condition, boundary and interface conditions. Moreover, the approximate ability of the neural network is proved by the convergence of the loss function and the convergence of the neural network to the exact solution in L^2 norm under certain conditions. Finally, some numerical examples are carried out to demonstrate the approximation ability of the neural networks intuitively.
The numerical solution of singular eigenvalue problems is complicated by the fact that small perturbations of the coefficients may have an arbitrarily bad effect on eigenvalue accuracy. However, it has been known for a long time that such perturbations are exceptional and standard eigenvalue solvers, such as the QZ algorithm, tend to yield good accuracy despite the inevitable presence of roundoff error. Recently, Lotz and Noferini quantified this phenomenon by introducing the concept of $\delta$-weak eigenvalue condition numbers. In this work, we consider singular quadratic eigenvalue problems and two popular linearizations. Our results show that a correctly chosen linearization increases $\delta$-weak eigenvalue condition numbers only marginally, justifying the use of these linearizations in numerical solvers also in the singular case. We propose a very simple but often effective algorithm for computing well-conditioned eigenvalues of a singular quadratic eigenvalue problems by adding small random perturbations to the coefficients. We prove that the eigenvalue condition number is, with high probability, a reliable criterion for detecting and excluding spurious eigenvalues created from the singular part.
We recall some of the history of the information-theoretic approach to deriving core results in probability theory and indicate parts of the recent resurgence of interest in this area with current progress along several interesting directions. Then we give a new information-theoretic proof of a finite version of de Finetti's classical representation theorem for finite-valued random variables. We derive an upper bound on the relative entropy between the distribution of the first $k$ in a sequence of $n$ exchangeable random variables, and an appropriate mixture over product distributions. The mixing measure is characterised as the law of the empirical measure of the original sequence, and de Finetti's result is recovered as a corollary. The proof is nicely motivated by the Gibbs conditioning principle in connection with statistical mechanics, and it follows along an appealing sequence of steps. The technical estimates required for these steps are obtained via the use of a collection of combinatorial tools known within information theory as `the method of types.'
We present a pipelined multiplier with reduced activities and minimized interconnect based on online digit-serial arithmetic. The working precision has been truncated such that $p<n$ bits are used to compute $n$ bits product, resulting in significant savings in area and power. The digit slices follow variable precision according to input, increasing upto $p$ and then decreases according to the error profile. Pipelining has been done to achieve high throughput and low latency which is desirable for compute intensive inner products. Synthesis results of the proposed designs have been presented and compared with the non-pipelined online multiplier, pipelined online multiplier with full working precision and conventional serial-parallel and array multipliers. For $8, 16, 24$ and $32$ bit precision, the proposed low power pipelined design show upto $38\%$ and $44\%$ reduction in power and area respectively compared to the pipelined online multiplier without working precision truncation.