Kim et al. (2021) gave a method to embed a given binary $[n,k]$ code $\mathcal{C}$ $(k = 3, 4)$ into a self-orthogonal code of the shortest length which has the same dimension $k$ and minimum distance $d' \ge d(\mathcal{C})$. We extends this result for $k=5$ and $6$ by proposing a new method related to a special matrix, called the self-orthogonality matrix $SO_k$, obtained by shortnening a Reed-Muller code $\mathcal{R}(2,k)$. Furthermore, we disprove partially the conjecture (Kim et al. (2021)) by showing that if $31 \le n \le 256$ and $n\equiv 14,22,29 \pmod{31}$, then there exist optimal $[n,5]$ codes which are self-orthogonal. We also construct optimal self-orthogonal $[n,6]$ codes when $41 \le n \le 256$ satisfies $n \ne 46, 54, 61$ and $n \not\equiv 7, 14, 22, 29, 38, 45, 53, 60 \pmod{63}$.
This paper considers the problem of measure estimation under the barycentric coding model (BCM), in which an unknown measure is assumed to belong to the set of Wasserstein-2 barycenters of a finite set of known measures. Estimating a measure under this model is equivalent to estimating the unknown barycenteric coordinates. We provide novel geometrical, statistical, and computational insights for measure estimation under the BCM, consisting of three main results. Our first main result leverages the Riemannian geometry of Wasserstein-2 space to provide a procedure for recovering the barycentric coordinates as the solution to a quadratic optimization problem assuming access to the true reference measures. The essential geometric insight is that the parameters of this quadratic problem are determined by inner products between the optimal displacement maps from the given measure to the reference measures defining the BCM. Our second main result then establishes an algorithm for solving for the coordinates in the BCM when all the measures are observed empirically via i.i.d. samples. We prove precise rates of convergence for this algorithm -- determined by the smoothness of the underlying measures and their dimensionality -- thereby guaranteeing its statistical consistency. Finally, we demonstrate the utility of the BCM and associated estimation procedures in three application areas: (i) covariance estimation for Gaussian measures; (ii) image processing; and (iii) natural language processing.
We study the computational complexity of zigzag sampling algorithm for strongly log-concave distributions. The zigzag process has the advantage of not requiring time discretization for implementation, and that each proposed bouncing event requires only one evaluation of partial derivative of the potential, while its convergence rate is dimension independent. Using these properties, we prove that the zigzag sampling algorithm achieves $\varepsilon$ error in chi-square divergence with a computational cost equivalent to $O\bigl(\kappa^2 d^\frac{1}{2}(\log\frac{1}{\varepsilon})^{\frac{3}{2}}\bigr)$ gradient evaluations in the regime $\kappa \ll \frac{d}{\log d}$ under a warm start assumption, where $\kappa$ is the condition number and $d$ is the dimension.
We study the complexity of proving that a sparse random regular graph on an odd number of vertices does not have a perfect matching, and related problems involving each vertex being matched some pre-specified number of times. We show that this requires proofs of degree $\Omega(n / \log n)$ in the Polynomial Calculus (over fields of characteristic $\ne 2$) and Sum-of-Squares proof systems, and exponential size in the bounded-depth Frege proof system. This resolves a question by Razborov asking whether the Lov\'asz-Schrijver proof system requires $n^\delta$ rounds to refute these formulas for some $\delta > 0$. The results are obtained by a worst-case to average-case reduction of these formulas relying on a topological embedding theorem which may be of independent interest.
We initiate the study of Boolean function analysis on high-dimensional expanders. We give a random-walk based definition of high-dimensional expansion, which coincides with the earlier definition in terms of two-sided link expanders. Using this definition, we describe an analog of the Fourier expansion and the Fourier levels of the Boolean hypercube for simplicial complexes. Our analog is a decomposition into approximate eigenspaces of random walks associated with the simplicial complexes. Our random-walk definition and the decomposition have the additional advantage that they extend to the more general setting of posets, encompassing both high-dimensional expanders and the Grassmann poset, which appears in recent work on the unique games conjecture. We then use this decomposition to extend the Friedgut-Kalai-Naor theorem to high-dimensional expanders. Our results demonstrate that a constant-degree high-dimensional expander can sometimes serve as a sparse model for the Boolean slice or hypercube, and quite possibly additional results from Boolean function analysis can be carried over to this sparse model. Therefore, this model can be viewed as a derandomization of the Boolean slice, containing only $|X(k-1)|=O(n)$ points in contrast to $\binom{n}{k}$ points in the $k$-slice (which consists of all $n$-bit strings with exactly $k$ ones).
We consider the matrix least squares problem of the form $\| \mathbf{A} \mathbf{X}-\mathbf{B} \|_F^2$ where the design matrix $\mathbf{A} \in \mathbb{R}^{N \times r}$ is tall and skinny with $N \gg r$. We propose to create a sketched version $\| \tilde{\mathbf{A}}\mathbf{X}-\tilde{\mathbf{B}} \|_F^2$ where the sketched matrices $\tilde{\mathbf{A}}$ and $\tilde{\mathbf{B}}$ contain weighted subsets of the rows of $\mathbf{A}$ and $\mathbf{B}$, respectively. The subset of rows is determined via random sampling based on leverage score estimates for each row. We say that the sketched problem is $\epsilon$-accurate if its solution $\tilde{\mathbf{X}}_{\rm \text{opt}} = \text{argmin } \| \tilde{\mathbf{A}}\mathbf{X}-\tilde{\mathbf{B}} \|_F^2$ satisfies $\|\mathbf{A}\tilde{\mathbf{X}}_{\rm \text{opt}}-\mathbf{B} \|_F^2 \leq (1+\epsilon) \min \| \mathbf{A}\mathbf{X}-\mathbf{B} \|_F^2$ with high probability. We prove that the number of samples required for an $\epsilon$-accurate solution is $O(r/(\beta \epsilon))$ where $\beta \in (0,1]$ is a measure of the quality of the leverage score estimates.
In 1954, Alston S. Householder published Principles of Numerical Analysis, one of the first modern treatments on matrix decomposition that favored a (block) LU decomposition-the factorization of a matrix into the product of lower and upper triangular matrices. And now, matrix decomposition has become a core technology in machine learning, largely due to the development of the back propagation algorithm in fitting a neural network. The sole aim of this survey is to give a self-contained introduction to concepts and mathematical tools in numerical linear algebra and matrix analysis in order to seamlessly introduce matrix decomposition techniques and their applications in subsequent sections. However, we clearly realize our inability to cover all the useful and interesting results concerning matrix decomposition and given the paucity of scope to present this discussion, e.g., the separated analysis of the Euclidean space, Hermitian space, Hilbert space, and things in the complex domain. We refer the reader to literature in the field of linear algebra for a more detailed introduction to the related fields.
We study the problem of learning in the stochastic shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state. We design a novel model-based algorithm EB-SSP that carefully skews the empirical transitions and perturbs the empirical costs with an exploration bonus to guarantee both optimism and convergence of the associated value iteration scheme. We prove that EB-SSP achieves the minimax regret rate $\widetilde{O}(B_{\star} \sqrt{S A K})$, where $K$ is the number of episodes, $S$ is the number of states, $A$ is the number of actions and $B_{\star}$ bounds the expected cumulative cost of the optimal policy from any state, thus closing the gap with the lower bound. Interestingly, EB-SSP obtains this result while being parameter-free, i.e., it does not require any prior knowledge of $B_{\star}$, nor of $T_{\star}$ which bounds the expected time-to-goal of the optimal policy from any state. Furthermore, we illustrate various cases (e.g., positive costs, or general costs when an order-accurate estimate of $T_{\star}$ is available) where the regret only contains a logarithmic dependence on $T_{\star}$, thus yielding the first horizon-free regret bound beyond the finite-horizon MDP setting.
We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ queries. Our algorithm works for any field $F$. This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix. Our algorithm is the first such algorithm which does not read a submatrix, and instead reads a carefully selected non-adaptive pattern of entries in rows and columns of $A$. We complement our algorithm with a matching query complexity lower bound for non-adaptive testers over any field. We also give tight bounds of $\widetilde{\Theta}(d^2)$ queries in the sensing model for which query access comes in the form of $\langle X_i, A\rangle:=tr(X_i^\top A)$; perhaps surprisingly these bounds do not depend on $\epsilon$. We next develop a novel property testing framework for testing numerical properties of a real-valued matrix $A$ more generally, which includes the stable rank, Schatten-$p$ norms, and SVD entropy. Specifically, we propose a bounded entry model, where $A$ is required to have entries bounded by $1$ in absolute value. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above.
This paper describes a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image of the matrix, called a sketch. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by numerical experiments with real and synthetic data.
In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.