Due to wide applications of binary sequences with low correlation to communications, various constructions of such sequences have been proposed in literature. However, most of the known constructions via finite fields make use of the multiplicative cyclic group of $\F_{2^n}$. It is often overlooked in this community that all $2^n+1$ rational places (including "place at infinity") of the rational function field over $\F_{2^n}$ form a cyclic structure under an automorphism of order $2^n+1$. In this paper, we make use of this cyclic structure to provide an explicit construction of families of binary sequences of length $2^n+1$ via the finite field $\F_{2^n}$. Each family of sequences has size $2^n-1$ and its correlation is upper bounded by $\lfloor 2^{(n+2)/2}\rfloor$. Our sequences can be constructed explicitly and have competitive parameters. In particular, compared with the Gold sequences of length $2^n-1$ for even $n$, we have larger length and smaller correlation although the family size of our sequences is slightly smaller.
This paper presents new existence, dual representation and approximation results for the information projection in the infinite-dimensional setting for moment inequality models. These results are established under a general specification of the moment inequality model, nesting both conditional and unconditional models, and allowing for an infinite number of such inequalities. An important innovation of the paper is the exhibition of the dual variable as a weak vector-valued integral to formulate an approximation scheme of the $I$-projection's equivalent Fenchel dual problem. In particular, it is shown under suitable assumptions that the dual problem's optimum value can be approximated by the values of finite-dimensional programs, and that, in addition, every accumulation point of a sequence of optimal solutions for the approximating programs is an optimal solution for the dual problem. This paper illustrates the verification of assumptions and the construction of the approximation scheme's parameters for the cases of unconditional and conditional first-order stochastic dominance constraints.
In value-based deep reinforcement learning methods, approximation of value functions induces overestimation bias and leads to suboptimal policies. We show that in deep actor-critic methods that aim to overcome the overestimation bias, if the reinforcement signals received by the agent have a high variance, a significant underestimation bias arises. To minimize the underestimation, we introduce a parameter-free, novel deep Q-learning variant. Our Q-value update rule combines the notions behind Clipped Double Q-learning and Maxmin Q-learning by computing the critic objective through the nested combination of maximum and minimum operators to bound the approximate value estimates. We evaluate our modification on the suite of several OpenAI Gym continuous control tasks, improving the state-of-the-art in every environment tested.
We revisit a classical crossword filling puzzle which already appeared in Garey\&Jonhson's book. We are given a grid with $n$ vertical and horizontal slots and a dictionary with $m$ words and are asked to place words from the dictionary in the slots so that shared cells are consistent. We attempt to pinpoint the source of intractability of this problem by taking into account the structure of the grid graph, which contains a vertex for each slot and an edge if two slots intersect. Our main approach is to consider the case where this graph has a tree-like structure. Unfortunately, if we impose the common rule that words cannot be reused, we show that the problem remains NP-hard even under very severe structural restrictions. The problem becomes slightly more tractable if word reuse is allowed, as we obtain an $m^{tw}$ algorithm in this case, where $tw$ is the treewidth of the grid graph. However, even in this case, we show that our algorithm cannot be improved. More strongly, we show that under the ETH the problem cannot be solved in time $m^{o(k)}$, where $k$ is the number of horizontal slots of the instance. Motivated by these mostly negative results, we consider the much more restricted case where the problem is parameterized by the number of slots $n$. Here, we show that the problem becomes FPT, but the parameter dependence is exponential in $n^2$. We show that this dependence is also justified: the existence of an algorithm with running time $2^{o(n^2)}$ would contradict the randomized ETH. Finally, we consider an optimization version of the problem, where we seek to place as many words on the grid as possible. Here it is easy to obtain a $\frac{1}{2}$-approximation, even on weighted instances. We show that this algorithm is also likely to be optimal, as obtaining a better approximation ratio in polynomial time would contradict the Unique Games Conjecture.
There are various cluster validity measures used for evaluating clustering results. One of the main objective of using these measures is to seek the optimal unknown number of clusters. Some measures work well for clusters with different densities, sizes and shapes. Yet, one of the weakness that those validity measures share is that they sometimes provide only one clear optimal number of clusters. That number is actually unknown and there might be more than one potential sub-optimal options that a user may wish to choose based on different applications. We develop two new cluster validity indices based on a correlation between an actual distance between a pair of data points and a centroid distance of clusters that the two points locate in. Our proposed indices constantly yield several peaks at different numbers of clusters which overcome the weakness previously stated. Furthermore, the introduced correlation can also be used for evaluating the quality of a selected clustering result. Several experiments in different scenarios including the well-known iris data set and a real-world marketing application have been conducted in order to compare the proposed validity indices with several well-known ones.
When using boundary integral equation methods, we represent solutions of a linear partial differential equation as layer potentials. It is well-known that the approximation of layer potentials using quadrature rules suffer from poor resolution when evaluated closed to (but not on) the boundary. To address this challenge, we provide modified representations of the problem's solution. Similar to Gauss's law used to modify Laplace's double-layer potential, we use modified representations of Laplace's single-layer potential and Helmholtz layer potentials that avoid the close evaluation problem. Some techniques have been developed in the context of the representation formula or using interpolation techniques. We provide alternative modified representations of the layer potentials directly (or when only one density is at stake). Several numerical examples illustrate the efficiency of the technique in two and three dimensions.
Low-rank matrix approximation is one of the central concepts in machine learning, with applications in dimension reduction, de-noising, multivariate statistical methodology, and many more. A recent extension to LRMA is called low-rank matrix completion (LRMC). It solves the LRMA problem when some observations are missing and is especially useful for recommender systems. In this paper, we consider an element-wise weighted generalization of LRMA. The resulting weighted low-rank matrix approximation technique therefore covers LRMC as a special case with binary weights. WLRMA has many applications. For example, it is an essential component of GLM optimization algorithms, where an exponential family is used to model the entries of a matrix, and the matrix of natural parameters admits a low-rank structure. We propose an algorithm for solving the weighted problem, as well as two acceleration techniques. Further, we develop a non-SVD modification of the proposed algorithm that is able to handle extremely high-dimensional data. We compare the performance of all the methods on a small simulation example as well as a real-data application.
We consider the power of local algorithms for approximately solving Max $k$XOR, a generalization of two constraint satisfaction problems previously studied with classical and quantum algorithms (MaxCut and Max E3LIN2). On instances with either random signs or no overlapping clauses and $D+1$ clauses per variable, we calculate the average satisfying fraction of the depth-1 QAOA and compare with a generalization of the local threshold algorithm. Notably, the quantum algorithm outperforms the threshold algorithm for $k > 4$. On the other hand, we highlight potential difficulties for the QAOA to achieve computational quantum advantage on this problem. We first compute a tight upper bound on the maximum satisfying fraction of nearly all large random regular Max $k$XOR instances by numerically calculating the ground state energy density $P(k)$ of a mean-field $k$-spin glass. The upper bound grows with $k$ much faster than the performance of both one-local algorithms. We also identify a new obstruction result for low-depth quantum circuits (including the QAOA) when $k=3$, generalizing a result of Bravyi et al [arXiv:1910.08980] when $k=2$. We conjecture that a similar obstruction exists for all $k$.
Recent advances in Transformer models allow for unprecedented sequence lengths, due to linear space and time complexity. In the meantime, relative positional encoding (RPE) was proposed as beneficial for classical Transformers and consists in exploiting lags instead of absolute positions for inference. Still, RPE is not available for the recent linear-variants of the Transformer, because it requires the explicit computation of the attention matrix, which is precisely what is avoided by such methods. In this paper, we bridge this gap and present Stochastic Positional Encoding as a way to generate PE that can be used as a replacement to the classical additive (sinusoidal) PE and provably behaves like RPE. The main theoretical contribution is to make a connection between positional encoding and cross-covariance structures of correlated Gaussian processes. We illustrate the performance of our approach on the Long-Range Arena benchmark and on music generation.
We advocate the use of implicit fields for learning generative models of shapes and introduce an implicit field decoder for shape generation, aimed at improving the visual quality of the generated shapes. An implicit field assigns a value to each point in 3D space, so that a shape can be extracted as an iso-surface. Our implicit field decoder is trained to perform this assignment by means of a binary classifier. Specifically, it takes a point coordinate, along with a feature vector encoding a shape, and outputs a value which indicates whether the point is outside the shape or not. By replacing conventional decoders by our decoder for representation learning and generative modeling of shapes, we demonstrate superior results for tasks such as shape autoencoding, generation, interpolation, and single-view 3D reconstruction, particularly in terms of visual quality.
In this work, we consider the distributed optimization of non-smooth convex functions using a network of computing units. We investigate this problem under two regularity assumptions: (1) the Lipschitz continuity of the global objective function, and (2) the Lipschitz continuity of local individual functions. Under the local regularity assumption, we provide the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate. A notable aspect of this result is that, for non-smooth functions, while the dominant term of the error is in $O(1/\sqrt{t})$, the structure of the communication network only impacts a second-order term in $O(1/t)$, where $t$ is time. In other words, the error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions. Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.