In this work we advance the understanding of the fundamental limits of computation for Binary Polynomial Optimization (BPO), which is the problem of maximizing a given polynomial function over all binary points. In our main result we provide a novel class of BPO that can be solved efficiently both from a theoretical and computational perspective. In fact, we give a strongly polynomial-time algorithm for instances whose corresponding hypergraph is beta-acyclic. We note that the beta-acyclicity assumption is natural in several applications including relational database schemes and the lifted multicut problem on trees. Due to the novelty of our proving technique, we obtain an algorithm which is interesting also from a practical viewpoint. This is because our algorithm is very simple to implement and the running time is a polynomial of very low degree in the number of nodes and edges of the hypergraph. Our result completely settles the computational complexity of BPO over acyclic hypergraphs, since the problem is NP-hard on alpha-acyclic instances. Our algorithm can also be applied to any general BPO problem that contains beta-cycles. For these problems, the algorithm returns a smaller instance together with a rule to extend any optimal solution of the smaller instance to an optimal solution of the original instance.
Quasar convexity is a condition that allows some first-order methods to efficiently minimize a function even when the optimization landscape is non-convex. Previous works develop near-optimal accelerated algorithms for minimizing this class of functions, however, they require a subroutine of binary search which results in multiple calls to gradient evaluations in each iteration, and consequently the total number of gradient evaluations does not match a known lower bound. In this work, we show that a recently proposed continuized Nesterov acceleration can be applied to minimizing quasar convex functions and achieves the optimal bound with a high probability. Furthermore, we find that the objective functions of training generalized linear models (GLMs) satisfy quasar convexity, which broadens the applicability of the relevant algorithms, while known practical examples of quasar convexity in non-convex learning are sparse in the literature. We also show that if a smooth and one-point strongly convex, Polyak-Lojasiewicz, or quadratic-growth function satisfies quasar convexity, then attaining an accelerated linear rate for minimizing the function is possible under certain conditions, while acceleration is not known in general for these classes of functions.
A mixed graph $G$ is a graph that consists of both undirected and directed edges. An orientation of $G$ is formed by orienting all the undirected edges of $G$, i.e., converting each undirected edge $\{u,v\}$ into a directed edge that is either $(u,v)$ or $(v,u)$. The problem of finding an orientation of a mixed graph that makes it strongly connected is well understood and can be solved in linear time. Here we introduce the following orientation problem in mixed graphs. Given a mixed graph $G$, we wish to compute its maximal sets of vertices $C_1,C_2,\ldots,C_k$ with the property that by removing any edge $e$ from $G$ (directed or undirected), there is an orientation $R_i$ of $G\setminus{e}$ such that all vertices in $C_i$ are strongly connected in $R_i$. We discuss properties of those sets, and we show how to solve this problem in linear time by reducing it to the computation of the $2$-edge twinless strongly connected components of a directed graph. A directed graph $G=(V,E)$ is twinless strongly connected if it contains a strongly connected spanning subgraph without any pair of antiparallel (or twin) edges. The twinless strongly connected components (TSCCs) of a directed graph $G$ are its maximal twinless strongly connected subgraphs. A $2$-edge twinless strongly connected component (2eTSCC) of $G$ is a maximal subset of vertices $C$ such that any two vertices $u, v \in C$ are in the same twinless strongly connected component of $G \setminus e$, for any edge $e$. These concepts are motivated by several diverse applications, such as the design of road and telecommunication networks, and the structural stability of buildings.
We develop a linear time algorithm for finding the diameter of an asteroidal triple-free (AT-free) graph. Furthermore, we update the definition of polar pairs and develop new properties of polar pairs for (weak) dominating pair graphs. We prove that the problem of computing a simplicial vertex in a general graph can be accomplished in O(n^2) based on an existing reduction to the problem of finding diameter in an AT-free graph. We improve the best-known run-time complexities of several graph theoretical problems.
The Shapley value, one of the well-known allocation rules in game theory, does not take into account information about the structure of the graph, so by using the Shapley value for each hyperedge, we introduce a new allocation rule by considering their first-order combination. We proved that some of the properties that hold for Shapley and Myerson values also hold for our allocation rule. In addition, we found the relationship between our allocation rule and the Forman curvature, which plays an important role in discrete geometry.
General factors are a generalization of matchings. Given a graph $G$ with a set $\pi(v)$ of feasible degrees, called a degree constraint, for each vertex $v$ of $G$, the general factor problem is to find a (spanning) subgraph $F$ of $G$ such that $\text{deg}_F(x) \in \pi(v)$ for every $v$ of $G$. When all degree constraints are symmetric $\Delta$-matroids, the problem is solvable in polynomial time. The weighted general factor problem is to find a general factor of the maximum total weight in an edge-weighted graph. Strongly polynomial-time algorithms are only known for weighted general factor problems that are reducible to the weighted matching problem by gadget constructions. In this paper, we present the first strongly polynomial-time algorithm for a type of weighted general factor problems with real-valued edge weights that is provably not reducible to the weighted matching problem by gadget constructions.
A candidate explanation of the good empirical performance of deep neural networks is the implicit regularization effect of first order optimization methods. Inspired by this, we prove a convergence theorem for nonconvex composite optimization, and apply it to a general learning problem covering many machine learning applications, including supervised learning. We then present a deep multilayer perceptron model and prove that, when sufficiently wide, it $(i)$ leads to the convergence of gradient descent to a global optimum with a linear rate, $(ii)$ benefits from the implicit regularization effect of gradient descent, $(iii)$ is subject to novel bounds on the generalization error, $(iv)$ exhibits the lazy training phenomenon and $(v)$ enjoys learning rate transfer across different widths. The corresponding coefficients, such as the convergence rate, improve as width is further increased, and depend on the even order moments of the data generating distribution up to an order depending on the number of layers. The only non-mild assumption we make is the concentration of the smallest eigenvalue of the neural tangent kernel at initialization away from zero, which has been shown to hold for a number of less general models in contemporary works. We present empirical evidence supporting this assumption as well as our theoretical claims.
Recently, several universal methods have been proposed for online convex optimization which can handle convex, strongly convex and exponentially concave cost functions simultaneously. However, most of these algorithms have been designed with static regret minimization in mind, but this notion of regret may not be suitable for changing environments. To address this shortcoming, we propose a novel and intuitive framework for universal online optimization in dynamic environments. Unlike existing universal algorithms, our strategy does not rely on the construction of a set of experts and an accompanying meta-algorithm. Instead, we show that the problem of dynamic online optimization can be reduced to a uniclass prediction problem. By leaving the choice of uniclass loss function in the user's hands, they are able to control and optimize dynamic regret bounds, which in turn carry over into the original problem. To the best of our knowledge, this is the first paper proposing a universal approach with state-of-the-art dynamic regret guarantees even for general convex cost functions.
Boolean functions are mathematical objects with numerous applications in domains like coding theory, cryptography, and telecommunications. Finding Boolean functions with specific properties is a complex combinatorial optimization problem where the search space grows super-exponentially with the number of input variables. One common property of interest is the nonlinearity of Boolean functions. Constructing highly nonlinear Boolean functions is difficult as it is not always known what nonlinearity values can be reached in practice. In this paper, we investigate the effects of the genetic operators for bit-string encoding in optimizing nonlinearity. While several mutation and crossover operators have commonly been used, the link between the genotype they operate on and the resulting phenotype changes is mostly obscure. By observing the range of possible changes an operator can provide, as well as relative probabilities of specific transitions in the objective space, one can use this information to design a more effective combination of genetic operators. The analysis reveals interesting insights into operator effectiveness and indicates how algorithm design may improve convergence compared to an operator-agnostic genetic algorithm.
There are a lot of recent works on generalizing the spectral theory of graphs and graph partitioning to hypergraphs. There have been two broad directions toward this goal. One generalizes the notion of graph conductance to hypergraph conductance [LM16, CLTZ18]. In the second approach one can view a hypergraph as a simplicial complex and study its various topological properties [LM06, MW09, DKW16, PR17] and spectral properties [KM17, DK17, KO18a, KO18b, Opp20]. In this work, we attempt to bridge these two directions of study by relating the spectrum of {\em up-down walks} and {\em swap-walks} on the simplicial complex to hypergraph expansion. In surprising contrast to random-walks on graphs, we show that the spectral gap of swap-walks and up-down walks between level $m$ and $l$ with $1 < m \leq l$ can not be used to infer any bounds on hypergraph conductance. Moreover, we show that the spectral gap of swap-walks between $X(1)$ and $X(k-1)$ can not be used to infer any bounds on hypergraph conductance, whereas we give a Cheeger-like inequality relating the spectral of walks between level $1$ and $l$ for any $l \leq k$ to hypergraph expansion. This is a surprising difference between swaps-walks and up-down walks! Finally, we also give a construction to show that the well-studied notion of link expansion in simplicial complexes can not be used to bound hypergraph expansion in a Cheeger like manner.
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.