We study polynomial-time approximation schemes (PTASes) for constraint satisfaction problems (CSPs) such as Maximum Independent Set or Minimum Vertex Cover on sparse graph classes. Baker's approach gives a PTAS on planar graphs, excluded-minor classes, and beyond. For Max-CSPs, and even more generally, maximisation finite-valued CSPs (where constraints are arbitrary non-negative functions), Romero, Wrochna, and \v{Z}ivn\'y [SODA'21] showed that the Sherali-Adams LP relaxation gives a simple PTAS for all fractionally-treewidth-fragile classes, which is the most general "sparsity" condition for which a PTAS is known. We extend these results to general-valued CSPs, which include "crisp" (or "strict") constraints that have to be satisfied by every feasible assignment. The only condition on the crisp constraints is that their domain contains an element which is at least as feasible as all the others (but possibly less valuable). For minimisation general-valued CSPs with crisp constraints, we present a PTAS for all Baker graph classes -- a definition by Dvo\v{r}\'ak [SODA'20] which encompasses all classes where Baker's technique is known to work, except possibly for fractionally-treewidth-fragile classes. While this is standard for problems satisfying a certain monotonicity condition on crisp constraints, we show this can be relaxed to diagonalisability -- a property of relational structures connected to logics, statistical physics, and random CSPs.
Learning Markov decision processes (MDPs) in the presence of the adversary is a challenging problem in reinforcement learning (RL). In this paper, we study RL in episodic MDPs with adversarial reward and full information feedback, where the unknown transition probability function is a linear function of a given feature mapping, and the reward function can change arbitrarily episode by episode. We propose an optimistic policy optimization algorithm POWERS and show that it can achieve $\tilde{O}(dH\sqrt{T})$ regret, where $H$ is the length of the episode, $T$ is the number of interactions with the MDP, and $d$ is the dimension of the feature mapping. Furthermore, we also prove a matching lower bound of $\tilde{\Omega}(dH\sqrt{T})$ up to logarithmic factors. Our key technical contributions are two-fold: (1) a new value function estimator based on importance weighting; and (2) a tighter confidence set for the transition kernel. They together lead to the nearly minimax optimal regret.
For relational structures A, B of the same signature, the Promise Constraint Satisfaction Problem PCSP(A,B) asks whether a given input structure maps homomorphically to A or does not even map to B. We are promised that the input satisfies exactly one of these two cases. If there exists a structure C with homomorphisms $A\to C\to B$, then PCSP(A,B) reduces naturally to CSP(C). To the best of our knowledge all known tractable PCSPs reduce to tractable CSPs in this way. However Barto showed that some PCSPs over finite structures A, B require solving CSPs over infinite C. We show that even when such a reduction to finite C is possible, this structure may become arbitrarily large. For every integer $n>1$ and every prime p we give A, B of size n with a single relation of arity $n^p$ such that PCSP(A, B) reduces via a chain of homomorphisms $ A\to C\to B$ to a tractable CSP over some C of size p but not over any smaller structure. In a second family of examples, for every prime $p\geq 7$ we construct A, B of size $p-1$ with a single ternary relation such that PCSP(A, B) reduces via $A\to C\to B$ to a tractable CSP over some C of size p but not over any smaller structure. In contrast we show that if A, B are graphs and PCSP(A,B) reduces to tractable CSP(C) for some finite digraph C, then already A or B has a tractable CSP. This extends results and answers a question of Deng et al.
Covariance estimation for matrix-valued data has received an increasing interest in applications. Unlike previous works that rely heavily on matrix normal distribution assumption and the requirement of fixed matrix size, we propose a class of distribution-free regularized covariance estimation methods for high-dimensional matrix data under a separability condition and a bandable covariance structure. Under these conditions, the original covariance matrix is decomposed into a Kronecker product of two bandable small covariance matrices representing the variability over row and column directions. We formulate a unified framework for estimating bandable covariance, and introduce an efficient algorithm based on rank one unconstrained Kronecker product approximation. The convergence rates of the proposed estimators are established, and the derived minimax lower bound shows our proposed estimator is rate-optimal under certain divergence regimes of matrix size. We further introduce a class of robust covariance estimators and provide theoretical guarantees to deal with heavy-tailed data. We demonstrate the superior finite-sample performance of our methods using simulations and real applications from a gridded temperature anomalies dataset and a S&P 500 stock data analysis.
Classification methods for binary (yes/no) tasks often produce a continuously valued score. Machine learning practitioners must perform model selection, calibration, discretization, performance assessment, tuning, and fairness assessment. Such tasks involve examining classifier results, typically using summary statistics and manual examination of details. In this paper, we provide an interactive visualization approach to support such continuously-valued classifier examination tasks. Our approach addresses the three phases of these tasks: calibration, operating point selection, and examination. We enhance standard views and introduce task-specific views so that they can be integrated into a multi-view coordination (MVC) system. We build on an existing comparison-based approach, extending it to continuous classifiers by treating the continuous values as trinary (positive, unsure, negative) even if the classifier will not ultimately use the 3-way classification. We provide use cases that demonstrate how our approach enables machine learning practitioners to accomplish key tasks.
In this article we suggest two discretization methods based on isogeometric analysis (IGA) for planar linear elasticity. On the one hand, we apply the well-known ansatz of weakly imposed symmetry for the stress tensor and obtain a well-posed mixed formulation. Such modified mixed problems have been already studied by different authors. But we concentrate on the exploitation of IGA results to handle also curved boundary geometries. On the other hand, we consider the more complicated situation of strong symmetry, i.e. we discretize the mixed weak form determined by the so-called Hellinger-Reissner variational principle. We show the existence of suitable approximate fields leading to an inf-sup stable saddle-point problem. For both discretization approaches we prove convergence statements and in case of weak symmetry we illustrate the approximation behavior by means of several numerical experiments.
In the storied Colonel Blotto game, two colonels allocate $a$ and $b$ troops, respectively, to $k$ distinct battlefields. A colonel wins a battle if they assign more troops to that particular battle, and each colonel seeks to maximize their total number of victories. Despite the problem's formulation in 1921, the first polynomial-time algorithm to compute Nash equilibrium (NE) strategies for this game was discovered only quite recently. In 2016, \citep{ahmadinejad_dehghani_hajiaghayi_lucier_mahini_seddighin_2019} formulated a breakthrough algorithm to compute NE strategies for the Colonel Blotto game\footnote{To the best of our knowledge, the algorithm from \citep{ahmadinejad_dehghani_hajiaghayi_lucier_mahini_seddighin_2019} has computational complexity $O(k^{14}\max\{a,b\}^{13})$}, receiving substantial media coverage (e.g. \citep{Insider}, \citep{NSF}, \citep{ScienceDaily}). In this work, we present the first known $\epsilon$-approximation algorithm to compute NE strategies in the two-player Colonel Blotto game in runtime $\widetilde{O}(\epsilon^{-4} k^8 \max\{a,b\}^2)$ for arbitrary settings of these parameters. Moreover, this algorithm computes approximate coarse correlated equilibrium strategies in the multiplayer (continuous and discrete) Colonel Blotto game (when there are $\ell > 2$ colonels) with runtime $\widetilde{O}(\ell \epsilon^{-4} k^8 n^2 + \ell^2 \epsilon^{-2} k^3 n (n+k))$, where $n$ is the maximum troop count. Before this work, no polynomial-time algorithm was known to compute exact or approximate equilibrium (in any sense) strategies for multiplayer Colonel Blotto with arbitrary parameters. Our algorithm computes these approximate equilibria by a novel (to the author's knowledge) sampling technique with which we implicitly perform multiplicative weights update over the exponentially many strategies available to each player.
While neural architecture search (NAS) has enabled automated machine learning (AutoML) for well-researched areas, its application to tasks beyond computer vision is still under-explored. As less-studied domains are precisely those where we expect AutoML to have the greatest impact, in this work we study NAS for efficiently solving diverse problems. Seeking an approach that is fast, simple, and broadly applicable, we fix a standard convolutional network (CNN) topology and propose to search for the right kernel sizes and dilations its operations should take on. This dramatically expands the model's capacity to extract features at multiple resolutions for different types of data while only requiring search over the operation space. To overcome the efficiency challenges of naive weight-sharing in this search space, we introduce DASH, a differentiable NAS algorithm that computes the mixture-of-operations using the Fourier diagonalization of convolution, achieving both a better asymptotic complexity and an up-to-10x search time speedup in practice. We evaluate DASH on NAS-Bench-360, a suite of ten tasks designed for benchmarking NAS in diverse domains. DASH outperforms state-of-the-art methods in aggregate, attaining the best-known automated performance on seven tasks. Meanwhile, on six of the ten tasks, the combined search and retraining time is less than 2x slower than simply training a CNN backbone that is far less accurate.
In this work we study the decidability of a class of global modal logics arising from Kripke frames evaluated over certain residuated lattices, known in the literature as modal many-valued logics. We exhibit a large family of these modal logics which are undecidable, in contrast with classical modal logic and propositional logics defined over the same classes of algebras. This family includes the global modal logics arising from Kripke frames evaluated over the standard Lukasiewicz and Product algebras. We later refine the previous result, and prove that global modal Lukasiewicz and Product logics are not even recursively axiomatizable. We conclude by solving negatively the open question of whether each global modal logic coincides with its local modal logic closed under the unrestricted necessitation rule.
Sparse decision tree optimization has been one of the most fundamental problems in AI since its inception and is a challenge at the core of interpretable machine learning. Sparse decision tree optimization is computationally hard, and despite steady effort since the 1960's, breakthroughs have only been made on the problem within the past few years, primarily on the problem of finding optimal sparse decision trees. However, current state-of-the-art algorithms often require impractical amounts of computation time and memory to find optimal or near-optimal trees for some real-world datasets, particularly those having several continuous-valued features. Given that the search spaces of these decision tree optimization problems are massive, can we practically hope to find a sparse decision tree that competes in accuracy with a black box machine learning model? We address this problem via smart guessing strategies that can be applied to any optimal branch-and-bound-based decision tree algorithm. We show that by using these guesses, we can reduce the run time by multiple orders of magnitude, while providing bounds on how far the resulting trees can deviate from the black box's accuracy and expressive power. Our approach enables guesses about how to bin continuous features, the size of the tree, and lower bounds on the error for the optimal decision tree. Our experiments show that in many cases we can rapidly construct sparse decision trees that match the accuracy of black box models. To summarize: when you are having trouble optimizing, just guess.
In this paper we propose a Bayesian nonparametric approach to modelling sparse time-varying networks. A positive parameter is associated to each node of a network, which models the sociability of that node. Sociabilities are assumed to evolve over time, and are modelled via a dynamic point process model. The model is able to capture long term evolution of the sociabilities. Moreover, it yields sparse graphs, where the number of edges grows subquadratically with the number of nodes. The evolution of the sociabilities is described by a tractable time-varying generalised gamma process. We provide some theoretical insights into the model and apply it to three datasets: a simulated network, a network of hyperlinks between communities on Reddit, and a network of co-occurences of words in Reuters news articles after the September 11th attacks.