In this paper we present a novel mathematical optimization-based methodology to construct tree-shaped classification rules for multiclass instances. Our approach consists of building Classification Trees in which, except for the leaf nodes, the labels are temporarily left out and grouped into two classes by means of a SVM separating hyperplane. We provide a Mixed Integer Non Linear Programming formulation for the problem and report the results of an extended battery of computational experiments to assess the performance of our proposal with respect to other benchmarking classification methods.
A \emph{general branch-and-bound tree} is a branch-and-bound tree which is allowed to use general disjunctions of the form $\pi^{\top} x \leq \pi_0 \,\vee\, \pi^{\top}x \geq \pi_0 + 1$, where $\pi$ is an integer vector and $\pi_0$ is an integer scalar, to create child nodes. We construct a packing instance, a set covering instance, and a Traveling Salesman Problem instance, such that any general branch-and-bound tree that solves these instances must be of exponential size. We also verify that an exponential lower bound on the size of general branch-and-bound trees persists when we add Gaussian noise to the coefficients of the cross polytope, thus showing that polynomial-size "smoothed analysis" upper bound is not possible. The results in this paper can be viewed as the branch-and-bound analog of the seminal paper by Chv\'atal et al. \cite{chvatal1989cutting}, who proved lower bounds for the Chv\'atal-Gomory rank.
We propose a survey of the research contributions on the field of Educational Timetabling with a specific focus on "standard" formulations and the corresponding benchmark instances. We identify six of such formulations and we discuss their features, pointing out their relevance and usability. Other available formulations and datasets are also reviewed and briefly discussed. Subsequently, we report the main state-of-the-art results on the selected benchmarks, in terms of solution quality (upper and lower bounds), search techniques, running times, statistical distributions, and other side settings.
Transition systems (TS) and Petri nets (PN) are important models of computation ubiquitous in formal methods for modeling systems. An important problem is how to extract from a given TS a PN whose reachability graph is equivalent (with a suitable notion of equivalence) to the original TS. This paper addresses the decomposition of transition systems into synchronizing state machines (SMs), which are a class of Petri nets where each transition has one incoming and one outgoing arc and all markings have exactly one token. This is an important case of the general problem of extracting a PN from a TS. The decomposition is based on the theory of regions, and it is shown that a property of regions called excitation-closure is a sufficient condition to guarantee the equivalence between the original TS and a decomposition into SMs. An efficient algorithm is provided which solves the problem by reducing its critical steps to the maximal independent set problem (to compute a minimal set of irredundant SMs) or to satisfiability (to merge the SMs). We report experimental results that show a good trade-off between quality of results vs. computation time.
The main result in this paper is an error estimate for interpolation biharmonic polysplines in an annulus $A\left( r_{1},r_{N}\right) $, with respect to a partition by concentric annular domains $A\left( r_{1} ,r_{2}\right) ,$ ...., $A\left( r_{N-1},r_{N}\right) ,$ for radii $0<r_{1}<....<r_{N}.$ The biharmonic polysplines interpolate a smooth function on the spheres $\left\vert x\right\vert =r_{j}$ for $j=1,...,N$ and satisfy natural boundary conditions for $\left\vert x\right\vert =r_{1}$ and $\left\vert x\right\vert =r_{N}.$ By analogy with a technique in one-dimensional spline theory established by C. de Boor, we base our proof on error estimates for harmonic interpolation splines with respect to the partition by the annuli $A\left( r_{j-1},r_{j}\right) $. For these estimates it is important to determine the smallest constant $c\left( \Omega\right) ,$ where $\Omega=A\left( r_{j-1},r_{j}\right) ,$ among all constants $c$ satisfying \[ \sup_{x\in\Omega}\left\vert f\left( x\right) \right\vert \leq c\sup _{x\in\Omega}\left\vert \Delta f\left( x\right) \right\vert \] for all $f\in C^{2}\left( \Omega\right) \cap C\left( \overline{\Omega }\right) $ vanishing on the boundary of the bounded domain $\Omega$ . In this paper we describe $c\left( \Omega\right) $ for an annulus $\Omega=A\left( r,R\right) $ and we will give the estimate \[ \min\{\frac{1}{2d},\frac{1}{8}\}\left( R-r\right) ^{2}\leq c\left( A\left( r,R\right) \right) \leq\max\{\frac{1}{2d},\frac{1}{8}\}\left( R-r\right) ^{2}% \] where $d$ is the dimension of the underlying space.
Supervised classification techniques use training samples to learn a classification rule with small expected 0-1-loss (error probability). Conventional methods enable tractable learning and provide out-of-sample generalization by using surrogate losses instead of the 0-1-loss and considering specific families of rules (hypothesis classes). This paper presents minimax risk classifiers (MRCs) that minimize the worst-case 0-1-loss over general classification rules and provide tight performance guarantees at learning. We show that MRCs are strongly universally consistent using feature mappings given by characteristic kernels. The paper also proposes efficient optimization techniques for MRC learning and shows that the methods presented can provide accurate classification together with tight performance guarantees
Variance-based global sensitivity analysis (GSA) can provide a wealth of information when applied to complex models. A well-known Achilles' heel of this approach is its computational cost which often renders it unfeasible in practice. An appealing alternative is to analyze instead the sensitivity of a surrogate model with the goal of lowering computational costs while maintaining sufficient accuracy. Should a surrogate be "simple" enough to be amenable to the analytical calculations of its Sobol' indices, the cost of GSA is essentially reduced to the construction of the surrogate. We propose a new class of sparse weight Extreme Learning Machines (SW-ELMs) which, when considered as surrogates in the context of GSA, admit analytical formulas for their Sobol' indices and, unlike the standard ELMs, yield accurate approximations of these indices. The effectiveness of this approach is illustrated through both traditional benchmarks in the field and on a chemical reaction network.
This work is a systematical analysis on the so-called hard class problem in zero-shot learning (ZSL), that is, some unseen classes disproportionally affect the ZSL performances than others, as well as how to remedy the problem by detecting and exploiting hard classes. At first, we report our empirical finding that the hard class problem is a ubiquitous phenomenon and persists regardless of used specific methods in ZSL. Then, we find that high semantic affinity among unseen classes is a plausible underlying cause of hardness and design two metrics to detect hard classes. Finally, two frameworks are proposed to remedy the problem by detecting and exploiting hard classes, one under inductive setting, the other under transductive setting. The proposed frameworks could accommodate most existing ZSL methods to further significantly boost their performances with little efforts. Extensive experiments on three popular benchmarks demonstrate the benefits by identifying and exploiting the hard classes in ZSL.
Deep neural networks and decision trees operate on largely separate paradigms; typically, the former performs representation learning with pre-specified architectures, while the latter is characterised by learning hierarchies over pre-specified features with data-driven architectures. We unite the two via adaptive neural trees (ANTs), a model that incorporates representation learning into edges, routing functions and leaf nodes of a decision tree, along with a backpropagation-based training algorithm that adaptively grows the architecture from primitive modules (e.g., convolutional layers). ANTs allow increased interpretability via hierarchical clustering, e.g., learning meaningful class associations, such as separating natural vs. man-made objects. We demonstrate this on classification and regression tasks, achieving over 99% and 90% accuracy on the MNIST and CIFAR-10 datasets, and outperforming standard neural networks, random forests and gradient boosted trees on the SARCOS dataset. Furthermore, ANT optimisation naturally adapts the architecture to the size and complexity of the training data.
We investigate video classification via a two-stream convolutional neural network (CNN) design that directly ingests information extracted from compressed video bitstreams. Our approach begins with the observation that all modern video codecs divide the input frames into macroblocks (MBs). We demonstrate that selective access to MB motion vector (MV) information within compressed video bitstreams can also provide for selective, motion-adaptive, MB pixel decoding (a.k.a., MB texture decoding). This in turn allows for the derivation of spatio-temporal video activity regions at extremely high speed in comparison to conventional full-frame decoding followed by optical flow estimation. In order to evaluate the accuracy of a video classification framework based on such activity data, we independently train two CNN architectures on MB texture and MV correspondences and then fuse their scores to derive the final classification of each test video. Evaluation on two standard datasets shows that the proposed approach is competitive to the best two-stream video classification approaches found in the literature. At the same time: (i) a CPU-based realization of our MV extraction is over 977 times faster than GPU-based optical flow methods; (ii) selective decoding is up to 12 times faster than full-frame decoding; (iii) our proposed spatial and temporal CNNs perform inference at 5 to 49 times lower cloud computing cost than the fastest methods from the literature.
In multi-task learning, a learner is given a collection of prediction tasks and needs to solve all of them. In contrast to previous work, which required that annotated training data is available for all tasks, we consider a new setting, in which for some tasks, potentially most of them, only unlabeled training data is provided. Consequently, to solve all tasks, information must be transferred between tasks with labels and tasks without labels. Focusing on an instance-based transfer method we analyze two variants of this setting: when the set of labeled tasks is fixed, and when it can be actively selected by the learner. We state and prove a generalization bound that covers both scenarios and derive from it an algorithm for making the choice of labeled tasks (in the active case) and for transferring information between the tasks in a principled way. We also illustrate the effectiveness of the algorithm by experiments on synthetic and real data.