Per-instance algorithm selection seeks to recommend, for a given problem instance and a given performance criterion, one or several suitable algorithms that are expected to perform well for the particular setting. The selection is classically done offline, using openly available information about the problem instance or features that are extracted from the instance during a dedicated feature extraction step. This ignores valuable information that the algorithms accumulate during the optimization process. In this work, we propose an alternative, online algorithm selection scheme which we coin per-run algorithm selection. In our approach, we start the optimization with a default algorithm, and, after a certain number of iterations, extract instance features from the observed trajectory of this initial optimizer to determine whether to switch to another optimizer. We test this approach using the CMA-ES as the default solver, and a portfolio of six different optimizers as potential algorithms to switch to. In contrast to other recent work on online per-run algorithm selection, we warm-start the second optimizer using information accumulated during the first optimization phase. We show that our approach outperforms static per-instance algorithm selection. We also compare two different feature extraction principles, based on exploratory landscape analysis and time series analysis of the internal state variables of the CMA-ES, respectively. We show that a combination of both feature sets provides the most accurate recommendations for our test cases, taken from the BBOB function suite from the COCO platform and the YABBOB suite from the Nevergrad platform.
Empirical likelihood is a popular nonparametric statistical tool that does not require any distributional assumptions. In this paper, we explore the possibility of conducting variable selection via Bayesian empirical likelihood. We show theoretically that when the prior distribution satisfies certain mild conditions, the corresponding Bayesian empirical likelihood estimators are posteriorly consistent and variable selection consistent. As special cases, we show the prior of Bayesian empirical likelihood LASSO and SCAD satisfies such conditions and thus can identify the non-zero elements of the parameters with probability tending to 1. In addition, it is easy to verify that those conditions are met for other widely used priors such as ridge, elastic net and adaptive LASSO. Empirical likelihood depends on a parameter that needs to be obtained by numerically solving a non-linear equation. Thus, there exists no conjugate prior for the posterior distribution, which causes the slow convergence of the MCMC sampling algorithm in some cases. To solve this problem, we propose a novel approach, which uses an approximation distribution as the proposal. The computational results demonstrate quick convergence for the examples used in the paper. We use both simulation and real data analyses to illustrate the advantages of the proposed methods.
In this paper, we address an issue that the visually impaired commonly face while crossing intersections and propose a solution that takes form as a mobile application. The application utilizes a deep learning convolutional neural network model, LytNetV2, to output necessary information that the visually impaired may lack when without human companions or guide-dogs. A prototype of the application runs on iOS devices of versions 11 or above. It is designed for comprehensiveness, concision, accuracy, and computational efficiency through delivering the two most important pieces of information, pedestrian traffic light color and direction, required to cross the road in real-time. Furthermore, it is specifically aimed to support those facing financial burden as the solution takes the form of a free mobile application. Through the modification and utilization of key principles in MobileNetV3 such as depthwise seperable convolutions and squeeze-excite layers, the deep neural network model achieves a classification accuracy of 96% and average angle error of 6.15 degrees, while running at a frame rate of 16.34 frames per second. Additionally, the model is trained as an image classifier, allowing for a faster and more accurate model. The network is able to outperform other methods such as object detection and non-deep learning algorithms in both accuracy and thoroughness. The information is delivered through both auditory signals and vibrations, and it has been tested on seven visually impaired and has received above satisfactory responses.
Despite being tremendously overparameterized, it is appreciated that deep neural networks trained by stochastic gradient descent (SGD) generalize surprisingly well. Based on the Rademacher complexity of a pre-specified hypothesis set, different norm-based generalization bounds have been developed to explain this phenomenon. However, recent studies suggest these bounds might be problematic as they increase with the training set size, which is contrary to empirical evidence. In this study, we argue that the hypothesis set SGD explores is trajectory-dependent and thus may provide a tighter bound over its Rademacher complexity. To this end, we characterize the SGD recursion via a stochastic differential equation by assuming the incurred stochastic gradient noise follows the fractional Brownian motion. We then identify the Rademacher complexity in terms of the covering numbers and relate it to the Hausdorff dimension of the optimization trajectory. By invoking the hypothesis set stability, we derive a novel generalization bound for deep neural networks. Extensive experiments demonstrate that it predicts well the generalization gap over several common experimental interventions. We further show that the Hurst parameter of the fractional Brownian motion is more informative than existing generalization indicators such as the power-law index and the upper Blumenthal-Getoor index.
In this article, a novel identification test is proposed, which can be applied to parameteric models such as Mixture of Normal (MN) distributions, Markow Switching(MS), or Structural Autoregressive (SVAR) models. In the approach, it is assumed that model parameters are identified under the null whereas under the alternative they are not identified. Thanks to the setting, the Maximum Likelihood (ML) estimator preserves its properties under the null hypothesis. The proposed test is based on a comparison of two consistent estimators based on independent subsamples of the data set. A Wald type statistic is proposed which has a typical $\chi^2$ distribution. Finally, the method is adjusted to test if the heteroscedasticity assumption is sufficient to identify parameters of SVAR model. Its properties are evaluated with a Monte Carlo experiment, which allows non Gaussian distribution of errors and mis-specified VAR order. They indicate that the test has an asymptotically correct size. Moreover, outcomes show that the power of the test makes it suitable for empirical applications.
Weakly supervised multi-label classification (WSML) task, which is to learn a multi-label classification using partially observed labels per image, is becoming increasingly important due to its huge annotation cost. In this work, we first regard unobserved labels as negative labels, casting the WSML task into noisy multi-label classification. From this point of view, we empirically observe that memorization effect, which was first discovered in a noisy multi-class setting, also occurs in a multi-label setting. That is, the model first learns the representation of clean labels, and then starts memorizing noisy labels. Based on this finding, we propose novel methods for WSML which reject or correct the large loss samples to prevent model from memorizing the noisy label. Without heavy and complex components, our proposed methods outperform previous state-of-the-art WSML methods on several partial label settings including Pascal VOC 2012, MS COCO, NUSWIDE, CUB, and OpenImages V3 datasets. Various analysis also show that our methodology actually works well, validating that treating large loss properly matters in a weakly supervised multi-label classification. Our code is available at //github.com/snucml/LargeLossMatters.
Most obstacle avoidance algorithms are only effective in specific environments, and they have low adaptability to some new environments. In this paper, we propose a trajectory learning (TL)-based obstacle avoidance algorithm, which can learn implicit obstacle avoidance mechanism from trajectories generated by general obstacle avoidance algorithms and achieves better adaptability. Specifically, we define a general data structure to describe the obstacle avoidance mechanism. Based on this structure, we transform the learning of the obstacle avoidance algorithm into a multiclass classification problem about the direction selection. Then, we design an artificial neural network (ANN) to fit multiclass classification function through supervised learning and finally obtain the obstacle avoidance mechanism that generates the observed trajectories. Our algorithm can obtain the obstacle avoidance mechanism similar to that demonstrated in the trajectories, and are adaptable to unseen environments. The automatic learning mechanism simplifies modification and debugging of obstacle avoidance algorithms in applications. Simulation results demonstrate that the proposed algorithm can learn obstacle avoidance strategy from trajectories and achieve better adaptability.
Landscape-aware algorithm selection approaches have so far mostly been relying on landscape feature extraction as a preprocessing step, independent of the execution of optimization algorithms in the portfolio. This introduces a significant overhead in computational cost for many practical applications, as features are extracted and computed via sampling and evaluating the problem instance at hand, similarly to what the optimization algorithm would perform anyway within its search trajectory. As suggested in Jankovic et al. (EvoAPPs 2021), trajectory-based algorithm selection circumvents the problem of costly feature extraction by computing landscape features from points that a solver sampled and evaluated during the optimization process. Features computed in this manner are used to train algorithm performance regression models, upon which a per-run algorithm selector is then built. In this work, we apply the trajectory-based approach onto a portfolio of five algorithms. We study the quality and accuracy of performance regression and algorithm selection models in the scenario of predicting different algorithm performances after a fixed budget of function evaluations. We rely on landscape features of the problem instance computed using one portion of the aforementioned budget of the same function evaluations. Moreover, we consider the possibility of switching between the solvers once, which requires them to be warm-started, i.e. when we switch, the second solver continues the optimization process already being initialized appropriately by making use of the information collected by the first solver. In this new context, we show promising performance of the trajectory-based per-run algorithm selection with warm-starting.
Automated machine learning (AutoML) frameworks have become important tools in the data scientists' arsenal, as they dramatically reduce the manual work devoted to the construction of ML pipelines. Such frameworks intelligently search among millions of possible ML pipelines - typically containing feature engineering, model selection and hyper parameters tuning steps - and finally output an optimal pipeline in terms of predictive accuracy. However, when the dataset is large, each individual configuration takes longer to execute, therefore the overall AutoML running times become increasingly high. To this end, we present SubStrat, an AutoML optimization strategy that tackles the data size, rather than configuration space. It wraps existing AutoML tools, and instead of executing them directly on the entire dataset, SubStrat uses a genetic-based algorithm to find a small yet representative data subset which preserves a particular characteristic of the full data. It then employs the AutoML tool on the small subset, and finally, it refines the resulted pipeline by executing a restricted, much shorter, AutoML process on the large dataset. Our experimental results, performed on two popular AutoML frameworks, Auto-Sklearn and TPOT, show that SubStrat reduces their running times by 79% (on average), with less than 2% average loss in the accuracy of the resulted ML pipeline.
Millimeter-wave and terahertz systems rely on beamforming/combining codebooks for finding the best beam directions during the initial access procedure. Existing approaches suffer from large codebook sizes and high beam searching overhead in the presence of mobile devices. To alleviate this problem, we suggest utilizing the similarity of the channel in adjacent locations to divide the UE trajectory into a set of separate regions and maintain a set of candidate paths for each region in a database. In this paper, we show the tradeoff between the number of regions and the signalling overhead, i.e., higher number of regions corresponds to higher signal-to-noise ratio (SNR) but also higher signalling overhead for the database. We then propose an optimization framework to find the minimum number of regions based on the trajectory of a mobile device. Using realistic ray tracing datasets, we demonstrate that the proposed method reduces the beam searching complexity and latency while providing high SNR.
Video anomaly detection under weak labels is formulated as a typical multiple-instance learning problem in previous works. In this paper, we provide a new perspective, i.e., a supervised learning task under noisy labels. In such a viewpoint, as long as cleaning away label noise, we can directly apply fully supervised action classifiers to weakly supervised anomaly detection, and take maximum advantage of these well-developed classifiers. For this purpose, we devise a graph convolutional network to correct noisy labels. Based upon feature similarity and temporal consistency, our network propagates supervisory signals from high-confidence snippets to low-confidence ones. In this manner, the network is capable of providing cleaned supervision for action classifiers. During the test phase, we only need to obtain snippet-wise predictions from the action classifier without any extra post-processing. Extensive experiments on 3 datasets at different scales with 2 types of action classifiers demonstrate the efficacy of our method. Remarkably, we obtain the frame-level AUC score of 82.12% on UCF-Crime.