In this paper we discuss the application of Artificial Intelligence (AI) to the exemplary industrial use case of the two-dimensional commissioning problem in a high-bay storage, which essentially can be phrased as an instance of Traveling Salesperson Problem (TSP). We investigate the mlrose library that provides an TSP optimizer based on various heuristic optimization techniques. Our focus is on two methods, namely Genetic Algorithm (GA) and Hill Climbing (HC), which are provided by mlrose. We present improvements for both methods that yield shorter tour lengths, by moderately exploiting the problem structure of TSP. That is, the proposed improvements have a generic character and are not limited to TSP only.
The $k$-Opt heuristic is a simple improvement heuristic for the Traveling Salesman Problem. It starts with an arbitrary tour and then repeatedly replaces $k$ edges of the tour by $k$ other edges, as long as this yields a shorter tour. We will prove that for 2-dimensional Euclidean Traveling Salesman Problems with $n$ cities the approximation ratio of the $k$-Opt heuristic is $\Theta(\log n / \log \log n)$. This improves the upper bound of $O(\log n)$ given by Chandra, Karloff, and Tovey in 1999 and provides for the first time a non-trivial lower bound for the case $k\ge 3$. Our results not only hold for the Euclidean norm but extend to arbitrary $p$-norms with $1 \le p < \infty$.
Perception is a process that requires a great deal of mental processing, which provides the means by which one's concept of the environment is created and which helps one learn and interact with it. The compilation of previous studies throughout history has led to the conclusion that auditory performance improves when combined with visual stimuli and vice versa. Taking into account the previous consideration, in the present work the two sensory pathways (vision and hearing) were used with the intention of carrying out a series of multisensory training, which were presented in different instances and with the purpose of introducing sound as a signal detection tool. A web development was also included to create a site that would allow the execution of the designed training, which is still in development due to difficulties that arose and exceed the limits of this final work. The work described in this report gave rise to a future doctoral thesis, which has a CONICET scholarship, where the development of new training and the continuous development of the website that will allow its execution are proposed.
Artistic pieces can be studied from several perspectives, one example being their reception among readers over time. In the present work, we approach this interesting topic from the standpoint of literary works, particularly assessing the task of predicting whether a book will become a best seller. Dissimilarly from previous approaches, we focused on the full content of books and considered visualization and classification tasks. We employed visualization for the preliminary exploration of the data structure and properties, involving SemAxis and linear discriminant analyses. Then, to obtain quantitative and more objective results, we employed various classifiers. Such approaches were used along with a dataset containing (i) books published from 1895 to 1924 and consecrated as best sellers by the Publishers Weekly Bestseller Lists and (ii) literary works published in the same period but not being mentioned in that list. Our comparison of methods revealed that the best-achieved result - combining a bag-of-words representation with a logistic regression classifier - led to an average accuracy of 0.75 both for the leave-one-out and 10-fold cross-validations. Such an outcome suggests that it is unfeasible to predict the success of books with high accuracy using only the full content of the texts. Nevertheless, our findings provide insights into the factors leading to the relative success of a literary work.
A salient characteristic of large pre-trained language models (PTLMs) is a remarkable improvement in their generalization capability and emergence of new capabilities with increasing model capacity and pre-training dataset size. Consequently, we are witnessing the development of enormous models pushing the state-of-the-art. It is, however, imperative to realize that this inevitably leads to prohibitively long training times, extortionate computing costs, and a detrimental environmental impact. Significant efforts are underway to make PTLM training more efficient through innovations in model architectures, training pipelines, and loss function design, with scant attention being paid to optimizing the utility of training data. The key question that we ask is whether it is possible to train PTLMs by employing only highly informative subsets of the training data while maintaining downstream performance? Building upon the recent progress in informative data subset selection, we show how we can employ submodular optimization to select highly representative subsets of the training corpora. Our results demonstrate that the proposed framework can be applied to efficiently train multiple PTLMs (BERT, BioBERT, GPT-2) using only a fraction of data while retaining up to $\sim99\%$ of the performance of the fully-trained models.
A well-known problem when learning from user clicks are inherent biases prevalent in the data, such as position or trust bias. Click models are a common method for extracting information from user clicks, such as document relevance in web search, or to estimate click biases for downstream applications such as counterfactual learning-to-rank, ad placement, or fair ranking. Recent work shows that the current evaluation practices in the community fail to guarantee that a well-performing click model generalizes well to downstream tasks in which the ranking distribution differs from the training distribution, i.e., under covariate shift. In this work, we propose an evaluation metric based on conditional independence testing to detect a lack of robustness to covariate shift in click models. We introduce the concept of debiasedness and a metric for measuring it. We prove that debiasedness is a necessary condition for recovering unbiased and consistent relevance scores and for the invariance of click prediction under covariate shift. In extensive semi-synthetic experiments, we show that our proposed metric helps to predict the downstream performance of click models under covariate shift and is useful in an off-policy model selection setting.
In view of the extended formulations (EFs) developments (e.g. "Fiorini, S., S. Massar, S. Pokutta, H.R. Tiwary, and R. de Wolf [2015]. Exponential Lower Bounds for Polytopes in Combinatorial Optimization. Journal of the ACM 62:2"), we focus in this paper on the question of whether it is possible to model an NP-Complete problem as a polynomial-sized linear program. For the sake of simplicity of exposition, the discussions are focused on the TSP. We show that a finding that there exists no polynomial-sized extended formulation of "the TSP polytope" does not (necessarily) imply that it is "impossible" for a polynomial-sized linear program to solve the TSP optimization problem. We show that under appropriate conditions the TSP optimization problem can be solved without recourse to the traditional city-to-city ("travel leg") variables, thereby side-stepping/"escaping from" "the TSP polytope" and hence, the barriers. Some illustrative examples are discussed.
We obtain bounds to quantify the distributional approximation in the delta method for vector statistics (the sample mean of $n$ independent random vectors) for normal and non-normal limits, measured using smooth test functions. For normal limits, we obtain bounds of the optimal order $n^{-1/2}$ rate of convergence, but for a wide class of non-normal limits, which includes quadratic forms amongst others, we achieve bounds with a faster order $n^{-1}$ convergence rate. We apply our general bounds to derive explicit bounds to quantify distributional approximations of an estimator for Bernoulli variance, several statistics of sample moments, order $n^{-1}$ bounds for the chi-square approximation of a family of rank-based statistics, and we also provide an efficient independent derivation of an order $n^{-1}$ bound for the chi-square approximation of Pearson's statistic. In establishing our general results, we generalise recent results on Stein's method for functions of multivariate normal random vectors to vector-valued functions and sums of independent random vectors whose components may be dependent. These bounds are widely applicable and are of independent interest.
Code based Language Models (LMs) have shown very promising results in the field of software engineering with applications such as code refinement, code completion and generation. However, the task of time and space complexity classification from code has not been extensively explored due to a lack of datasets, with prior endeavors being limited to Java. In this project, we aim to address these gaps by creating a labelled dataset of code snippets spanning multiple languages (Python and C++ datasets currently, with C, C#, and JavaScript datasets being released shortly). We find that existing time complexity calculation libraries and tools only apply to a limited number of use-cases. The lack of a well-defined rule based system motivates the application of several recently proposed code-based LMs. We demonstrate the effectiveness of dead code elimination and increasing the maximum sequence length of LMs. In addition to time complexity, we propose to use LMs to find space complexities from code, and to the best of our knowledge, this is the first attempt to do so. Furthermore, we introduce a novel code comprehension task, called cross-language transfer, where we fine-tune the LM on one language and run inference on another. Finally, we visualize the activation of the attention fed classification head of our LMs using Non-negative Matrix Factorization (NMF) to interpret our results.
High complexity models are notorious in machine learning for overfitting, a phenomenon in which models well represent data but fail to generalize an underlying data generating process. A typical procedure for circumventing overfitting computes empirical risk on a holdout set and halts once (or flags that/when) it begins to increase. Such practice often helps in outputting a well-generalizing model, but justification for why it works is primarily heuristic. We discuss the overfitting problem and explain why standard asymptotic and concentration results do not hold for evaluation with training data. We then proceed to introduce and argue for a hypothesis test by means of which both model performance may be evaluated using training data, and overfitting quantitatively defined and detected. We rely on said concentration bounds which guarantee that empirical means should, with high probability, approximate their true mean to conclude that they should approximate each other. We stipulate conditions under which this test is valid, describe how the test may be used for identifying overfitting, articulate a further nuance according to which distributional shift may be flagged, and highlight an alternative notion of learning which usefully captures generalization in the absence of uniform PAC guarantees.
Since deep neural networks were developed, they have made huge contributions to everyday lives. Machine learning provides more rational advice than humans are capable of in almost every aspect of daily life. However, despite this achievement, the design and training of neural networks are still challenging and unpredictable procedures. To lower the technical thresholds for common users, automated hyper-parameter optimization (HPO) has become a popular topic in both academic and industrial areas. This paper provides a review of the most essential topics on HPO. The first section introduces the key hyper-parameters related to model training and structure, and discusses their importance and methods to define the value range. Then, the research focuses on major optimization algorithms and their applicability, covering their efficiency and accuracy especially for deep learning networks. This study next reviews major services and toolkits for HPO, comparing their support for state-of-the-art searching algorithms, feasibility with major deep learning frameworks, and extensibility for new modules designed by users. The paper concludes with problems that exist when HPO is applied to deep learning, a comparison between optimization algorithms, and prominent approaches for model evaluation with limited computational resources.