In statistics, it is common to encounter multi-modal and non-smooth likelihood (or objective function) maximization problems, where the parameters have known upper and lower bounds. This paper proposes a novel derivative-free global optimization technique that can be used to solve those problems even when the objective function is not known explicitly or its derivatives are difficult or expensive to obtain. The technique is based on the pattern search algorithm, which has been shown to be effective for black-box optimization problems. The proposed algorithm works by iteratively generating new solutions from the current solution. The new solutions are generated by making movements along the coordinate axes of the constrained sample space. Before making a jump from the current solution to a new solution, the objective function is evaluated at several neighborhood points around the current solution. The best solution point is then chosen based on the objective function values at those points. Parallel threading can be used to make the algorithm more scalable. The performance of the proposed method is evaluated by optimizing up to 5000-dimensional multi-modal benchmark functions. The proposed algorithm is shown to be up to 40 and 368 times faster than genetic algorithm (GA) and simulated annealing (SA), respectively. The proposed method is also used to estimate the optimal biomarker combination from Alzheimer's disease data by maximizing the empirical estimates of the area under the receiver operating characteristic curve (AUC), outperforming the contextual popular alternative, known as step-down algorithm.
The rising popularity of deep learning (DL) methods and techniques has invigorated interest in the topic of SE4DL, the application of software engineering (SE) practices on deep learning software. Despite the novel engineering challenges brought on by the data-driven and non-deterministic paradigm of DL software, little work has been invested into developing AI-targeted SE tools. On the other hand, tools tackling more general engineering issues in DL are actively used and referred to under the umbrella term of ``MLOps tools''. Furthermore, the available literature supports the utility of conventional SE tooling in DL software development. Building upon previous MSR research on tool usage in open-source software works, we identify conventional and MLOps tools adopted in popular applied DL projects that use Python as the main programming language. About 70% of the GitHub repositories mined contained at least one conventional SE tool. Software configuration management tools are the most adopted, while the opposite applies to maintenance tools. Substantially fewer MLOps tools were in use, with only 9 tools out of a sample of 80 used in at least one repository. The majority of them were open-source rather than proprietary. One of these tools, TensorBoard, was found to be adopted in about half of the repositories in our study. Consequently, the use of conventional SE tooling demonstrates its relevance to DL software. Further research is recommended on the adoption of MLOps tooling by open-source projects, focusing on the relevance of particular tool types, the development of required tools, as well as ways to promote the use of already available tools.
In scientific research, the ability to effectively retrieve relevant documents based on complex, multifaceted queries is critical. Existing evaluation datasets for this task are limited, primarily due to the high cost and effort required to annotate resources that effectively represent complex queries. To address this, we propose a novel task, Scientific DOcument Retrieval using Multi-level Aspect-based quEries (DORIS-MAE), which is designed to handle the complex nature of user queries in scientific research. We developed a benchmark dataset within the field of computer science, consisting of 100 human-authored complex query cases. For each complex query, we assembled a collection of 100 relevant documents and produced annotated relevance scores for ranking them. Recognizing the significant labor of expert annotation, we also introduce Anno-GPT, a scalable framework for validating the performance of Large Language Models (LLMs) on expert-level dataset annotation tasks. LLM annotation of the DORIS-MAE dataset resulted in a 500x reduction in cost, without compromising quality. Furthermore, due to the multi-tiered structure of these complex queries, the DORIS-MAE dataset can be extended to over 4,000 sub-query test cases without requiring additional annotation. We evaluated 17 recent retrieval methods on DORIS-MAE, observing notable performance drops compared to traditional datasets. This highlights the need for better approaches to handle complex, multifaceted queries in scientific research. Our dataset and codebase are available at //github.com/Real-Doris-Mae/Doris-Mae-Dataset.
The Bayesian statistical framework provides a systematic approach to enhance the regularization model by incorporating prior information about the desired solution. For the Bayesian linear inverse problems with Gaussian noise and Gaussian prior, we propose a new iterative regularization algorithm that belongs to subspace projection regularization (SPR) methods. By treating the forward model matrix as a linear operator between the two underlying finite dimensional Hilbert spaces with new introduced inner products, we first introduce an iterative process that can generate a series of valid solution subspaces. The SPR method then projects the original problem onto these solution subspaces to get a series of low dimensional linear least squares problems, where an efficient procedure is developed to update the solutions of them to approximate the desired solution of the original problem. With the new designed early stopping rules, this iterative algorithm can obtain a regularized solution with a satisfied accuracy. Several theoretical results about the algorithm are established to reveal the regularization properties of it. We use both small-scale and large-scale inverse problems to test the proposed algorithm and demonstrate its robustness and efficiency. The most computationally intensive operations in the proposed algorithm only involve matrix-vector products, making it highly efficient for large-scale problems.
This work presents a comparative study to numerically compute impulse approximate controls for parabolic equations with various boundary conditions. Theoretical controllability results have been recently investigated using a logarithmic convexity estimate at a single time based on a Carleman commutator approach. We propose a numerical algorithm for computing the impulse controls with minimal $L^2$-norms by adapting a penalized Hilbert Uniqueness Method (HUM) combined with a Conjugate Gradient (CG) method. We consider static boundary conditions (Dirichlet and Neumann) and dynamic boundary conditions. Some numerical experiments based on our developed algorithm are given to validate and compare the theoretical impulse controllability results.
Deep learning algorithms have been widely used to solve linear Kolmogorov partial differential equations~(PDEs) in high dimensions, where the loss function is defined as a mathematical expectation. We propose to use the randomized quasi-Monte Carlo (RQMC) method instead of the Monte Carlo (MC) method for computing the loss function. In theory, we decompose the error from empirical risk minimization~(ERM) into the generalization error and the approximation error. Notably, the approximation error is independent of the sampling methods. We prove that the convergence order of the mean generalization error for the RQMC method is $O(n^{-1+\epsilon})$ for arbitrarily small $\epsilon>0$, while for the MC method it is $O(n^{-1/2+\epsilon})$ for arbitrarily small $\epsilon>0$. Consequently, we find that the overall error for the RQMC method is asymptotically smaller than that for the MC method as $n$ increases. Our numerical experiments show that the algorithm based on the RQMC method consistently achieves smaller relative $L^{2}$ error than that based on the MC method.
Instruction-based language modeling has received significant attention in pretrained language models. However, the efficiency of instruction engineering remains low and hinders the development of instruction studies. Recent studies have focused on automating instruction generation, but they primarily aim to improve performance without considering other crucial objectives that impact instruction quality, such as instruction length and perplexity. Therefore, we propose a novel approach (i.e., InstOptima) that treats instruction generation as an evolutionary multi-objective optimization problem. In contrast to text edition-based methods, our approach utilizes a large language model (LLM) to simulate instruction operators, including mutation and crossover. Furthermore, we introduce an objective-guided mechanism for these operators, allowing the LLM to comprehend the objectives and enhance the quality of the generated instructions. Experimental results demonstrate improved fine-tuning performance and the generation of a diverse set of high-quality instructions.
Automatic sign language recognition (SLR) is an important topic within the areas of human-computer interaction and machine learning. On the one hand, it poses a complex challenge that requires the intervention of various knowledge areas, such as video processing, image processing, intelligent systems and linguistics. On the other hand, robust recognition of sign language could assist in the translation process and the integration of hearing-impaired people, as well as the teaching of sign language for the hearing population. SLR systems usually employ Hidden Markov Models, Dynamic Time Warping or similar models to recognize signs. Such techniques exploit the sequential ordering of frames to reduce the number of hypothesis. This paper presents a general probabilistic model for sign classification that combines sub-classifiers based on different types of features such as position, movement and handshape. The model employs a bag-of-words approach in all classification steps, to explore the hypothesis that ordering is not essential for recognition. The proposed model achieved an accuracy rate of 97% on an Argentinian Sign Language dataset containing 64 classes of signs and 3200 samples, providing some evidence that indeed recognition without ordering is possible.
In the current study, our purpose is to evaluate the feasibility of applying deep learning (DL) enabled algorithms to quantify bilateral knee biomarkers in healthy controls scanned at 0.55T and compared with 3.0T. The current study assesses the performance of standard in-practice bone, and cartilage segmentation algorithms at 0.55T, both qualitatively and quantitatively, in terms of comparing segmentation performance, areas of improvement, and compartment-wise cartilage thickness values between 0.55T vs. 3.0T. Initial results demonstrate a usable to good technical feasibility of translating existing quantitative deep-learning-based image segmentation techniques, trained on 3.0T, out of 0.55T for knee MRI, in a multi-vendor acquisition environment. Especially in terms of segmenting cartilage compartments, the models perform almost equivalent to 3.0T in terms of Likert ranking. The 0.55T low-field sustainable and easy-to-install MRI, as demonstrated, thus, can be utilized for evaluating knee cartilage thickness and bone segmentations aided by established DL algorithms trained at higher-field strengths out-of-the-box initially. This could be utilized at the far-spread point-of-care locations with a lack of radiologists available to manually segment low-field images, at least till a decent base of low-field data pool is collated. With further fine-tuning with manual labeling of low-field data or utilizing synthesized higher SNR images from low-field images, OA biomarker quantification performance is potentially guaranteed to be further improved.
Due to their inherent capability in semantic alignment of aspects and their context words, attention mechanism and Convolutional Neural Networks (CNNs) are widely applied for aspect-based sentiment classification. However, these models lack a mechanism to account for relevant syntactical constraints and long-range word dependencies, and hence may mistakenly recognize syntactically irrelevant contextual words as clues for judging aspect sentiment. To tackle this problem, we propose to build a Graph Convolutional Network (GCN) over the dependency tree of a sentence to exploit syntactical information and word dependencies. Based on it, a novel aspect-specific sentiment classification framework is raised. Experiments on three benchmarking collections illustrate that our proposed model has comparable effectiveness to a range of state-of-the-art models, and further demonstrate that both syntactical information and long-range word dependencies are properly captured by the graph convolution structure.
Recently, ensemble has been applied to deep metric learning to yield state-of-the-art results. Deep metric learning aims to learn deep neural networks for feature embeddings, distances of which satisfy given constraint. In deep metric learning, ensemble takes average of distances learned by multiple learners. As one important aspect of ensemble, the learners should be diverse in their feature embeddings. To this end, we propose an attention-based ensemble, which uses multiple attention masks, so that each learner can attend to different parts of the object. We also propose a divergence loss, which encourages diversity among the learners. The proposed method is applied to the standard benchmarks of deep metric learning and experimental results show that it outperforms the state-of-the-art methods by a significant margin on image retrieval tasks.