Neural architecture search (NAS) enables the automatic design of neural network models. However, training the candidates generated by the search algorithm for performance evaluation incurs considerable computational overhead. Our method, dubbed nasgraph, remarkably reduces the computational costs by converting neural architectures to graphs and using the average degree, a graph measure, as the proxy in lieu of the evaluation metric. Our training-free NAS method is data-agnostic and light-weight. It can find the best architecture among 200 randomly sampled architectures from NAS-Bench201 in 217 CPU seconds. Besides, our method is able to achieve competitive performance on various datasets including NASBench-101, NASBench-201, and NDS search spaces. We also demonstrate that nasgraph generalizes to more challenging tasks on Micro TransNAS-Bench-101.
There is a recent boom in the development of AI solutions to facilitate and enhance diagnostic procedures for established clinical tools. To assess the integrity of the developing nervous system, the Prechtl general movement assessment (GMA) is recognized for its clinical value in the diagnosis of neurological impairments in early infancy. GMA has been increasingly augmented through machine learning approaches intending to scale-up its application, circumvent costs in the training of human assessors and further standardize classification of spontaneous motor patterns. Available deep learning tools, all of which are based on single sensor modalities, are however still considerably inferior to that of well-trained human assessors. These approaches are hardly comparable as all models are designed, trained and evaluated on proprietary/ silo-data sets. We propose a sensor fusion approach for assessing fidgety movements (FMs) comparing three different sensor modalities (pressure, inertial, and visual sensors). Various combinations and two sensor fusion approaches (late and early fusion) for infant movement classification were tested to evaluate whether a multi-sensor system outperforms single modality assessments. The performance of the three-sensor fusion (classification accuracy of 94.5\%) was significantly higher than that of any single modality evaluated, suggesting the sensor fusion approach is a promising avenue for automated classification of infant motor patterns. The development of a robust sensor fusion system may significantly enhance AI-based early recognition of neurofunctions, ultimately facilitating early implementation of automated detection of neurodevelopmental conditions.
Linear structural vector autoregressive models can be identified statistically without imposing restrictions on the model if the shocks are mutually independent and at most one of them is Gaussian. We show that this result extends to structural threshold and smooth transition vector autoregressive models incorporating a time-varying impact matrix defined as a weighted sum of the impact matrices of the regimes. We also discuss labelling of the shocks, maximum likelihood estimation of the parameters, and stationarity the model. The introduced methods are implemented to the accompanying R package sstvars. Our empirical application studies the effects of the climate policy uncertainty shock on the U.S. macroeconomy. In a structural logistic smooth transition vector autoregressive model consisting of two regimes, we find that a positive climate policy uncertainty shock decreases production in times of low economic policy uncertainty but slightly increases it in times of high economic policy uncertainty.
This paper introduces the Lagrange Policy for Continuous Actions (LPCA), a reinforcement learning algorithm specifically designed for weakly coupled MDP problems with continuous action spaces. LPCA addresses the challenge of resource constraints dependent on continuous actions by introducing a Lagrange relaxation of the weakly coupled MDP problem within a neural network framework for Q-value computation. This approach effectively decouples the MDP, enabling efficient policy learning in resource-constrained environments. We present two variations of LPCA: LPCA-DE, which utilizes differential evolution for global optimization, and LPCA-Greedy, a method that incrementally and greadily selects actions based on Q-value gradients. Comparative analysis against other state-of-the-art techniques across various settings highlight LPCA's robustness and efficiency in managing resource allocation while maximizing rewards.
Mathematical modeling offers the opportunity to test hypothesis concerning Myeloproliferative emergence and development. We tested different mathematical models based on a training cohort (n=264 patients) (Registre de la c\^ote d'Or) to determine the emergence and evolution times before JAK2V617F classical Myeloproliferative disorders (respectively Polycythemia Vera and Essential Thrombocytemia) are diagnosed. We dissected the time before diagnosis as two main periods: the time from embryonic development for the JAK2V617F mutation to occur, not disappear and enter in proliferation, and a second time corresponding to the expansion of the clonal population until diagnosis. We demonstrate using progressively complexified models that the rate of active mutation occurrence is not constant and doesn't just rely on individual variability, but rather increases with age and takes a median time of 63.1+/-13 years. A contrario, the expansion time can be considered as constant: 8.8 years once the mutation has emerged. Results were validated in an external cohort (national FIMBANK Cohort, n=1248 patients). Analyzing JAK2V617F Essential Thrombocytema versus Polycythemia Vera, we noticed that the first period of time (rate of active homozygous mutation occurrence) for PV takes approximatively 1.5 years more than for ET to develop when the expansion time was quasi-similar. In conclusion, our multi-step approach and the ultimate time-dependent model of MPN emergence and development demonstrates that the emergence of a JAK2V617F mutation should be linked to an aging mechanism, and indicates a 8-9 years period of time to develop a full MPN.
The development of existing facial coding systems, such as the Facial Action Coding System (FACS), relied on manual examination of facial expression videos for defining Action Units (AUs). To overcome the labor-intensive nature of this process, we propose the unsupervised learning of an automated facial coding system by leveraging computer-vision-based facial keypoint tracking. In this novel facial coding system called the Data-driven Facial Expression Coding System (DFECS), the AUs are estimated by applying dimensionality reduction to facial keypoint movements from a neutral frame through a proposed Full Face Model (FFM). FFM employs a two-level decomposition using advanced dimensionality reduction techniques such as dictionary learning (DL) and non-negative matrix factorization (NMF). These techniques enhance the interpretability of AUs by introducing constraints such as sparsity and positivity to the encoding matrix. Results show that DFECS AUs estimated from the DISFA dataset can account for an average variance of up to 91.29 percent in test datasets (CK+ and BP4D-Spontaneous) and also surpass the variance explained by keypoint-based equivalents of FACS AUs in these datasets. Additionally, 87.5 percent of DFECS AUs are interpretable, i.e., align with the direction of facial muscle movements. In summary, advancements in automated facial coding systems can accelerate facial expression analysis across diverse fields such as security, healthcare, and entertainment. These advancements offer numerous benefits, including enhanced detection of abnormal behavior, improved pain analysis in healthcare settings, and enriched emotion-driven interactions. To facilitate further research, the code repository of DFECS has been made publicly accessible.
The advancement of industrialization has spurred the development of innovative swarm intelligence algorithms, with Lion Swarm Optimization (LSO) notable for its robustness, parallelism, simplicity, and efficiency. While LSO excels in single-objective optimization, its multi-objective variants face challenges such as poor initialization, local optima entrapment, and so on. This study proposes Dynamic Multi-Objective Lion Swarm Optimization with Multi-strategy Fusion (MF-DMOLSO) to address these limitations. MF-DMOLSO comprises three key components: initialization, swarm position update, and external archive update. The initialization unit employs chaotic mapping for uniform population distribution. The position update unit enhances behavior patterns and step size formulas for cub lions, incorporating crowding degree sorting, Pareto non-dominated sorting, and Levy flight to improve convergence speed and global search capabilities. Reference points guide convergence in higher-dimensional spaces, maintaining population diversity. An adaptive cold-hot start strategy generates a population responsive to environmental changes. The external archive update unit re-evaluates solutions based on non-domination and diversity to form the new population. Evaluations on benchmark functions showed MF-DMOLSO surpassed multi-objective particle swarm optimization, non-dominated sorting genetic algorithm II, and multi-objective lion swarm optimization, exceeding 90% accuracy for two-objective and 97% for three-objective problems. Compared to non-dominated sorting genetic algorithm III, MF-DMOLSO showed a 60% improvement. Applied to 6R robot trajectory planning, MF-DMOLSO optimized running time and maximum acceleration to 8.3s and 0.3pi rad/s^2, achieving a set coverage rate of 70.97% compared to 2% by multi-objective particle swarm optimization, thus improving efficiency and reducing mechanical dither.
Adversarial training methods commonly generate independent initial perturbation for adversarial samples from a simple uniform distribution, and obtain the training batch for the classifier without selection. In this work, we propose a simple yet effective training framework called Batch-in-Batch (BB) to enhance models robustness. It involves specifically a joint construction of initial values that could simultaneously generates $m$ sets of perturbations from the original batch set to provide more diversity for adversarial samples; and also includes various sample selection strategies that enable the trained models to have smoother losses and avoid overconfident outputs. Through extensive experiments on three benchmark datasets (CIFAR-10, SVHN, CIFAR-100) with two networks (PreActResNet18 and WideResNet28-10) that are used in both the single-step (Noise-Fast Gradient Sign Method, N-FGSM) and multi-step (Projected Gradient Descent, PGD-10) adversarial training, we show that models trained within the BB framework consistently have higher adversarial accuracy across various adversarial settings, notably achieving over a 13% improvement on the SVHN dataset with an attack radius of 8/255 compared to the N-FGSM baseline model. Furthermore, experimental analysis of the efficiency of both the proposed initial perturbation method and sample selection strategies validates our insights. Finally, we show that our framework is cost-effective in terms of computational resources, even with a relatively large value of $m$.
In the past few years, the emergence of pre-training models has brought uni-modal fields such as computer vision (CV) and natural language processing (NLP) to a new era. Substantial works have shown they are beneficial for downstream uni-modal tasks and avoid training a new model from scratch. So can such pre-trained models be applied to multi-modal tasks? Researchers have explored this problem and made significant progress. This paper surveys recent advances and new frontiers in vision-language pre-training (VLP), including image-text and video-text pre-training. To give readers a better overall grasp of VLP, we first review its recent advances from five aspects: feature extraction, model architecture, pre-training objectives, pre-training datasets, and downstream tasks. Then, we summarize the specific VLP models in detail. Finally, we discuss the new frontiers in VLP. To the best of our knowledge, this is the first survey on VLP. We hope that this survey can shed light on future research in the VLP field.
We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We will share our code based on the Timm library and pre-trained models.
Deep learning constitutes a recent, modern technique for image processing and data analysis, with promising results and large potential. As deep learning has been successfully applied in various domains, it has recently entered also the domain of agriculture. In this paper, we perform a survey of 40 research efforts that employ deep learning techniques, applied to various agricultural and food production challenges. We examine the particular agricultural problems under study, the specific models and frameworks employed, the sources, nature and pre-processing of data used, and the overall performance achieved according to the metrics used at each work under study. Moreover, we study comparisons of deep learning with other existing popular techniques, in respect to differences in classification or regression performance. Our findings indicate that deep learning provides high accuracy, outperforming existing commonly used image processing techniques.