Most evolutionary algorithms have multiple parameters and their values drastically affect the performance. Due to the often complicated interplay of the parameters, setting these values right for a particular problem (parameter tuning) is a challenging task. This task becomes even more complicated when the optimal parameter values change significantly during the run of the algorithm since then a dynamic parameter choice (parameter control) is necessary. In this work, we propose a lazy but effective solution, namely choosing all parameter values (where this makes sense) in each iteration randomly from a suitably scaled power-law distribution. To demonstrate the effectiveness of this approach, we perform runtime analyses of the $(1+(\lambda,\lambda))$ genetic algorithm with all three parameters chosen in this manner. We show that this algorithm on the one hand can imitate simple hill-climbers like the $(1+1)$ EA, giving the same asymptotic runtime on problems like OneMax, LeadingOnes, or Minimum Spanning Tree. On the other hand, this algorithm is also very efficient on jump functions, where the best static parameters are very different from those necessary to optimize simple problems. We prove a performance guarantee that is comparable to the best performance known for static parameters. For the most interesting case that the jump size $k$ is constant, we prove that our performance is asymptotically better than what can be obtained with any static parameter choice. We complement our theoretical results with a rigorous empirical study confirming what the asymptotic runtime results suggest.
Agitation is one of the most prevalent symptoms in people with dementia (PwD) that can place themselves and the caregiver's safety at risk. Developing objective agitation detection approaches is important to support health and safety of PwD living in a residential setting. In a previous study, we collected multimodal wearable sensor data from 17 participants for 600 days and developed machine learning models for predicting agitation in one-minute windows. However, there are significant limitations in the dataset, such as imbalance problem and potential imprecise labels as the occurrence of agitation is much rarer in comparison to the normal behaviours. In this paper, we first implement different undersampling methods to eliminate the imbalance problem, and come to the conclusion that only 20\% of normal behaviour data are adequate to train a competitive agitation detection model. Then, we design a weighted undersampling method to evaluate the manual labeling mechanism given the ambiguous time interval (ATI) assumption. After that, the postprocessing method of cumulative class re-decision (CCR) is proposed based on the historical sequential information and continuity characteristic of agitation, improving the decision-making performance for the potential application of agitation detection system. The results show that a combination of undersampling and CCR improves F1-score and other metrics to varying degrees with less training time and data used, and inspires a way to find the potential range of optimal threshold reference for clinical purpose.
An inner-product Hilbert space formulation of the Kemeny distance is defined over the domain of all permutations with ties upon the extended real line, and results in an unbiased minimum variance (Gauss-Markov) correlation estimator upon a homogeneous i.i.d. sample. In this work, we construct and prove the necessary requirements to extend this linear topology for both Spearman's \(\rho\) and Kendall's \(\tau_{b}\), showing both spaces to be both biased and inefficient upon practical data domains. A probability distribution is defined for the Kemeny \(\tau_{\kappa}\) estimator, and a Studentisation adjustment for finite samples is provided as well. This work allows for a general purpose linear model duality to be identified as a unique consistent solution to many biased and unbiased estimation scenarios.
Since their introduction in Abadie and Gardeazabal (2003), Synthetic Control (SC) methods have quickly become one of the leading methods for estimating causal effects in observational studies in settings with panel data. Formal discussions often motivate SC methods by the assumption that the potential outcomes were generated by a factor model. Here we study SC methods from a design-based perspective, assuming a model for the selection of the treated unit(s) and period(s). We show that the standard SC estimator is generally biased under random assignment. We propose a Modified Unbiased Synthetic Control (MUSC) estimator that guarantees unbiasedness under random assignment and derive its exact, randomization-based, finite-sample variance. We also propose an unbiased estimator for this variance. We document in settings with real data that under random assignment, SC-type estimators can have root mean-squared errors that are substantially lower than that of other common estimators. We show that such an improvement is weakly guaranteed if the treated period is similar to the other periods, for example, if the treated period was randomly selected. While our results only directly apply in settings where treatment is assigned randomly, we believe that they can complement model-based approaches even for observational studies.
Pre-trained Language Models (LMs) have become an integral part of Natural Language Processing (NLP) in recent years, due to their superior performance in downstream applications. In spite of this resounding success, the usability of LMs is constrained by computational and time complexity, along with their increasing size; an issue that has been referred to as `overparameterisation'. Different strategies have been proposed in the literature to alleviate these problems, with the aim to create effective compact models that nearly match the performance of their bloated counterparts with negligible performance losses. One of the most popular techniques in this area of research is model distillation. Another potent but underutilised technique is cross-layer parameter sharing. In this work, we combine these two strategies and present MiniALBERT, a technique for converting the knowledge of fully parameterised LMs (such as BERT) into a compact recursive student. In addition, we investigate the application of bottleneck adapters for layer-wise adaptation of our recursive student, and also explore the efficacy of adapter tuning for fine-tuning of compact models. We test our proposed models on a number of general and biomedical NLP tasks to demonstrate their viability and compare them with the state-of-the-art and other existing compact models. All the codes used in the experiments are available at //github.com/nlpie-research/MiniALBERT. Our pre-trained compact models can be accessed from //huggingface.co/nlpie.
Developing efficient Bayesian computation algorithms for imaging inverse problems is challenging due to the dimensionality involved and because Bayesian imaging models are often not smooth. Current state-of-the-art methods often address these difficulties by replacing the posterior density with a smooth approximation that is amenable to efficient exploration by using Langevin Markov chain Monte Carlo (MCMC) methods. An alternative approach is based on data augmentation and relaxation, where auxiliary variables are introduced in order to construct an approximate augmented posterior distribution that is amenable to efficient exploration by Gibbs sampling. This paper proposes a new accelerated proximal MCMC method called latent space SK-ROCK (ls SK-ROCK), which tightly combines the benefits of the two aforementioned strategies. Additionally, instead of viewing the augmented posterior distribution as an approximation of the original model, we propose to consider it as a generalisation of this model. Following on from this, we empirically show that there is a range of values for the relaxation parameter for which the accuracy of the model improves, and propose a stochastic optimisation algorithm to automatically identify the optimal amount of relaxation for a given problem. In this regime, ls SK-ROCK converges faster than competing approaches from the state of the art, and also achieves better accuracy since the underlying augmented Bayesian model has a higher Bayesian evidence. The proposed methodology is demonstrated with a range of numerical experiments related to image deblurring and inpainting, as well as with comparisons with alternative approaches from the state of the art. An open-source implementation of the proposed MCMC methods is available from //github.com/luisvargasmieles/ls-MCMC.
We present a registration method for model reduction of parametric partial differential equations with dominating advection effects and moving features. Registration refers to the use of a parameter-dependent mapping to make the set of solutions to these equations more amicable for approximation using classical reduced basis methods. The proposed approach utilizes concepts from optimal transport theory, as we utilize Monge embeddings to construct these mappings in a purely data-driven way. The method relies on one interpretable hyper-parameter. We discuss how our approach relates to existing works that combine model order reduction and optimal transport theory. Numerical results are provided to demonstrate the effect of the registration. This includes a model problem where the solution is itself a probability density and one where it is not.
Parallel-in-time integration has been the focus of intensive research efforts over the past two decades due to the advent of massively parallel computer architectures and the scaling limits of purely spatial parallelization. Various iterative parallel-in-time (PinT) algorithms have been proposed, like Parareal, PFASST, MGRIT, and Space-Time Multi-Grid (STMG). These methods have been described using different notations, and the convergence estimates that are available are difficult to compare. We describe Parareal, PFASST, MGRIT and STMG for the Dahlquist model problem using a common notation and give precise convergence estimates using generating functions. This allows us, for the first time, to directly compare their convergence. We prove that all four methods eventually converge super-linearly, and also compare them numerically. The generating function framework provides further opportunities to explore and analyze existing and new methods.
We study numerical integration over bounded regions in $\mathbb{R}^s, s\ge1$ with respect to some probability measure. We replace random sampling with quasi-Monte Carlo methods, where the underlying point set is derived from deterministic constructions that aim to fill the space more evenly than random points. Such quasi-Monte Carlo point sets are ordinarily designed for the uniform measure, and the theory only works for product measures when a coordinate-wise transformation is applied. Going beyond this setting, we first consider the case where the target density is a mixture distribution where each term in the mixture comes from a product distribution. Next we consider target densities which can be approximated with such mixture distributions. We require the approximation to be a sum of coordinate-wise products and the approximation to be positive everywhere (so that they can be re-scaled to probability density functions). We use tensor product hat function approximations for this purpose here, since a hat function approximation of a positive function is itself positive. We also study more complex algorithms, where we first approximate the target density with a general Gaussian mixture distribution and approximate the mixtures with an adaptive hat function approximation on rotated intervals. The Gaussian mixture approximation allows us to locate the essential parts of the target density, whereas the adaptive hat function approximation allows us to approximate the finer structure of the target density. We prove convergence rates for each of the integration techniques based on quasi-Monte Carlo sampling for integrands with bounded partial mixed derivatives. The employed algorithms are based on digital $(t,s)$-sequences over the finite field $\mathbb{F}_2$ and an inversion method. Numerical examples illustrate the performance of the algorithms for some target densities and integrands.
We consider sequential state and parameter learning in state-space models with intractable state transition and observation processes. By exploiting low-rank tensor-train (TT) decompositions, we propose new sequential learning methods for joint parameter and state estimation under the Bayesian framework. Our key innovation is the introduction of scalable function approximation tools such as TT for recursively learning the sequentially updated posterior distributions. The function approximation perspective of our methods offers tractable error analysis and potentially alleviates the particle degeneracy faced by many particle-based methods. In addition to the new insights into algorithmic design, our methods complement conventional particle-based methods. Our TT-based approximations naturally define conditional Knothe--Rosenblatt (KR) rearrangements that lead to filtering, smoothing and path estimation accompanying our sequential learning algorithms, which open the door to removing potential approximation bias. We also explore several preconditioning techniques based on either linear or nonlinear KR rearrangements to enhance the approximation power of TT for practical problems. We demonstrate the efficacy and efficiency of our proposed methods on several state-space models, in which our methods achieve state-of-the-art estimation accuracy and computational performance.
The majority of current salient object detection (SOD) models are focused on designing a series of decoders based on fully convolutional networks (FCNs) or Transformer architectures and integrating them in a skillful manner. These models have achieved remarkable high performance and made significant contributions to the development of SOD. Their primary research objective is to develop novel algorithms that can outperform state-of-the-art models, a task that is extremely difficult and time-consuming. In contrast, this paper proposes a positive feedback method based on F-measure value for SOD, aiming to improve the accuracy of saliency prediction using existing methods. Specifically, our proposed method takes an image to be detected and inputs it into several existing models to obtain their respective prediction maps. These prediction maps are then fed into our positive feedback method to generate the final prediction result, without the need for careful decoder design or model training. Moreover, our method is adaptive and can be implemented based on existing models without any restrictions. Experimental results on five publicly available datasets show that our proposed positive feedback method outperforms the latest 12 methods in five evaluation metrics for saliency map prediction. Additionally, we conducted a robustness experiment, which shows that when at least one good prediction result exists in the selected existing model, our proposed approach can ensure that the prediction result is not worse. Our approach achieves a prediction speed of 20 frames per second (FPS) when evaluated on a low configuration host and after removing the prediction time overhead of inserted models. These results highlight the effectiveness, efficiency, and robustness of our proposed approach for salient object detection.