In this contribution we provide initial findings to the problem of modeling fuzzy rating responses in a traditional probabilistic context. In particular, we study a probabilistic tree model with the aim of representing the stage-wise mechanisms of direct fuzzy rating scales. A Multinomial model coupled with a mixture of Binomial distributions is adopted to model the parameters of LR-type fuzzy responses whereas a binary decision tree is used for the stage-wise rating mechanism. Parameter estimation is performed via marginal maximum likelihood approach whereas the characteristics of the proposed model are evaluated by means of an application to a real dataset.
The stochastic nature of iterative optimization heuristics leads to inherently noisy performance measurements. Since these measurements are often gathered once and then used repeatedly, the number of collected samples will have a significant impact on the reliability of algorithm comparisons. We show that care should be taken when making decisions based on limited data. Particularly, we show that the number of runs used in many benchmarking studies, e.g., the default value of 15 suggested by the COCO environment, can be insufficient to reliably rank algorithms on well-known numerical optimization benchmarks. Additionally, methods for automated algorithm configuration are sensitive to insufficient sample sizes. This may result in the configurator choosing a `lucky' but poor-performing configuration despite exploring better ones. We show that relying on mean performance values, as many configurators do, can require a large number of runs to provide accurate comparisons between the considered configurations. Common statistical tests can greatly improve the situation in most cases but not always. We show examples of performance losses of more than 20%, even when using statistical races to dynamically adjust the number of runs, as done by irace. Our results underline the importance of appropriately considering the statistical distribution of performance values.
Removing noise from the any processed images is very important. Noise should be removed in such a way that important information of image should be preserved. A decisionbased nonlinear algorithm for elimination of band lines, drop lines, mark, band lost and impulses in images is presented in this paper. The algorithm performs two simultaneous operations, namely, detection of corrupted pixels and evaluation of new pixels for replacing the corrupted pixels. Removal of these artifacts is achieved without damaging edges and details. However, the restricted window size renders median operation less effective whenever noise is excessive in that case the proposed algorithm automatically switches to mean filtering. The performance of the algorithm is analyzed in terms of Mean Square Error [MSE], Peak-Signal-to-Noise Ratio [PSNR], Signal-to-Noise Ratio Improved [SNRI], Percentage Of Noise Attenuated [PONA], and Percentage Of Spoiled Pixels [POSP]. This is compared with standard algorithms already in use and improved performance of the proposed algorithm is presented. The advantage of the proposed algorithm is that a single algorithm can replace several independent algorithms which are required for removal of different artifacts.
This paper proposes a new technique based on a non-linear Minmax Detector Based (MDB) filter for image restoration. The aim of image enhancement is to reconstruct the true image from the corrupted image. The process of image acquisition frequently leads to degradation and the quality of the digitized image becomes inferior to the original image. Image degradation can be due to the addition of different types of noise in the original image. Image noise can be modelled of many types and impulse noise is one of them. Impulse noise generates pixels with gray value not consistent with their local neighbourhood. It appears as a sprinkle of both light and dark or only light spots in the image. Filtering is a technique for enhancing the image. Linear filter is the filtering in which the value of an output pixel is a linear combination of neighborhood values, which can produce blur in the image. Thus a variety of smoothing techniques have been developed that are non linear. Median filter is the one of the most popular non-linear filter. When considering a small neighborhood it is highly efficient but for large window and in case of high noise it gives rise to more blurring to image. The Centre Weighted Mean (CWM) filter has got a better average performance over the median filter. However the original pixel corrupted and noise reduction is substantial under high noise condition. Hence this technique has also blurring affect on the image. To illustrate the superiority of the proposed approach, the proposed new scheme has been simulated along with the standard ones and various restored performance measures have been compared.
Hierarchical Text Classification (HTC) is a challenging task where a document can be assigned to multiple hierarchically structured categories within a taxonomy. The majority of prior studies consider HTC as a flat multi-label classification problem, which inevitably leads to "label inconsistency" problem. In this paper, we formulate HTC as a sequence generation task and introduce a sequence-to-tree framework (Seq2Tree) for modeling the hierarchical label structure. Moreover, we design a constrained decoding strategy with dynamic vocabulary to secure the label consistency of the results. Compared with previous works, the proposed approach achieves significant and consistent improvements on three benchmark datasets.
The metriplectic formalism is useful for describing complete dynamical systems which conserve energy and produce entropy. This creates challenges for model reduction, as the elimination of high-frequency information will generally not preserve the metriplectic structure which governs long-term stability of the system. Based on proper orthogonal decomposition, a provably convergent metriplectic reduced-order model is formulated which is guaranteed to maintain the algebraic structure necessary for energy conservation and entropy formation. Numerical results on benchmark problems show that the proposed method is remarkably stable, leading to improved accuracy over long time scales at a moderate increase in cost over naive methods.
The increase and rapid growth of data produced by scientific instruments, the Internet of Things (IoT), and social media is causing data transfer performance and resource consumption to garner much attention in the research community. The network infrastructure and end systems that enable this extensive data movement use a substantial amount of electricity, measured in terawatt-hours per year. Managing energy consumption within the core networking infrastructure is an active research area, but there is a limited amount of work on reducing power consumption at the end systems during active data transfers. This paper presents a novel two-phase dynamic throughput and energy optimization model that utilizes an offline decision-search-tree based clustering technique to encapsulate and categorize historical data transfer log information and an online search optimization algorithm to find the best application and kernel layer parameter combination to maximize the achieved data transfer throughput while minimizing the energy consumption. Our model also incorporates an ensemble method to reduce aleatoric uncertainty in finding optimal application and kernel layer parameters during the offline analysis phase. The experimental evaluation results show that our decision-tree based model outperforms the state-of-the-art solutions in this area by achieving 117% higher throughput on average and also consuming 19% less energy at the end systems during active data transfers.
One of the most important problems in system identification and statistics is how to estimate the unknown parameters of a given model. Optimization methods and specialized procedures, such as Empirical Minimization (EM) can be used in case the likelihood function can be computed. For situations where one can only simulate from a parametric model, but the likelihood is difficult or impossible to evaluate, a technique known as the Two-Stage (TS) Approach can be applied to obtain reliable parametric estimates. Unfortunately, there is currently a lack of theoretical justification for TS. In this paper, we propose a statistical decision-theoretical derivation of TS, which leads to Bayesian and Minimax estimators. We also show how to apply the TS approach on models for independent and identically distributed samples, by computing quantiles of the data as a first step, and using a linear function as the second stage. The proposed method is illustrated via numerical simulations.
This paper is devoted to a practical method for ferroalloys consumption modeling and optimization. We consider the problem of selecting the optimal process control parameters based on the analysis of historical data from sensors. We developed approach, which predicts results of chemical reactions and give ferroalloys consumption recommendation. The main features of our method are easy interpretation and noise resistance. Our approach is based on k-means clustering algorithm, decision trees and linear regression. The main idea of the method is to identify situations where processes go similarly. For this, we propose using a k-means based dataset clustering algorithm and a classification algorithm to determine the cluster. This algorithm can be also applied to various technological processes, in this article, we demonstrate its application in metallurgy. To test the application of the proposed method, we used it to optimize ferroalloys consumption in Basic Oxygen Furnace steelmaking when finishing steel in a ladle furnace. The minimum required element content for a given steel grade was selected as the predictive model's target variable, and the required amount of the element to be added to the melt as the optimized variable. Keywords: Clustering, Machine Learning, Linear Regression, Steelmaking, Optimization, Gradient Boosting, Artificial Intelligence, Decision Trees, Recommendation services
We present a novel static analysis technique to derive higher moments for program variables for a large class of probabilistic loops with potentially uncountable state spaces. Our approach is fully automatic, meaning it does not rely on externally provided invariants or templates. We employ algebraic techniques based on linear recurrences and introduce program transformations to simplify probabilistic programs while preserving their statistical properties. We develop power reduction techniques to further simplify the polynomial arithmetic of probabilistic programs and define the theory of moment-computable probabilistic loops for which higher moments can precisely be computed. Our work has applications towards recovering probability distributions of random variables and computing tail probabilities. The empirical evaluation of our results demonstrates the applicability of our work on many challenging examples.
Recommender systems play a crucial role in mitigating the problem of information overload by suggesting users' personalized items or services. The vast majority of traditional recommender systems consider the recommendation procedure as a static process and make recommendations following a fixed strategy. In this paper, we propose a novel recommender system with the capability of continuously improving its strategies during the interactions with users. We model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedbacks. In particular, we introduce an online user-agent interacting environment simulator, which can pre-train and evaluate model parameters offline before applying the model online. Moreover, we validate the importance of list-wise recommendations during the interactions between users and agent, and develop a novel approach to incorporate them into the proposed framework LIRD for list-wide recommendations. The experimental results based on a real-world e-commerce dataset demonstrate the effectiveness of the proposed framework.