清纯唯美另类亚洲欧美综合,中文字幕AV一区二区精品,欧美精品国产极品原创综合日韩,制服丝袜日韩欧美一区二区

The modeling and analysis of degradation data have been an active research area in reliability and system health management. As the senor technology advances, multivariate sensory data are commonly collected for the underlying degradation process. However, most existing research on degradation modeling requires a univariate degradation index to be provided. Thus, constructing a degradation index for multivariate sensory data is a fundamental step in degradation modeling. In this paper, we propose a novel degradation index building method for multivariate sensory data. Based on an additive nonlinear model with variable selection, the proposed method can automatically select the most informative sensor signals to be used in the degradation index. The penalized likelihood method with adaptive group penalty is developed for parameter estimation. We demonstrate that the proposed method outperforms existing methods via both simulation studies and analyses of the NASA jet engine sensor data.

相關內容

MoDELS

關注 43

ACM/IEEE第23屆模型驅動工程語言和系統國際會議，是模型驅動軟件和系統工程的首要會議系列，由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來，模型涵蓋了建模的各個方面，從語言和方法到工具和應用程序。模特的參加者來自不同的背景，包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇，參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會，并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。官網鏈接： · 推斷 · Performer · 統計量 · 優化器 ·

2021 年 12 月 16 日

Profile Matching for the Generalization and Personalization of Causal Inferences

Eric R. Cohn,Jose R. Zubizarreta

We introduce profile matching, a multivariate matching method for randomized experiments and observational studies that finds the largest possible unweighted samples across multiple treatment groups that are balanced relative to a covariate profile. This covariate profile can represent a specific population or a target individual, facilitating the tasks of generalization and personalization of causal inferences. For generalization, because the profile often amounts to summary statistics for a target population, profile matching does not require accessing individual-level data, which may be unavailable for confidentiality reasons. For personalization, the profile characterizes a single individual. Profile matching achieves covariate balance by construction, but unlike existing approaches to matching, it does not require specifying a matching ratio, as this is implicitly optimized for the data. The method can also be used for the selection of units for study follow-up, and it readily applies to multi-valued treatments with many treatment categories. We evaluate the performance of profile matching in a simulation study of generalization of a randomized trial to a target population. We further illustrate this method in an exploratory observational study of the relationship between opioid use and mental health outcomes. We analyze these relationships for three covariate profiles representing: (i) sexual minorities, (ii) the Appalachian United States, and (iii) a hypothetical vulnerable patient. We provide R code with step-by-step explanations to implement the methods in the paper in the Supplementary Materials.

估計/估計量 · 得分 · Extensibility · 秩 · MoDELS ·

2021 年 12 月 15 日

A Targeted Approach to Confounder Selection for High-Dimensional Data

Asad Haris,Robert Platt

We consider the problem of selecting confounders for adjustment from a potentially large set of covariates, when estimating a causal effect. Recently, the high-dimensional Propensity Score (hdPS) method was developed for this task; hdPS ranks potential confounders by estimating an importance score for each variable and selects the top few variables. However, this ranking procedure is limited: it requires all variables to be binary. We propose an extension of the hdPS to general types of response and confounder variables. We further develop a group importance score, allowing us to rank groups of potential confounders. The main challenge is that our parameter requires either the propensity score or response model; both vulnerable to model misspecification. We propose a targeted maximum likelihood estimator (TMLE) which allows the use of nonparametric, machine learning tools for fitting these intermediate models. We establish asymptotic normality of our estimator, which consequently allows constructing confidence intervals. We complement our work with numerical studies on simulated and real data.

特化 · 優化器 · 模型評估 · 可約的 · 連續優化 ·

2021 年 12 月 15 日

On multivariate randomized classification trees: $l_0$-based sparsity, VC~dimension and decomposition methods

Edoardo Amaldi,Antonio Consolo,Andrea Manno

Decision trees are widely-used classification and regression models because of their interpretability and good accuracy. Classical methods such as CART are based on greedy approaches but a growing attention has recently been devoted to optimal decision trees. We investigate the nonlinear continuous optimization formulation proposed in Blanquero et al. (EJOR, vol. 284, 2020; COR, vol. 132, 2021) for (sparse) optimal randomized classification trees. Sparsity is important not only for feature selection but also to improve interpretability. We first consider alternative methods to sparsify such trees based on concave approximations of the $l_{0}$ ``norm". Promising results are obtained on 24 datasets in comparison with $l_1$ and $l_{\infty}$ regularizations. Then, we derive bounds on the VC dimension of multivariate randomized classification trees. Finally, since training is computationally challenging for large datasets, we propose a general decomposition scheme and an efficient version of it. Experiments on larger datasets show that the proposed decomposition method is able to significantly reduce the training times without compromising the accuracy.

磁流變材料 · 估計/估計量 · 無偏 · 劃分 · MoDELS ·

2021 年 12 月 15 日

Bayesian Mendelian randomization with study heterogeneity and data partitioning for large studies

Linyi Zou,Hui Guo,Carlo Berzuini

Background: Mendelian randomization (MR) is a useful approach to causal inference from observational studies when randomised controlled trials are not feasible. However, study heterogeneity of two association studies required in MR is often overlooked. When dealing with large studies, recently developed Bayesian MR is limited by its computational expensiveness. Methods: We addressed study heterogeneity by proposing a random effect Bayesian MR model with multiple exposures and outcomes. For large studies, we adopted a subset posterior aggregation method to tackle the problem of computation. In particular, we divided data into subsets and combine estimated subset causal effects obtained from the subsets". The performance of our method was evaluated by a number of simulations, in which part of exposure data was missing. Results: Random effect Bayesian MR outperformed conventional inverse-variance weighted estimation, whether the true causal effects are zero or non-zero. Data partitioning of large studies had little impact on variations of the estimated causal effects, whereas it notably affected unbiasedness of the estimates with weak instruments and high missing rate of data. Our simulation results indicate that data partitioning is a good way of improving computational efficiency, for little cost of decrease in unbiasedness of the estimates, as long as the sample size of subsets is reasonably large. Conclusions: We have further advanced Bayesian MR by including random effects to explicitly account for study heterogeneity. We also adopted a subset posterior aggregation method to address the issue of computational expensiveness of MCMC, which is important especially when dealing with large studies. Our proposed work is likely to pave the way for more general model settings, as Bayesian approach itself renders great flexibility in model constructions.

劃分 · 可約的 · 值域 · INFORMS · state-of-the-art ·

2021 年 12 月 14 日

HINT: A Hierarchical Index for Intervals in Main Memory

George Christodoulou,Panagiotis Bouros,Nikos Mamoulis

Indexing intervals is a fundamental problem, finding a wide range of applications. Recent work on managing large collections of intervals in main memory focused on overlap joins and temporal aggregation problems. In this paper, we propose novel and efficient in-memory indexing techniques for intervals, with a focus on interval range queries, which are a basic component of many search and analysis tasks. First, we propose an optimized version of a single-level (flat) domain-partitioning approach, which may have large space requirements due to excessive replication. Then, we propose a hierarchical partitioning approach, which assigns each interval to at most two partitions per level and has controlled space requirements. Novel elements of our techniques include the division of the intervals at each partition into groups based on whether they begin inside or before the partition boundaries, reducing the information stored at each partition to the absolutely necessary, and the effective handling of data sparsity and skew. Experimental results on real and synthetic interval sets of different characteristics show that our approaches are typically one order of magnitude faster than the state-of-the-art.

Weight · 估計/估計量 · 無偏 · 統計量 · 圖 ·

2021 年 12 月 14 日

Randomized Response Mechanisms for Differential Privacy Data Analysis: Bounds and Applications

Fei Ma,Ping Wang

Randomized response, as a basic building-block for differentially private mechanism, has given rise to great interest and found various potential applications in science communities. In this work, we are concerned with three-elements randomized response (RR$_{3}$) along with relevant applications to the analysis of weighted bipartite graph upon differentially private guarantee. We develop a principled framework for estimating statistics produced by RR$_{3}$-based mechanisms, and then prove the corresponding estimations to be unbiased. At the same time, we study in detail several fundamental and significant members in RR$_{3}$ family, and derive the closed-form solutions to unbiased estimations. Next, we show potential applications of several RR$_{3}$-based mechanisms into the estimation of average degree and average weighted value on weighted bipartite graph when requiring local differential privacy guarantee. In the meantime, we determine the lower bounds for choice of relevant parameters by minimizing variance of statistics in order to design optimal RR$_{3}$-based local differential private mechanisms, with which we optimize previous protocols in the literature and put forward a version that achieves the tight bound. Last but most importantly, we observe that in the analysis of relational data such as weighted bipartite graph, a portion of privacy budget in local differential private mechanism is sometimes "consumed" by mechanism itself accidentally, resulting to a more stronger privacy guarantee than we would get by simply sequential compositions.

MoDELS · 可辨認的 · 估計/估計量 · 大學 · 數據庫 ·

2021 年 12 月 14 日

Answering Queries with Negation over Existential Rules

Stefan Ellmauthaler,Markus Kr?tzsch,Stephan Mennicke

from arxiv, Technical report of our AAAI 2022 paper

Ontology-based query answering with existential rules is well understood and implemented for positive queries, in particular conjunctive queries. The situation changes drastically for queries with negation, where there is no agreed-upon semantics or standard implementation. Stratification, as used for Datalog, is not enough for existential rules, since the latter still admit multiple universal models that can differ on negative queries. We therefore propose universal core models as a basis for a meaningful (non-monotonic) semantics for queries with negation. Since cores are hard to compute, we identify syntactic descriptions of queries that can equivalently be answered over other types of models. This leads to fragments of queries with negation that can safely be evaluated by current chase implementations. We establish new techniques to estimate how the core model differs from other universal models, and we incorporate our findings into a new reasoning approach for existential rules with negation.

entity · 特征選擇 · 情景 · 秩 · 相似度 ·

2019 年 10 月 17 日

SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble

Jiaming Shen,Zeqiu Wu,Dongming Lei,Jingbo Shang,Xiang Ren,Jiawei Han

from arxiv, ECMLPKDD 2017 accepted

Corpus-based set expansion (i.e., finding the "complete" set of entities belonging to the same semantic class, based on a given corpus and a tiny set of seeds) is a critical task in knowledge discovery. It may facilitate numerous downstream applications, such as information extraction, taxonomy induction, question answering, and web search. To discover new entities in an expanded set, previous approaches either make one-time entity ranking based on distributional similarity, or resort to iterative pattern-based bootstrapping. The core challenge for these methods is how to deal with noisy context features derived from free-text corpora, which may lead to entity intrusion and semantic drifting. In this study, we propose a novel framework, SetExpan, which tackles this problem, with two techniques: (1) a context feature selection method that selects clean context features for calculating entity-entity distributional similarity, and (2) a ranking-based unsupervised ensemble method for expanding entity set based on denoised context features. Experiments on three datasets show that SetExpan is robust and outperforms previous state-of-the-art methods in terms of mean average precision.

異常檢測 · 生成式對抗網絡 · Networking · 判別器 · Machine Learning ·

2019 年 1 月 15 日

MAD-GAN: Multivariate Anomaly Detection for Time Series Data with Generative Adversarial Networks

Dan Li,Dacheng Chen,Lei Shi,Baihong Jin,Jonathan Goh,See-Kiong Ng

from arxiv, This is a pre-print of an on-going work. arXiv admin note: text overlap with arXiv:1809.04758

The prevalence of networked sensors and actuators in many real-world systems such as smart buildings, factories, power plants, and data centers generate substantial amounts of multivariate time series data for these systems. The rich sensor data can be continuously monitored for intrusion events through anomaly detection. However, conventional threshold-based anomaly detection methods are inadequate due to the dynamic complexities of these systems, while supervised machine learning methods are unable to exploit the large amounts of data due to the lack of labeled data. On the other hand, current unsupervised machine learning approaches have not fully exploited the spatial-temporal correlation and other dependencies amongst the multiple variables (sensors/actuators) in the system for detecting anomalies. In this work, we propose an unsupervised multivariate anomaly detection method based on Generative Adversarial Networks (GANs). Instead of treating each data stream independently, our proposed MAD-GAN framework considers the entire variable set concurrently to capture the latent interactions amongst the variables. We also fully exploit both the generator and discriminator produced by the GAN, using a novel anomaly score called DR-score to detect anomalies by discrimination and reconstruction. We have tested our proposed MAD-GAN using two recent datasets collected from real-world CPS: the Secure Water Treatment (SWaT) and the Water Distribution (WADI) datasets. Our experimental results showed that the proposed MAD-GAN is effective in reporting anomalies caused by various cyber-intrusions compared in these complex real-world systems.

Performer · 特征選擇 · tuning · 可約的 · Continuity ·

2018 年 8 月 7 日

Efficient and Effective $L_0$ Feature Selection

Ana Kenney,Francesca Chiaromonte,Giovanni Felici

from arxiv, This work has been presented at JSM 2018 (Vancouver, Canada), INSPS 2018 (Salerno, Italy), and various other conferences

Because of continuous advances in mathematical programing, Mix Integer Optimization has become a competitive vis-a-vis popular regularization method for selecting features in regression problems. The approach exhibits unquestionable foundational appeal and versatility, but also poses important challenges. We tackle these challenges, reducing computational burden when tuning the sparsity bound (a parameter which is critical for effectiveness) and improving performance in the presence of feature collinearity and of signals that vary in nature and strength. Importantly, we render the approach efficient and effective in applications of realistic size and complexity - without resorting to relaxations or heuristics in the optimization, or abandoning rigorous cross-validation tuning. Computational viability and improved performance in subtler scenarios is achieved with a multi-pronged blueprint, leveraging characteristics of the Mixed Integer Programming framework and by means of whitening, a data pre-processing step.