姑娘日本电影免费观看全集中文,激情欧美综合

We introduce a new statistical test based on the observed spacings of ordered data. The statistic is sensitive to detect non-uniformity in random samples, or short-lived features in event time series. Under some conditions, this new test can outperform existing ones, such as the well known Kolmogorov-Smirnov or Anderson-Darling tests, in particular when the number of samples is small and differences occur over a small quantile of the null hypothesis distribution. A detailed description of the test statistic is provided including a detailed discussion of the parameterization of its distribution via asymptotic bootstrapping as well as a novel per-quantile error estimation of the empirical distribution. Two example applications are provided, using the test to boost the sensitivity in generic "bump hunting", and employing the test to detect supernovae. The article is rounded off with an extended performance comparison to other, established goodness-of-fit tests.

相關內容

統計量

關注 3

Weight · Integration · Analysis · 平滑 · 在線 ·

2022 年 12 月 13 日

The leaky integrator that could: Or recursive polynomial regression for online signal analysis

Hugh L Kennedy

from arxiv, Added approximate calculation of bandwidth (i.e. frequency dispersion) for an Erlang window from its variance in the time domain (see Table 6)

Fitting a local polynomial model to a noisy sequence of uniformly sampled observations or measurements (i.e. regressing) by minimizing the sum of weighted squared errors (i.e. residuals) may be used to design digital filters for a diverse range of signal-analysis problems, such as detection, classification and tracking, in biomedical, financial, and aerospace applications, for instance. Furthermore, the recursive realization of such filters, using a network of so-called leaky integrators, yields simple digital components with a low computational complexity and an infinite impulse response (IIR) that are ideal in embedded online sensing systems with high data rates. Target tracking, pulse-edge detection, peak detection and anomaly/change detection are considered in this tutorial as illustrative examples. Erlang-weighted polynomial regression provides a design framework within which the various design trade-offs of state estimators (e.g. bias errors vs. random errors) and IIR smoothers (e.g. frequency isolation vs. time localization) may be intuitively balanced. Erlang weights are configured using a smoothing parameter which determines the decay rate of the exponential tail; and a shape parameter which may be used to discount more recent data, so that a greater relative emphasis is placed on a past time interval. In Morrison's 1969 treatise on sequential smoothing and prediction, the exponential weight (i.e. the zero shape-parameter case) and the Laguerre polynomials that are orthogonal with respect to this weight, are described in detail; however, more general Erlang weights and the resulting associated Laguerre polynomials are not considered there, nor have they been covered in detail elsewhere since. Thus, one of the purposes of this tutorial is to explain how Erlang weights may be used to shape and improve the response of recursive regression filters.

評論員 · 操作 · 近似 · 講稿 · 數學 ·

2022 年 12 月 12 日

On the Number of Maintenance Cycles in Systems with Critical and Non-Critical Components

Guanchen Li,Dimitri Kagaris

from arxiv, submitted to IEEE Transactions, in press

We present a novel mathematical framework for computing the number of maintenance cycles in a system with critical and non-critical components, where "critical" (CR) means that the component's failure is fatal for the system's operation and renders any more repairs inapplicable, whereas "noncritical" (NC) means that the component can undergo corrective maintenance (replacement or minimal repair) whenever it fails, provided that the CR component is still in operation. Whenever the NC component fails, the CR component can optionally be preventively replaced. We extend traditional renewal theory (whether classical or generalized) for various maintenance scenarios for a system composed of one CR and one NC component in order to compute the average number of renewals of NC under the restriction ("bound") necessitated by CR. We also develop approximations in closed form for the proposed "bounded" renewal functions. We validate our formulas by simulations on a variety of component lifetime distributions, including actual lifetime distributions of wind turbine components.

穩健性 · 估計/估計量 · 統計量 · 推斷 · 總回報 ·

2022 年 12 月 11 日

Robust Estimation and Inference for Expected Shortfall Regression with Many Regressors

Xuming He,Kean Ming Tan,Wen-Xin Zhou

Expected Shortfall (ES), also known as superquantile or Conditional Value-at-Risk, has been recognized as an important measure in risk analysis and stochastic optimization, and is also finding applications beyond these areas. In finance, it refers to the conditional expected return of an asset given that the return is below some quantile of its distribution. In this paper, we consider a recently proposed joint regression framework that simultaneously models the quantile and the ES of a response variable given a set of covariates, for which the state-of-the-art approach is based on minimizing a joint loss function that is non-differentiable and non-convex. This inevitably raises numerical challenges and limits its applicability for analyzing large-scale data. Motivated by the idea of using Neyman-orthogonal scores to reduce sensitivity with respect to nuisance parameters, we propose a statistically robust (to highly skewed and heavy-tailed data) and computationally efficient two-step procedure for fitting joint quantile and ES regression models. With increasing covariate dimensions, we establish explicit non-asymptotic bounds on estimation and Gaussian approximation errors, which lay the foundation for statistical inference. Finally, we demonstrate through numerical experiments and two data applications that our approach well balances robustness, statistical, and numerical efficiencies for expected shortfall regression.

Markov · 馬爾可夫鏈 · Learning · 泛函 · Networks ·

2022 年 12 月 11 日

Neural Continuous-Time Markov Models

Majerle Reeves,Harish S. Bhat

from arxiv, 8 pages, 6 figures

Continuous-time Markov chains are used to model stochastic systems where transitions can occur at irregular times, e.g., birth-death processes, chemical reaction networks, population dynamics, and gene regulatory networks. We develop a method to learn a continuous-time Markov chain's transition rate functions from fully observed time series. In contrast with existing methods, our method allows for transition rates to depend nonlinearly on both state variables and external covariates. The Gillespie algorithm is used to generate trajectories of stochastic systems where propensity functions (reaction rates) are known. Our method can be viewed as the inverse: given trajectories of a stochastic reaction network, we generate estimates of the propensity functions. While previous methods used linear or log-linear methods to link transition rates to covariates, we use neural networks, increasing the capacity and potential accuracy of learned models. In the chemical context, this enables the method to learn propensity functions from non-mass-action kinetics. We test our method with synthetic data generated from a variety of systems with known transition rates. We show that our method learns these transition rates with considerably more accuracy than log-linear methods, in terms of mean absolute error between ground truth and predicted transition rates. We also demonstrate an application of our methods to open-loop control of a continuous-time Markov chain.

MoDELS · 線性的 · 輸出 · 估計/估計量 · 可辨認的 ·

2022 年 12 月 10 日

What Makes A Good Fisherman? Linear Regression under Self-Selection Bias

Yeshwanth Cherapanamjeri,Constantinos Daskalakis,Andrew Ilyas,Manolis Zampetakis

In the classical setting of self-selection, the goal is to learn $k$ models, simultaneously from observations $(x^{(i)}, y^{(i)})$ where $y^{(i)}$ is the output of one of $k$ underlying models on input $x^{(i)}$. In contrast to mixture models, where we observe the output of a randomly selected model, here the observed model depends on the outputs themselves, and is determined by some known selection criterion. For example, we might observe the highest output, the smallest output, or the median output of the $k$ models. In known-index self-selection, the identity of the observed model output is observable; in unknown-index self-selection, it is not. Self-selection has a long history in Econometrics and applications in various theoretical and applied fields, including treatment effect estimation, imitation learning, learning from strategically reported data, and learning from markets at disequilibrium. In this work, we present the first computationally and statistically efficient estimation algorithms for the most standard setting of this problem where the models are linear. In the known-index case, we require poly$(1/\varepsilon, k, d)$ sample and time complexity to estimate all model parameters to accuracy $\varepsilon$ in $d$ dimensions, and can accommodate quite general selection criteria. In the more challenging unknown-index case, even the identifiability of the linear models (from infinitely many samples) was not known. We show three results in this case for the commonly studied $\max$ self-selection criterion: (1) we show that the linear models are indeed identifiable, (2) for general $k$ we provide an algorithm with poly$(d) \exp(\text{poly}(k))$ sample and time complexity to estimate the regression parameters up to error $1/\text{poly}(k)$, and (3) for $k = 2$ we provide an algorithm for any error $\varepsilon$ and poly$(d, 1/\varepsilon)$ sample and time complexity.

Analysis · 多樣性 · Unstructured · CASES · 參數空間 ·

2022 年 12 月 9 日

A reversed form of public goods game: equivalence and difference

Chaoqian Wang,Attila Szolnoki

from arxiv, 30 pages, 11 figures, accepted for publication in New Journal of Physics

According to the public goods game (PGG) protocol, participants decide freely whether they want to contribute to a common pool or not, but the resulting benefit is distributed equally. A conceptually similar dilemma situation may emerge when participants consider if they claim a common resource but the related cost is covered equally by all group members. The latter establishes a reversed form of the original public goods game (R-PGG). In this work, we show that R-PGG is equivalent to PGG in several circumstances, starting from the traditional analysis, via the evolutionary approach in unstructured populations, to Monte Carlo simulations in structured populations. However, there are also cases when the behavior of R-PGG could be surprisingly different from the outcome of PGG. When the key parameters are heterogeneous, for instance, the results of PGG and R-PGG could be diverse even if we apply the same amplitudes of heterogeneity. We find that the heterogeneity in R-PGG generally impedes cooperation, while the opposite is observed for PGG. These diverse system reactions can be understood if we follow how payoff functions change when introducing heterogeneity in the parameter space. This analysis also reveals the distinct roles of cooperator and defector strategies in the mentioned games. Our observations may hopefully stimulate further research to check the potential differences between PGG and R-PGG due to the alternative complexity of conditions.

泛函 · 正則的 · 估計/估計量 · 變換 · 近似 ·

2022 年 12 月 9 日

Polynomial Distributions and Transformations

Yue Yu,Pavel Loskot

from arxiv, 21 pages, no figures

Polynomials are common algebraic structures, which are often used to approximate functions including probability distributions. This paper proposes to directly define polynomial distributions in order to describe stochastic properties of systems rather than to assume polynomials for only approximating known or empirically estimated distributions. Polynomial distributions offer a great modeling flexibility, and often, also mathematical tractability. However, unlike canonical distributions, polynomial functions may have non-negative values in the interval of support for some parameter values, the number of their parameters is usually much larger than for canonical distributions, and the interval of support must be finite. In particular, polynomial distributions are defined here assuming three forms of polynomial function. The transformation of polynomial distributions and fitting a histogram to a polynomial distribution are considered. The key properties of polynomial distributions are derived in closed-form. A piecewise polynomial distribution construction is devised to ensure that it is non-negative over the support interval. Finally, the problems of estimating parameters of polynomial distributions and generating polynomially distributed samples are also studied.

Automator · CASES · Performer · Processing（編程語言） · 可辨認的 ·

2022 年 12 月 9 日

Industrially Applicable System Regression Test Prioritization in Production Automation

Sebastian Ulewicz,Birgit Vogel-Heuser

from arxiv, 13 pages, //ieeexplore.ieee.org/abstract/document/8320514/

When changes are performed on an automated production system (aPS), new faults can be accidentally introduced in the system, which are called regressions. A common method for finding these faults is regression testing. In most cases, this regression testing process is performed under high time pressure and on-site in a very uncomfortable environment. Until now, there is no automated support for finding and prioritizing system test cases regarding the fully integrated aPS that are suitable for finding regressions. Thus, the testing technician has to rely on personal intuition and experience, possibly choosing an inappropriate order of test cases, finding regressions at a very late stage of the test run. Using a suitable prioritization, this iterative process of finding and fixing regressions can be streamlined and a lot of time can be saved by executing test cases likely to identify new regressions earlier. Thus, an approach is presented in this paper that uses previously acquired runtime data from past test executions and performs a change identification and impact analysis to prioritize test cases that have a high probability to unveil regressions caused by side effects of a system change. The approach was developed in cooperation with reputable industrial partners active in the field of aPS engineering, ensuring a development in line with industrial requirements. An industrial case study and an expert evaluation were performed, showing promising results.

簇 · 線性的 · 平滑 · CC · HTTPS ·

2022 年 12 月 9 日

Systematically and efficiently improving $k$-means initialization by pairwise-nearest-neighbor smoothing

Carlo Baldassi

from arxiv, //openreview.net/forum?id=FTtFAg3pek 16 pages (+8 appendix), 2 figures, 4 tables (+14 appendix). Transactions on Machine Learning Research, Dec 2022

We present a meta-method for initializing (seeding) the $k$-means clustering algorithm called PNN-smoothing. It consists in splitting a given dataset into $J$ random subsets, clustering each of them individually, and merging the resulting clusterings with the pairwise-nearest-neighbor (PNN) method. It is a meta-method in the sense that when clustering the individual subsets any seeding algorithm can be used. If the computational complexity of that seeding algorithm is linear in the size of the data $N$ and the number of clusters $k$, PNN-smoothing is also almost linear with an appropriate choice of $J$, and quite competitive in practice. We show empirically, using several existing seeding methods and testing on several synthetic and real datasets, that this procedure results in systematically better costs. In particular, our method of enhancing $k$-means++ seeding proves superior in both effectiveness and speed compared to the popular "greedy" $k$-means++ variant. Our implementation is publicly available at //github.com/carlobaldassi/KMeansPNNSmoothing.jl.

樣本 · 類別 · 損失 · Performer · SimPLe ·

2019 年 1 月 16 日

Class-Balanced Loss Based on Effective Number of Samples

Yin Cui,Menglin Jia,Tsung-Yi Lin,Yang Song,Serge Belongie

from arxiv, Code is available at: //github.com/richardaecn/class-balanced-loss

With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-\beta^{n})/(1-\beta)$, where $n$ is the number of samples and $\beta \in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.