斗破苍穹第四季25集免费观看_亚洲国产中文精品在线观看香蕉_YY6080新视觉伦午夜无码_久草性爱中文黄片免播_欧美国产日产网站_夜夜嗨国产一区二区三区网_亚洲AV无码国产精品三区

We here propose a machine learning approach for monitoring particle detectors in real-time. The goal is to assess the compatibility of incoming experimental data with a reference dataset, characterising the data behaviour under normal circumstances, via a likelihood-ratio hypothesis test. The model is based on a modern implementation of kernel methods, nonparametric algorithms that can learn any continuous function given enough data. The resulting approach is efficient and agnostic to the type of anomaly that may be present in the data. Our study demonstrates the effectiveness of this strategy on multivariate data from drift tube chamber muon detectors.

相關內容

核化

關注 1

INFORMS · 得分 · 大數據 · 優化器 · 求逆 ·

2023 年 5 月 2 日

On the selection of optimal subdata for big data regression based on leverage scores

Vasilis Chasiotis,Dimitris Karlis

from arxiv, arXiv admin note: text overlap with arXiv:2305.00218

Regression can be really difficult in case of big datasets, since we have to dealt with huge volumes of data. The demand of computational resources for the modeling process increases as the scale of the datasets does, since traditional approaches for regression involve inverting huge data matrices. The main problem relies on the large data size, and so a standard approach is subsampling that aims at obtaining the most informative portion of the big data. In the current paper we consider an approach based on leverages scores, already existing in the current literature. The aforementioned approach proposed in order to select subdata for linear model discrimination. However, we highlight its importance on the selection of data points that are the most informative for estimating unknown parameters. We conclude that the approach based on leverage scores improves existing approaches, providing simulation experiments as well as a real data application.

樣本 · 自適應采樣 · Performance · 跡 · INFORMS ·

2023 年 5 月 1 日

Software Runtime Monitoring with Adaptive Sampling Rate to Collect Representative Samples of Execution Traces

Jhonny Mertz,Ingrid Nunes

from arxiv, in Journal of Systems and Software

Monitoring software systems at runtime is key for understanding workloads, debugging, and self-adaptation. It typically involves collecting and storing observable software data, which can be analyzed online or offline. Despite the usefulness of collecting system data, it may significantly impact the system execution by delaying response times and competing with system resources. The typical approach to cope with this is to filter portions of the system to be monitored and to sample data. Although these approaches are a step towards achieving a desired trade-off between the amount of collected information and the impact on the system performance, they focus on collecting data of a particular type or may capture a sample that does not correspond to the actual system behavior. In response, we propose an adaptive runtime monitoring process to dynamically adapt the sampling rate while monitoring software systems. It includes algorithms with statistical foundations to improve the representativeness of collected samples without compromising the system performance. Our evaluation targets five applications of a widely used benchmark. It shows that the error (RMSE) of the samples collected with our approach is 9-54% lower than the main alternative strategy (sampling rate inversely proportional to the throughput), with 1-6% higher performance impact.

TOOLS · Performer · 標注 · 數據集 · Automator ·

2023 年 5 月 1 日

Simplistic Collection and Labeling Practices Limit the Utility of Benchmark Datasets for Twitter Bot Detection

Chris Hays,Zachary Schutzman,Manish Raghavan,Erin Walk,Philipp Zimmer

from arxiv, 10 pages, 6 figures; updated citation, clarified language

Accurate bot detection is necessary for the safety and integrity of online platforms. It is also crucial for research on the influence of bots in elections, the spread of misinformation, and financial market manipulation. Platforms deploy infrastructure to flag or remove automated accounts, but their tools and data are not publicly available. Thus, the public must rely on third-party bot detection. These tools employ machine learning and often achieve near perfect performance for classification on existing datasets, suggesting bot detection is accurate, reliable and fit for use in downstream applications. We provide evidence that this is not the case and show that high performance is attributable to limitations in dataset collection and labeling rather than sophistication of the tools. Specifically, we show that simple decision rules -- shallow decision trees trained on a small number of features -- achieve near-state-of-the-art performance on most available datasets and that bot detection datasets, even when combined together, do not generalize well to out-of-sample datasets. Our findings reveal that predictions are highly dependent on each dataset's collection and labeling procedures rather than fundamental differences between bots and humans. These results have important implications for both transparency in sampling and labeling procedures and potential biases in research using existing bot detection tools for pre-processing.

估計/估計量 · SC · 無偏 · 控制器 · 周期的 ·

2023 年 5 月 1 日

A Design-Based Perspective on Synthetic Control Methods

Lea Bottmer,Guido Imbens,Jann Spiess,Merrill Warnick

Since their introduction in Abadie and Gardeazabal (2003), Synthetic Control (SC) methods have quickly become one of the leading methods for estimating causal effects in observational studies in settings with panel data. Formal discussions often motivate SC methods by the assumption that the potential outcomes were generated by a factor model. Here we study SC methods from a design-based perspective, assuming a model for the selection of the treated unit(s) and period(s). We show that the standard SC estimator is generally biased under random assignment. We propose a Modified Unbiased Synthetic Control (MUSC) estimator that guarantees unbiasedness under random assignment and derive its exact, randomization-based, finite-sample variance. We also propose an unbiased estimator for this variance. We document in settings with real data that under random assignment, SC-type estimators can have root mean-squared errors that are substantially lower than that of other common estimators. We show that such an improvement is weakly guaranteed if the treated period is similar to the other periods, for example, if the treated period was randomly selected. While our results only directly apply in settings where treatment is assigned randomly, we believe that they can complement model-based approaches even for observational studies.

IB · 層 · ForCES · Krylov方法 · 相互獨立的 ·

2023 年 5 月 1 日

Immersed Boundary Double Layer Method

Brittany J. Leathers,Robert D. Guy

The Immersed Boundary (IB) method of Peskin (J. Comput. Phys., 1977) is useful for problems involving fluid-structure interactions or complex geometries. By making use of a regular Cartesian grid that is independent of the geometry, the IB framework yields a robust numerical scheme that can efficiently handle immersed deformable structures. Additionally, the IB method has been adapted to problems with prescribed motion and other PDEs with given boundary data. IB methods for these problems traditionally involve penalty forces which only approximately satisfy boundary conditions, or they are formulated as constraint problems. In the latter approach, one must find the unknown forces by solving an equation that corresponds to a poorly conditioned first-kind integral equation. This operation can require a large number of iterations of a Krylov method, and since a time-dependent problem requires this solve at each time step, this method can be prohibitively inefficient without preconditioning. In this work, we introduce a new, well-conditioned IB formulation for boundary value problems, which we call the Immersed Boundary Double Layer (IBDL) method. We present the method as it applies to Poisson and Helmholtz problems to demonstrate its efficiency over the original constraint method. In this double layer formulation, the equation for the unknown boundary distribution corresponds to a well-conditioned second-kind integral equation that can be solved efficiently with a small number of iterations of a Krylov method. Furthermore, the iteration count is independent of both the mesh size and immersed boundary point spacing. The method converges away from the boundary, and when combined with a local interpolation, it converges in the entire PDE domain. Additionally, while the original constraint method applies only to Dirichlet problems, the IBDL formulation can also be used for Neumann conditions.

極大 · ICALP · Weight · 類別 · FAST ·

2023 年 4 月 28 日

Faster Submodular Maximization for Several Classes of Matroids

Monika Henzinger,Paul Liu,Jan Vondrak,Da Wei Zheng

from arxiv, 38 pages. Abstract shortened for arxiv. To appear in ICALP 2023

The maximization of submodular functions have found widespread application in areas such as machine learning, combinatorial optimization, and economics, where practitioners often wish to enforce various constraints; the matroid constraint has been investigated extensively due to its algorithmic properties and expressive power. Recent progress has focused on fast algorithms for important classes of matroids given in explicit form. Currently, nearly-linear time algorithms only exist for graphic and partition matroids [ICALP '19]. In this work, we develop algorithms for monotone submodular maximization constrained by graphic, transversal matroids, or laminar matroids in time near-linear in the size of their representation. Our algorithms achieve an optimal approximation of $1-1/e-\epsilon$ and both generalize and accelerate the results of Ene and Nguyen [ICALP '19]. In fact, the running time of our algorithm cannot be improved within the fast continuous greedy framework of Badanidiyuru and Vondr\'ak [SODA '14]. To achieve near-linear running time, we make use of dynamic data structures that maintain bases with approximate maximum cardinality and weight under certain element updates. These data structures need to support a weight decrease operation and a novel FREEZE operation that allows the algorithm to freeze elements (i.e. force to be contained) in its basis regardless of future data structure operations. For the laminar matroid, we present a new dynamic data structure using the top tree interface of Alstrup, Holm, de Lichtenberg, and Thorup [TALG '05] that maintains the maximum weight basis under insertions and deletions of elements in $O(\log n)$ time. For the transversal matroid the FREEZE operation corresponds to requiring the data structure to keep a certain set $S$ of vertices matched, a property that we call $S$-stability.

線性的 · DirectShow · 線性組合 · 參數空間 · 設計 ·

2023 年 4 月 28 日

A Method for Finding a Design Space as Linear Combinations of Parameter Ranges for Biopharmaceutical Control Strategies

Thomas Oberleitner,Thomas Zahel,Christoph Herwig

from arxiv, 15 pages, 7 figures, 3 tables, research article

According to ICH Q8 guidelines, the biopharmaceutical manufacturer submits a design space (DS) definition as part of the regulatory approval application, in which case process parameter (PP) deviations within this space are not considered a change and do not trigger a regulatory post approval procedure. A DS can be described by non-linear PP ranges, i.e., the range of one PP conditioned on specific values of another. However, independent PP ranges (linear combinations) are often preferred in biopharmaceutical manufacturing due to their operation simplicity. While some statistical software supports the calculation of a DS comprised of linear combinations, such methods are generally based on discretizing the parameter space - an approach that scales poorly as the number of PPs increases. Here, we introduce a novel method for finding linear PP combinations using a numeric optimizer to calculate the largest design space within the parameter space that results in critical quality attribute (CQA) boundaries within acceptance criteria, predicted by a regression model. A precomputed approximation of tolerance intervals is used in inequality constraints to facilitate fast evaluations of this boundary using a single matrix multiplication. Correctness of the method was validated against different ground truths with known design spaces. Compared to stateof-the-art, grid-based approaches, the optimizer-based procedure is more accurate, generally yields a larger DS and enables the calculation in higher dimensions. Furthermore, a proposed weighting scheme can be used to favor certain PPs over others and therefore enabling a more dynamic approach to DS definition and exploration. The increased PP ranges of the larger DS provide greater operational flexibility for biopharmaceutical manufacturers.

近似 · Learning · 估計/估計量 · MoDELS · 泛函 ·

2023 年 4 月 28 日

Tensor-train methods for sequential state and parameter learning in state-space models

Yiran Zhao,Tiangang Cui

We consider sequential state and parameter learning in state-space models with intractable state transition and observation processes. By exploiting low-rank tensor-train (TT) decompositions, we propose new sequential learning methods for joint parameter and state estimation under the Bayesian framework. Our key innovation is the introduction of scalable function approximation tools such as TT for recursively learning the sequentially updated posterior distributions. The function approximation perspective of our methods offers tractable error analysis and potentially alleviates the particle degeneracy faced by many particle-based methods. In addition to the new insights into algorithmic design, our methods complement conventional particle-based methods. Our TT-based approximations naturally define conditional Knothe--Rosenblatt (KR) rearrangements that lead to filtering, smoothing and path estimation accompanying our sequential learning algorithms, which open the door to removing potential approximation bias. We also explore several preconditioning techniques based on either linear or nonlinear KR rearrangements to enhance the approximation power of TT for practical problems. We demonstrate the efficacy and efficiency of our proposed methods on several state-space models, in which our methods achieve state-of-the-art estimation accuracy and computational performance.

可辨認的 · 估計/估計量 · SimPLe · MoDELS · 包外估計 ·

2023 年 4 月 28 日

Data-OOB: Out-of-bag Estimate as a Simple and Efficient Data Value

Yongchan Kwon,James Zou

from arxiv, 18 pages, ICML 2023

Data valuation is a powerful framework for providing statistical insights into which data are beneficial or detrimental to model training. Many Shapley-based data valuation methods have shown promising results in various downstream tasks, however, they are well known to be computationally challenging as it requires training a large number of models. As a result, it has been recognized as infeasible to apply to large datasets. To address this issue, we propose Data-OOB, a new data valuation method for a bagging model that utilizes the out-of-bag estimate. The proposed method is computationally efficient and can scale to millions of data by reusing trained weak learners. Specifically, Data-OOB takes less than 2.25 hours on a single CPU processor when there are $10^6$ samples to evaluate and the input dimension is 100. Furthermore, Data-OOB has solid theoretical interpretations in that it identifies the same important data point as the infinitesimal jackknife influence function when two different points are compared. We conduct comprehensive experiments using 12 classification datasets, each with thousands of sample sizes. We demonstrate that the proposed method significantly outperforms existing state-of-the-art data valuation methods in identifying mislabeled data and finding a set of helpful (or harmful) data points, highlighting the potential for applying data values in real-world applications.

泛化理論 · INFORMS · Performer · 測試樣本 · state-of-the-art ·

2021 年 3 月 29 日

Adaptive Methods for Real-World Domain Generalization

Abhimanyu Dubey,Vignesh Ramanathan,Alex Pentland,Dhruv Mahajan

from arxiv, To appear as an oral presentation in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

Invariant approaches have been remarkably successful in tackling the problem of domain generalization, where the objective is to perform inference on data distributions different from those used in training. In our work, we investigate whether it is possible to leverage domain information from the unseen test samples themselves. We propose a domain-adaptive approach consisting of two steps: a) we first learn a discriminative domain embedding from unsupervised training examples, and b) use this domain embedding as supplementary information to build a domain-adaptive model, that takes both the input as well as its domain into account while making predictions. For unseen domains, our method simply uses few unlabelled test examples to construct the domain embedding. This enables adaptive classification on any unseen domain. Our approach achieves state-of-the-art performance on various domain generalization benchmarks. In addition, we introduce the first real-world, large-scale domain generalization benchmark, Geo-YFCC, containing 1.1M samples over 40 training, 7 validation, and 15 test domains, orders of magnitude larger than prior work. We show that the existing approaches either do not scale to this dataset or underperform compared to the simple baseline of training a model on the union of data from all training domains. In contrast, our approach achieves a significant improvement.