东京热加勒比中文无码_国产精品午夜福利鲁丝片在线_日本一区二区三区免视频免费播放_精品一区二区三区免费_日本视频激情视频一区二区_国产成人综合一区二区三区_尤物免费午夜一区二区三区

Merging satellite products and ground-based measurements is often required for obtaining precipitation datasets that simultaneously cover large regions with high density and are more accurate than pure satellite precipitation products. Machine and statistical learning regression algorithms are regularly utilized in this endeavour. At the same time, tree-based ensemble algorithms for regression are adopted in various fields for solving algorithmic problems with high accuracy and low computational cost. The latter can constitute a crucial factor for selecting algorithms for satellite precipitation product correction at the daily and finer time scales, where the size of the datasets is particularly large. Still, information on which tree-based ensemble algorithm to select in such a case for the contiguous United States (US) is missing from the literature. In this work, we conduct an extensive comparison between three tree-based ensemble algorithms, specifically random forests, gradient boosting machines (gbm) and extreme gradient boosting (XGBoost), in the context of interest. We use daily data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and the IMERG (Integrated Multi-satellitE Retrievals for GPM) gridded datasets. We also use earth-observed precipitation data from the Global Historical Climatology Network daily (GHCNd) database. The experiments refer to the entire contiguous US and additionally include the application of the linear regression algorithm for benchmarking purposes. The results suggest that XGBoost is the best-performing tree-based ensemble algorithm among those compared. They also suggest that IMERG is more useful than PERSIANN in the context investigated.

相關內容

集成

關注 1

Hacking · Learning · MoDELS · Machine Learning · AI ·

2023 年 2 月 28 日

Machine Learning Featurizations for AI Hacking of Political Systems

Nathan E Sanders,Bruce Schneier

from arxiv, 11 2-column pages, 2 figures

What would the inputs be to a machine whose output is the destabilization of a robust democracy, or whose emanations could disrupt the political power of nations? In the recent essay "The Coming AI Hackers," Schneier (2021) proposed a future application of artificial intelligences to discover, manipulate, and exploit vulnerabilities of social, economic, and political systems at speeds far greater than humans' ability to recognize and respond to such threats. This work advances the concept by applying to it theory from machine learning, hypothesizing some possible "featurization" (input specification and transformation) frameworks for AI hacking. Focusing on the political domain, we develop graph and sequence data representations that would enable the application of a range of deep learning models to predict attributes and outcomes of political, particularly legislative, systems. We explore possible data models, datasets, predictive tasks, and actionable applications associated with each framework. We speculate about the likely practical impact and feasibility of such models, and conclude by discussing their ethical implications.

Analysis · 整數線性規劃 · 線性的 · 可辨認的 · 通用動力公司 ·

2023 年 2 月 28 日

Attack time analysis in dynamic attack trees via integer linear programming

Milan Lopuha?-Zwakenberg,Mari?lle Stoelinga

Attack trees are an important tool in security analysis, and an important part of attack tree analysis is computing metrics. This paper focuses on dynamic attack trees and their min time metric, i.e. the minimal time to attack a system. For general attack trees, calculating min time efficiently is an open problem, with the fastest current method being enumerating all minimal attacks, which is NP-hard. This paper presents three tools for calculating min time. First, we introduce a novel method for general dynamic attack trees based on mixed integer linear programming. Second, we show how the computation can be sped up by identifying the modules of an attack tree, i.e. subtrees connected to the rest of the attack tree via only one node. Finally, we define a general semantics for dynamic attack trees that significantly relaxes the restrictions on attack trees compared to earlier work, allowing us to apply our methods to a wide variety of attack trees. Experiments on both a case study of a server cluster and a synthetic testing set of large attack trees verify that both the integer linear programming approach and modular analysis considerably decrease the computation time of attack time analysis.

比率匹配 · 重要性采樣 · 離散化 · Learning · binary ·

2023 年 2 月 27 日

Gradient-Guided Importance Sampling for Learning Binary Energy-Based Models

Meng Liu,Haoran Liu,Shuiwang Ji

from arxiv, Accepted by ICLR 2023

Learning energy-based models (EBMs) is known to be difficult especially on discrete data where gradient-based learning strategies cannot be applied directly. Although ratio matching is a sound method to learn discrete EBMs, it suffers from expensive computation and excessive memory requirements, thereby resulting in difficulties in learning EBMs on high-dimensional data. Motivated by these limitations, in this study, we propose ratio matching with gradient-guided importance sampling (RMwGGIS). Particularly, we use the gradient of the energy function w.r.t. the discrete data space to approximately construct the provably optimal proposal distribution, which is subsequently used by importance sampling to efficiently estimate the original ratio matching objective. We perform experiments on density modeling over synthetic discrete data, graph generation, and training Ising models to evaluate our proposed method. The experimental results demonstrate that our method can significantly alleviate the limitations of ratio matching, perform more effectively in practice, and scale to high-dimensional problems. Our implementation is available at //github.com/divelab/RMwGGIS.

Analysis · 泛函 · 可約的 · 漢明距離 · 查準率/準確率 ·

2023 年 2 月 27 日

Runtime Analysis for Permutation-based Evolutionary Algorithms

Benjamin Doerr,Yassine Ghannane,Marouane Ibn Brahim

from arxiv, Journal version of our paper at GECCO 2022. 51 pages. arXiv admin note: substantial text overlap with arXiv:2204.07637

While the theoretical analysis of evolutionary algorithms (EAs) has made significant progress for pseudo-Boolean optimization problems in the last 25 years, only sporadic theoretical results exist on how EAs solve permutation-based problems. To overcome the lack of permutation-based benchmark problems, we propose a general way to transfer the classic pseudo-Boolean benchmarks into benchmarks defined on sets of permutations. We then conduct a rigorous runtime analysis of the permutation-based $(1+1)$ EA proposed by Scharnow, Tinnefeld, and Wegener (2004) on the analogues of the LeadingOnes and Jump benchmarks. The latter shows that, different from bit-strings, it is not only the Hamming distance that determines how difficult it is to mutate a permutation $\sigma$ into another one $\tau$, but also the precise cycle structure of $\sigma \tau^{-1}$. For this reason, we also regard the more symmetric scramble mutation operator. We observe that it not only leads to simpler proofs, but also reduces the runtime on jump functions with odd jump size by a factor of $\Theta(n)$. Finally, we show that a heavy-tailed version of the scramble operator, as in the bit-string case, leads to a speed-up of order $m^{\Theta(m)}$ on jump functions with jump size $m$. A short empirical analysis confirms these findings, but also reveals that small implementation details like the rate of void mutations can make an important difference.

估計/估計量 · MoDELS · Performer · 方陣 · 有向 ·

2023 年 2 月 26 日

Direct Estimation of Parameters in ODE Models Using WENDy: Weak-form Estimation of Nonlinear Dynamics

David M. Bortz,Daniel A. Messenger,Vanja. Dukic

from arxiv, 25 pages, 13 figures

We introduce the Weak-form Estimation of Nonlinear Dynamics (WENDy) method for estimating model parameters for non-linear systems of ODEs. The core mathematical idea involves an efficient conversion of the strong form representation of a model to its weak form, and then solving a regression problem to perform parameter inference. The core statistical idea rests on the Errors-In-Variables framework, which necessitates the use of the iteratively reweighted least squares algorithm. Further improvements are obtained by using orthonormal test functions, created from a set of $C^{\infty}$ bump functions of varying support sizes. We demonstrate that WENDy is a highly robust and efficient method for parameter inference in differential equations. Without relying on any numerical differential equation solvers, WENDy computes accurate estimates and is robust to large (biologically relevant) levels of measurement noise. For low dimensional systems with modest amounts of data, WENDy is competitive with conventional forward solver-based nonlinear least squares methods in terms of speed and accuracy. For both higher dimensional systems and stiff systems, WENDy is typically both faster (often by orders of magnitude) and more accurate than forward solver-based approaches. We illustrate the method and its performance in some common population and neuroscience models, including logistic growth, Lotka-Volterra, FitzHugh-Nagumo, Hindmarsh-Rose, and a Protein Transduction Benchmark model. Software and code for reproducing the examples is available at (//github.com/MathBioCU/WENDy).

MoDELS · 有偏 · 估計/估計量 · 可約的 · data aggregation ·

2023 年 2 月 26 日

Calibration of imperfect geophysical models by multiple satellite interferograms with measurement bias

Mengyang Gu,Kyle Anderson,Erika McPhillips

Model calibration consists of using experimental or field data to estimate the unknown parameters of a mathematical model. The presence of model discrepancy and measurement bias in the data complicates this task. Satellite interferograms, for instance, are widely used for calibrating geophysical models in geological hazard quantification. In this work, we used satellite interferograms to relate ground deformation observations to the properties of the magma chamber at K\={\i}lauea Volcano in Hawai`i. We derived closed-form marginal likelihoods and implemented posterior sampling procedures that simultaneously estimate the model discrepancy of physical models, and the measurement bias from the atmospheric error in satellite interferograms. We found that model calibration by aggregating multiple interferograms and downsampling the pixels in the interferograms can reduce the computation complexity compared to calibration approaches based on multiple data sets. The conditions that lead to no loss of information from data aggregation and downsampling are studied. Simulation illustrates that both discrepancy and measurement bias can be estimated, and real applications demonstrate that modeling both effects helps obtain a reliable estimation of a physical model's unobserved parameters and enhance its predictive accuracy. We implement the computational tools in the RobustCalibration package available on CRAN.

相關系數 · 均值 · 統計量 · 矩 · 向量化 ·

2023 年 2 月 24 日

Logarithmic law of large random correlation matrices

Nestor Parolya,Johannes Heiny,Dorota Kurowicka

from arxiv, 29 pages, 6 figures. This is an old version. A revised version appears in Bernoulli (2023)

Consider a random vector $\mathbf{y}=\mathbf{\Sigma}^{1/2}\mathbf{x}$, where the $p$ elements of the vector $\mathbf{x}$ are i.i.d. real-valued random variables with zero mean and finite fourth moment, and $\mathbf{\Sigma}^{1/2}$ is a deterministic $p\times p$ matrix such that the spectral norm of the population correlation matrix $\mathbf{R}$ of $\mathbf{y}$ is uniformly bounded. In this paper, we find that the log determinant of the sample correlation matrix $\hat{\mathbf{R}}$ based on a sample of size $n$ from the distribution of $\mathbf{y}$ satisfies a CLT (central limit theorem) for $p/n\to \gamma\in (0, 1]$ and $p\leq n$. Explicit formulas for the asymptotic mean and variance are provided. In case the mean of $\mathbf{y}$ is unknown, we show that after recentering by the empirical mean the obtained CLT holds with a shift in the asymptotic mean. This result is of independent interest in both large dimensional random matrix theory and high-dimensional statistical literature of large sample correlation matrices for non-normal data. At last, the obtained findings are applied for testing of uncorrelatedness of $p$ random variables. Surprisingly, in the null case $\mathbf{R}=\mathbf{I}$, the test statistic becomes completely pivotal and the extensive simulations show that the obtained CLT also holds if the moments of order four do not exist at all, which conjectures a promising and robust test statistic for heavy-tailed high-dimensional data.

模型評估 · 近似 · MCMC · 泛函 · 馬爾可夫鏈蒙特卡羅 ·

2023 年 2 月 24 日

A Targeted Accuracy Diagnostic for Variational Approximations

Yu Wang,Miko?aj Kasprzak,Jonathan H. Huggins

from arxiv, Code to reproduce all of our experiments is available at //github.com/TARPS-group/TADDAA

Variational Inference (VI) is an attractive alternative to Markov Chain Monte Carlo (MCMC) due to its computational efficiency in the case of large datasets and/or complex models with high-dimensional parameters. However, evaluating the accuracy of variational approximations remains a challenge. Existing methods characterize the quality of the whole variational distribution, which is almost always poor in realistic applications, even if specific posterior functionals such as the component-wise means or variances are accurate. Hence, these diagnostics are of practical value only in limited circumstances. To address this issue, we propose the TArgeted Diagnostic for Distribution Approximation Accuracy (TADDAA), which uses many short parallel MCMC chains to obtain lower bounds on the error of each posterior functional of interest. We also develop a reliability check for TADDAA to determine when the lower bounds should not be trusted. Numerical experiments validate the practical utility and computational efficiency of our approach on a range of synthetic distributions and real-data examples, including sparse logistic regression and Bayesian neural network models.

核化 · 近似 · 樣本 · 可辨認的 · Extensibility ·

2023 年 2 月 24 日

Compress Then Test: Powerful Kernel Testing in Near-linear Time

Carles Domingo-Enrich,Raaz Dwivedi,Lester Mackey

from arxiv, Accepted as a paper at AISTATS 2023

Kernel two-sample testing provides a powerful framework for distinguishing any pair of distributions based on $n$ sample points. However, existing kernel tests either run in $n^2$ time or sacrifice undue power to improve runtime. To address these shortcomings, we introduce Compress Then Test (CTT), a new framework for high-powered kernel testing based on sample compression. CTT cheaply approximates an expensive test by compressing each $n$ point sample into a small but provably high-fidelity coreset. For standard kernels and subexponential distributions, CTT inherits the statistical behavior of a quadratic-time test -- recovering the same optimal detection boundary -- while running in near-linear time. We couple these advances with cheaper permutation testing, justified by new power analyses; improved time-vs.-quality guarantees for low-rank approximation; and a fast aggregation procedure for identifying especially discriminating kernels. In our experiments with real and simulated data, CTT and its extensions provide 20--200x speed-ups over state-of-the-art approximate MMD tests with no loss of power.

MoDELS · 相似度 · 可理解性 · 離散化 · 數學 ·

2023 年 2 月 24 日

Crowd simulation incorporating a route choice model and similarity evaluation using real large-scale data

Ryo Nishida,Masaki Onishi,Koichi Hashimoto

from arxiv, This is the full version of the paper submitted as an extended abstract for AAMAS 2023

Modeling and simulation approaches that express crowd movement with mathematical models are widely and actively studied to understand crowd movement and resolve crowd accidents. Existing literature on crowd modeling focuses on only the decision-making of walking behavior. However, the decision-making of route choice, which is a higher-level decision, should also be modeled for constructing more practical simulations. Furthermore, the reproducibility evaluation of the crowd simulation incorporating the route choice model using real data is insufficient. Therefore, we generalize and propose a crowd simulation framework that includes actual crowd movement measurements, route choice model estimation, and crowd simulator construction. We use the Discrete choice model as the route choice model and the Social force model as the walking model. In experiments, we measure crowd movements during an evacuation drill in a theater and a firework event where tens of thousands of people moved and prove that the crowd simulation incorporating the route choice model can reproduce the real large-scale crowd movement more accurately.