亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

The objective of this work is to quantify the uncertainty in probability of failure estimates resulting from incomplete knowledge of the probability distributions for the input random variables. We propose a framework that couples the widely used Subset simulation (SuS) with Bayesian/information theoretic multi-model inference. The process starts with data used to infer probability distributions for the model inputs. Often such data sets are small. Multi-model inference is used to assess uncertainty associated with the model-form and parameters of these random variables in the form of model probabilities and the associated joint parameter probability densities. A sampling procedure is used to construct a set of equally probable candidate probability distributions and an optimal importance sampling distribution is determined analytically from this set. Subset simulation is then performed using this optimal sampling density and the resulting conditional probabilities are re-weighted using importance sampling. The result of this process are empirical probability distributions of failure probabilities that provide direct estimates of the uncertainty in failure probability estimates that result from inference on small data sets. The method is demonstrated to be both computationally efficient -- requiring only a single subset simulation and nominal cost of sample re-weighting -- and to provide reasonable estimates of the uncertainty in failure probabilities.

相關內容

Given $(a_1, \dots, a_n, t) \in \mathbb{Z}_{\geq 0}^{n + 1}$, the Subset Sum problem ($\mathsf{SSUM}$) is to decide whether there exists $S \subseteq [n]$ such that $\sum_{i \in S} a_i = t$. There is a close variant of the $\mathsf{SSUM}$, called $\mathsf{Subset~Product}$. Given positive integers $a_1, ..., a_n$ and a target integer $t$, the $\mathsf{Subset~Product}$ problem asks to determine whether there exists a subset $S \subseteq [n]$ such that $\prod_{i \in S} a_i=t$. There is a pseudopolynomial time dynamic programming algorithm, due to Bellman (1957) which solves the $\mathsf{SSUM}$ and $\mathsf{Subset~Product}$ in $O(nt)$ time and $O(t)$ space. In the first part, we present {\em search} algorithms for variants of the Subset Sum problem. Our algorithms are parameterized by $k$, which is a given upper bound on the number of realisable sets (i.e.,~number of solutions, summing exactly $t$). We show that $\mathsf{SSUM}$ with a unique solution is already NP-hard, under randomized reduction. This makes the regime of parametrized algorithms, in terms of $k$, very interesting. Subsequently, we present an $\tilde{O}(k\cdot (n+t))$ time deterministic algorithm, which finds the hamming weight of all the realisable sets for a subset sum instance. We also give a poly$(knt)$-time and $O(\log(knt))$-space deterministic algorithm that finds all the realisable sets for a subset sum instance. In the latter part, we present a simple and elegant randomized $\tilde{O}(n + t)$ time algorithm for $\mathsf{Subset~Product}$. Moreover, we also present a poly$(nt)$ time and $O(\log^2 (nt))$ space deterministic algorithm for the same. We study these problems in the unbounded setting as well. Our algorithms use multivariate FFT, power series and number-theoretic techniques, introduced by Jin and Wu (SOSA'19) and Kane (2010).

We consider the problem of space-efficiently estimating the number of simplices in a hypergraph stream. This is the most natural hypergraph generalization of the highly-studied problem of estimating the number of triangles in a graph stream. Our input is a $k$-uniform hypergraph $H$ with $n$ vertices and $m$ hyperedges. A $k$-simplex in $H$ is a subhypergraph on $k+1$ vertices $X$ such that all $k+1$ possible hyperedges among $X$ exist in $H$. The goal is to process a stream of hyperedges of $H$ and compute a good estimate of $T_k(H)$, the number of $k$-simplices in $H$. We design a suite of algorithms for this problem. Under a promise that $T_k(H) \ge T$, our algorithms use at most four passes and together imply a space bound of $O( \epsilon^{-2} \log\delta^{-1} \text{polylog} n \cdot \min\{ m^{1+1/k}/T, m/T^{2/(k+1)} \} )$ for each fixed $k \ge 3$, in order to guarantee an estimate within $(1\pm\epsilon)T_k(H)$ with probability at least $1-\delta$. We also give a simpler $1$-pass algorithm that achieves $O(\epsilon^{-2} \log\delta^{-1} \log n\cdot (m/T) ( \Delta_E + \Delta_V^{1-1/k} ))$ space, where $\Delta_E$ (respectively, $\Delta_V$) denotes the maximum number of $k$-simplices that share a hyperedge (respectively, a vertex). We complement these algorithmic results with space lower bounds of the form $\Omega(\epsilon^{-2})$, $\Omega(m^{1+1/k}/T)$, $\Omega(m/T^{1-1/k})$ and $\Omega(m\Delta_V^{1/k}/T)$ for multi-pass algorithms and $\Omega(m\Delta_E/T)$ for $1$-pass algorithms, which show that some of the dependencies on parameters in our upper bounds are nearly tight. Our techniques extend and generalize several different ideas previously developed for triangle counting in graphs, using appropriate innovations to handle the more complicated combinatorics of hypergraphs.

Between the two dominant schools of thought in statistics, namely, Bayesian and classical/frequentist, a main difference is that the former is grounded in the mathematically rigorous theory of probability while the latter is not. In this paper, I show that the latter is grounded in a different but equally mathematically rigorous theory of imprecise probability. Specifically, I show that for every suitable testing or confidence procedure with error rate control guarantees, there exists a consonant plausibility function whose derived testing or confidence procedure is no less efficient. Beyond its foundational implications, this characterization has at least two important practical consequences: first, it simplifies the interpretation of p-values and confidence regions, thus creating opportunities for improved education and scientific communication; second, the constructive proof of the main results leads to a strategy for new and improved methods in challenging inference problems.

We explore the efficient estimation of statistical quantities, particularly rare event probabilities, for stochastic reaction networks. We propose a novel importance sampling (IS) approach to improve the efficiency of Monte Carlo (MC) estimators when based on an approximate tau-leap scheme. In the IS framework, it is crucial to choose an appropriate change of probability measure for achieving substantial variance reduction. Based on an original connection between finding the optimal IS parameters within a class of probability measures and a stochastic optimal control (SOC) formulation, we propose an automated approach to obtain a highly efficient path-dependent measure change. The optimal IS parameters are obtained by solving a variance minimization problem. We derive an associated backward equation solved by these optimal parameters. Given the challenge of analytically solving this backward equation, we propose a numerical dynamic programming algorithm to approximate the optimal control parameters. To mitigate the curse of dimensionality issue caused by solving the backward equation in the multi-dimensional case, we propose a learning-based method that approximates the value function using a neural network, the parameters of which are determined via stochastic optimization. Our numerical experiments show that our learning-based IS approach substantially reduces the variance of the MC estimator. Moreover, when applying the numerical dynamic programming approach for the one-dimensional case, we obtained a variance that decays at a rate of $\mathcal{O}(\Delta t)$ for a step size of $\Delta t$, compared to $\mathcal{O}(1)$ for a standard MC estimator. For a given prescribed error tolerance, $\text{TOL}$, this implies an improvement in the computational complexity to become $\mathcal{O}(\text{TOL}^{-2})$ instead of $\mathcal{O}(\text{TOL}^{-3})$ when using a standard MC estimator.

We develop a new method to estimate an ARMA model in the presence of big time series data. Using the concept of a rolling average, we develop a new efficient algorithm, called Rollage, to estimate the order of an AR model and subsequently fit the model. When used in conjunction with an existing methodology, specifically Durbin's algorithm, we show that our proposed method can be used as a criterion to optimally fit ARMA models. Empirical results on large-scale synthetic time series data support the theoretical results and reveal the efficacy of this new approach, especially when compared to existing methodology.

We develop a probabilistic framework for analysing model-based reinforcement learning in the episodic setting. We then apply it to study finite-time horizon stochastic control problems with linear dynamics but unknown coefficients and convex, but possibly irregular, objective function. Using probabilistic representations, we study regularity of the associated cost functions and establish precise estimates for the performance gap between applying optimal feedback control derived from estimated and true model parameters. We identify conditions under which this performance gap is quadratic, improving the linear performance gap in recent work [X. Guo, A. Hu, and Y. Zhang, arXiv preprint, arXiv:2104.09311, (2021)], which matches the results obtained for stochastic linear-quadratic problems. Next, we propose a phase-based learning algorithm for which we show how to optimise exploration-exploitation trade-off and achieve sublinear regrets in high probability and expectation. When assumptions needed for the quadratic performance gap hold, the algorithm achieves an order $\mathcal{O}(\sqrt{N} \ln N)$ high probability regret, in the general case, and an order $\mathcal{O}((\ln N)^2)$ expected regret, in self-exploration case, over $N$ episodes, matching the best possible results from the literature. The analysis requires novel concentration inequalities for correlated continuous-time observations, which we derive.

The framework of model-X knockoffs provides a flexible tool for exact finite-sample false discovery rate (FDR) control in variable selection. It also completely bypasses the use of conventional p-values, making it especially appealing in high-dimensional nonlinear models. Existing works have focused on the setting of independent and identically distributed observations. Yet time series data is prevalent in practical applications. This motivates the study of model-X knockoffs inference for time series data. In this paper, we make some initial attempt to establish the theoretical and methodological foundation for the model-X knockoffs inference for time series data. We suggest the method of time series knockoffs inference (TSKI) by exploiting the idea of subsampling to alleviate the difficulty caused by the serial dependence. We establish sufficient conditions under which the original model-X knockoffs inference combined with subsampling still achieves the asymptotic FDR control. Our technical analysis reveals the exact effect of serial dependence on the FDR control. To alleviate the practical concern on the power loss because of reduced sample size cause by subsampling, we exploit the idea of knockoffs with copies and multiple knockoffs. Under fairly general time series model settings, we show that the FDR remains to be controlled asymptotically. To theoretically justify the power of TSKI, we further suggest the new knockoff statistic, the backward elimination ranking (BE) statistic, and show that it enjoys both the sure screening property and controlled FDR in the linear time series model setting. The theoretical results and appealing finite-sample performance of the suggested TSKI method coupled with the BE are illustrated with several simulation examples and an economic inflation forecasting application.

We study algorithms for approximating pairwise similarity matrices that arise in natural language processing. Generally, computing a similarity matrix for $n$ data points requires $\Omega(n^2)$ similarity computations. This quadratic scaling is a significant bottleneck, especially when similarities are computed via expensive functions, e.g., via transformer models. Approximation methods reduce this quadratic complexity, often by using a small subset of exactly computed similarities to approximate the remainder of the complete pairwise similarity matrix. Significant work focuses on the efficient approximation of positive semidefinite (PSD) similarity matrices, which arise e.g., in kernel methods. However, much less is understood about indefinite (non-PSD) similarity matrices, which often arise in NLP. Motivated by the observation that many of these matrices are still somewhat close to PSD, we introduce a generalization of the popular Nystr\"{o}m method to the indefinite setting. Our algorithm can be applied to any similarity matrix and runs in sublinear time in the size of the matrix, producing a rank-$s$ approximation with just $O(ns)$ similarity computations. We show that our method, along with a simple variant of CUR decomposition, performs very well in approximating a variety of similarity matrices arising in NLP tasks. We demonstrate high accuracy of the approximated similarity matrices in the downstream tasks of document classification, sentence similarity, and cross-document coreference.

Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly. We develop a simple method for estimating parameters in implicit models that does not require knowledge of the form of the likelihood function or any derived quantities, but can be shown to be equivalent to maximizing likelihood under some conditions. Our result holds in the non-asymptotic parametric setting, where both the capacity of the model and the number of data examples are finite. We also demonstrate encouraging experimental results.

Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space, such as the simplex, the time-discretisation error can dominate when we are near the boundary of the space. We demonstrate that while current SGMCMC methods for the simplex perform well in certain cases, they struggle with sparse simplex spaces; when many of the components are close to zero. However, most popular large-scale applications of Bayesian inference on simplex spaces, such as network or topic models, are sparse. We argue that this poor performance is due to the biases of SGMCMC caused by the discretization error. To get around this, we propose the stochastic CIR process, which removes all discretization error and we prove that samples from the stochastic CIR process are asymptotically unbiased. Use of the stochastic CIR process within a SGMCMC algorithm is shown to give substantially better performance for a topic model and a Dirichlet process mixture model than existing SGMCMC approaches.

北京阿比特科技有限公司