亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this work, we study a random orthogonal projection based least squares estimator for the stable solution of a multivariate nonparametric regression (MNPR) problem. More precisely, given an integer $d\geq 1$ corresponding to the dimension of the MNPR problem, a positive integer $N\geq 1$ and a real parameter $\alpha\geq -\frac{1}{2},$ we show that a fairly large class of $d-$variate regression functions are well and stably approximated by its random projection over the orthonormal set of tensor product $d-$variate Jacobi polynomials with parameters $(\alpha,\alpha).$ The associated uni-variate Jacobi polynomials have degree at most $N$ and their tensor products are orthonormal over $\mathcal U=[0,1]^d,$ with respect to the associated multivariate Jacobi weights. In particular, if we consider $n$ random sampling points $\mathbf X_i$ following the $d-$variate Beta distribution, with parameters $(\alpha+1,\alpha+1),$ then we give a relation involving $n, N, \alpha$ to ensure that the resulting $(N+1)^d\times (N+1)^d$ random projection matrix is well conditioned. Moreover, we provide squared integrated as well as $L^2-$risk errors of this estimator. Precise estimates of these errors are given in the case where the regression function belongs to an isotropic Sobolev space $H^s(I^d),$ with $s> \frac{d}{2}.$ Also, to handle the general and practical case of an unknown distribution of the $\mathbf X_i,$ we use Shepard's scattered interpolation scheme in order to generate fairly precise approximations of the observed data at $n$ i.i.d. sampling points $\mathbf X_i$ following a $d-$variate Beta distribution. Finally, we illustrate the performance of our proposed multivariate nonparametric estimator by some numerical simulations with synthetic as well as real data.

相關內容

We study the theory of neural network (NN) from the lens of classical nonparametric regression problems with a focus on NN's ability to adaptively estimate functions with heterogeneous smoothness --- a property of functions in Besov or Bounded Variation (BV) classes. Existing work on this problem requires tuning the NN architecture based on the function spaces and sample sizes. We consider a "Parallel NN" variant of deep ReLU networks and show that the standard weight decay is equivalent to promoting the $\ell_p$-sparsity ($0<p<1$) of the coefficient vector of an end-to-end learned function bases, i.e., a dictionary. Using this equivalence, we further establish that by tuning only the weight decay, such Parallel NN achieves an estimation error arbitrarily close to the minimax rates for both the Besov and BV classes. Notably, it gets exponentially closer to minimax optimal as the NN gets deeper. Our research sheds new lights on why depth matters and how NNs are more powerful than kernel methods.

In Statistical Relational Artificial Intelligence, a branch of AI and machine learning which combines the logical and statistical schools of AI, one uses the concept {\em para\-metrized probabilistic graphical model (PPGM)} to model (conditional) dependencies between random variables and to make probabilistic inferences about events on a space of "possible worlds". The set of possible worlds with underlying domain $D$ (a set of objects) can be represented by the set $\mathbf{W}_D$ of all first-order structures (for a suitable signature) with domain $D$. Using a formal logic we can describe events on $\mathbf{W}_D$. By combining a logic and a PPGM we can also define a probability distribution $\mathbb{P}_D$ on $\mathbf{W}_D$ and use it to compute the probability of an event. We consider a logic, denoted $PLA$, with truth values in the unit interval, which uses aggregation functions, such as arithmetic mean, geometric mean, maximum and minimum instead of quantifiers. However we face the problem of computational efficiency and this problem is an obstacle to the wider use of methods from Statistical Relational AI in practical applications. We address this problem by proving that the described probability will, under certain assumptions on the PPGM and the sentence $\varphi$, converge as the size of $D$ tends to infinity. The convergence result is obtained by showing that every formula $\varphi(x_1, \ldots, x_k)$ which contains only "admissible" aggregation functions (e.g. arithmetic and geometric mean, max and min) is asymptotically equivalent to a formula $\psi(x_1, \ldots, x_k)$ without aggregation functions.

Molecular dynamics (MD) has long been the \emph{de facto} choice for modeling complex atomistic systems from first principles, and recently deep learning become a popular way to accelerate it. Notwithstanding, preceding approaches depend on intermediate variables such as the potential energy or force fields to update atomic positions, which requires additional computations to perform back-propagation. To waive this requirement, we propose a novel model called ScoreMD by directly estimating the gradient of the log density of molecular conformations. Moreover, we analyze that diffusion processes highly accord with the principle of enhanced sampling in MD simulations, and is therefore a perfect match to our sequential conformation generation task. That is, ScoreMD perturbs the molecular structure with a conditional noise depending on atomic accelerations and employs conformations at previous timeframes as the prior distribution for sampling. Another challenge of modeling such a conformation generation process is that the molecule is kinetic instead of static, which no prior studies strictly consider. To solve this challenge, we introduce a equivariant geometric Transformer as a score function in the diffusion process to calculate the corresponding gradient. It incorporates the directions and velocities of atomic motions via 3D spherical Fourier-Bessel representations. With multiple architectural improvements, we outperforms state-of-the-art baselines on MD17 and isomers of C7O2H10. This research provides new insights into the acceleration of new material and drug discovery.

This paper focuses on stochastic saddle point problems with decision-dependent distributions. These are problems whose objective is the expected value of a stochastic payoff function, where random variables are drawn from a distribution induced by a distributional map. For general distributional maps, the problem of finding saddle points is in general computationally burdensome, even if the distribution is known. To enable a tractable solution approach, we introduce the notion of equilibrium points -- which are saddle points for the stationary stochastic minimax problem that they induce -- and provide conditions for their existence and uniqueness. We demonstrate that the distance between the two solution types is bounded provided that the objective has a strongly-convex-strongly-concave payoff and a Lipschitz continuous distributional map. We develop deterministic and stochastic primal-dual algorithms and demonstrate their convergence to the equilibrium point. In particular, by modeling errors emerging from a stochastic gradient estimator as sub-Weibull random variables, we provide error bounds in expectation and in high probability that hold for each iteration. Moreover, we show convergence to a neighborhood almost surely. Finally, we investigate a condition on the distributional map -- which we call opposing mixture dominance -- that ensures that the objective is strongly-convex-strongly-concave. We tailor the convergence results for the primal-dual algorithms to this opposing mixture dominance setup.

We investigate the feature compression of high-dimensional ridge regression using the optimal subsampling technique. Specifically, based on the basic framework of random sampling algorithm on feature for ridge regression and the A-optimal design criterion, we first obtain a set of optimal subsampling probabilities. Considering that the obtained probabilities are uneconomical, we then propose the nearly optimal ones. With these probabilities, a two step iterative algorithm is established which has lower computational cost and higher accuracy. We provide theoretical analysis and numerical experiments to support the proposed methods. Numerical results demonstrate the decent performance of our methods.

Many existing algorithms for streaming geometric data analysis have been plagued by exponential dependencies in the space complexity, which are undesirable for processing high-dimensional data sets. In particular, once $d\geq\log n$, there are no known non-trivial streaming algorithms for problems such as maintaining convex hulls and L\"owner-John ellipsoids of $n$ points, despite a long line of work in streaming computational geometry since [AHV04]. We simultaneously improve these results to $\mathrm{poly}(d,\log n)$ bits of space by trading off with a $\mathrm{poly}(d,\log n)$ factor distortion. We achieve these results in a unified manner, by designing the first streaming algorithm for maintaining a coreset for $\ell_\infty$ subspace embeddings with $\mathrm{poly}(d,\log n)$ space and $\mathrm{poly}(d,\log n)$ distortion. Our algorithm also gives similar guarantees in the \emph{online coreset} model. Along the way, we sharpen results for online numerical linear algebra by replacing a log condition number dependence with a $\log n$ dependence, answering a question of [BDM+20]. Our techniques provide a novel connection between leverage scores, a fundamental object in numerical linear algebra, and computational geometry. For $\ell_p$ subspace embeddings, we give nearly optimal trade-offs between space and distortion for one-pass streaming algorithms. For instance, we give a deterministic coreset using $O(d^2\log n)$ space and $O((d\log n)^{1/2-1/p})$ distortion for $p>2$, whereas previous deterministic algorithms incurred a $\mathrm{poly}(n)$ factor in the space or the distortion [CDW18]. Our techniques have implications in the offline setting, where we give optimal trade-offs between the space complexity and distortion of subspace sketch data structures. To do this, we give an elementary proof of a "change of density" theorem of [LT80] and make it algorithmic.

We consider the problem of nonparametric estimation of the drift and diffusion coefficients of a Stochastic Differential Equation (SDE), based on $n$ independent replicates $\left\{X_i(t)\::\: t\in [0,1]\right\}_{1 \leq i \leq n}$, observed sparsely and irregularly on the unit interval, and subject to additive noise corruption. By \textit{sparse} we intend to mean that the number of measurements per path can be arbitrary (as small as two), and remain constant with respect to $n$. We focus on time-inhomogeneous SDE of the form $dX_t = \mu(t)X_t^{\alpha}dt + \sigma(t)X_t^{\beta}dW_t$, where $\alpha \in \{0,1\}$ and $\beta \in \{0,1/2,1\}$, which includes prominent examples such as Brownian motion, Ornstein-Uhlenbeck process, geometric Brownian motion, and Brownian bridge. Our estimators are constructed by relating the local (drift/diffusion) parameters of the diffusion to their global parameters (mean/covariance, and their derivatives) by means of an apparently novel PDE. This allows us to use methods inspired by functional data analysis, and pool information across the sparsely measured paths. The methodology we develop is fully non-parametric and avoids any functional form specification on the time-dependency of either the drift function or the diffusion function. We establish almost sure uniform asymptotic convergence rates of the proposed estimators as the number of observed curves $n$ grows to infinity. Our rates are non-asymptotic in the number of measurements per path, explicitly reflecting how different sampling frequency might affect the speed of convergence. Our framework suggests possible further fruitful interactions between FDA and SDE methods in problems with replication.

Let $X^{(n)}$ be an observation sampled from a distribution $P_{\theta}^{(n)}$ with an unknown parameter $\theta,$ $\theta$ being a vector in a Banach space $E$ (most often, a high-dimensional space of dimension $d$). We study the problem of estimation of $f(\theta)$ for a functional $f:E\mapsto {\mathbb R}$ of some smoothness $s>0$ based on an observation $X^{(n)}\sim P_{\theta}^{(n)}.$ Assuming that there exists an estimator $\hat \theta_n=\hat \theta_n(X^{(n)})$ of parameter $\theta$ such that $\sqrt{n}(\hat \theta_n-\theta)$ is sufficiently close in distribution to a mean zero Gaussian random vector in $E,$ we construct a functional $g:E\mapsto {\mathbb R}$ such that $g(\hat \theta_n)$ is an asymptotically normal estimator of $f(\theta)$ with $\sqrt{n}$ rate provided that $s>\frac{1}{1-\alpha}$ and $d\leq n^{\alpha}$ for some $\alpha\in (0,1).$ We also derive general upper bounds on Orlicz norm error rates for estimator $g(\hat \theta)$ depending on smoothness $s,$ dimension $d,$ sample size $n$ and the accuracy of normal approximation of $\sqrt{n}(\hat \theta_n-\theta).$ In particular, this approach yields asymptotically efficient estimators in some high-dimensional exponential models.

For a given nonnegative matrix $A=(A_{ij})$, the matrix scaling problem asks whether $A$ can be scaled to a doubly stochastic matrix $XAY$ for some positive diagonal matrices $X,Y$. The Sinkhorn algorithm is a simple iterative algorithm, which repeats row-normalization $A_{ij} \leftarrow A_{ij}/\sum_{j}A_{ij}$ and column-normalization $A_{ij} \leftarrow A_{ij}/\sum_{i}A_{ij}$ alternatively. By this algorithm, $A$ converges to a doubly stochastic matrix in limit if and only if the bipartite graph associated with $A$ has a perfect matching. This property can decide the existence of a perfect matching in a given bipartite graph $G$, which is identified with the $0,1$-matrix $A_G$. Linial, Samorodnitsky, and Wigderson showed that a polynomial number of the Sinkhorn iterations for $A_G$ decides whether $G$ has a perfect matching. In this paper, we show an extension of this result: If $G$ has no perfect matching, then a polynomial number of the Sinkhorn iterations identifies a Hall blocker -- a certificate of the nonexistence of a perfect matching. Our analysis is based on an interpretation of the Sinkhorn algorithm as alternating KL-divergence minimization (Csisz\'{a}r and Tusn\'{a}dy 1984, Gietl and Reffel 2013) and its limiting behavior for a nonscalable matrix (Aas 2014). We also relate the Sinkhorn limit with parametric network flow, principal partition of polymatroids, and the Dulmage-Mendelsohn decomposition of a bipartite graph.

We present a new sublinear time algorithm for approximating the spectral density (eigenvalue distribution) of an $n\times n$ normalized graph adjacency or Laplacian matrix. The algorithm recovers the spectrum up to $\epsilon$ accuracy in the Wasserstein-1 distance in $O(n\cdot \text{poly}(1/\epsilon))$ time given sample access to the graph. This result compliments recent work by David Cohen-Steiner, Weihao Kong, Christian Sohler, and Gregory Valiant (2018), which obtains a solution with runtime independent of $n$, but exponential in $1/\epsilon$. We conjecture that the trade-off between dimension dependence and accuracy is inherent. Our method is simple and works well experimentally. It is based on a Chebyshev polynomial moment matching method that employees randomized estimators for the matrix trace. We prove that, for any Hermitian $A$, this moment matching method returns an $\epsilon$ approximation to the spectral density using just $O({1}/{\epsilon})$ matrix-vector products with $A$. By leveraging stability properties of the Chebyshev polynomial three-term recurrence, we then prove that the method is amenable to the use of coarse approximate matrix-vector products. Our sublinear time algorithm follows from combining this result with a novel sampling algorithm for approximating matrix-vector products with a normalized graph adjacency matrix. Of independent interest, we show a similar result for the widely used \emph{kernel polynomial method} (KPM), proving that this practical algorithm nearly matches the theoretical guarantees of our moment matching method. Our analysis uses tools from Jackson's seminal work on approximation with positive polynomial kernels.

北京阿比特科技有限公司