亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this paper, we investigate the problem of deciding whether two random databases $\mathsf{X}\in\mathcal{X}^{n\times d}$ and $\mathsf{Y}\in\mathcal{Y}^{n\times d}$ are statistically dependent or not. This is formulated as a hypothesis testing problem, where under the null hypothesis, these two databases are statistically independent, while under the alternative, there exists an unknown row permutation $\sigma$, such that $\mathsf{X}$ and $\mathsf{Y}^\sigma$, a permuted version of $\mathsf{Y}$, are statistically dependent with some known joint distribution, but have the same marginal distributions as the null. We characterize the thresholds at which optimal testing is information-theoretically impossible and possible, as a function of $n$, $d$, and some spectral properties of the generative distributions of the datasets. For example, we prove that if a certain function of the eigenvalues of the likelihood function and $d$, is below a certain threshold, as $d\to\infty$, then weak detection (performing slightly better than random guessing) is statistically impossible, no matter what the value of $n$ is. This mimics the performance of an efficient test that thresholds a centered version of the log-likelihood function of the observed matrices. We also analyze the case where $d$ is fixed, for which we derive strong (vanishing error) and weak detection lower and upper bounds.

相關內容

Say we have a collection of independent random variables $X_0, ... , X_n$, where $X_0 \sim \mathcal{N}(\mu_0, \sigma_0^2)$, but $X_i \sim \mathcal{N}(\mu, \sigma^2)$, for $1 \leq i \leq n$. We characterize the distribution of $R_0 := 1 + \sum_{i=1}^{n} \mathbf{1}\{X_i \leq X_0\}$, the rank of the random variable whose distribution potentially differs from that of the others -- the odd normal out. We show that $R_0 - 1$ is approximately beta-binomial, an approximation that becomes equality as $\sigma/\sigma_0$ or $(\mu-\mu_0)/\sigma_0$ become large or small. The intra-class correlation of the approximating beta-binomial depends on $\Pr(X_1 \leq X_0)$ and $\Pr(X_1 \leq X_0, X_2 \leq X_0)$. Our approach relies on the conjugacy of the beta distribution for the binomial: $\Phi((X_0-\mu)/\sigma)$ is approximately $\mathrm{Beta}(\alpha(\sigma/\sigma_0, (\mu-\mu_0)/\sigma_0), \beta(\sigma/\sigma_0, (\mu-\mu_0)/\sigma_0))$ for functions $\alpha, \beta > 0$. We study the distributions of the in-normal ranks. Throughout, simulations corroborate the formulae we derive.

In this paper, we propose an orthogonal block wise Kaczmarz (POBK) algorithm based on preprocessing techniques to solve large-scale sparse linear systems $Ax=f$. Firstly, the Reverse Cuthill McKee Algorithm (RCM) algorithm is used to preprocess the linear system, and then a new partitioning strategy is proposed to divide orthogonal blocks into one category, in order to accelerate the convergence rate of the Kaczmarz algorithm. The convergence of the POBK algorithm has been theoretically proven, and a theoretical analysis of its faster convergence is also provided. In addition, the experimental results confirm that this algorithm is far superior to GRBK, RBK(k), and GREBK(k) algorithms in both iteration steps (IT) and CPU time aspects.

In this paper, we address the challenge of obtaining a comprehensive and symmetric representation of point particle groups, such as atoms in a molecule, which is crucial in physics and theoretical chemistry. The problem has become even more important with the widespread adoption of machine-learning techniques in science, as it underpins the capacity of models to accurately reproduce physical relationships while being consistent with fundamental symmetries and conservation laws. However, some of the descriptors that are commonly used to represent point clouds -- most notably those based on discretized correlations of the neighbor density, that underpin most of the existing ML models of matter at the atomic scale -- are unable to distinguish between special arrangements of particles in three dimensions. This makes it impossible to machine learn their properties. Atom-density correlations are provably complete in the limit in which they simultaneously describe the mutual relationship between all atoms, which is impractical. We present a novel approach to construct descriptors of \emph{finite} correlations based on the relative arrangement of particle triplets, which can be employed to create symmetry-adapted models with universal approximation capabilities, which have the resolution of the neighbor discretization as the sole convergence parameter. Our strategy is demonstrated on a class of atomic arrangements that are specifically built to defy a broad class of conventional symmetric descriptors, showcasing its potential for addressing their limitations.

We initiate a study of a new model of property testing that is a hybrid of testing properties of distributions and testing properties of strings. Specifically, the new model refers to testing properties of distributions, but these are distributions over huge objects (i.e., very long strings). Accordingly, the model accounts for the total number of local probes into these objects (resp., queries to the strings) as well as for the distance between objects (resp., strings), and the distance between distributions is defined as the earth mover's distance with respect to the relative Hamming distance between strings. We study the query complexity of testing in this new model, focusing on three directions. First, we try to relate the query complexity of testing properties in the new model to the sample complexity of testing these properties in the standard distribution testing model. Second, we consider the complexity of testing properties that arise naturally in the new model (e.g., distributions that capture random variations of fixed strings). Third, we consider the complexity of testing properties that were extensively studied in the standard distribution testing model: Two such cases are uniform distributions and pairs of identical distributions.

There is a recent interest on first-order methods for linear programming (LP). In this paper,we propose a stochastic algorithm using variance reduction and restarts for solving sharp primal-dual problems such as LP. We show that the proposed stochastic method exhibits a linear convergence rate for solving sharp instances with a high probability. In addition, we propose an efficient coordinate-based stochastic oracle for unconstrained bilinear problems, which has $\mathcal O(1)$ per iteration cost and improves the complexity of the existing deterministic and stochastic algorithms. Finally, we show that the obtained linear convergence rate is nearly optimal (upto $\log$ terms) for a wide class of stochastic primal dual methods.

Epistemic modals have peculiar logical features that are challenging to account for in a broadly classical framework. For instance, while a sentence of the form $p\wedge\Diamond\neg p$ ('$p$, but it might be that not $p$') appears to be a contradiction, $\Diamond\neg p$ does not entail $\neg p$, which would follow in classical logic. Likewise, the classical laws of distributivity and disjunctive syllogism fail for epistemic modals. Existing attempts to account for these facts generally either under- or over-correct. Some predict that $p\wedge\Diamond\neg p$, a so-called epistemic contradiction, is a contradiction only in an etiolated sense, under a notion of entailment that does not always allow us to replace $p\wedge\Diamond\neg p$ with a contradiction; these theories underpredict the infelicity of embedded epistemic contradictions. Other theories savage classical logic, eliminating not just rules that intuitively fail but also rules like non-contradiction, excluded middle, De Morgan's laws, and disjunction introduction, which intuitively remain valid for epistemic modals. In this paper, we aim for a middle ground, developing a semantics and logic for epistemic modals that makes epistemic contradictions genuine contradictions and that invalidates distributivity and disjunctive syllogism but that otherwise preserves classical laws that intuitively remain valid. We start with an algebraic semantics, based on ortholattices instead of Boolean algebras, and then propose a more concrete possibility semantics, based on partial possibilities related by compatibility. Both semantics yield the same consequence relation, which we axiomatize. We then show how to lift an arbitrary possible worlds model for a non-modal language to a possibility model for a language with epistemic modals.

We study the following combinatorial problem. Given a set of $n$ y-monotone curves, which we call wires, a tangle determines the order of the wires on a number of horizontal layers such that any two consecutive layers differ only in swaps of neighboring wires. Given a multiset $L$ of swaps (that is, unordered pairs of wires) and an initial order of the wires, a tangle realizes $L$ if each pair of wires changes its order exactly as many times as specified by $L$. Deciding whether a given multiset of swaps admits a realizing tangle is known to be NP-hard [Yamanaka et al., CCCG 2018]. We prove that this problem remains NP-hard if every pair of wires swaps only a constant number of times. On the positive side, we improve the runtime of a previous exponential-time algorithm. We also show that the problem is in NP and fixed-parameter tractable with respect to the number of wires.

We consider the problem of discovering $K$ related Gaussian directed acyclic graphs (DAGs), where the involved graph structures share a consistent causal order and sparse unions of supports. Under the multi-task learning setting, we propose a $l_1/l_2$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models. We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order (or topological order) than separate estimations. Moreover, the joint estimator is able to recover non-identifiable DAGs, by estimating them together with some identifiable DAGs. Lastly, our analysis also shows the consistency of union support recovery of the structures. To allow practical implementation, we design a continuous optimization problem whose optimizer is the same as the joint estimator and can be approximated efficiently by an iterative algorithm. We validate the theoretical analysis and the effectiveness of the joint estimator in experiments.

Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.

We describe the new field of mathematical analysis of deep learning. This field emerged around a list of research questions that were not answered within the classical framework of learning theory. These questions concern: the outstanding generalization power of overparametrized neural networks, the role of depth in deep architectures, the apparent absence of the curse of dimensionality, the surprisingly successful optimization performance despite the non-convexity of the problem, understanding what features are learned, why deep architectures perform exceptionally well in physical problems, and which fine aspects of an architecture affect the behavior of a learning task in which way. We present an overview of modern approaches that yield partial answers to these questions. For selected approaches, we describe the main ideas in more detail.

北京阿比特科技有限公司