亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

In this paper, we investigate the optimal statistical performance and the impact of computational constraints for independent component analysis (ICA). Our goal is twofold. On the one hand, we characterize the precise role of dimensionality on sample complexity and statistical accuracy, and how computational consideration may affect them. In particular, we show that the optimal sample complexity is linear in dimensionality, and interestingly, the commonly used sample kurtosis-based approaches are necessarily suboptimal. However, the optimal sample complexity becomes quadratic, up to a logarithmic factor, in the dimension if we restrict ourselves to estimates that can be computed with low-degree polynomial algorithms. On the other hand, we develop computationally tractable estimates that attain both the optimal sample complexity and minimax optimal rates of convergence. We study the asymptotic properties of the proposed estimates and establish their asymptotic normality that can be readily used for statistical inferences. Our method is fairly easy to implement and numerical experiments are presented to further demonstrate its practical merits.

相關內容

In the prophet inequality problem, a gambler faces a sequence of items arriving online with values drawn independently from known distributions. On seeing an item, the gambler must choose whether to accept its value as her reward and quit the game, or reject it and continue. The gambler's aim is to maximize her expected reward relative to the expected maximum of the values of all items. Since the seventies, a tight bound of 1/2 has been known for this competitive ratio in the setting where the items arrive in an adversarial order (Krengel and Sucheston, 1977, 1978). However, the optimum ratio still remains unknown in the order selection setting, where the gambler selects the arrival order, as well as in prophet secretary, where the items arrive in a random order. Moreover, it is not even known whether a separation exists between the two settings. In this paper, we show that the power of order selection allows the gambler to guarantee a strictly better competitive ratio than if the items arrive randomly. For the order selection setting, we identify an instance for which Peng and Tang's (FOCS'22) state-of-the-art algorithm performs no better than their claimed competitive ratio of (approximately) 0.7251, thus illustrating the need for an improved approach. We therefore extend their design and provide a more general algorithm design framework, using which we show that their ratio can be beaten, by designing a 0.7258-competitive algorithm. For the random order setting, we improve upon Correa, Saona and Ziliotto's (SODA'19) 0.732-hardness result to show a hardness of 0.7254 for general algorithms - even in the setting where the gambler knows the arrival order beforehand, thus establishing a separation between the order selection and random order settings.

Bayesian modeling provides a principled approach to quantifying uncertainty in model parameters and model structure and has seen a surge of applications in recent years. Within the context of a Bayesian workflow, we are concerned with model selection for the purpose of finding models that best explain the data, that is, help us understand the underlying data generating process. Since we rarely have access to the true process, all we are left with during real-world analyses is incomplete causal knowledge from sources outside of the current data and model predictions of said data. This leads to the important question of when the use of prediction as a proxy for explanation for the purpose of model selection is valid. We approach this question by means of large-scale simulations of Bayesian generalized linear models where we investigate various causal and statistical misspecifications. Our results indicate that the use of prediction as proxy for explanation is valid and safe only when the models under consideration are sufficiently consistent with the underlying causal structure of the true data generating process.

This paper proposes a new framework to study multi-agent interaction in Markov games: Markov $\alpha$-potential games. Markov potential games are special cases of Markov $\alpha$-potential games, so are two important and practically significant classes of games: Markov congestion games and perturbed Markov team games. In this paper, {$\alpha$-potential} functions for both games are provided and the gap $\alpha$ is characterized with respect to game parameters. Two algorithms -- the projected gradient-ascent algorithm and the sequential maximum improvement smoothed best response dynamics -- are introduced for approximating the stationary Nash equilibrium in Markov $\alpha$-potential games. The Nash-regret for each algorithm is shown to scale sub-linearly in time horizon. Our analysis and numerical experiments demonstrates that simple algorithms are capable of finding approximate equilibrium in Markov $\alpha$-potential games.

The study of approximate matching in the Massively Parallel Computations (MPC) model has recently seen a burst of breakthroughs. Despite this progress, however, we still have a far more limited understanding of maximal matching which is one of the central problems of parallel and distributed computing. All known MPC algorithms for maximal matching either take polylogarithmic time which is considered inefficient, or require a strictly super-linear space of $n^{1+\Omega(1)}$ per machine. In this work, we close this gap by providing a novel analysis of an extremely simple algorithm a variant of which was conjectured to work by Czumaj et al. [STOC'18]. The algorithm edge-samples the graph, randomly partitions the vertices, and finds a random greedy maximal matching within each partition. We show that this algorithm drastically reduces the vertex degrees. This, among some other results, leads to an $O(\log \log \Delta)$ round algorithm for maximal matching with $O(n)$ space (or even mildly sublinear in $n$ using standard techniques). As an immediate corollary, we get a $2$ approximate minimum vertex cover in essentially the same rounds and space. This is the best possible approximation factor under standard assumptions, culminating a long line of research. It also leads to an improved $O(\log\log \Delta)$ round algorithm for $1 + \varepsilon$ approximate matching. All these results can also be implemented in the congested clique model within the same number of rounds.

When dealing with time series data, causal inference methods often employ structural vector autoregressive (SVAR) processes to model time-evolving random systems. In this work, we rephrase recursive SVAR processes with possible latent component processes as a linear Structural Causal Model (SCM) of stochastic processes on a simple causal graph, the \emph{process graph}, that models every process as a single node. Using this reformulation, we generalise Wright's well-known path-rule for linear Gaussian SCMs to the newly introduced process SCMs and we express the auto-covariance sequence of an SVAR process by means of a generalised trek-rule. Employing the Fourier-Transformation, we derive compact expressions for causal effects in the frequency domain that allow us to efficiently visualise the causal interactions in a multivariate SVAR process. Finally, we observe that the process graph can be used to formulate graphical criteria for identifying causal effects and to derive algebraic relations with which these frequency domain causal effects can be recovered from the observed spectral density.

Sharpness-Aware Minimization (SAM) is a recent optimization framework aiming to improve the deep neural network generalization, through obtaining flatter (i.e. less sharp) solutions. As SAM has been numerically successful, recent papers have studied the theoretical aspects of the framework and have shown SAM solutions are indeed flat. However, there has been limited theoretical exploration regarding statistical properties of SAM. In this work, we directly study the statistical performance of SAM, and present a new theoretical explanation of why SAM generalizes well. To this end, we study two statistical problems, neural networks with a hidden layer and kernel regression, and prove under certain conditions, SAM has smaller prediction error over Gradient Descent (GD). Our results concern both convex and non-convex settings, and show that SAM is particularly well-suited for non-convex problems. Additionally, we prove that in our setup, SAM solutions are less sharp as well, showing our results are in agreement with the previous work. Our theoretical findings are validated using numerical experiments on numerous scenarios, including deep neural networks.

Statistical learning and logical reasoning are two major fields of AI expected to be unified for human-like machine intelligence. Most existing work considers how to combine existing logical and statistical systems. However, there is no theory of inference so far explaining how basic approaches to statistical learning and logical reasoning stem from a common principle. Inspired by the fact that much empirical work in neuroscience suggests Bayesian (or probabilistic generative) approaches to brain function including learning and reasoning, we here propose a simple Bayesian model of logical reasoning and statistical learning. The theory is statistically correct as it satisfies Kolmogorov's axioms, is consistent with both Fenstad's representation theorem and maximum likelihood estimation and performs exact Bayesian inference with a linear-time complexity. The theory is logically correct as it is a data-driven generalisation of uncertain reasoning from consistency, possibility, inconsistency and impossibility. The theory is correct in terms of machine learning as its solution to generation and prediction tasks on the MNIST dataset is not only empirically reasonable but also theoretically correct against the K nearest neighbour method. We simply model how data causes symbolic knowledge in terms of its satisfiability in formal logic. Symbolic reasoning emerges as a result of the process of going the causality forwards and backwards. The forward and backward processes correspond to an interpretation and inverse interpretation in formal logic, respectively. The inverse interpretation differentiates our work from the mainstream often referred to as inverse entailment, inverse deduction or inverse resolution. The perspective gives new insights into learning and reasoning towards human-like machine intelligence.

The generalization error of a classifier is related to the complexity of the set of functions among which the classifier is chosen. We study a family of low-complexity classifiers consisting of thresholding a random one-dimensional feature. The feature is obtained by projecting the data on a random line after embedding it into a higher-dimensional space parametrized by monomials of order up to k. More specifically, the extended data is projected n-times and the best classifier among those n, based on its performance on training data, is chosen. We show that this type of classifier is extremely flexible, as it is likely to approximate, to an arbitrary precision, any continuous function on a compact set as well as any boolean function on a compact set that splits the support into measurable subsets. In particular, given full knowledge of the class conditional densities, the error of these low-complexity classifiers would converge to the optimal (Bayes) error as k and n go to infinity. On the other hand, if only a training dataset is given, we show that the classifiers will perfectly classify all the training points as k and n go to infinity. We also bound the generalization error of our random classifiers. In general, our bounds are better than those for any classifier with VC dimension greater than O (ln n) . In particular, our bounds imply that, unless the number of projections n is extremely large, there is a significant advantageous gap between the generalization error of the random projection approach and that of a linear classifier in the extended space. Asymptotically, as the number of samples approaches infinity, the gap persists for any such n. Thus, there is a potentially large gain in generalization properties by selecting parameters at random, rather than optimization.

We introduce an approach which allows inferring causal relationships between variables for which the time evolution is available. Our method builds on the ideas of Granger Causality and Transfer Entropy, but overcomes most of their limitations. Specifically, our approach tests whether the predictability of a putative driven system Y can be improved by incorporating information from a potential driver system X, without making assumptions on the underlying dynamics and without the need to compute probability densities of the dynamic variables. Causality is assessed by a rigorous variational scheme based on the Information Imbalance of distance ranks, a recently developed statistical test capable of inferring the relative information content of different distance measures. This framework makes causality detection possible even for high-dimensional systems where only few of the variables are known or measured. Benchmark tests on coupled dynamical systems demonstrate that our approach outperforms other model-free causality detection methods, successfully handling both unidirectional and bidirectional couplings, and it is capable of detecting the arrow of time when present. We also show that the method can be used to robustly detect causality in electroencephalography data in humans.

Link prediction on knowledge graphs (KGs) is a key research topic. Previous work mainly focused on binary relations, paying less attention to higher-arity relations although they are ubiquitous in real-world KGs. This paper considers link prediction upon n-ary relational facts and proposes a graph-based approach to this task. The key to our approach is to represent the n-ary structure of a fact as a small heterogeneous graph, and model this graph with edge-biased fully-connected attention. The fully-connected attention captures universal inter-vertex interactions, while with edge-aware attentive biases to particularly encode the graph structure and its heterogeneity. In this fashion, our approach fully models global and local dependencies in each n-ary fact, and hence can more effectively capture associations therein. Extensive evaluation verifies the effectiveness and superiority of our approach. It performs substantially and consistently better than current state-of-the-art across a variety of n-ary relational benchmarks. Our code is publicly available.

北京阿比特科技有限公司