99热日韩这里只有国产中文精品_久久人妻中出按摩_国内精品少妇高潮在线看_好男人视频社区视频在线播放_无遮挡H纯内动漫在线观看_在线看美女第一区_天美成人久久精品二区三区

Distinguishing between cause and effect using time series observational data is a major challenge in many scientific fields. A new perspective has been provided based on the principle of Independence of Causal Mechanisms (ICM), leading to the Spectral Independence Criterion (SIC), postulating that the power spectral density (PSD) of the cause time series is uncorrelated with the squared modulus of the frequency response of the filter generating the effect. Since SIC rests on methods and assumptions in stark contrast with most causal discovery methods for time series, it raises questions regarding what theoretical grounds justify its use. In this paper, we provide answers covering several key aspects. After providing an information theoretic interpretation of SIC, we present an identifiability result that sheds light on the context for which this approach is expected to perform well. We further demonstrate the robustness of SIC to downsampling - an obstacle that can spoil Granger-based inference. Finally, an invariance perspective allows to explore the limitations of the spectral independence assumption and how to generalize it. Overall, these results support the postulate of Spectral Independence is a well grounded leading principle for causal inference based on empirical time series.

相關內容

相(xiang)互獨立(li)的

關注 1

MoDELS · Performer · 推斷 · 潛變量/隱變量 · 可辨認的 ·

2021 年 12 月 30 日

Two-directional simultaneous inference for high-dimensional models

Wei Liu,Huazhen Lin,Jin Liu,Shurong Zheng

This paper proposes a general two directional simultaneous inference (TOSI) framework for high-dimensional models with a manifest variable or latent variable structure, for example, high-dimensional mean models, high-dimensional sparse regression models, and high-dimensional latent factors models. TOSI performs simultaneous inference on a set of parameters from two directions, one to test whether the assumed zero parameters indeed are zeros and one to test whether exist zeros in the parameter set of nonzeros. As a result, we can exactly identify whether the parameters are zeros, thereby keeping the data structure fully and parsimoniously expressed. We theoretically prove that the proposed TOSI method asymptotically controls the Type I error at the prespecified significance level and that the testing power converges to one. Simulations are conducted to examine the performance of the proposed method in finite sample situations and two real datasets are analyzed. The results show that the TOSI method is more predictive and has more interpretable estimators than existing methods.

優化器 · 控制器 · Networking · 二次規劃 · CASES ·

2021 年 12 月 30 日

Parallel Network Flow Allocation in Repeated Routing Games via LQR Optimal Control

Marsalis Gibson,Yiling You,Alexandre Bayen

from arxiv, 23 pages, 9 figures, TRB submission

In this article, we study the repeated routing game problem on a parallel network with affine latency functions on each edge. We cast the game setup in a LQR control theoretic framework, leveraging the Rosenthal potential formulation. We use control techniques to analyze the convergence of the game dynamics with specific cases that lend themselves to optimal control. We design proper dynamics parameters so that the conservation of flow is guaranteed. We provide an algorithmic solution for the general optimal control setup using a multiparametric quadratic programming approach (explicit MPC). Finally we illustrate with numerics the impact of varying system parameters on the solutions.

INFORMS · 估計/估計量 · 準則 · 穩健性 · MoDELS ·

2021 年 12 月 29 日

Doubly Robust Criterion for Causal Inference

Takamichi Baba,Yoshiyuki Ninomiya

from arxiv, 28 pages

The semiparametric estimation approach, which includes inverse-probability-weighted and doubly robust estimation using propensity scores, is a standard tool for marginal structural models used in causal inference, and it is rapidly being extended in various directions. On the other hand, although model selection is indispensable in statistical analysis, an information criterion for selecting an appropriate marginal structure has just started to be developed. In this paper, we derive an Akaike information type of criterion on the basis of the original definition of the information criterion. Here, we define a risk function based on the Kullback-Leibler divergence as the cornerstone of the information criterion and treat a general causal inference model that is not necessarily a linear one. The causal effects to be estimated are those in the general population, such as the average treatment effect on the treated or the average treatment effect on the untreated. In light of the fact that this field attaches importance to doubly robust estimation, which allows either the model of the assignment variable or the model of the outcome variable to be wrong, we make the information criterion itself doubly robust so that either one can be wrong and it will still be a mathematically valid criterion. In simulation studies, we compare the derived criterion with an existing criterion obtained from a formal argument and confirm that the former outperforms the latter. Specifically, we check that the divergence between the estimated structure from the derived criterion and the true structure is clearly small in all simulation settings and that the probability of selecting the true or nearly true model is clearly higher. Real data analyses confirm that the results of variable selection using the two criteria differ significantly.

優化器 · Extensibility · 可行 · 潛在 · Performer ·

2021 年 12 月 28 日

Task-Oriented Convex Bilevel Optimization with Latent Feasibility

Risheng Liu,Long Ma,Xiaoming Yuan,Shangzhi Zeng,Jin Zhang

This paper firstly proposes a convex bilevel optimization paradigm to formulate and optimize popular learning and vision problems in real-world scenarios. Different from conventional approaches, which directly design their iteration schemes based on given problem formulation, we introduce a task-oriented energy as our latent constraint which integrates richer task information. By explicitly re-characterizing the feasibility, we establish an efficient and flexible algorithmic framework to tackle convex models with both shrunken solution space and powerful auxiliary (based on domain knowledge and data distribution of the task). In theory, we present the convergence analysis of our latent feasibility re-characterization based numerical strategy. We also analyze the stability of the theoretical convergence under computational error perturbation. Extensive numerical experiments are conducted to verify our theoretical findings and evaluate the practical performance of our method on different applications.

MoDELS · INFORMS · AIM · 粵港澳大灣區數字經濟研究院 · 泛函 ·

2020 年 12 月 10 日

Equivalent Causal Models

Sander Beckers

from arxiv, Preprint of accepted AAAI2021 paper

The aim of this paper is to offer the first systematic exploration and definition of equivalent causal models in the context where both models are not made up of the same variables. The idea is that two models are equivalent when they agree on all "essential" causal information that can be expressed using their common variables. I do so by focussing on the two main features of causal models, namely their structural relations and their functional relations. In particular, I define several relations of causal ancestry and several relations of causal sufficiency, and require that the most general of these relations are preserved across equivalent models.

泛化理論 · 圖形處理器 · Boosting（一種模型訓練加速方式） · Neural Networks · 優化器 ·

2020 年 6 月 15 日

Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks

Kenta Oono,Taiji Suzuki

from arxiv, 9 pages, Reference 5 pages, Supplemental material 18 pages

It is known that the current graph neural networks (GNNs) are difficult to make themselves deep due to the problem known as \textit{over-smoothing}. Multi-scale GNNs are a promising approach for mitigating the over-smoothing problem. However, there is little explanation of why it works empirically from the viewpoint of learning theory. In this study, we derive the optimization and generalization guarantees of transductive learning algorithms that include multi-scale GNNs. Using the boosting theory, we prove the convergence of the training error under weak learning-type conditions. By combining it with generalization gap bounds in terms of transductive Rademacher complexity, we show that a test error bound of a specific type of multi-scale GNNs that decreases corresponding to the depth under the conditions. Our results offer theoretical explanations for the effectiveness of the multi-scale structure against the over-smoothing problem. We apply boosting algorithms to the training of multi-scale GNNs for real-world node prediction tasks. We confirm that its performance is comparable to existing GNNs, and the practical behaviors are consistent with theoretical observations. Code is available at //github.com/delta2323/GB-GNN

Neural Networks · 優化器 · Networks · 局部極小 · Networking ·

2019 年 12 月 19 日

Optimization for deep learning: theory and algorithms

Ruoyu Sun

from arxiv, 38 pages of main body; 5 pages of appendix; 12 pages of references

When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.

潛在狄利克雷分配 · LDA · 話題模型 · 稀疏 · MoDELS ·

2018 年 4 月 23 日

Polya Urn Latent Dirichlet Allocation: a doubly sparse massively parallel sampler

Alexander Terenin,M?ns Magnusson,Leif Jonsson,David Draper

Latent Dirichlet Allocation (LDA) is a topic model widely used in natural language processing and machine learning. Most approaches to training the model rely on iterative algorithms, which makes it difficult to run LDA on big corpora that are best analyzed in parallel and distributed computational environments. Indeed, current approaches to parallel inference either don't converge to the correct posterior or require storage of large dense matrices in memory. We present a novel sampler that overcomes both problems, and we show that this sampler is faster, both empirically and theoretically, than previous Gibbs samplers for LDA. We do so by employing a novel P\'olya-urn-based approximation in the sparse partially collapsed sampler for LDA. We prove that the approximation error vanishes with data size, making our algorithm asymptotically exact, a property of importance for large-scale topic models. In addition, we show, via an explicit example, that -- contrary to popular belief in the topic modeling literature -- partially collapsed samplers can be more efficient than fully collapsed samplers. We conclude by comparing the performance of our algorithm with that of other approaches on well-known corpora.

離散化 · 馬爾可夫鏈蒙特卡羅 · 潛在 · 可交換的 · 話題模型 ·

2018 年 1 月 15 日

Latent nested nonparametric priors

Federico Camerlenghi,David B. Dunson,Antonio Lijoi,Igor Prünster,Abel Rodríguez

Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.

估計/估計量 · 穩健性 · 容差 · 樣本復雜度 · TCS ·

2017 年 12 月 14 日

Being Robust (in High Dimensions) Can Be Practical

Ilias Diakonikolas,Gautam Kamath,Daniel M. Kane,Jerry Li,Ankur Moitra,Alistair Stewart

from arxiv, Appeared in ICML 2017

Robust estimation is much more challenging in high dimensions than it is in one dimension: Most techniques either lead to intractable optimization problems or estimators that can tolerate only a tiny fraction of errors. Recent work in theoretical computer science has shown that, in appropriate distributional models, it is possible to robustly estimate the mean and covariance with polynomial time algorithms that can tolerate a constant fraction of corruptions, independent of the dimension. However, the sample and time complexity of these algorithms is prohibitively large for high-dimensional applications. In this work, we address both of these issues by establishing sample complexity bounds that are optimal, up to logarithmic factors, as well as giving various refinements that allow the algorithms to tolerate a much larger fraction of corruptions. Finally, we show on both synthetic and real data that our algorithms have state-of-the-art performance and suddenly make high-dimensional robust estimation a realistic possibility.