In 2013, Marcus, Spielman, and Srivastava resolved the famous Kadison-Singer conjecture. It states that for $n$ independent random vectors $v_1,\cdots, v_n$ that have expected squared norm bounded by $\epsilon$ and are in the isotropic position in expectation, there is a positive probability that the determinant polynomial $\det(xI - \sum_{i=1}^n v_iv_i^\top)$ has roots bounded by $(1 + \sqrt{\epsilon})^2$. An interpretation of the Kadison-Singer theorem is that we can always find a partition of the vectors $v_1,\cdots,v_n$ into two sets with a low discrepancy in terms of the spectral norm (in other words, rely on the determinant polynomial). In this paper, we provide two results for a broader class of polynomials, the hyperbolic polynomials. Furthermore, our results are in two generalized settings: $\bullet$ The first one shows that the Kadison-Singer result requires a weaker assumption that the vectors have a bounded sum of hyperbolic norms. $\bullet$ The second one relaxes the Kadison-Singer result's distribution assumption to the Strongly Rayleigh distribution. To the best of our knowledge, the previous results only support determinant polynomials [Anari and Oveis Gharan'14, Kyng, Luh and Song'20]. It is unclear whether they can be generalized to a broader class of polynomials. In addition, we also provide a sub-exponential time algorithm for constructing our results.
In this study, we examine numerical approximations for 2nd-order linear-nonlinear differential equations with diverse boundary conditions, followed by the residual corrections of the first approximations. We first obtain numerical results using the Galerkin weighted residual approach with Bernstein polynomials. The generation of residuals is brought on by the fact that our first approximation is computed using numerical methods. To minimize these residuals, we use the compact finite difference scheme of 4th-order convergence to solve the error differential equations in accordance with the error boundary conditions. We also introduce the formulation of the compact finite difference method of fourth-order convergence for the nonlinear BVPs. The improved approximations are produced by adding the error values derived from the approximations of the error differential equation to the weighted residual values. Numerical results are compared to the exact solutions and to the solutions available in the published literature to validate the proposed scheme, and high accuracy is achieved in all cases
The Information Bottleneck (IB) is a method of lossy compression. Its rate-distortion (RD) curve describes the fundamental tradeoff between input compression and the preservation of relevant information. However, it conceals the underlying dynamics of optimal input encodings. We argue that these typically follow a piecewise smooth trajectory as the input information is being compressed, as recently shown in RD. These smooth dynamics are interrupted when an optimal encoding changes qualitatively, at a bifurcation. By leveraging the IB's intimate relations with RD, sub-optimal solutions can be seen to collide or exchange optimality there. Despite the acceptance of the IB and its applications, there are surprisingly few techniques to solve it numerically, even for finite problems whose distribution is known. We derive anew the IB's first-order Ordinary Differential Equation, which describes the dynamics underlying its optimal tradeoff curve. To exploit these dynamics, one needs not only to detect IB bifurcations but also to identify their type in order to handle them accordingly. Rather than approaching the optimal IB curve from sub-optimal directions, the latter allows us to follow a solution's trajectory along the optimal curve, under mild assumptions. Thereby, translating an understanding of IB bifurcations into a surprisingly accurate numerical algorithm.
The non-Euclidean geometry of hyperbolic spaces has recently garnered considerable attention in the realm of representation learning. Current endeavors in hyperbolic representation largely presuppose that the underlying hierarchies can be automatically inferred and preserved through the adaptive optimization process. This assumption, however, is questionable and requires further validation. In this work, we first introduce a position-tracking mechanism to scrutinize existing prevalent \hlms, revealing that the learned representations are sub-optimal and unsatisfactory. To address this, we propose a simple yet effective method, hyperbolic informed embedding (HIE), by incorporating cost-free hierarchical information deduced from the hyperbolic distance of the node to origin (i.e., induced hyperbolic norm) to advance existing \hlms. The proposed method HIE is both task-agnostic and model-agnostic, enabling its seamless integration with a broad spectrum of models and tasks. Extensive experiments across various models and different tasks demonstrate the versatility and adaptability of the proposed method. Remarkably, our method achieves a remarkable improvement of up to 21.4\% compared to the competing baselines.
We show that corner polyhedra and 3-connected Schnyder labelings join the growing list of planar structures that can be set in exact correspondence with (weighted) models of quadrant walks via a bijection due to Kenyon, Miller, Sheffield and Wilson. Our approach leads to a first polynomial time algorithm to count these structures, and to the determination of their exact asymptotic growth constants: the number $p_n$ of corner polyhedra and $s_n$ of 3-connected Schnyder labelings of size $n$ respectively satisfy $(p_n)^{1/n}\to 9/2$ and $(s_n)^{1/n}\to 16/3$ as $n$ goes to infinity. While the growth rates are rational, like in the case of previously known instances of such correspondences, the exponent of the asymptotic polynomial correction to the exponential growth does not appear to follow from the now standard Denisov-Wachtel approach, due to a bimodal behavior of the step set of the underlying tandem walk. However a heuristic argument suggests that these exponents are $-1-\pi/\arccos(9/16)\approx -4.23$ for $p_n$ and $-1-\pi/\arccos(22/27)\approx -6.08$ for $s_n$, which would imply that the associated series are not D-finite.
Position based dynamics is a powerful technique for simulating a variety of materials. Its primary strength is its robustness when run with limited computational budget. We develop a novel approach to address problems with PBD for quasistatic hyperelastic materials. Even though PBD is based on the projection of static constraints, PBD is best suited for dynamic simulations. This is particularly relevant since the efficient creation of large data sets of plausible, but not necessarily accurate elastic equilibria is of increasing importance with the emergence of quasistatic neural networks. Furthermore, PBD projects one constraint at a time. We show that ignoring the effects of neighboring constraints limits its convergence and stability properties. Recent works have shown that PBD can be related to the Gauss-Seidel approximation of a Lagrange multiplier formulation of backward Euler time stepping, where each constraint is solved/projected independently of the others in an iterative fashion. We show that a position-based, rather than constraint-based nonlinear Gauss-Seidel approach solves these problems. Our approach retains the essential PBD feature of stable behavior with constrained computational budgets, but also allows for convergent behavior with expanded budgets. We demonstrate the efficacy of our method on a variety of representative hyperelastic problems and show that both successive over relaxation (SOR) and Chebyshev acceleration can be easily applied.
There are well-established methods for identifying the causal effect of a time-varying treatment applied at discrete time points. However, in the real world, many treatments are continuous or have a finer time scale than the one used for measurement or analysis. While researchers have investigated the discrepancies between estimates under varying discretization scales using simulations and empirical data, it is still unclear how the choice of discretization scale affects causal inference. To address this gap, we present a framework to understand how discretization scales impact the properties of causal inferences about the effect of a time-varying treatment. We introduce the concept of "identification bias", which is the difference between the causal estimand for a continuous-time treatment and the purported estimand of a discretized version of the treatment. We show that this bias can persist even with an infinite number of longitudinal treatment-outcome trajectories. We specifically examine the identification problem in a class of linear stochastic continuous-time data-generating processes and demonstrate the identification bias of the g-formula in this context. Our findings indicate that discretization bias can significantly impact empirical analysis, especially when there are limited repeated measurements. Therefore, we recommend that researchers carefully consider the choice of discretization scale and perform sensitivity analysis to address this bias. We also propose a simple and heuristic quantitative measure for sensitivity concerning discretization and suggest that researchers report this measure along with point and interval estimates in their work. By doing so, researchers can better understand and address the potential impact of discretization bias on causal inference.
In this paper, we propose and analyze the least squares finite element methods for the linear elasticity interface problem in the stress-displacement system on unfitted meshes. We consider the cases that the interface is $C^2$ or polygonal, and the exact solution $(\sigma,u)$ belongs to $H^s(div; \Omega_0 \cup \Omega_1) \times $H^{1+s}(\Omega_0 \cup \Omega_1)$ with $s > 1/2$. Two types of least squares functionals are defined to seek the numerical solution. The first is defined by simply applying the $L^2$ norm least squares principle, and requires the condition $s \geq 1$. The second is defined with a discrete minus norm, which is related to the inner product in $H^{-1/2}(\Gamma)$. The use of this discrete minus norm results in a method of optimal convergence rates and allows the exact solution has the regularity of any $s > 1/2$. The stability near the interface for both methods is guaranteed by the ghost penalty bilinear forms and we can derive the robust condition number estimates. The convergence rates under $L^2$ norm and the energy norm are derived for both methods. We illustrate the accuracy and the robustness of the proposed methods by a series of numerical experiments for test problems in two and three dimensions.
AI explanations are often mentioned as a way to improve human-AI decision-making, but empirical studies have not found consistent evidence of explanations' effectiveness and, on the contrary, suggest that they can increase overreliance when the AI system is wrong. While many factors may affect reliance on AI support, one important factor is how decision-makers reconcile their own intuition -- beliefs or heuristics, based on prior knowledge, experience, or pattern recognition, used to make judgments -- with the information provided by the AI system to determine when to override AI predictions. We conduct a think-aloud, mixed-methods study with two explanation types (feature- and example-based) for two prediction tasks to explore how decision-makers' intuition affects their use of AI predictions and explanations, and ultimately their choice of when to rely on AI. Our results identify three types of intuition involved in reasoning about AI predictions and explanations: intuition about the task outcome, features, and AI limitations. Building on these, we summarize three observed pathways for decision-makers to apply their own intuition and override AI predictions. We use these pathways to explain why (1) the feature-based explanations we used did not improve participants' decision outcomes and increased their overreliance on AI, and (2) the example-based explanations we used improved decision-makers' performance over feature-based explanations and helped achieve complementary human-AI performance. Overall, our work identifies directions for further development of AI decision-support systems and explanation methods that help decision-makers effectively apply their intuition to achieve appropriate reliance on AI.
Neural network compression has been an increasingly important subject, due to its practical implications in terms of reducing the computational requirements and its theoretical implications, as there is an explicit connection between compressibility and the generalization error. Recent studies have shown that the choice of the hyperparameters of stochastic gradient descent (SGD) can have an effect on the compressibility of the learned parameter vector. Even though these results have shed some light on the role of the training dynamics over compressibility, they relied on unverifiable assumptions and the resulting theory does not provide a practical guideline due to its implicitness. In this study, we propose a simple modification for SGD, such that the outputs of the algorithm will be provably compressible without making any nontrivial assumptions. We consider a one-hidden-layer neural network trained with SGD and we inject additive heavy-tailed noise to the iterates at each iteration. We then show that, for any compression rate, there exists a level of overparametrization (i.e., the number of hidden units), such that the output of the algorithm will be compressible with high probability. To achieve this result, we make two main technical contributions: (i) we build on a recent study on stochastic analysis and prove a 'propagation of chaos' result with improved rates for a class of heavy-tailed stochastic differential equations, and (ii) we derive strong-error estimates for their Euler discretization. We finally illustrate our approach on experiments, where the results suggest that the proposed approach achieves compressibility with a slight compromise from the training and test error.
An $(n,m)$-graph is a graph with $n$ types of arcs and $m$ types of edges. A homomorphism of an $(n,m)$-graph $G$ to another $(n,m)$-graph $H$ is a vertex mapping that preserves the adjacencies along with their types and directions. The order of a smallest (with respect to the number of vertices) such $H$ is the $(n,m)$-chromatic number of $G$.Moreover, an $(n,m)$-relative clique $R$ of an $(n,m)$-graph $G$ is a vertex subset of $G$ for which no two distinct vertices of $R$ get identified under any homomorphism of $G$. The $(n,m)$-relative clique number of $G$, denoted by $\omega_{r(n,m)}(G)$, is the maximum $|R|$ such that $R$ is an $(n,m)$-relative clique of $G$. In practice, $(n,m)$-relative cliques are often used for establishing lower bounds of $(n,m)$-chromatic number of graph families. Generalizing an open problem posed by Sopena [Discrete Mathematics 2016] in his latest survey on oriented coloring, Chakroborty, Das, Nandi, Roy and Sen [Discrete Applied Mathematics 2022] conjectured that $\omega_{r(n,m)}(G) \leq 2 (2n+m)^2 + 2$ for any triangle-free planar $(n,m)$-graph $G$ and that this bound is tight for all $(n,m) \neq (0,1)$.In this article, we positively settle this conjecture by improving the previous upper bound of $\omega_{r(n,m)}(G) \leq 14 (2n+m)^2 + 2$ to $\omega_{r(n,m)}(G) \leq 2 (2n+m)^2 + 2$, and by finding examples of triangle-free planar graphs that achieve this bound. As a consequence of the tightness proof, we also establish a new lower bound of $2 (2n+m)^2 + 2$ for the $(n,m)$-chromatic number for the family of triangle-free planar graphs.