Faithfully summarizing the knowledge encoded by a deep neural network (DNN) into a few symbolic primitive patterns without losing much information represents a core challenge in explainable AI. To this end, Ren et al. (2024) have derived a series of theorems to prove that the inference score of a DNN can be explained as a small set of interactions between input variables. However, the lack of generalization power makes it still hard to consider such interactions as faithful primitive patterns encoded by the DNN. Therefore, given different DNNs trained for the same task, we develop a new method to extract interactions that are shared by these DNNs. Experiments show that the extracted interactions can better reflect common knowledge shared by different DNNs.
We propose a new second-order accurate lattice Boltzmann formulation for linear elastodynamics that is stable for arbitrary combinations of material parameters under a CFL-like condition. The construction of the numerical scheme uses an equivalent first-order hyperbolic system of equations as an intermediate step, for which a vectorial lattice Boltzmann formulation is introduced. The only difference to conventional lattice Boltzmann formulations is the usage of vector-valued populations, so that all computational benefits of the algorithm are preserved. Using the asymptotic expansion technique and the notion of pre-stability structures we further establish second-order consistency as well as analytical stability estimates. Lastly, we introduce a second-order consistent initialization of the populations as well as a boundary formulation for Dirichlet boundary conditions on 2D rectangular domains. All theoretical derivations are numerically verified by convergence studies using manufactured solutions and long-term stability tests.
In this work we revisit the arithmetic and bit complexity of Hermitian eigenproblems. We first provide an analysis for the divide-and-conquer tridiagonal eigensolver of Gu and Eisenstat [GE95] in the Real RAM model, when accelerated with the Fast Multipole Method. The analysis asserts the claimed nearly-$O(n^2)$ complexity to compute a full diagonalization of a symmetric tridiagonal matrix. Combined with the tridiagonal reduction algorithm of Sch\"onhage [Sch72], it implies that a Hermitian matrix can be diagonalized deterministically in $O(n^{\omega}\log(n)+n^2\mathrm{polylog}(n/\epsilon))$ arithmetic operations, where $\omega\lesssim 2.371$ is the square matrix multiplication exponent. This improves the classic deterministic $O(n^3)$ diagonalization algorithms, and derandomizes the $ O(n^{\omega}\log^2(n/\epsilon))$ algorithm of [BGVKS, FOCS '20]. Ultimately, this has a direct application to the SVD, which is widely used as a subroutine in advanced algorithms, but its complexity and approximation guarantees are often unspecified. In finite precision, we show that Sch\"onhage's algorithm is stable in floating point using $O(\log(n/\epsilon))$ bits. Combined with the (rational arithmetic) algorithm of Bini and Pan [BP91], it provides a deterministic algorithm to compute all the eigenvalues of a Hermitian matrix in $O\left(n^{\omega}F\left(\log(n/\epsilon)\right)+n^2\mathrm{polylog}(n/\epsilon)\right)$ bit operations, where $F(b)\in\widetilde{O}(b)$ is the bit complexity of a single floating point operation on $b$ bits. This improves the best known $\widetilde{O}(n^3)$ deterministic and $O\left( n^{\omega}\log^2(n/\epsilon)F\left(\log^4(n/\epsilon)\log(n)\right)\right)$ randomized complexities. We conclude with some other useful subroutines such as computing spectral gaps, condition numbers, and spectral projectors, and few open problems.
The present paper evaluates the learning behaviour of a transformer-based neural network with regard to an irregular inflectional paradigm. We apply the paradigm cell filling problem to irregular patterns. We approach this problem using the morphological reinflection task and model it as a character sequence-to-sequence learning problem. The test case under investigation are irregular verbs in Spanish. Besides many regular verbs in Spanish L-shaped verbs the first person singular indicative stem irregularly matches the subjunctive paradigm, while other indicative forms remain unaltered. We examine the role of frequency during learning and compare models under differing input frequency conditions. We train the model on a corpus of Spanish with a realistic distribution of regular and irregular verbs to compare it with models trained on input with augmented distributions of (ir)regular words. We explore how the neural models learn this L-shaped pattern using post-hoc analyses. Our experiments show that, across frequency conditions, the models are surprisingly capable of learning the irregular pattern. Furthermore, our post-hoc analyses reveal the possible sources of errors. All code and data are available at \url{//anonymous.4open.science/r/modeling_spanish_acl-7567/} under MIT license.
Gesture recognition based on surface electromyographic signal (sEMG) is one of the most used methods. The traditional manual feature extraction can only extract some low-level signal features, this causes poor classifier performance and low recognition accuracy when dealing with some complex signals. A recognition method, namely SEDCNN-SVM, is proposed to recognize sEMG of different gestures. SEDCNN-SVM consists of an improved deep convolutional neural network (DCNN) and a support vector machine (SVM). The DCNN can automatically extract and learn the feature information of sEMG through the convolution operation of the convolutional layer, so that it can capture the complex and high-level features in the data. The Squeeze and Excitation Networks (SE-Net) and the residual module were added to the model, so that the feature representation of each channel could be improved, the loss of feature information in convolutional operations was reduced, useful feature information was captured, and the problem of network gradient vanishing was eased. The SVM can improve the generalization ability and classification accuracy of the model by constructing an optimal hyperplane of the feature space. Hence, the SVM was used to replace the full connection layer and the Softmax function layer of the DCNN, the use of a suitable kernel function in SVM can improve the model's generalization ability and classification accuracy. To verify the effectiveness of the proposed classification algorithm, this method is analyzed and compared with other comparative classification methods. The recognition accuracy of SEDCNN-SVM can reach 0.955, it is significantly improved compared with other classification methods, the SEDCNN-SVM model is recognized online in real time.
Graph representation learning (GRL) is critical for extracting insights from complex network structures, but it also raises security concerns due to potential privacy vulnerabilities in these representations. This paper investigates the structural vulnerabilities in graph neural models where sensitive topological information can be inferred through edge reconstruction attacks. Our research primarily addresses the theoretical underpinnings of similarity-based edge reconstruction attacks (SERA), furnishing a non-asymptotic analysis of their reconstruction capacities. Moreover, we present empirical corroboration indicating that such attacks can perfectly reconstruct sparse graphs as graph size increases. Conversely, we establish that sparsity is a critical factor for SERA's effectiveness, as demonstrated through analysis and experiments on (dense) stochastic block models. Finally, we explore the resilience of private graph representations produced via noisy aggregation (NAG) mechanism against SERA. Through theoretical analysis and empirical assessments, we affirm the mitigation of SERA using NAG . In parallel, we also empirically delineate instances wherein SERA demonstrates both efficacy and deficiency in its capacity to function as an instrument for elucidating the trade-off between privacy and utility.
Ridge functions are used to describe and study the lower bound of the approximation done by the neural networks which can be written as a linear combination of activation functions. If the activation functions are also ridge functions, these networks are called explainable neural networks. In this brief paper, we first show that quantum neural networks which are based on variational quantum circuits can be written as a linear combination of ridge functions by following matrix notations. Consequently, we show that the interpretability and explainability of such quantum neural networks can be directly considered and studied as an approximation with the linear combination of ridge functions.
This paper studies the approximation property of ReLU neural networks (NNs) to piecewise constant functions with unknown interfaces in bounded regions in $\mathbb{R}^d$. Under the assumption that the discontinuity interface $\Gamma$ may be approximated by a connected series of hyperplanes with a prescribed accuracy $\varepsilon >0$, we show that a three-layer ReLU NN is sufficient to accurately approximate any piecewise constant function and establish its error bound. Moreover, if the discontinuity interface is convex, an analytical formula of the ReLU NN approximation with exact weights and biases is provided.
Change point detection (CPD) methods aim to identify abrupt shifts in the distribution of input data streams. Accurate estimators for this task are crucial across various real-world scenarios. Yet, traditional unsupervised CPD techniques face significant limitations, often relying on strong assumptions or suffering from low expressive power due to inherent model simplicity. In contrast, representation learning methods overcome these drawbacks by offering flexibility and the ability to capture the full complexity of the data without imposing restrictive assumptions. However, these approaches are still emerging in the CPD field and lack robust theoretical foundations to ensure their reliability. Our work addresses this gap by integrating the expressive power of representation learning with the groundedness of traditional CPD techniques. We adopt spectral normalization (SN) for deep representation learning in CPD tasks and prove that the embeddings after SN are highly informative for CPD. Our method significantly outperforms current state-of-the-art methods during the comprehensive evaluation via three standard CPD datasets.
We formalize a complete proof of the regular case of Fermat's Last Theorem in the Lean4 theorem prover. Our formalization includes a proof of Kummer's lemma, that is the main obstruction to Fermat's Last Theorem for regular primes. Rather than following the modern proof of Kummer's lemma via class field theory, we prove it by using Hilbert's Theorems 90-94 in a way that is more amenable to formalization.
CholeskyQR-type algorithms are very popular in both academia and industry in recent years. It could make a balance between the computational cost, accuracy and speed. CholeskyQR2 provides numerical stability of orthogonality and Shifted CholeskyQR3 deals with problems regarding ill-conditioned matrices. 3C algorithm is applicable for sparse matrices. However, the overestimation of the error matrices in the previous works influences the sufficient conditions for these algorithms. Particularly, it leads to a conservative shifted item in Shifted CholeskyQR3 and 3C, which may greatly influence the properties of the algorithms. In this work, we consider the randomized methods and utilize the model of probabilistic error analysis in \cite{New} to do rounding error analysis for CholeskyQR-type algorithms. We combine the theoretical analysis with the $g$-norm defined in \cite{Columns}. Our analysis could provide a smaller shifted item for Shifted CholeskyQR3 and could improve the orthogonality of our 3C algorithm for dense matrices. Numerical experiments in the final section shows that our improvements with randomized methods do have some advantages compared with the original algorithms.