Knowledge tracing consists in predicting the performance of some students on new questions given their performance on previous questions, and can be a prior step to optimizing assessment and learning. Deep knowledge tracing (DKT) is a competitive model for knowledge tracing relying on recurrent neural networks, even if some simpler models may match its performance. However, little is known about why DKT works so well. In this paper, we frame deep knowledge tracing as a encoderdecoder architecture. This viewpoint not only allows us to propose better models in terms of performance, simplicity or expressivity but also opens up promising avenues for future research directions. In particular, we show on several small and large datasets that a simpler decoder, with possibly fewer parameters than the one used by DKT, can predict student performance better.
This article studies structure-preserving discretizations of Hilbert complexes with nonconforming spaces that rely on projections onto an underlying conforming subcomplex. This approach follows the conforming/nonconforming Galerkin (CONGA) method introduced in [doi.org/10.1090/mcom/3079, doi.org/10.5802/smai-jcm.20, doi.org/10.5802/smai-jcm.21] to derive efficient structure-preserving finite element schemes for the time-dependent Maxwell and Maxwell-Vlasov systems by relaxing the curl-conforming constraint in finite element exterior calculus (FEEC) spaces. Here, it is extended to the discretization of full Hilbert complexes with possibly nontrivial harmonic fields, and the properties of the CONGA Hodge Laplacian operator are investigated. By using block-diagonal mass matrices which may be locally inverted, this framework possesses a canonical sequence of dual commuting projection operators which are local, and it naturally yields local discrete coderivative operators, in contrast to conforming FEEC discretizations. The resulting CONGA Hodge Laplacian operator is also local, and its kernel consists of the same discrete harmonic fields as the underlying conforming operator, provided that a symmetric stabilization term is added to handle the space nonconformities. Under the assumption that the underlying conforming subcomplex admits a bounded cochain projection, and that the conforming projections are stable with moment-preserving properties, a priori convergence results are established for both the CONGA Hodge Laplace source and eigenvalue problems. Our theory is finally illustrated with a spectral element method, and numerical experiments are performed which corroborate our results. Applications to spline finite elements on multi-patch mapped domains are described in a related article [arXiv:2208.05238] for which the present work provides a theoretical background.
This paper discusses the foundation of methods for accurately grasping the interaction effects. Among the existing methods that capture the interaction effects as terms, PD and ALE are known as global modelagnostic methods in the IML field. ALE, among the two, can theoretically provide a functional decomposition of the prediction function, and this study focuses on functional decomposition. Specifically, we mathematically formalize what we consider to be the requirements that must always be met by a decomposition (interaction decomposition, hereafter, ID) that decomposes the prediction function into main and interaction effect terms. We also present a theorem about how to produce a decomposition that meets these requirements. Furthermore, we confirm that while ALE is ID, PD is not, and we present examples of decomposition that meet the requirements of ID using methods other than existing ones (i.e., new methods).
We adopt the integral definition of the fractional Laplace operator and study an optimal control problem on Lipschitz domains that involves a fractional elliptic partial differential equation (PDE) as state equation and a control variable that enters the state equation as a coefficient; pointwise constraints on the control variable are considered as well. We establish the existence of optimal solutions and analyze first and, necessary and sufficient, second order optimality conditions. Regularity estimates for optimal variables are also analyzed. We develop two finite element discretization strategies: a semidiscrete scheme in which the control variable is not discretized, and a fully discrete scheme in which the control variable is discretized with piecewise constant functions. For both schemes, we analyze the convergence properties of discretizations and derive error estimates.
This paper takes a different look on the problem of testing the mutual independence of the components of a high-dimensional vector. Instead of testing if all pairwise associations (e.g. all pairwise Kendall's $\tau$) between the components vanish, we are interested in the (null)-hypothesis that all pairwise associations do not exceed a certain threshold in absolute value. The consideration of these hypotheses is motivated by the observation that in the high-dimensional regime, it is rare, and perhaps impossible, to have a null hypothesis that can be exactly modeled by assuming that all pairwise associations are precisely equal to zero. The formulation of the null hypothesis as a composite hypothesis makes the problem of constructing tests non-standard and in this paper we provide a solution for a broad class of dependence measures, which can be estimated by $U$-statistics. In particular we develop an asymptotic and a bootstrap level $\alpha$-test for the new hypotheses in the high-dimensional regime. We also prove that the new tests are minimax-optimal and investigate their finite sample properties by means of a small simulation study and a data example.
Analytical workflows in functional magnetic resonance imaging are highly flexible with limited best practices as to how to choose a pipeline. While it has been shown that the use of different pipelines might lead to different results, there is still a lack of understanding of the factors that drive these differences and of the stability of these differences across contexts. We use community detection algorithms to explore the pipeline space and assess the stability of pipeline relationships across different contexts. We show that there are subsets of pipelines that give similar results, especially those sharing specific parameters (e.g. number of motion regressors, software packages, etc.). Those pipeline-to-pipeline patterns are stable across groups of participants but not across different tasks. By visualizing the differences between communities, we show that the pipeline space is mainly driven by the size of the activation area in the brain and the scale of statistic values in statistic maps.
In 2012 Chen and Singer introduced the notion of discrete residues for rational functions as a complete obstruction to rational summability. More explicitly, for a given rational function f(x), there exists a rational function g(x) such that f(x) = g(x+1) - g(x) if and only if every discrete residue of f(x) is zero. Discrete residues have many important further applications beyond summability: to creative telescoping problems, thence to the determination of (differential-)algebraic relations among hypergeometric sequences, and subsequently to the computation of (differential) Galois groups of difference equations. However, the discrete residues of a rational function are defined in terms of its complete partial fraction decomposition, which makes their direct computation impractical due to the high complexity of completely factoring arbitrary denominator polynomials into linear factors. We develop a factorization-free algorithm to compute discrete residues of rational functions, relying only on gcd computations and linear algebra.
Mendelian randomization (MR) is an instrumental variable (IV) approach to infer causal relationships between exposures and outcomes with genome-wide association studies (GWAS) summary data. However, the multivariable inverse-variance weighting (IVW) approach, which serves as the foundation for most MR approaches, cannot yield unbiased causal effect estimates in the presence of many weak IVs. To address this problem, we proposed the MR using Bias-corrected Estimating Equation (MRBEE) that can infer unbiased causal relationships with many weak IVs and account for horizontal pleiotropy simultaneously. While the practical significance of MRBEE was demonstrated in our parallel work (Lorincz-Comi (2023)), this paper established the statistical theories of multivariable IVW and MRBEE with many weak IVs. First, we showed that the bias of the multivariable IVW estimate is caused by the error-in-variable bias, whose scale and direction are inflated and influenced by weak instrument bias and sample overlaps of exposures and outcome GWAS cohorts, respectively. Second, we investigated the asymptotic properties of multivariable IVW and MRBEE, showing that MRBEE outperforms multivariable IVW regarding unbiasedness of causal effect estimation and asymptotic validity of causal inference. Finally, we applied MRBEE to examine myopia and revealed that education and outdoor activity are causal to myopia whereas indoor activity is not.
Increasing the degree of digitisation and automation in the concrete production process can play a crucial role in reducing the CO$_2$ emissions that are associated with the production of concrete. In this paper, a method is presented that makes it possible to predict the properties of fresh concrete during the mixing process based on stereoscopic image sequences of the concretes flow behaviour. A Convolutional Neural Network (CNN) is used for the prediction, which receives the images supported by information on the mix design as input. In addition, the network receives temporal information in the form of the time difference between the time at which the images are taken and the time at which the reference values of the concretes are carried out. With this temporal information, the network implicitly learns the time-dependent behaviour of the concretes properties. The network predicts the slump flow diameter, the yield stress and the plastic viscosity. The time-dependent prediction potentially opens up the pathway to determine the temporal development of the fresh concrete properties already during mixing. This provides a huge advantage for the concrete industry. As a result, countermeasures can be taken in a timely manner. It is shown that an approach based on depth and optical flow images, supported by information of the mix design, achieves the best results.
Multi-product formulas (MPF) are linear combinations of Trotter circuits offering high-quality simulation of Hamiltonian time evolution with fewer Trotter steps. Here we report two contributions aimed at making multi-product formulas more viable for near-term quantum simulations. First, we extend the theory of Trotter error with commutator scaling developed by Childs, Su, Tran et al. to multi-product formulas. Our result implies that multi-product formulas can achieve a quadratic reduction of Trotter error in 1-norm (nuclear norm) on arbitrary time intervals compared with the regular product formulas without increasing the required circuit depth or qubit connectivity. The number of circuit repetitions grows only by a constant factor. Second, we introduce dynamic multi-product formulas with time-dependent coefficients chosen to minimize a certain efficiently computable proxy for the Trotter error. We use a minimax estimation method to make dynamic multi-product formulas robust to uncertainty from algorithmic errors, sampling and hardware noise. We call this method Minimax MPF and we provide a rigorous bound on its error.
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.