In this paper we investigate the relationships between a multipreferential semantics for defeasible reasoning in knowledge representation and a multilayer neural network model. Weighted knowledge bases for a simple description logic with typicality are considered under a (many-valued) ``concept-wise" multipreference semantics. The semantics is used to provide a preferential interpretation of MultiLayer Perceptrons (MLPs). A model checking and an entailment based approach are exploited in the verification of conditional properties of MLPs.
We construct a bipartite generalization of Alon and Szegedy's nearly orthogonal vectors, thereby obtaining strong bounds for several extremal problems involving the Lov\'asz theta function, vector chromatic number, minimum semidefinite rank, nonnegative rank, and extension complexity of polytopes. In particular, we derive a couple of general lower bounds for the vector chromatic number which may be of independent interest.
This paper investigates the multiple testing problem for high-dimensional sparse binary sequences, motivated by the crowdsourcing problem in machine learning. We study the empirical Bayes approach for multiple testing on the high-dimensional Bernoulli model with a conjugate spike and uniform slab prior. We first show that the hard thresholding rule deduced from the posterior distribution is suboptimal. Consequently, the $\ell$-value procedure constructed using this posterior tends to be overly conservative in estimating the false discovery rate (FDR). We then propose two new procedures based on $\adj\ell$-values and $q$-values to correct this issue. Sharp frequentist theoretical results are obtained, demonstrating that both procedures can effectively control the FDR under sparsity. Numerical experiments are conducted to validate our theory in finite samples. To our best knowledge, this work provides the first uniform FDR control result in multiple testing for high-dimensional sparse binary data.
In prediction settings where data are collected over time, it is often of interest to understand both the importance of variables for predicting the response at each time point and the importance summarized over the time series. Building on recent advances in estimation and inference for variable importance measures, we define summaries of variable importance trajectories. These measures can be estimated and the same approaches for inference can be applied regardless of the choice of the algorithm(s) used to estimate the prediction function. We propose a nonparametric efficient estimation and inference procedure as well as a null hypothesis testing procedure that are valid even when complex machine learning tools are used for prediction. Through simulations, we demonstrate that our proposed procedures have good operating characteristics, and we illustrate their use by investigating the longitudinal importance of risk factors for suicide attempt.
In this paper we propose a definition of the distributional Riemann curvature tensor in dimension $N\geq 2$ if the underlying metric tensor $g$ defined on a triangulation $\mathcal{T}$ possesses only single-valued tangential-tangential components on codimension 1 simplices. We analyze the convergence of the curvature approximation in the $H^{-2}$-norm if a sequence of interpolants $g_h$ of polynomial order $k\geq 0$ of a smooth metric $g$ is given. We show that for dimension $N=2$ convergence rates of order $\mathcal{O}(h^{k+1})$ are obtained. For $N\geq 3$ convergence holds only in the case $k\geq 1$. Numerical examples demonstrate that our theoretical results are sharp. By choosing appropriate test functions we show that the distributional Gauss and scalar curvature in 2D respectively any dimension are obtained. Further, a first definition of the distributional Ricci curvature tensor in arbitrary dimension is derived, for which our analysis is applicable.
The Causal Roadmap outlines a systematic approach to our research endeavors: define quantity of interest, evaluate needed assumptions, conduct statistical estimation, and carefully interpret of results. At the estimation step, it is essential that the estimation algorithm be chosen thoughtfully for its theoretical properties and expected performance. Simulations can help researchers gain a better understanding of an estimator's statistical performance under conditions unique to the real-data application. This in turn can inform the rigorous pre-specification of a Statistical Analysis Plan (SAP), not only stating the estimand (e.g., G-computation formula), the estimator (e.g., targeted minimum loss-based estimation [TMLE]), and adjustment variables, but also the implementation of the estimator -- including nuisance parameter estimation and approach for variance estimation. Doing so helps ensure valid inference (e.g., 95% confidence intervals with appropriate coverage). Failing to pre-specify estimation can lead to data dredging and inflated Type-I error rates.
In this paper we examine the effectiveness of several multi-arm bandit algorithms when used as a trust system to select agents to delegate tasks to. In contrast to existing work, we allow for recursive delegation to occur. That is, a task delegated to one agent can be delegated onwards by that agent, with further delegation possible until some agent finally executes the task. We show that modifications to the standard multi-arm bandit algorithms can provide improvements in performance in such recursive delegation settings.
Palimpsests refer to historical manuscripts where erased writings have been partially covered by the superimposition of a second writing. By employing imaging techniques, e.g., multispectral imaging, it becomes possible to identify features that are imperceptible to the naked eye, including faded and erased inks. When dealing with overlapping inks, Artificial Intelligence techniques can be utilized to disentangle complex nodes of overlapping letters. In this work, we propose deep learning-based semantic segmentation as a method for identifying and segmenting individual letters in overlapping characters. The experiment was conceived as a proof of concept, focusing on the palimpsests of the Ars Grammatica by Prisciano as a case study. Furthermore, caveats and prospects of our approach combined with multispectral imaging are also discussed.
This paper presents the workspace optimization of one-translational two-rotational (1T2R) parallel manipulators using a dimensionally homogeneous constraint-embedded Jacobian. The mixed degrees of freedom of 1T2R parallel manipulators, which cause dimensional inconsistency, make it difficult to optimize their architectural parameters. To solve this problem, a point-based approach with a shifting property, selection matrix, and constraint-embedded inverse Jacobian is proposed. A simplified formulation is provided, eliminating the complex partial differentiation required in previous approaches. The dimensional homogeneity of the proposed method was analytically proven, and its validity was confirmed by comparing it with the conventional point-based method using a 3-PRS manipulator. Furthermore, the approach was applied to an asymmetric 2-RRS/RRRU manipulator with no parasitic motion. This mechanism has a T-shape combination of limbs with different kinematic parameters, making it challenging to derive a dimensionally homogeneous Jacobian using the conventional method. Finally, optimization was performed, and the results show that the proposed method is more efficient than the conventional approach. The efficiency and simplicity of the proposed method were verified using two distinct parallel manipulators.
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.
When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods. Second, we review generic optimization methods used in training neural networks, such as SGD, adaptive gradient methods and distributed methods, and theoretical results for these algorithms. Third, we review existing research on the global issues of neural network training, including results on bad local minima, mode connectivity, lottery ticket hypothesis and infinite-width analysis.