We consider the dichotomy conjecture for consistent query answering under primary key constraints stating that for every fixed Boolean conjunctive query q, testing whether it is certain over all repairs of a given inconsistent database is either polynomial time or coNP-complete. This conjecture has been verified for self-join-free and path queries. We propose a simple inflationary fixpoint algorithm for consistent query answering which, for a given database, naively computes a set $\Delta$ of subsets of database repairs with at most $k$ facts, where $k$ is the size of the query $q$. The algorithm runs in polynomial time and can be formally defined as: 1. Initialize $\Delta$ with all sets $S$ of at most $k$ facts such that $S$ satisfies $q$. 2. Add any set $S$ of at most $k$ facts to $\Delta$ if there exists a block $B$ (ie, a maximal set of facts sharing the same key) such that for every fact $a$ of $B$ there is a set $S' \in \Delta$ contained in $(S \cup \{a\})$. The algorithm answers "$q$ is certain" iff $\Delta$ eventually contains the empty set. The algorithm correctly computes certain answers when the query $q$ falls in the polynomial time cases for self-join-free queries and path queries. For arbitrary queries, the algorithm is an under-approximation: The query is guaranteed to be certain if the algorithm claims so. However, there are polynomial time certain queries (with self-joins) which are not identified as such by the algorithm.
We give the first efficient algorithm for learning halfspaces in the testable learning model recently defined by Rubinfeld and Vasilyan (2023). In this model, a learner certifies that the accuracy of its output hypothesis is near optimal whenever the training set passes an associated test, and training sets drawn from some target distribution -- e.g., the Gaussian -- must pass the test. This model is more challenging than distribution-specific agnostic or Massart noise models where the learner is allowed to fail arbitrarily if the distributional assumption does not hold. We consider the setting where the target distribution is Gaussian (or more generally any strongly log-concave distribution) in $d$ dimensions and the noise model is either Massart or adversarial (agnostic). For Massart noise, our tester-learner runs in polynomial time and outputs a hypothesis with (information-theoretically optimal) error $\mathsf{opt} + \epsilon$ for any strongly log-concave target distribution. For adversarial noise, our tester-learner obtains error $O(\mathsf{opt}) + \epsilon$ in polynomial time when the target distribution is Gaussian; for strongly log-concave distributions, we obtain $\tilde{O}(\mathsf{opt}) + \epsilon$ in quasipolynomial time. Prior work on testable learning ignores the labels in the training set and checks that the empirical moments of the covariates are close to the moments of the base distribution. Here we develop new tests of independent interest that make critical use of the labels and combine them with the moment-matching approach of Gollakota et al. (2023). This enables us to simulate a variant of the algorithm of Diakonikolas et al. (2020) for learning noisy halfspaces using nonconvex SGD but in the testable learning setting.
We consider the problem of correct motion planning for T-intersection merge-ins of arbitrary geometry and vehicle density. A merge-in support system has to estimate the chances that a gap between two consecutive vehicles can be taken successfully. In contrast to previous models based on heuristic gap size rules, we present an approach which optimizes the integral risk of the situation using parametrized velocity ramps. It accounts for the risks from curves and all involved vehicles (front and rear on all paths) with a so-called survival analysis. For comparison, we also introduce a specially designed extension of the Intelligent Driver Model (IDM) for entering intersections. We show in a quantitative statistical evaluation that the survival method provides advantages in terms of lower absolute risk (i.e., no crash happens) and better risk-utility tradeoff (i.e., making better use of appearing gaps). Furthermore, our approach generalizes to more complex situations with additional risk sources.
This study presents a secondary data analysis of the survey data collected as part of the American Trends Panel series by the Pew Research Center. A logistic regression was performed to ascertain the effects of the perceived risk of sharing, perceived problems on Twitter, and motivation of using Twitter on the likelihood that participants regret sharing on Twitter. The logistic regression model was statistically significant, \c{hi}2(15) = 102.5, p < .001. The model correctly classified 78.5 percent of cases. Whether or not Twitter users regret sharing on Twitter depends on different motivations for using Twitter. We observe that "A way to express my opinion" is statistically significant in the mod-el, indicating that the odds of Twitter users regretting sharing for this motivation is 2.1 times higher than that of entertainment. Perceived risks of potential hostility and visibility were negatively associated with an increased likelihood of regret sharing. In contrast, perceived problems on Twitter concerning misinformation were negatively associated with the likelihood of regret sharing.
This paper concerns the paraconsistent logic LPQ$^{\supset,\mathsf{F}}$ and an application of it in the area of relational database theory. The notions of a relational database, a query applicable to a relational database, and a consistent answer to a query with respect to a possibly inconsistent relational database are considered from the perspective of this logic. This perspective enables among other things the definition of a consistent answer to a query with respect to a possibly inconsistent database without resort to database repairs. In a previous paper, LPQ$^{\supset,\mathsf{F}}$ is presented with a sequent-style natural deduction proof system. In this paper, a sequent calculus proof system is presented because it is common to use a sequent calculus proof system as the basis of proof search procedures and such procedures may form the core of algorithms for computing consistent answers to queries.
We often rely on censuses of triangulations to guide our intuition in $3$-manifold topology. However, this can lead to misplaced faith in conjectures if the smallest counterexamples are too large to appear in our census. Since the number of triangulations increases super-exponentially with size, there is no way to expand a census beyond relatively small triangulations; the current census only goes up to $10$ tetrahedra. Here, we show that it is feasible to search for large and hard-to-find counterexamples by using heuristics to selectively (rather than exhaustively) enumerate triangulations. We use this idea to find counterexamples to three conjectures which ask, for certain $3$-manifolds, whether one-vertex triangulations always have a "distinctive" edge that would allow us to recognise the $3$-manifold.
Cloud computing is a web-based utility model that is becoming popular every day with the emergence of 4th Industrial Revolution, therefore, cybercrimes that affect web-based systems are also relevant to cloud computing. In order to conduct a forensic investigation into a cyber-attack, it is necessary to identify and locate the source of the attack as soon as possible. Although significant study has been done in this domain on obstacles and its solutions, research on approaches and strategies is still in its development stage. There are barriers at every stage of cloud forensics, therefore, before we can come up with a comprehensive way to deal with these problems, we must first comprehend the cloud technology and its forensics environment. Although there are articles that are linked to cloud forensics, there is not yet a paper that accumulated the contemporary concerns and solutions related to cloud forensic. Throughout this chapter, we have looked at the cloud environment, as well as the threats and attacks that it may be subjected to. We have also looked at the approaches that cloud forensics may take, as well as the various frameworks and the practical challenges and limitations they may face when dealing with cloud forensic investigations.
Traditional recurrent neural networks (RNNs) have a fixed, finite number of memory cells. In theory (assuming bounded range and precision), this limits their formal language recognition power to regular languages, and in practice, RNNs have been shown to be unable to learn many context-free languages (CFLs). In order to expand the class of languages RNNs recognize, prior work has augmented RNNs with a nondeterministic stack data structure, putting them on par with pushdown automata and increasing their language recognition power to CFLs. Nondeterminism is needed for recognizing all CFLs (not just deterministic CFLs), but in this paper, we show that nondeterminism and the neural controller interact to produce two more unexpected abilities. First, the nondeterministic stack RNN can recognize not only CFLs, but also many non-context-free languages. Second, it can recognize languages with much larger alphabet sizes than one might expect given the size of its stack alphabet. Finally, to increase the information capacity in the stack and allow it to solve more complicated tasks with large alphabet sizes, we propose a new version of the nondeterministic stack that simulates stacks of vectors rather than discrete symbols. We demonstrate perplexity improvements with this new model on the Penn Treebank language modeling benchmark.
Numerical predictions of quantities of interest measured within physical systems rely on the use of mathematical models that should be validated, or at best, not invalidated. Model validation usually involves the comparison of experimental data (outputs from the system of interest) and model predictions, both obtained at a specific validation scenario. The design of this validation experiment should be directly relevant to the objective of the model, that of predicting a quantity of interest at a prediction scenario. In this paper, we address two specific issues arising when designing validation experiments. The first issue consists in determining an appropriate validation scenario in cases where the prediction scenario cannot be carried out in a controlled environment. The second issue concerns the selection of observations when the quantity of interest cannot be readily observed. The proposed methodology involves the computation of influence matrices that characterize the response surface of given model functionals. Minimization of the distance between influence matrices allow one for selecting a validation experiment most representative of the prediction scenario. We illustrate our approach on two numerical examples. The first example considers the validation of a simple model based on an ordinary differential equation governing an object in free fall to put in evidence the importance of the choice of the validation experiment. The second numerical experiment focuses on the transport of a pollutant and demonstrates the impact that the choice of the quantity of interest has on the validation experiment to be performed.
Spiking neural networks (SNN) have recently emerged as alternatives to traditional neural networks, owing to energy efficiency benefits and capacity to better capture biological neuronal mechanisms. However, the classic backpropagation algorithm for training traditional networks has been notoriously difficult to apply to SNN due to the hard-thresholding and discontinuities at spike times. Therefore, a large majority of prior work believes exact gradients for SNN w.r.t. their weights do not exist and has focused on approximation methods to produce surrogate gradients. In this paper, (1) by applying the implicit function theorem to SNN at the discrete spike times, we prove that, albeit being non-differentiable in time, SNNs have well-defined gradients w.r.t. their weights, and (2) we propose a novel training algorithm, called \emph{forward propagation} (FP), that computes exact gradients for SNN. FP exploits the causality structure between the spikes and allows us to parallelize computation forward in time. It can be used with other algorithms that simulate the forward pass, and it also provides insights on why other related algorithms such as Hebbian learning and also recently-proposed surrogate gradient methods may perform well.
This paper is concerned with the convergence of a series associated with a certain version of the convexification method. That version has been recently developed by the research group of the first author for solving coefficient inverse problems. The convexification method aims to construct a globally convex Tikhonov-like functional with a Carleman Weight Function in it. In the previous works the construction of the strictly convex weighted Tikhonov-like functional assumes a truncated Fourier series (i.e. a finite series instead of an infinite one) for a function generated by the total wave field. In this paper we prove a convergence property for this truncated Fourier series approximation. More precisely, we show that the residual of the approximate PDE obtained by using the truncated Fourier series tends to zero in $L^{2}$ as the truncation index in the truncated Fourier series tends to infinity. The proof relies on a convergence result in the $H^{1}$-norm for a sequence of $L^{2}$-orthogonal projections on finite-dimensional subspaces spanned by elements of a special Fourier basis. However, due to the ill-posed nature of coefficient inverse problems, we cannot prove that the solution of that approximate PDE, which results from the minimization of that Tikhonov-like functional, converges to the correct solution.