In case of incomplete database tables, a possible world is obtained by replacing any missing value by a value from the corresponding attribute's domain that can be infinite. A possible key or possible functional dependency constraint is satisfied by an incomplete table if we can obtain a possible world that satisfies the given key or functional dependency. On the other hand, a certain key or certain functional dependency holds if all possible worlds satisfy the constraint, A strongly possible constraint is an intermediate concept between possible and certain constraints, based on the strongly possible world approach (a strongly possible world is obtained by replacing \nul's by a value from the ones appearing in the corresponding attribute of the table). A strongly possible key or functional dependency holds in an incomplete table if there exists a strongly possible world that satisfies the given constraint. In the present paper, we introduce strongly possible versions of multivalued dependencies and cross joins, and we analyse the complexity of checking the validity of a given strongly possible cross joins. We also study approximation measures of strongly possible keys (spKeys), functional dependencies (spFDs), multivalued dependencies (spMVDs) and cross joins (spCJs). We also treat complexity questions of determination of the approximation values.
In addressing control problems such as regulation and tracking through reinforcement learning, it is often required to guarantee that the acquired policy meets essential performance and stability criteria such as a desired settling time and steady-state error prior to deployment. Motivated by this necessity, we present a set of results and a systematic reward shaping procedure that (i) ensures the optimal policy generates trajectories that align with specified control requirements and (ii) allows to assess whether any given policy satisfies them. We validate our approach through comprehensive numerical experiments conducted in two representative environments from OpenAI Gym: the Inverted Pendulum swing-up problem and the Lunar Lander. Utilizing both tabular and deep reinforcement learning methods, our experiments consistently affirm the efficacy of our proposed framework, highlighting its effectiveness in ensuring policy adherence to the prescribed control requirements.
With the increasing amount of data available to scientists in disciplines as diverse as bioinformatics, physics, and remote sensing, scientific workflow systems are becoming increasingly important for composing and executing scalable data analysis pipelines. When writing such workflows, users need to specify the resources to be reserved for tasks so that sufficient resources are allocated on the target cluster infrastructure. Crucially, underestimating a task's memory requirements can result in task failures. Therefore, users often resort to overprovisioning, resulting in significant resource wastage and decreased throughput. In this paper, we propose a novel online method that uses monitoring time series data to predict task memory usage in order to reduce the memory wastage of scientific workflow tasks. Our method predicts a task's runtime, divides it into k equally-sized segments, and learns the peak memory value for each segment depending on the total file input size. We evaluate the prototype implementation of our method using workflows from the publicly available nf-core repository, showing an average memory wastage reduction of 29.48% compared to the best state-of-the-art approach.
Multispectral images (MSI) contain light information in different wavelengths of objects, which convey spectral-spatial information and help improve the performance of various image processing tasks. Numerous techniques have been created to extend the application of total variation regularization in restoring multispectral images, for example, based on channel coupling and adaptive total variation regularization. The primary contribution of this paper is to propose and develop a new multispectral total variation regularization in a generalized opponent transformation domain instead of the original multispectral image domain. Here opponent transformations for multispectral images are generalized from a well-known opponent transformation for color images. We will explore the properties of generalized opponent transformation total variation (GOTTV) regularization and the corresponding optimization formula for multispectral image restoration. To evaluate the effectiveness of the new GOTTV method, we provide numerical examples that showcase its superior performance compared to existing multispectral image total variation methods, using criteria such as MPSNR and MSSIM.
We consider a two-player dynamic information design problem between a principal and a receiver -- a game is played between the two agents on top of a Markovian system controlled by the receiver's actions, where the principal obtains and strategically shares some information about the underlying system with the receiver in order to influence their actions. In our setting, both players have long-term objectives, and the principal sequentially commits to their strategies instead of committing at the beginning. Further, the principal cannot directly observe the system state, but at every turn they can choose randomized experiments to observe the system partially. The principal can share details about the experiments to the receiver. For our analysis we impose the truthful disclosure rule: the principal is required to truthfully announce the details and the result of each experiment to the receiver immediately after the experiment result is revealed. Based on the received information, the receiver takes an action when its their turn, with the action influencing the state of the underlying system. We show that there exist Perfect Bayesian equilibria in this game where both agents play Canonical Belief Based (CBB) strategies using a compressed version of their information, rather than full information, to choose experiments (for the principal) or actions (for the receiver). We also provide a backward inductive procedure to solve for an equilibrium in CBB strategies.
Algorithms with predictions have attracted much attention in the last years across various domains, including variants of facility location, as a way to surpass traditional worst-case analyses. We study the $k$-facility location mechanism design problem, where the $n$ agents are strategic and might misreport their location. Unlike previous models, where predictions are for the $k$ optimal facility locations, we receive $n$ predictions for the locations of each of the agents. However, these predictions are only "mostly" and "approximately" correct (or MAC for short) -- i.e., some $\delta$-fraction of the predicted locations are allowed to be arbitrarily incorrect, and the remainder of the predictions are allowed to be correct up to an $\varepsilon$-error. We make no assumption on the independence of the errors. Can such predictions allow us to beat the current best bounds for strategyproof facility location? We show that the $1$-median (geometric median) of a set of points is naturally robust under corruptions, which leads to an algorithm for single-facility location with MAC predictions. We extend the robustness result to a "balanced" variant of the $k$ facilities case. Without balancedness, we show that robustness completely breaks down, even for the setting of $k=2$ facilities on a line. For this "unbalanced" setting, we devise a truthful random mechanism that outperforms the best known result of Lu et al. [2010], which does not use predictions. En route, we introduce the problem of "second" facility location (when the first facility's location is already fixed). Our findings on the robustness of the $1$-median and more generally $k$-medians may be of independent interest, as quantitative versions of classic breakdown-point results in robust statistics.
Reentrancy vulnerability as one of the most notorious vulnerabilities, has been a prominent topic in smart contract security research. Research shows that existing vulnerability detection presents a range of challenges, especially as smart contracts continue to increase in complexity. Existing tools perform poorly in terms of efficiency and successful detection rates for vulnerabilities in complex contracts. To effectively detect reentrancy vulnerabilities in contracts with complex logic, we propose a tool named SliSE. SliSE's detection process consists of two stages: Warning Search and Symbolic Execution Verification. In Stage I, SliSE utilizes program slicing to analyze the Inter-contract Program Dependency Graph (I-PDG) of the contract, and collects suspicious vulnerability information as warnings. In Stage II, symbolic execution is employed to verify the reachability of these warnings, thereby enhancing vulnerability detection accuracy. SliSE obtained the best performance compared with eight state-of-the-art detection tools. It achieved an F1 score of 78.65%, surpassing the highest score recorded by an existing tool of 9.26%. Additionally, it attained a recall rate exceeding 90% for detection of contracts on Ethereum. Overall, SliSE provides a robust and efficient method for detection of Reentrancy vulnerabilities for complex contracts.
Graphs are important data representations for describing objects and their relationships, which appear in a wide diversity of real-world scenarios. As one of a critical problem in this area, graph generation considers learning the distributions of given graphs and generating more novel graphs. Owing to their wide range of applications, generative models for graphs, which have a rich history, however, are traditionally hand-crafted and only capable of modeling a few statistical properties of graphs. Recent advances in deep generative models for graph generation is an important step towards improving the fidelity of generated graphs and paves the way for new kinds of applications. This article provides an extensive overview of the literature in the field of deep generative models for graph generation. Firstly, the formal definition of deep generative models for the graph generation and the preliminary knowledge are provided. Secondly, taxonomies of deep generative models for both unconditional and conditional graph generation are proposed respectively; the existing works of each are compared and analyzed. After that, an overview of the evaluation metrics in this specific domain is provided. Finally, the applications that deep graph generation enables are summarized and five promising future research directions are highlighted.
Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.
The notion of "in-domain data" in NLP is often over-simplistic and vague, as textual data varies in many nuanced linguistic aspects such as topic, style or level of formality. In addition, domain labels are many times unavailable, making it challenging to build domain-specific systems. We show that massive pre-trained language models implicitly learn sentence representations that cluster by domains without supervision -- suggesting a simple data-driven definition of domains in textual data. We harness this property and propose domain data selection methods based on such models, which require only a small set of in-domain monolingual data. We evaluate our data selection methods for neural machine translation across five diverse domains, where they outperform an established approach as measured by both BLEU and by precision and recall of sentence selection with respect to an oracle.
Graphs, which describe pairwise relations between objects, are essential representations of many real-world data such as social networks. In recent years, graph neural networks, which extend the neural network models to graph data, have attracted increasing attention. Graph neural networks have been applied to advance many different graph related tasks such as reasoning dynamics of the physical system, graph classification, and node classification. Most of the existing graph neural network models have been designed for static graphs, while many real-world graphs are inherently dynamic. For example, social networks are naturally evolving as new users joining and new relations being created. Current graph neural network models cannot utilize the dynamic information in dynamic graphs. However, the dynamic information has been proven to enhance the performance of many graph analytical tasks such as community detection and link prediction. Hence, it is necessary to design dedicated graph neural networks for dynamic graphs. In this paper, we propose DGNN, a new {\bf D}ynamic {\bf G}raph {\bf N}eural {\bf N}etwork model, which can model the dynamic information as the graph evolving. In particular, the proposed framework can keep updating node information by capturing the sequential information of edges, the time intervals between edges and information propagation coherently. Experimental results on various dynamic graphs demonstrate the effectiveness of the proposed framework.