Plotting is a tile-matching puzzle video game published by Taito in 1989. Its objective is to reduce a given grid of coloured blocks down to a goal number or fewer. This is achieved by the avatar character repeatedly shooting the block it holds into the grid. Plotting is an example of a planning problem: given a model of the environment, a planning problem asks us to find a sequence of actions that can lead from an initial state of the environment to a given goal state while respecting some constraints. The key difficulty in modelling Plotting is in capturing the way the puzzle state changes after each shot. A single shot can affect multiple tiles directly, and the grid is affected by gravity so numerous other tiles can be affected indirectly. We present and evaluate a constraint model of the Plotting problem that captures this complexity. We also discuss the difficulties and inefficiencies of modelling Plotting in PDDL, the standard language used for input to specialised AI planners. We conclude by arguing that AI planning could benefit from a richer modelling language.
In the last decade, a great effort has been employed in the study of Hybrid Unmanned Aerial Underwater Vehicles, robots that can easily fly and dive into the water with different levels of mechanical adaptation. However, most of this literature is concentrated on physical design, practical issues of construction, and, more recently, low-level control strategies. Little has been done in the context of high-level intelligence, such as motion planning and interactions with the real world. Therefore, we proposed in this paper a trajectory planning approach that allows collision avoidance against unknown obstacles and smooth transitions between aerial and aquatic media. Our method is based on a variant of the classic Rapidly-exploring Random Tree, whose main advantages are the capability to deal with obstacles, complex nonlinear dynamics, model uncertainties, and external disturbances. The approach uses the dynamic model of the \hydrone, a hybrid vehicle proposed with high underwater performance, but we believe it can be easily generalized to other types of aerial/aquatic platforms. In the experimental section, we present simulated results in environments filled with obstacles, where the robot is commanded to perform different media movements, demonstrating the applicability of our strategy.
The field of General Reinforcement Learning (GRL) formulates the problem of sequential decision-making from ground up. The history of interaction constitutes a "ground" state of the system, which never repeats. On the one hand, this generality allows GRL to model almost every domain possible, e.g.\ Bandits, MDPs, POMDPs, PSRs, and history-based environments. On the other hand, in general, the near-optimal policies in GRL are functions of complete history, which hinders not only learning but also planning in GRL. The usual way around for the planning part is that the agent is given a Markovian abstraction of the underlying process. So, it can use any MDP planning algorithm to find a near-optimal policy. The Extreme State Aggregation (ESA) framework has extended this idea to non-Markovian abstractions without compromising on the possibility of planning through a (surrogate) MDP. A distinguishing feature of ESA is that it proves an upper bound of $O\left(\varepsilon^{-A} \cdot (1-\gamma)^{-2A}\right)$ on the number of states required for the surrogate MDP (where $A$ is the number of actions, $\gamma$ is the discount-factor, and $\varepsilon$ is the optimality-gap) which holds \emph{uniformly} for \emph{all} domains. While the possibility of a universal bound is quite remarkable, we show that this bound is very loose. We propose a novel non-MDP abstraction which allows for a much better upper bound of $O\left(\varepsilon^{-1} \cdot (1-\gamma)^{-2} \cdot A \cdot 2^{A}\right)$. Furthermore, we show that this bound can be improved further to $O\left(\varepsilon^{-1} \cdot (1-\gamma)^{-2} \cdot \log^3 A \right)$ by using an action-sequentialization method.
The sensitivity of a string compression algorithm $C$ asks how much the output size $C(T)$ for an input string $T$ can increase when a single character edit operation is performed on $T$. This notion enables one to measure the robustness of compression algorithms in terms of errors and/or dynamic changes occurring in the input string. In this paper, we analyze the worst-case multiplicative sensitivity of string compression algorithms, defined by $\max_{T \in \Sigma^n}\{C(T')/C(T) : ed(T, T') = 1\}$, where $ed(T, T')$ denotes the edit distance between $T$ and $T'$. For the most common versions of the Lempel-Ziv 77 compressors, we prove that the worst-case multiplicative sensitivity is only a small constant (2 or 3, depending on the version of the Lempel-Ziv 77 and the edit operation type). We strengthen our upper bound results by presenting matching lower bounds on the worst-case sensitivity for all these major versions of the Lempel-Ziv 77 factorizations. This contrasts with the previously known related results such that the size $z_{\rm 78}$ of the Lempel-Ziv 78 factorization can increase by a factor of $\Omega(n^{3/4})$ [Lagarde and Perifel, 2018], and the number $r$ of runs in the Burrows-Wheeler transform can increase by a factor of $\Omega(\log n)$ [Giuliani et al., 2021] when a character is prepended to an input string of length $n$. We also study the worst-case sensitivity of several grammar compression algorithms including Bisection, AVL-grammar, GCIS, and CDAWG. Further, we extend the notion of the worst-case sensitivity to string repetitiveness measures such as the smallest string attractor size $\gamma$ and the substring complexity $\delta$. We present some non-trivial upper and lower bounds of the worst-case multiplicative sensitivity for $\gamma$ and matching upper and lower bounds of the worst-case multiplicative sensitivity for $\delta$.
We consider the dynamic pricing problem with covariates under a generalized linear demand model: a seller can dynamically adjust the price of a product over a horizon of $T$ time periods, and at each time period $t$, the demand of the product is jointly determined by the price and an observable covariate vector $x_t\in\mathbb{R}^d$ through an unknown generalized linear model. Most of the existing literature assumes the covariate vectors $x_t$'s are independently and identically distributed (i.i.d.); the few papers that relax this assumption either sacrifice model generality or yield sub-optimal regret bounds. In this paper we show that a simple pricing algorithm has an $O(d\sqrt{T}\log T)$ regret upper bound without assuming any statistical structure on the covariates $x_t$ (which can even be arbitrarily chosen). The upper bound on the regret matches the lower bound (even under the i.i.d. assumption) up to logarithmic factors. Our paper thus shows that (i) the i.i.d. assumption is not necessary for obtaining low regret, and (ii) the regret bound can be independent of the (inverse) minimum eigenvalue of the covariance matrix of the $x_t$'s, a quantity present in previous bounds. Furthermore, we discuss a condition under which a better regret is achievable and how a Thompson sampling algorithm can be applied to give an efficient computation of the prices.
It is widely expected that future networks of 6G and beyond will deliver on the unachieved goals set by 5G. Technologies such as Internet of Skills and Industry 4.0 will become stable and viable, as a direct consequence of networks that offer sustained and reliable mobile performance levels. The primary challenges for future technologies are not just low-latency and high-bandwidth. The more critical problem Mobile Service Providers (MSPs) will face will be in balancing the inflated demands of network connections and customers' trust in the network service, that is, being able to interconnect billions of unique devices while adhering to the agreed terms of Service Level Agreements (SLAs). To meet these targets, it is self-evident that MSPs cannot operate in a solitary environment. They must enable cooperation among themselves in a manner that ensures trust, both between themselves as well as with customers. In this study, we present the BEAT (Blockchain-Enabled Accountable and Transparent) Infrastructure Sharing architecture. BEAT exploits the inherent properties of permissioned type of distributed ledger technology (i.e., permissioned distributed ledgers) to deliver on accountability and transparency metrics whenever infrastructure needs to be shared between providers. We also propose a lightweight method that enables device-level accountability. BEAT has been designed to be deployable directly as only minor software upgrades to network devices such as routers. Our simulations on a resource-limited device show that BEAT adds only a few seconds of overhead processing time -- with the latest state-of-the-art network devices, we can reasonably anticipate much lower overheads.
In this paper, we derive asymptotic expressions for the ergodic capacity of the multiple-input multiple-output (MIMO) keyhole channel at low SNR in independent and identically distributed (i.i.d.) Nakagami-$m$ fading conditions with perfect channel state information available at both the transmitter (CSI-T) and the receiver (CSI-R). We show that the low-SNR capacity of this keyhole channel scales proportionally as $\frac{\mathrm{SNR}}{4} \log^2 \left(1/{\mathrm{SNR}}\right)$. With this asymptotic low-SNR capacity formula, we find a very surprising result that contrary to popular belief, the capacity of the MIMO fading channel at low SNR increases in the presence of keyhole degenerate condition. Additionally, we show that a simple one-bit CSI-T based On-Off power scheme achieves this low-SNR capacity; surprisingly, it is robust against both moderate and severe fading conditions for a wide range of low SNR values. These results also extend to the Rayleigh keyhole MIMO channel as a special case.
Imitation learning enables agents to reuse and adapt the hard-won expertise of others, offering a solution to several key challenges in learning behavior. Although it is easy to observe behavior in the real-world, the underlying actions may not be accessible. We present a new method for imitation solely from observations that achieves comparable performance to experts on challenging continuous control tasks while also exhibiting robustness in the presence of observations unrelated to the task. Our method, which we call FORM (for "Future Observation Reward Model") is derived from an inverse RL objective and imitates using a model of expert behavior learned by generative modelling of the expert's observations, without needing ground truth actions. We show that FORM performs comparably to a strong baseline IRL method (GAIL) on the DeepMind Control Suite benchmark, while outperforming GAIL in the presence of task-irrelevant features.
Retrosynthetic planning is a fundamental problem in chemistry for finding a pathway of reactions to synthesize a target molecule. Recently, search algorithms have shown promising results for solving this problem by using deep neural networks (DNNs) to expand their candidate solutions, i.e., adding new reactions to reaction pathways. However, the existing works on this line are suboptimal; the retrosynthetic planning problem requires the reaction pathways to be (a) represented by real-world reactions and (b) executable using "building block" molecules, yet the DNNs expand reaction pathways without fully incorporating such requirements. Motivated by this, we propose an end-to-end framework for directly training the DNNs towards generating reaction pathways with the desirable properties. Our main idea is based on a self-improving procedure that trains the model to imitate successful trajectories found by itself. We also propose a novel reaction augmentation scheme based on a forward reaction model. Our experiments demonstrate that our scheme significantly improves the success rate of solving the retrosynthetic problem from 86.84% to 96.32% while maintaining the performance of DNN for predicting valid reactions.
Neural architecture search has attracted wide attentions in both academia and industry. To accelerate it, researchers proposed weight-sharing methods which first train a super-network to reuse computation among different operators, from which exponentially many sub-networks can be sampled and efficiently evaluated. These methods enjoy great advantages in terms of computational costs, but the sampled sub-networks are not guaranteed to be estimated precisely unless an individual training process is taken. This paper owes such inaccuracy to the inevitable mismatch between assembled network layers, so that there is a random error term added to each estimation. We alleviate this issue by training a graph convolutional network to fit the performance of sampled sub-networks so that the impact of random errors becomes minimal. With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates, which consequently leads to better performance of the final architecture. In addition, our approach also enjoys the flexibility of being used under different hardware constraints, since the graph convolutional network has provided an efficient lookup table of the performance of architectures in the entire search space.
Retrosynthetic planning is a critical task in organic chemistry which identifies a series of reactions that can lead to the synthesis of a target product. The vast number of possible chemical transformations makes the size of the search space very big, and retrosynthetic planning is challenging even for experienced chemists. However, existing methods either require expensive return estimation by rollout with high variance, or optimize for search speed rather than the quality. In this paper, we propose Retro*, a neural-based A*-like algorithm that finds high-quality synthetic routes efficiently. It maintains the search as an AND-OR tree, and learns a neural search bias with off-policy data. Then guided by this neural network, it performs best-first search efficiently during new planning episodes. Experiments on benchmark USPTO datasets show that, our proposed method outperforms existing state-of-the-art with respect to both the success rate and solution quality, while being more efficient at the same time.