We study polynomial systems with prescribed monomial supports in the Cox rings of toric varieties built from complete polyhedral fans. We present combinatorial formulas for the dimensions of their associated subvarieties under genericity assumptions on the coefficients of the polynomials. Using these formulas, we identify at which degrees generic systems in polytopal algebras form regular sequences. Our motivation comes from sparse elimination theory, where knowing the expected dimension of these subvarieties leads to specialized algorithms and to large speed-ups for solving sparse polynomial systems. As a special case, we classify the degrees at which regular sequences defined by weighted homogeneous polynomials can be found, answering an open question in the Gr\"obner bases literature. We also show that deciding whether a sparse system is generically a regular sequence in a polytopal algebra is hard from the point of view of theoretical computational complexity.
Extremely large aperture arrays can enable unprecedented spatial multiplexing in beyond 5G systems due to their extremely narrow beamfocusing capabilities. However, acquiring the spatial correlation matrix to enable efficient channel estimation is a complex task due to the vast number of antenna dimensions. Recently, a new estimation method called the "reduced-subspace least squares (RS-LS) estimator" has been proposed for densely packed arrays. This method relies solely on the geometry of the array to limit the estimation resources. In this paper, we address a gap in the existing literature by deriving the average spectral efficiency for a certain distribution of user equipments (UEs) and a lower bound on it when using the RS-LS estimator. This bound is determined by the channel gain and the statistics of the normalized spatial correlation matrices of potential UEs but, importantly, does not require knowledge of a specific UE's spatial correlation matrix. We establish that there exists a pilot length that maximizes this expression. Additionally, we derive an approximate expression for the optimal pilot length under low signal-to-noise ratio (SNR) conditions. Simulation results validate the tightness of the derived lower bound and the effectiveness of using the optimized pilot length.
We study a primal-dual reinforcement learning (RL) algorithm for the online constrained Markov decision processes (CMDP) problem, wherein the agent explores an optimal policy that maximizes return while satisfying constraints. Despite its widespread practical use, the existing theoretical literature on primal-dual RL algorithms for this problem only provides sublinear regret guarantees and fails to ensure convergence to optimal policies. In this paper, we introduce a novel policy gradient primal-dual algorithm with uniform probably approximate correctness (Uniform-PAC) guarantees, simultaneously ensuring convergence to optimal policies, sublinear regret, and polynomial sample complexity for any target accuracy. Notably, this represents the first Uniform-PAC algorithm for the online CMDP problem. In addition to the theoretical guarantees, we empirically demonstrate in a simple CMDP that our algorithm converges to optimal policies, while an existing algorithm exhibits oscillatory performance and constraint violation.
We study the problem of multi-agent reinforcement learning (MARL) with adaptivity constraints -- a new problem motivated by real-world applications where deployments of new policies are costly and the number of policy updates must be minimized. For two-player zero-sum Markov Games, we design a (policy) elimination based algorithm that achieves a regret of $\widetilde{O}(\sqrt{H^3 S^2 ABK})$, while the batch complexity is only $O(H+\log\log K)$. In the above, $S$ denotes the number of states, $A,B$ are the number of actions for the two players respectively, $H$ is the horizon and $K$ is the number of episodes. Furthermore, we prove a batch complexity lower bound $\Omega(\frac{H}{\log_{A}K}+\log\log K)$ for all algorithms with $\widetilde{O}(\sqrt{K})$ regret bound, which matches our upper bound up to logarithmic factors. As a byproduct, our techniques naturally extend to learning bandit games and reward-free MARL within near optimal batch complexity. To the best of our knowledge, these are the first line of results towards understanding MARL with low adaptivity.
Agent-based models (ABMs) have shown promise for modelling various real world phenomena incompatible with traditional equilibrium analysis. However, a critical concern is the manual definition of behavioural rules in ABMs. Recent developments in multi-agent reinforcement learning (MARL) offer a way to address this issue from an optimisation perspective, where agents strive to maximise their utility, eliminating the need for manual rule specification. This learning-focused approach aligns with established economic and financial models through the use of rational utility-maximising agents. However, this representation departs from the fundamental motivation for ABMs: that realistic dynamics emerging from bounded rationality and agent heterogeneity can be modelled. To resolve this apparent disparity between the two approaches, we propose a novel technique for representing heterogeneous processing-constrained agents within a MARL framework. The proposed approach treats agents as constrained optimisers with varying degrees of strategic skills, permitting departure from strict utility maximisation. Behaviour is learnt through repeated simulations with policy gradients to adjust action likelihoods. To allow efficient computation, we use parameterised shared policy learning with distributions of agent skill levels. Shared policy learning avoids the need for agents to learn individual policies yet still enables a spectrum of bounded rational behaviours. We validate our model's effectiveness using real-world data on a range of canonical $n$-agent settings, demonstrating significantly improved predictive capability.
Gaussian processes (GPs) based methods for solving partial differential equations (PDEs) demonstrate great promise by bridging the gap between the theoretical rigor of traditional numerical algorithms and the flexible design of machine learning solvers. The main bottleneck of GP methods lies in the inversion of a covariance matrix, whose cost grows cubically concerning the size of samples. Drawing inspiration from neural networks, we propose a mini-batch algorithm combined with GPs to solve nonlinear PDEs. A naive deployment of a stochastic gradient descent method for solving PDEs with GPs is challenging, as the objective function in the requisite minimization problem cannot be depicted as the expectation of a finite-dimensional random function. To address this issue, we employ a mini-batch method to the corresponding infinite-dimensional minimization problem over function spaces. The algorithm takes a mini-batch of samples at each step to update the GP model. Thus, the computational cost is allotted to each iteration. Using stability analysis and convexity arguments, we show that the mini-batch method steadily reduces a natural measure of errors towards zero at the rate of $O(1/K+1/M)$, where $K$ is the number of iterations and $M$ is the batch size.
We investigate the problem of jointly testing multiple hypotheses and estimating a random parameter of the underlying distribution in a sequential setup. The aim is to jointly infer the true hypothesis and the true parameter while using on average as few samples as possible and keeping the detection and estimation errors below predefined levels. Based on mild assumptions on the underlying model, we propose an asymptotically optimal procedure, i.e., a procedure that becomes optimal when the tolerated detection and estimation error levels tend to zero. The implementation of the resulting asymptotically optimal stopping rule is computationally cheap and, hence, applicable for high-dimensional data. We further propose a projected quasi-Newton method to optimally choose the coefficients that parameterize the instantaneous cost function such that the constraints are fulfilled with equality. The proposed theory is validated by numerical examples.
Quantum computing holds immense potential for solving classically intractable problems by leveraging the unique properties of quantum mechanics. The scalability of quantum architectures remains a significant challenge. Multi-core quantum architectures are proposed to solve the scalability problem, arising a new set of challenges in hardware, communications and compilation, among others. One of these challenges is to adapt a quantum algorithm to fit within the different cores of the quantum computer. This paper presents a novel approach for circuit partitioning using Deep Reinforcement Learning, contributing to the advancement of both quantum computing and graph partitioning. This work is the first step in integrating Deep Reinforcement Learning techniques into Quantum Circuit Mapping, opening the door to a new paradigm of solutions to such problems.
Machine teaching often involves the creation of an optimal (typically minimal) dataset to help a model (referred to as the `student') achieve specific goals given by a teacher. While abundant in the continuous domain, the studies on the effectiveness of machine teaching in the discrete domain are relatively limited. This paper focuses on machine teaching in the discrete domain, specifically on manipulating student models' predictions based on the goals of teachers via changing the training data efficiently. We formulate this task as a combinatorial optimization problem and solve it by proposing an iterative searching algorithm. Our algorithm demonstrates significant numerical merit in the scenarios where a teacher attempts at correcting erroneous predictions to improve the student's models, or maliciously manipulating the model to misclassify some specific samples to the target class aligned with his personal profits. Experimental results show that our proposed algorithm can have superior performance in effectively and efficiently manipulating the predictions of the model, surpassing conventional baselines.
Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph. This allows one to apply GCNs not only in scenarios where the given graph is incomplete or corrupted but also in those where a graph is not available. We conduct a series of experiments that analyze the behavior of the proposed method and demonstrate that it outperforms related methods by a significant margin.
In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.