We study the problem of designing mechanisms for \emph{information acquisition} scenarios. This setting models strategic interactions between an uniformed \emph{receiver} and a set of informed \emph{senders}. In our model the senders receive information about the underlying state of nature and communicate their observation (either truthfully or not) to the receiver, which, based on this information, selects an action. Our goal is to design mechanisms maximizing the receiver's utility while incentivizing the senders to report truthfully their information. First, we provide an algorithm that efficiently computes an optimal \emph{incentive compatible} (IC) mechanism. Then, we focus on the \emph{online} problem in which the receiver sequentially interacts in an unknown game, with the objective of minimizing the \emph{cumulative regret} w.r.t. the optimal IC mechanism, and the \emph{cumulative violation} of the incentive compatibility constraints. We investigate two different online scenarios, \emph{i.e.,} the \emph{full} and \emph{bandit feedback} settings. For the full feedback problem, we propose an algorithm that guarantees $\tilde{\mathcal O}(\sqrt T)$ regret and violation, while for the bandit feedback setting we present an algorithm that attains $\tilde{\mathcal O}(T^{\alpha})$ regret and $\tilde{\mathcal O}(T^{1-\alpha/2})$ violation for any $\alpha\in[1/2, 1]$. Finally, we complement our results providing a tight lower bound.
We consider search by mobile agents for a hidden, idle target, placed on the infinite line. Feasible solutions are agent trajectories in which all agents reach the target sooner or later. A special feature of our problem is that the agents are $p$-faulty, meaning that every attempt to change direction is an independent Bernoulli trial with known probability $p$, where $p$ is the probability that a turn fails. We are looking for agent trajectories that minimize the worst-case expected termination time, relative to competitive analysis. First, we study linear search with one deterministic $p$-faulty agent, i.e., with no access to random oracles, $p\in (0,1/2)$. For this problem, we provide trajectories that leverage the probabilistic faults into an algorithmic advantage. Our strongest result pertains to a search algorithm (deterministic, aside from the adversarial probabilistic faults) which, as $p\to 0$, has optimal performance $4.59112+\epsilon$, up to the additive term $\epsilon$ that can be arbitrarily small. Additionally, it has performance less than $9$ for $p\leq 0.390388$. When $p\to 1/2$, our algorithm has performance $\Theta(1/(1-2p))$, which we also show is optimal up to a constant factor. Second, we consider linear search with two $p$-faulty agents, $p\in (0,1/2)$, for which we provide three algorithms of different advantages, all with a bounded competitive ratio even as $p\rightarrow 1/2$. Indeed, for this problem, we show how the agents can simulate the trajectory of any $0$-faulty agent (deterministic or randomized), independently of the underlying communication model. As a result, searching with two agents allows for a solution with a competitive ratio of $9+\epsilon$, or a competitive ratio of $4.59112+\epsilon$. Our final contribution is a novel algorithm for searching with two $p$-faulty agents that achieves a competitive ratio $3+4\sqrt{p(1-p)}$.
A burgeoning line of research leverages deep neural networks to approximate the solutions to high dimensional PDEs, opening lines of theoretical inquiry focused on explaining how it is that these models appear to evade the curse of dimensionality. However, most prior theoretical analyses have been limited to linear PDEs. In this work, we take a step towards studying the representational power of neural networks for approximating solutions to nonlinear PDEs. We focus on a class of PDEs known as \emph{nonlinear elliptic variational PDEs}, whose solutions minimize an \emph{Euler-Lagrange} energy functional $\mathcal{E}(u) = \int_\Omega L(x, u(x), \nabla u(x)) - f(x) u(x)dx$. We show that if composing a function with Barron norm $b$ with partial derivatives of $L$ produces a function of Barron norm at most $B_L b^p$, the solution to the PDE can be $\epsilon$-approximated in the $L^2$ sense by a function with Barron norm $O\left(\left(dB_L\right)^{\max\{p \log(1/ \epsilon), p^{\log(1/\epsilon)}\}}\right)$. By a classical result due to Barron [1993], this correspondingly bounds the size of a 2-layer neural network needed to approximate the solution. Treating $p, \epsilon, B_L$ as constants, this quantity is polynomial in dimension, thus showing neural networks can evade the curse of dimensionality. Our proof technique involves neurally simulating (preconditioned) gradient in an appropriate Hilbert space, which converges exponentially fast to the solution of the PDE, and such that we can bound the increase of the Barron norm at each iterate. Our results subsume and substantially generalize analogous prior results for linear elliptic PDEs over a unit hypercube.
Online federated learning (FL) enables geographically distributed devices to learn a global shared model from locally available streaming data. Most online FL literature considers a best-case scenario regarding the participating clients and the communication channels. However, these assumptions are often not met in real-world applications. Asynchronous settings can reflect a more realistic environment, such as heterogeneous client participation due to available computational power and battery constraints, as well as delays caused by communication channels or straggler devices. Further, in most applications, energy efficiency must be taken into consideration. Using the principles of partial-sharing-based communications, we propose a communication-efficient asynchronous online federated learning (PAO-Fed) strategy. By reducing the communication overhead of the participants, the proposed method renders participation in the learning task more accessible and efficient. In addition, the proposed aggregation mechanism accounts for random participation, handles delayed updates and mitigates their effect on accuracy. We prove the first and second-order convergence of the proposed PAO-Fed method and obtain an expression for its steady-state mean square deviation. Finally, we conduct comprehensive simulations to study the performance of the proposed method on both synthetic and real-life datasets. The simulations reveal that in asynchronous settings, the proposed PAO-Fed is able to achieve the same convergence properties as that of the online federated stochastic gradient while reducing the communication overhead by 98 percent.
Each year, deep learning demonstrates new and improved empirical results with deeper and wider neural networks. Meanwhile, with existing theoretical frameworks, it is difficult to analyze networks deeper than two layers without resorting to counting parameters or encountering sample complexity bounds that are exponential in depth. Perhaps it may be fruitful to try to analyze modern machine learning under a different lens. In this paper, we propose a novel information-theoretic framework with its own notions of regret and sample complexity for analyzing the data requirements of machine learning. With our framework, we first work through some classical examples such as scalar estimation and linear regression to build intuition and introduce general techniques. Then, we use the framework to study the sample complexity of learning from data generated by deep neural networks with ReLU activation units. For a particular prior distribution on weights, we establish sample complexity bounds that are simultaneously width independent and linear in depth. This prior distribution gives rise to high-dimensional latent representations that, with high probability, admit reasonably accurate low-dimensional approximations. We conclude by corroborating our theoretical results with experimental analysis of random single-hidden-layer neural networks.
Deep learning surrogate models are being increasingly used in accelerating scientific simulations as a replacement for costly conventional numerical techniques. However, their use remains a significant challenge when dealing with real-world complex examples. In this work, we demonstrate three types of neural network architectures for efficient learning of highly non-linear deformations of solid bodies. The first two architectures are based on the recently proposed CNN U-NET and MAgNET (graph U-NET) frameworks which have shown promising performance for learning on mesh-based data. The third architecture is Perceiver IO, a very recent architecture that belongs to the family of attention-based neural networks--a class that has revolutionised diverse engineering fields and is still unexplored in computational mechanics. We study and compare the performance of all three networks on two benchmark examples, and show their capabilities to accurately predict the non-linear mechanical responses of soft bodies.
Motivated by statistical inference problems in high-dimensional time series data analysis, we first derive non-asymptotic error bounds for Gaussian approximations of sums of high-dimensional dependent random vectors on hyper-rectangles, simple convex sets and sparsely convex sets. We investigate the quantitative effect of temporal dependence on the rates of convergence to a Gaussian random vector over three different dependency frameworks ($\alpha$-mixing, $m$-dependent, and physical dependence measure). In particular, we establish new error bounds under the $\alpha$-mixing framework and derive faster rate over existing results under the physical dependence measure. To implement the proposed results in practical statistical inference problems, we also derive a data-driven parametric bootstrap procedure based on a kernel estimator for the long-run covariance matrices. We apply the unified Gaussian and bootstrap approximation results to test mean vectors with combined $\ell^2$ and $\ell^\infty$ type statistics, change point detection, and construction of confidence regions for covariance and precision matrices, all for time series data.
Graph Convolutional Networks (GCNs) have been widely applied in various fields due to their significant power on processing graph-structured data. Typical GCN and its variants work under a homophily assumption (i.e., nodes with same class are prone to connect to each other), while ignoring the heterophily which exists in many real-world networks (i.e., nodes with different classes tend to form edges). Existing methods deal with heterophily by mainly aggregating higher-order neighborhoods or combing the immediate representations, which leads to noise and irrelevant information in the result. But these methods did not change the propagation mechanism which works under homophily assumption (that is a fundamental part of GCNs). This makes it difficult to distinguish the representation of nodes from different classes. To address this problem, in this paper we design a novel propagation mechanism, which can automatically change the propagation and aggregation process according to homophily or heterophily between node pairs. To adaptively learn the propagation process, we introduce two measurements of homophily degree between node pairs, which is learned based on topological and attribute information, respectively. Then we incorporate the learnable homophily degree into the graph convolution framework, which is trained in an end-to-end schema, enabling it to go beyond the assumption of homophily. More importantly, we theoretically prove that our model can constrain the similarity of representations between nodes according to their homophily degree. Experiments on seven real-world datasets demonstrate that this new approach outperforms the state-of-the-art methods under heterophily or low homophily, and gains competitive performance under homophily.
Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.
Recent years have witnessed the enormous success of low-dimensional vector space representations of knowledge graphs to predict missing facts or find erroneous ones. Currently, however, it is not yet well-understood how ontological knowledge, e.g. given as a set of (existential) rules, can be embedded in a principled way. To address this shortcoming, in this paper we introduce a framework based on convex regions, which can faithfully incorporate ontological knowledge into the vector space embedding. Our technical contribution is two-fold. First, we show that some of the most popular existing embedding approaches are not capable of modelling even very simple types of rules. Second, we show that our framework can represent ontologies that are expressed using so-called quasi-chained existential rules in an exact way, such that any set of facts which is induced using that vector space embedding is logically consistent and deductively closed with respect to the input ontology.
Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.