Archetypal analysis is a matrix factorization method with convexity constraints. Due to local minima, a good initialization is essential, but frequently used initialization methods yield either sub-optimal starting points or are prone to get stuck in poor local minima. In this paper, we propose archetypal analysis++ (AA++), a probabilistic initialization strategy for archetypal analysis that sequentially samples points based on their influence on the objective, similar to $k$-means++. In fact, we argue that $k$-means++ already approximates the proposed initialization method. Furthermore, we suggest to adapt an efficient Monte Carlo approximation of $k$-means++ to AA++. In an extensive empirical evaluation of 13 real-world data sets of varying sizes and dimensionalities and considering two pre-processing strategies, we show that AA++ nearly always outperforms all baselines, including the most frequently used ones.
We present a solution to the problem of spatio-temporal calibration for event cameras mounted on an onmi-directional vehicle. Different from traditional methods that typically determine the camera's pose with respect to the vehicle's body frame using alignment of trajectories, our approach leverages the kinematic correlation of two sets of linear velocity estimates from event data and wheel odometers, respectively. The overall calibration task consists of estimating the underlying temporal offset between the two heterogeneous sensors, and furthermore, recovering the extrinsic rotation that defines the linear relationship between the two sets of velocity estimates. The first sub-problem is formulated as an optimization one, which looks for the optimal temporal offset that maximizes a correlation measurement invariant to arbitrary linear transformation. Once the temporal offset is compensated, the extrinsic rotation can be worked out with an iterative closed-form solver that incrementally registers associated linear velocity estimates. The proposed algorithm is proved effective on both synthetic data and real data, outperforming traditional methods based on alignment of trajectories.
In Statistical Relational Artificial Intelligence, a branch of AI and machine learning which combines the logical and statistical schools of AI, one uses the concept {\em para\-metrized probabilistic graphical model (PPGM)} to model (conditional) dependencies between random variables and to make probabilistic inferences about events on a space of "possible worlds". The set of possible worlds with underlying domain $D$ (a set of objects) can be represented by the set $\mathbf{W}_D$ of all first-order structures (for a suitable signature) with domain $D$. Using a formal logic we can describe events on $\mathbf{W}_D$. By combining a logic and a PPGM we can also define a probability distribution $\mathbb{P}_D$ on $\mathbf{W}_D$ and use it to compute the probability of an event. We consider a logic, denoted $PLA$, with truth values in the unit interval, which uses aggregation functions, such as arithmetic mean, geometric mean, maximum and minimum instead of quantifiers. However we face the problem of computational efficiency and this problem is an obstacle to the wider use of methods from Statistical Relational AI in practical applications. We address this problem by proving that the described probability will, under certain assumptions on the PPGM and the sentence $\varphi$, converge as the size of $D$ tends to infinity. The convergence result is obtained by showing that every formula $\varphi(x_1, \ldots, x_k)$ which contains only "admissible" aggregation functions (e.g. arithmetic and geometric mean, max and min) is asymptotically equivalent to a formula $\psi(x_1, \ldots, x_k)$ without aggregation functions.
Meta-structures are widely used to define which subset of neighbors to aggregate information in heterogeneous information networks (HINs). In this work, we investigate existing meta-structures, including meta-path and meta-graph, and observe that they are initially designed manually with fixed patterns and hence are insufficient to encode various rich semantic information on diverse HINs. Through reflection on their limitation, we define a new concept called meta-multigraph as a more expressive and flexible generalization of meta-graph, and propose a stable differentiable search method to automatically optimize the meta-multigraph for specific HINs and tasks. As the flexibility of meta-multigraphs may propagate redundant messages, we further introduce a complex-to-concise (C2C) meta-multigraph that propagates messages from complex to concise along the depth of meta-multigraph. Moreover, we observe that the differentiable search typically suffers from unstable search and a significant gap between the meta-structures in search and evaluation. To this end, we propose a progressive search algorithm by implicitly narrowing the search space to improve search stability and reduce inconsistency. Extensive experiments are conducted on six medium-scale benchmark datasets and one large-scale benchmark dataset over two representative tasks, i.e., node classification and recommendation. Empirical results demonstrate that our search methods can automatically find expressive meta-multigraphs and C2C meta-multigraphs, enabling our model to outperform state-of-the-art heterogeneous graph neural networks.
Stringent line-of-sight demands necessitated by the fast attenuating nature of millimeter waves (mmWaves) through obstacles pose one of the central problems of next generation wireless networks. These mmWave links are easily disrupted due to obstacles, including vehicles and pedestrians, which cause degradation in link quality and even link failure. Dynamic obstacles are usually tracked by dedicated tracking hardware like RGB-D cameras, which usually have small ranges, and hence lead to prohibitively increased deployment costs to achieve complete coverage of the deployment area. In this manuscript, we propose an altogether different approach to track multiple dynamic obstacles in an mmWave network, solely based on short-term historical link failure information, without resorting to any dedicated tracking hardware. After proving that the said problem is NP-complete, we employ a greedy set-cover based approach to solve it. Using the obtained trajectories, we perform proactive handoffs for at-risk links. We compare our approach with an RGB-D camera-based approach and show that our approach provides better tracking and handoff performances when the camera coverage is low to moderate, which is often the case in real deployment scenarios.
This study explores the intricacies of waiting games, a novel dynamic that emerged with Ethereum's transition to a Proof-of-Stake (PoS)-based block proposer selection protocol. Within this PoS framework, validators acquire a distinct monopoly position during their assigned slots, given that block proposal rights are set deterministically, contrasting with Proof-of-Work (PoW) protocols. Consequently, validators have the power to delay block proposals, stepping outside the honest validator specs, optimizing potential returns through MEV payments. Nonetheless, this strategic behaviour introduces the risk of orphaning if attestors fail to observe and vote on the block timely. Our quantitative analysis of this waiting phenomenon and its associated risks reveals an opportunity for enhanced MEV extraction, exceeding standard protocol rewards, and providing sufficient incentives for validators to play the game. Notably, our findings indicate that delayed proposals do not always result in orphaning and orphaned blocks are not consistently proposed later than non-orphaned ones. To further examine consensus stability under varying network conditions, we adopt an agent-based simulation model tailored for PoS-Ethereum, illustrating that consensus disruption will not be observed unless significant delay strategies are adopted. Ultimately, this research offers valuable insights into the advent of waiting games on Ethereum, providing a comprehensive understanding of trade-offs and potential profits for validators within the blockchain ecosystem.
Adversarial attacks in reinforcement learning (RL) often assume highly-privileged access to the victim's parameters, environment, or data. Instead, this paper proposes a novel adversarial setting called a Cheap Talk MDP in which an Adversary can merely append deterministic messages to the Victim's observation, resulting in a minimal range of influence. The Adversary cannot occlude ground truth, influence underlying environment dynamics or reward signals, introduce non-stationarity, add stochasticity, see the Victim's actions, or access their parameters. Additionally, we present a simple meta-learning algorithm called Adversarial Cheap Talk (ACT) to train Adversaries in this setting. We demonstrate that an Adversary trained with ACT still significantly influences the Victim's training and testing performance, despite the highly constrained setting. Affecting train-time performance reveals a new attack vector and provides insight into the success and failure modes of existing RL algorithms. More specifically, we show that an ACT Adversary is capable of harming performance by interfering with the learner's function approximation, or instead helping the Victim's performance by outputting useful features. Finally, we show that an ACT Adversary can manipulate messages during train-time to directly and arbitrarily control the Victim at test-time. Project video and code are available at //sites.google.com/view/adversarial-cheap-talk
In parametric lock-sharing systems processes can spawn new processes to run in parallel, and can create new locks. The behavior of every process is given by a pushdown automaton. We consider infinite behaviors of such systems under strong process fairness condition. A result of a potentially infinite execution of a system is a limit configuration, that is a potentially infinite tree. The verification problem is to determine if a given system has a limit configuration satisfying a given regular property. This formulation of the problem encompasses verification of reachability as well as of many liveness properties. We show that this verification problem, while undecidable in general, is decidable for nested lock usage. We show Exptime-completeness of the verification problem. The main source of complexity is the number of parameters in the spawn operation. If the number of parameters is bounded, our algorithm works in Ptime for properties expressed by parity automata with a fixed number of ranks.
Interpretability in machine learning (ML) is crucial for high stakes decisions and troubleshooting. In this work, we provide fundamental principles for interpretable ML, and dispel common misunderstandings that dilute the importance of this crucial topic. We also identify 10 technical challenge areas in interpretable machine learning and provide history and background on each problem. Some of these problems are classically important, and some are recent problems that have arisen in the last few years. These problems are: (1) Optimizing sparse logical models such as decision trees; (2) Optimization of scoring systems; (3) Placing constraints into generalized additive models to encourage sparsity and better interpretability; (4) Modern case-based reasoning, including neural networks and matching for causal inference; (5) Complete supervised disentanglement of neural networks; (6) Complete or even partial unsupervised disentanglement of neural networks; (7) Dimensionality reduction for data visualization; (8) Machine learning models that can incorporate physics and other generative or causal constraints; (9) Characterization of the "Rashomon set" of good models; and (10) Interpretable reinforcement learning. This survey is suitable as a starting point for statisticians and computer scientists interested in working in interpretable machine learning.
This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.
Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute: self-supervised pretraining and high-resource machine translation. We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models converge in significantly fewer steps. Moreover, this acceleration in convergence typically outpaces the additional computational overhead of using larger models. Therefore, the most compute-efficient training strategy is to counterintuitively train extremely large models but stop after a small number of iterations. This leads to an apparent trade-off between the training efficiency of large Transformer models and the inference efficiency of small Transformer models. However, we show that large models are more robust to compression techniques such as quantization and pruning than small models. Consequently, one can get the best of both worlds: heavily compressed, large models achieve higher accuracy than lightly compressed, small models.