There is a presumption in human-computer interaction that laying out menus and most other material in neat rows and columns helps users get work done. The rule has been so implicit in the field of design as to allow for no debate. However, the idea that perfect collinearity benefits creates an advantage for both either search and or recall has rarely been tested. Drawing from separate branches of cognitive literature, we tested a minimal brainstorming interface with either aligned or eccentrically arranged layouts on 96 college students. Incidental exact recall of recently worked locations improved in the eccentric condition. And in both conditions there were frequent near-miss recall errors to neighboring aligned objects and groups of objects. Further analysis found only marginal performance advantages specifically for females with the eccentric design. However, NASA-TLX subjective measures showed that in eccentric, females reported higher performance, less effort, and yet also higher frustration; while males reported lower performance with about the same effort, and lower frustration.
While analogies are a common way to evaluate word embeddings in NLP, it is also of interest to investigate whether or not analogical reasoning is a task in itself that can be learned. In this paper, we test several ways to learn basic analogical reasoning, specifically focusing on analogies that are more typical of what is used to evaluate analogical reasoning in humans than those in commonly used NLP benchmarks. Our experiments find that models are able to learn analogical reasoning, even with a small amount of data. We additionally compare our models to a dataset with a human baseline, and find that after training, models approach human performance.
Spectral independence is a recently-developed framework for obtaining sharp bounds on the convergence time of the classical Glauber dynamics. This new framework has yielded optimal $O(n \log n)$ sampling algorithms on bounded-degree graphs for a large class of problems throughout the so-called uniqueness regime, including, for example, the problems of sampling independent sets, matchings, and Ising-model configurations. Our main contribution is to relax the bounded-degree assumption that has so far been important in establishing and applying spectral independence. Previous methods for avoiding degree bounds rely on using $L^p$-norms to analyse contraction on graphs with bounded connective constant (Sinclair, Srivastava, Yin; FOCS'13). The non-linearity of $L^p$-norms is an obstacle to applying these results to bound spectral independence. Our solution is to capture the $L^p$-analysis recursively by amortising over the subtrees of the recurrence used to analyse contraction. Our method generalises previous analyses that applied only to bounded-degree graphs. As a main application of our techniques, we consider the random graph $G(n,d/n)$, where the previously known algorithms run in time $n^{O(\log d)}$ or applied only to large $d$. We refine these algorithmic bounds significantly, and develop fast $n^{1+o(1)}$ algorithms based on Glauber dynamics that apply to all $d$, throughout the uniqueness regime.
A key challenge when trying to understand innovation is that it is a dynamic, ongoing process, which can be highly contingent on ephemeral factors such as culture, economics, or luck. This means that any analysis of the real-world process must necessarily be historical - and thus probably too late to be most useful - but also cannot be sure what the properties of the web of connections between innovations is or was. Here I try to address this by designing and generating a set of synthetic innovation web "dictionaries" that can be used to host sampled innovation timelines, probe the overall statistics and behaviours of these processes, and determine the degree of their reliance on the structure or generating algorithm. Thus, inspired by the work of Fink, Reeves, Palma and Farr (2017) on innovation in language, gastronomy, and technology, I study how new symbol discovery manifests itself in terms of additional "word" vocabulary being available from dictionaries generated from a finite number of symbols. Several distinct dictionary generation models are investigated using numerical simulation, with emphasis on the scaling of knowledge as dictionary generators and parameters are varied, and the role of which order the symbols are discovered in.
Recurrent neural networks (RNNs) have yielded promising results for both recognizing objects in challenging conditions and modeling aspects of primate vision. However, the representational dynamics of recurrent computations remain poorly understood, especially in large-scale visual models. Here, we studied such dynamics in RNNs trained for object classification on MiniEcoset, a novel subset of ecoset. We report two main insights. First, upon inference, representations continued to evolve after correct classification, suggesting a lack of the notion of being ``done with classification''. Second, focusing on ``readout zones'' as a way to characterize the activation trajectories, we observe that misclassified representations exhibit activation patterns with lower L2 norm, and are positioned more peripherally in the readout zones. Such arrangements help the misclassified representations move into the correct zones as time progresses. Our findings generalize to networks with lateral and top-down connections, and include both additive and multiplicative interactions with the bottom-up sweep. The results therefore contribute to a general understanding of RNN dynamics in naturalistic tasks. We hope that the analysis framework will aid future investigations of other types of RNNs, including understanding of representational dynamics in primate vision.
Projected distributions have proved to be useful in the study of circular and directional data. Although any multivariate distribution can be used to produce a projected model, these distributions are typically parametric. In this article we consider a multivariate P\'olya tree on $R^k$ and project it to the unit hypersphere $S^k$ to define a new Bayesian nonparametric model for directional data. We study the properties of the proposed model and in particular, concentrate on the implied conditional distributions of some directions given the others to define a directional-directional regression model. We also define a multivariate linear regression model with P\'olya tree error and project it to define a linear-directional regression model. We obtain the posterior characterisation of all models and show their performance with simulated and real datasets.
Time-Aware Shaper (TAS) is a time-triggered scheduling mechanism that ensures bounded latency for time-critical Scheduled Traffic (ST) flows. The Linux kernel implementation (a.k.a TAPRIO) has limited capabilities due to varying CPU workloads and thus does not offer tight latency bound for the ST flows. Also, currently only higher cycle times are possible. Other software implementations are limited to simulation studies without physical implementation. In this paper, we present $\mu$TAS, a MicroC-based hardware implementation of TAS onto a programmable SmartNIC. $\mu$TAS takes advantage of the parallel-processing architecture of the SmartNIC to configure the scheduling behaviour of its queues at runtime. To demonstrate the effectiveness of $\mu$TAS, we built a Time-Sensitive Networking (TSN) testbed from scratch. This consists of multiple end-hosts capable of generating ST and Best Effort (BE) flows and TSN switches equipped with SmartNICs running $\mu$TAS. Time synchronization is maintained between the switches and hosts. Our experiments demonstrate that the ST flows experience a bounded latency of the order of tens of microseconds.
We consider several basic questions on distributed routing in directed graphs with multiple additive costs, or metrics, and multiple constraints. Distributed routing in this sense is used in several protocols, such as IS-IS and OSPF. A practical approach to the multi-constraint routing problem is to, first, combine the metrics into a single `composite' metric, and then apply one-to-all shortest path algorithms, e.g. Dijkstra, in order to find shortest path trees. We show that, in general, even if a feasible path exists and is known for every source and destination pair, it is impossible to guarantee a distributed routing under several constraints. We also study the question of choosing the optimal `composite' metric. We show that under certain mathematical assumptions we can efficiently find a convex combination of several metrics that maximizes the number of discovered feasible paths. Sometimes it can be done analytically, and is in general possible using what we call a 'smart iterative approach'. We illustrate these findings by extensive experiments on several typical network topologies.
In this paper, we tackle the problem of video alignment, the process of matching the frames of a pair of videos containing similar actions. The main challenge in video alignment is that accurate correspondence should be established despite the differences in the execution processes and appearances between the two videos. We introduce an unsupervised method for alignment that uses global and local features of the frames. In particular, we introduce effective features for each video frame using three machine vision tools: person detection, pose estimation, and VGG network. Then, the features are processed and combined to construct a multidimensional time series that represents the video. The resulting time series are used to align videos of the same actions using a novel version of dynamic time warping named Diagonalized Dynamic Time Warping(DDTW). The main advantage of our approach is that no training is required, which makes it applicable for any new type of action without any need to collect training samples for it. For evaluation, we considered video synchronization and phase classification tasks on the Penn action dataset. Also, for an effective evaluation of the video synchronization task, we present a new metric called Enclosed Area Error(EAE). The results show that our method outperforms previous state-of-the-art methods, such as TCC, and other self-supervised and weakly supervised methods.
A recent body of work has demonstrated that Transformer embeddings can be linearly decomposed into well-defined sums of factors, that can in turn be related to specific network inputs or components. There is however still a dearth of work studying whether these mathematical reformulations are empirically meaningful. In the present work, we study representations from machine-translation decoders using two of such embedding decomposition methods. Our results indicate that, while decomposition-derived indicators effectively correlate with model performance, variation across different runs suggests a more nuanced take on this question. The high variability of our measurements indicate that geometry reflects model-specific characteristics more than it does sentence-specific computations, and that similar training conditions do not guarantee similar vector spaces.
Bayesian cross-validation (CV) is a popular method for predictive model assessment that is simple to implement and broadly applicable. A wide range of CV schemes is available for time series applications, including generic leave-one-out (LOO) and K-fold methods, as well as specialized approaches intended to deal with serial dependence such as leave-future-out (LFO), h-block, and hv-block. Existing large-sample results show that both specialized and generic methods are applicable to models of serially-dependent data. However, large sample consistency results overlook the impact of sampling variability on accuracy in finite samples. Moreover, the accuracy of a CV scheme depends on many aspects of the procedure. We show that poor design choices can lead to elevated rates of adverse selection. In this paper, we consider the problem of identifying the regression component of an important class of models of data with serial dependence, autoregressions of order p with q exogenous regressors (ARX(p,q)), under the logarithmic scoring rule. We show that when serial dependence is present, scores computed using the joint (multivariate) density have lower variance and better model selection accuracy than the popular pointwise estimator. In addition, we present a detailed case study of the special case of ARX models with fixed autoregressive structure and variance. For this class, we derive the finite-sample distribution of the CV estimators and the model selection statistic. We conclude with recommendations for practitioners.