The convex hull of a data set $P$ is the smallest convex set that contains $P$. In this work, we present a new data structure for convex hull, that allows for efficient dynamic updates. In a dynamic convex hull implementation, the following traits are desirable: (1) algorithms for efficiently answering queries as to whether a specified point is inside or outside the hull, (2) adhering to geometric robustness, and (3) algorithmic simplicity.Furthermore, a specific but well-motivated type of two-dimensional data is rank-based data. Here, the input is a set of real-valued numbers $Y$ where for any number $y\in Y$ its rank is its index in $Y$'s sorted order. Each value in $Y$ can be mapped to a point $(rank, value)$ to obtain a two-dimensional point set. In this work, we give an efficient, geometrically robust, dynamic convex hull algorithm, that facilitates queries to whether a point is internal. Furthermore, our construction can be used to efficiently update the convex hull of rank-ordered data, when the real-valued point set is subject to insertions and deletions. Our improved solution is based on an algorithmic simplification of the classical convex hull data structure by Overmars and van Leeuwen~[STOC'80], combined with new algorithmic insights. Our theoretical guarantees on the update time match those of Overmars and van Leeuwen, namely $O(\log^2 |P|)$, while we allow a wider range of functionalities (including rank-based data). Our algorithmic simplification includes simplifying an 11-case check down to a 3-case check that can be written in 20 lines of easily readable C-code. We extend our solution to provide a trade-off between theoretical guarantees and the practical performance of our algorithm. We test and compare our solutions extensively on inputs that were generated randomly or adversarially, including benchmarking datasets from the literature.
Non-contrastive self-supervised learning (NC-SSL) methods like BarlowTwins and VICReg have shown great promise for label-free representation learning in computer vision. Despite the apparent simplicity of these techniques, researchers must rely on several empirical heuristics to achieve competitive performance, most notably using high-dimensional projector heads and two augmentations of the same image. In this work, we provide theoretical insights on the implicit bias of the BarlowTwins and VICReg loss that can explain these heuristics and guide the development of more principled recommendations. Our first insight is that the orthogonality of the features is more critical than projector dimensionality for learning good representations. Based on this, we empirically demonstrate that low-dimensional projector heads are sufficient with appropriate regularization, contrary to the existing heuristic. Our second theoretical insight suggests that using multiple data augmentations better represents the desiderata of the SSL objective. Based on this, we demonstrate that leveraging more augmentations per sample improves representation quality and trainability. In particular, it improves optimization convergence, leading to better features emerging earlier in the training. Remarkably, we demonstrate that we can reduce the pretraining dataset size by up to 4x while maintaining accuracy and improving convergence simply by using more data augmentations. Combining these insights, we present practical pretraining recommendations that improve wall-clock time by 2x and improve performance on CIFAR-10/STL-10 datasets using a ResNet-50 backbone. Thus, this work provides a theoretical insight into NC-SSL and produces practical recommendations for enhancing its sample and compute efficiency.
This article considers Bayesian model selection via mean-field (MF) variational approximation. Towards this goal, we study the non-asymptotic properties of MF inference under the Bayesian framework that allows latent variables and model mis-specification. Concretely, we show a Bernstein von-Mises (BvM) theorem for the variational distribution from MF under possible model mis-specification, which implies the distributional convergence of MF variational approximation to a normal distribution centering at the maximal likelihood estimator (within the specified model). Motivated by the BvM theorem, we propose a model selection criterion using the evidence lower bound (ELBO), and demonstrate that the model selected by ELBO tends to asymptotically agree with the one selected by the commonly used Bayesian information criterion (BIC) as sample size tends to infinity. Comparing to BIC, ELBO tends to incur smaller approximation error to the log-marginal likelihood (a.k.a. model evidence) due to a better dimension dependence and full incorporation of the prior information. Moreover, we show the geometric convergence of the coordinate ascent variational inference (CAVI) algorithm under the parametric model framework, which provides a practical guidance on how many iterations one typically needs to run when approximating the ELBO. These findings demonstrate that variational inference is capable of providing a computationally efficient alternative to conventional approaches in tasks beyond obtaining point estimates, which is also empirically demonstrated by our extensive numerical experiments.
While coresets have been growing in terms of their application, barring few exceptions, they have mostly been limited to unsupervised settings. We consider supervised classification problems, and non-decomposable evaluation measures in such settings. We show that stratified uniform sampling based coresets have excellent empirical performance that are backed by theoretical guarantees too. We focus on the F1 score and Matthews Correlation Coefficient, two widely used non-decomposable objective functions that are nontrivial to optimize for and show that uniform coresets attain a lower bound for coreset size, and have good empirical performance, comparable with ``smarter'' coreset construction strategies.
Sequential recommendation models are crucial for next-item recommendations in online platforms, capturing complex patterns in user interactions. However, many focus on a single behavior, overlooking valuable implicit interactions like clicks and favorites. Existing multi-behavioral models often fail to simultaneously capture sequential patterns. We propose CASM, a Context-Aware Sequential Model, leveraging sequential models to seamlessly handle multiple behaviors. CASM employs context-aware multi-head self-attention for heterogeneous historical interactions and a weighted binary cross-entropy loss for precise control over behavior contributions. Experimental results on four datasets demonstrate CASM's superiority over state-of-the-art approaches.
In this work, we solve differential equations using quantum Chebyshev feature maps. We propose a tensor product over a summation of Pauli-Z operators as a change in the measurement observables resulting in improved accuracy and reduced computation time for initial value problems processed by floating boundary handling. This idea has been tested on solving the complex dynamics of a Riccati equation as well as on a system of differential equations. Furthermore, a second-order differential equation is investigated in which we propose adding entangling layers to improve accuracy without increasing the variational parameters. Additionally, a modified self-adaptivity approach of physics-informed neural networks is incorporated to balance the multi-objective loss function. Finally, a new quantum circuit structure is proposed to approximate multivariable functions, tested on solving a 2D Poisson's equation.
We now have a wide range of proof assistants available for compositional reasoning in monoidal or higher categories which are free on some generating signature. However, none of these allow us to represent categorical operations such as products, equalizers, and similar logical techniques. Here we show how the foundational mathematical formalism of one such proof assistant can be generalized, replacing the conventional notion of string diagram as a geometrical entity living inside an n-cube with a posetal variant that allows exotic branching structure. We show that these generalized diagrams have richer behaviour with respect to categorical limits, and give an algorithm for computing limits in this setting, with a view towards future application in proof assistants.
The prevalence of the powerful multilingual models, such as Whisper, has significantly advanced the researches on speech recognition. However, these models often struggle with handling the code-switching setting, which is essential in multilingual speech recognition. Recent studies have attempted to address this setting by separating the modules for different languages to ensure distinct latent representations for languages. Some other methods considered the switching mechanism based on language identification. In this study, a new attention-guided adaptation is proposed to conduct parameter-efficient learning for bilingual ASR. This method selects those attention heads in a model which closely express language identities and then guided those heads to be correctly attended with their corresponding languages. The experiments on the Mandarin-English code-switching speech corpus show that the proposed approach achieves a 14.2% mixed error rate, surpassing state-of-the-art method, where only 5.6% additional parameters over Whisper are trained.
Federated Learning (FL) is a decentralized machine-learning paradigm, in which a global server iteratively averages the model parameters of local users without accessing their data. User heterogeneity has imposed significant challenges to FL, which can incur drifted global models that are slow to converge. Knowledge Distillation has recently emerged to tackle this issue, by refining the server model using aggregated knowledge from heterogeneous users, other than directly averaging their model parameters. This approach, however, depends on a proxy dataset, making it impractical unless such a prerequisite is satisfied. Moreover, the ensemble knowledge is not fully utilized to guide local model learning, which may in turn affect the quality of the aggregated model. Inspired by the prior art, we propose a data-free knowledge distillation} approach to address heterogeneous FL, where the server learns a lightweight generator to ensemble user information in a data-free manner, which is then broadcasted to users, regulating local training using the learned knowledge as an inductive bias. Empirical studies powered by theoretical implications show that, our approach facilitates FL with better generalization performance using fewer communication rounds, compared with the state-of-the-art.
Approaches based on deep neural networks have achieved striking performance when testing data and training data share similar distribution, but can significantly fail otherwise. Therefore, eliminating the impact of distribution shifts between training and testing data is crucial for building performance-promising deep models. Conventional methods assume either the known heterogeneity of training data (e.g. domain labels) or the approximately equal capacities of different domains. In this paper, we consider a more challenging case where neither of the above assumptions holds. We propose to address this problem by removing the dependencies between features via learning weights for training samples, which helps deep models get rid of spurious correlations and, in turn, concentrate more on the true connection between discriminative features and labels. Extensive experiments clearly demonstrate the effectiveness of our method on multiple distribution generalization benchmarks compared with state-of-the-art counterparts. Through extensive experiments on distribution generalization benchmarks including PACS, VLCS, MNIST-M, and NICO, we show the effectiveness of our method compared with state-of-the-art counterparts.
Knowledge graph embedding, which aims to represent entities and relations as low dimensional vectors (or matrices, tensors, etc.), has been shown to be a powerful technique for predicting missing links in knowledge graphs. Existing knowledge graph embedding models mainly focus on modeling relation patterns such as symmetry/antisymmetry, inversion, and composition. However, many existing approaches fail to model semantic hierarchies, which are common in real-world applications. To address this challenge, we propose a novel knowledge graph embedding model---namely, Hierarchy-Aware Knowledge Graph Embedding (HAKE)---which maps entities into the polar coordinate system. HAKE is inspired by the fact that concentric circles in the polar coordinate system can naturally reflect the hierarchy. Specifically, the radial coordinate aims to model entities at different levels of the hierarchy, and entities with smaller radii are expected to be at higher levels; the angular coordinate aims to distinguish entities at the same level of the hierarchy, and these entities are expected to have roughly the same radii but different angles. Experiments demonstrate that HAKE can effectively model the semantic hierarchies in knowledge graphs, and significantly outperforms existing state-of-the-art methods on benchmark datasets for the link prediction task.