The design of online algorithms for matching markets and revenue management settings is usually bound by the stochastic prior that the demand process is formed by a fixed-length sequence of queries with unknown types, each drawn independently. This assumption of {\em serial independence} implies that the demand of each type, i.e., the number of queries of a given type, has low variance and is approximately Poisson-distributed. This paper explores more general stochastic models for online edge-weighted matching that depart from the serial independence assumption. We propose two new models, \Indep and \Correl, that capture different forms of serial correlations by combining a nonparametric distribution for the demand with standard assumptions on the arrival patterns -- adversarial or random order. The \Indep model has arbitrary marginal distributions for the demands but assumes cross-sectional independence for the customer types, whereas the \Correl model captures common shocks across customer types. We demonstrate that fluid relaxations, which rely solely on expected demand information, have arbitrarily bad performance guarantees. In contrast, we develop new algorithms that essentially achieve optimal constant-factor performance guarantees in each model. Our mathematical analysis includes tighter linear programming relaxations that leverage distribution knowledge, and a new lossless randomized rounding scheme in the case of $\Indep$. In numerical simulations of the $\Indep$ model, we find that tighter relaxations are beneficial under high-variance demand and that our demand-aware rounding scheme can outperform stockout-aware rounding.
Multimodal large language models (MLLMs) demand considerable computations for inference due to the extensive parameters and the additional input tokens needed for visual information representation. Herein, we introduce Visual Tokens Withdrawal (VTW), a plug-and-play module to boost MLLMs for rapid inference. Our approach is inspired by two intriguing phenomena we have observed: (1) the attention sink phenomenon that is prevalent in LLMs also persists in MLLMs, suggesting that initial tokens and nearest tokens receive the majority of attention, while middle vision tokens garner minimal attention in deep layers; (2) the presence of information migration, which implies that visual information is transferred to subsequent text tokens within the first few layers of MLLMs. As per our findings, we conclude that vision tokens are not necessary in the deep layers of MLLMs. Thus, we strategically withdraw them at a certain layer, enabling only text tokens to engage in subsequent layers. To pinpoint the ideal layer for vision tokens withdrawal, we initially analyze a limited set of tiny datasets and choose the first layer that meets the Kullback-Leibler divergence criterion. Our VTW approach can cut computational overhead by over 40\% across diverse multimodal tasks while maintaining performance. Our code is released at //github.com/lzhxmu/VTW.
In finance, durations between successive transactions are usually modeled by the autoregressive conditional duration model based on a continuous distribution omitting zero values. Zero or close-to-zero durations can be caused by either split transactions or independent transactions. We propose a discrete model allowing for excessive zero values based on the zero-inflated negative binomial distribution with score dynamics. This model allows to distinguish between the processes generating split and standard transactions. We use the existing theory on score models to establish the invertibility of the score filter and verify that sufficient conditions hold for the consistency and asymptotic normality of the maximum likelihood of the model parameters. In an empirical study, we find that split transactions cause between 92 and 98 percent of zero and close-to-zero values. Furthermore, the loss of decimal places in the proposed approach is less severe than the incorrect treatment of zero values in continuous models.
Facial feature tracking is essential in imaging ballistocardiography for accurate heart rate estimation and enables motor degradation quantification in Parkinson's disease through skin feature tracking. While deep convolutional neural networks have shown remarkable accuracy in tracking tasks, they typically require extensive labeled data for supervised training. Our proposed pipeline employs a convolutional stacked autoencoder to match image crops with a reference crop containing the target feature, learning deep feature encodings specific to the object category in an unsupervised manner, thus reducing data requirements. To overcome edge effects making the performance dependent on crop size, we introduced a Gaussian weight on the residual errors of the pixels when calculating the loss function. Training the autoencoder on facial images and validating its performance on manually labeled face and hand videos, our Deep Feature Encodings (DFE) method demonstrated superior tracking accuracy with a mean error ranging from 0.6 to 3.3 pixels, outperforming traditional methods like SIFT, SURF, Lucas Kanade, and the latest transformers like PIPs++ and CoTracker. Overall, our unsupervised learning approach excels in tracking various skin features under significant motion conditions, providing superior feature descriptors for tracking, matching, and image registration compared to both traditional and state-of-the-art supervised learning methods.
To understand and summarize approval preferences and other binary evaluation data, it is useful to order the items on an axis which explains the data. In a political election using approval voting, this could be an ideological left-right axis such that each voter approves adjacent candidates, an analogue of single-peakedness. In a perfect axis, every approval set would be an interval, which is usually not possible, and so we need to choose an axis that gets closest to this ideal. The literature has developed algorithms for optimizing several objective functions (e.g., minimize the number of added approvals needed to get a perfect axis), but provides little help with choosing among different objectives. In this paper, we take a social choice approach and compare 5 different axis selection rules axiomatically, by studying the properties they satisfy. We establish some impossibility theorems, and characterize (within the class of scoring rules) the rule that chooses the axes that maximize the number of votes that form intervals, using the axioms of ballot monotonicity and resistance to cloning. Finally, we study the behavior of the rules on data from French election surveys, on the votes of justices of the US Supreme Court, and on synthetic data.
Federated learning (FL) for minimax optimization has emerged as a powerful paradigm for training models across distributed nodes/clients while preserving data privacy and model robustness on data heterogeneity. In this work, we delve into the decentralized implementation of federated minimax optimization by proposing \texttt{K-GT-Minimax}, a novel decentralized minimax optimization algorithm that combines local updates and gradient tracking techniques. Our analysis showcases the algorithm's communication efficiency and convergence rate for nonconvex-strongly-concave (NC-SC) minimax optimization, demonstrating a superior convergence rate compared to existing methods. \texttt{K-GT-Minimax}'s ability to handle data heterogeneity and ensure robustness underscores its significance in advancing federated learning research and applications.
The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challenges, and future directions. We present empirical evidence from prior art to demonstrate its effectiveness and highlight the importance of ensuring its factuality, fidelity, and unbiasedness. We emphasize the need for responsible use of synthetic data to build more powerful, inclusive, and trustworthy language models.
Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on \emph{determinantal point processes} (DPP). By incorporating the diversity metric into best-response dynamics, we develop \emph{diverse fictitious play} and \emph{diverse policy-space response oracle} for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the \emph{gamescape} -- convex polytopes spanned by agents' mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve much lower exploitability than state-of-the-art solvers by finding effective and diverse strategies.
Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.
Incompleteness is a common problem for existing knowledge graphs (KGs), and the completion of KG which aims to predict links between entities is challenging. Most existing KG completion methods only consider the direct relation between nodes and ignore the relation paths which contain useful information for link prediction. Recently, a few methods take relation paths into consideration but pay less attention to the order of relations in paths which is important for reasoning. In addition, these path-based models always ignore nonlinear contributions of path features for link prediction. To solve these problems, we propose a novel KG completion method named OPTransE. Instead of embedding both entities of a relation into the same latent space as in previous methods, we project the head entity and the tail entity of each relation into different spaces to guarantee the order of relations in the path. Meanwhile, we adopt a pooling strategy to extract nonlinear and complex features of different paths to further improve the performance of link prediction. Experimental results on two benchmark datasets show that the proposed model OPTransE performs better than state-of-the-art methods.
Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.