In this paper, we establish novel data-dependent upper bounds on the generalization error through the lens of a "variable-size compressibility" framework that we introduce newly here. In this framework, the generalization error of an algorithm is linked to a variable-size 'compression rate' of its input data. This is shown to yield bounds that depend on the empirical measure of the given input data at hand, rather than its unknown distribution. Our new generalization bounds that we establish are tail bounds, tail bounds on the expectation, and in-expectations bounds. Moreover, it is shown that our framework also allows to derive general bounds on any function of the input data and output hypothesis random variables. In particular, these general bounds are shown to subsume and possibly improve over several existing PAC-Bayes and data-dependent intrinsic dimension-based bounds that are recovered as special cases, thus unveiling a unifying character of our approach. For instance, a new data-dependent intrinsic dimension-based bound is established, which connects the generalization error to the optimization trajectories and reveals various interesting connections with the rate-distortion dimension of a process, the R\'enyi information dimension of a process, and the metric mean dimension.
In this paper, we propose an efficient and high-performance method for partially relevant video retrieval, which aims to retrieve long videos that contain at least one moment relevant to the input text query. The challenge lies in encoding dense frames using visual backbones. This requires models to handle the increased frames, resulting in significant computation costs for long videos. To mitigate the costs, previous studies use lightweight visual backbones, yielding sub-optimal retrieval performance due to their limited capabilities. However, it is undesirable to simply replace the backbones with high-performance large vision-and-language models (VLMs) due to their low efficiency. To address this dilemma, instead of dense frames, we focus on super images, which are created by rearranging the video frames in an $N \times N$ grid layout. This reduces the number of visual encodings to $\frac{1}{N^2}$ and mitigates the low efficiency of large VLMs. Based on this idea, we make two contributions. First, we explore whether VLMs generalize to super images in a zero-shot setting. To this end, we propose a method called query-attentive super image retrieval (QASIR), which attends to partial moments relevant to the input query. The zero-shot QASIR yields two discoveries: (1) it enables VLMs to generalize to super images and (2) the grid size $N$, image resolution, and VLM size are key trade-off parameters between performance and computation costs. Second, we introduce fine-tuning and hybrid QASIR that combines high- and low-efficiency models to strike a balance between performance and computation costs. This reveals two findings: (1) the fine-tuning QASIR enhances VLMs to learn super images effectively, and (2) the hybrid QASIR minimizes the performance drop of large VLMs while reducing the computation costs.
In this paper, we examine biases arising in A/B tests where firms modify a continuous parameter, such as price, to estimate the global treatment effect of a given performance metric, such as profit. These biases emerge in canonical experimental estimators due to interference among market participants. We employ structural modeling and differential calculus to derive intuitive characterizations of these biases. We then specialize our general model to a standard revenue management pricing problem. This setting highlights a key pitfall in the use of A/B pricing experiments to guide profit maximization: notably, the canonical estimator for the expected change in profits can have the {\em wrong sign}. In other words, following the guidance of canonical estimators may lead firms to move prices in the wrong direction, inadvertently decreasing profits relative to the status quo. We apply these results to a two-sided market model and show how this ``change of sign" regime depends on model parameters such as market imbalance, as well as the price markup. Finally, we discuss structural and practical implications for platform operators.
In this paper, we introduce InfiAgent-DABench, the first benchmark specifically designed to evaluate LLM-based agents on data analysis tasks. These tasks require agents to end-to-end solving complex tasks by interacting with an execution environment. This benchmark contains DAEval, a dataset consisting of 257 data analysis questions derived from 52 CSV files, and an agent framework which incorporates LLMs to serve as data analysis agents for both serving and evaluation. Since data analysis questions are often open-ended and hard to evaluate without human supervision, we adopt a format-prompting technique to convert each question into a closed-form format so that they can be automatically evaluated. Our extensive benchmarking of 34 LLMs uncovers the current challenges encountered in data analysis tasks. In addition, building on top of our agent framework, we develop a specialized agent, DAAgent, which surpasses GPT-3.5 by 3.9% on DABench. Evaluation datasets and toolkits for InfiAgent-DABench are released at //github.com/InfiAgent/InfiAgent .
In this paper, we investigate the millimeter-wave (mmWave) near-field beam training problem to find the correct beam direction. In order to address the high complexity and low identification accuracy of existing beam training techniques, we propose an efficient hashing multi-arm beam (HMB) training scheme for the near-field scenario. Specifically, we first design a set of sparse bases based on the polar domain sparsity of the near-field channel. Then, the random hash functions are chosen to construct the near-field multi-arm beam training codebook. Each multi-arm beam codeword is scanned in a time slot until all the predefined codewords are traversed. Finally, the soft decision and voting methods are applied to distinguish the signal from different base stations and obtain correctly aligned beams. Simulation results show that our proposed near-field HMB training method can reduce the beam training overhead to the logarithmic level, and achieve 96.4% identification accuracy of exhaustive beam training. Moreover, we also verify applicability under the far-field scenario.
In this paper, we discuss the development and deployment of a robust autonomous system capable of performing various tasks in the maritime domain under unknown dynamic conditions. We investigate a data-driven approach based on modular design for ease of transfer of autonomy across different maritime surface vessel platforms. The data-driven approach alleviates issues related to a priori identification of system models that may become deficient under evolving system behaviors or shifting, unanticipated, environmental influences. Our proposed learning-based platform comprises a deep Koopman system model and a change point detector that provides guidance on domain shifts prompting relearning under severe exogenous and endogenous perturbations. Motion control of the autonomous system is achieved via an optimal controller design. The Koopman linearized model naturally lends itself to a linear-quadratic regulator (LQR) control design. We propose the C3D control architecture Cascade Control with Change Point Detection and Deep Koopman Learning. The framework is verified in station keeping task on an ASV in both simulation and real experiments. The approach achieved at least 13.9 percent improvement in mean distance error in all test cases compared to the methods that do not consider system changes.
In this paper, we introduce a privacy-preserving stable diffusion framework leveraging homomorphic encryption, called HE-Diffusion, which primarily focuses on protecting the denoising phase of the diffusion process. HE-Diffusion is a tailored encryption framework specifically designed to align with the unique architecture of stable diffusion, ensuring both privacy and functionality. To address the inherent computational challenges, we propose a novel min-distortion method that enables efficient partial image encryption, significantly reducing the overhead without compromising the model's output quality. Furthermore, we adopt a sparse tensor representation to expedite computational operations, enhancing the overall efficiency of the privacy-preserving diffusion process. We successfully implement HE-based privacy-preserving stable diffusion inference. The experimental results show that HE-Diffusion achieves 500 times speedup compared with the baseline method, and reduces time cost of the homomorphically encrypted inference to the minute level. Both the performance and accuracy of the HE-Diffusion are on par with the plaintext counterpart. Our approach marks a significant step towards integrating advanced cryptographic techniques with state-of-the-art generative models, paving the way for privacy-preserving and efficient image generation in critical applications.
In this paper we validate, including experimentally, the effectiveness of a recent theoretical developments made by our group on control-affine Extremum Seeking Control (ESC) systems. In particular, our validation is concerned with the problem of source seeking by a mobile robot to the unknown source of a scalar signal (e.g., light). Our recent theoretical results made it possible to estimate the gradient of the unknown objective function (i.e., the scalar signal) incorporated in the ESC and use such information to apply an adaptation law which attenuates the oscillations of the ESC system while converging to the extremum (i.e., source). Based on our previous results, we propose here an amended design of the simple single-integrator control-affine structure known in ESC literature and show that it can functions effectively to achieve a model-free, real-time source seeking of light with attenuated oscillations using only local measurements of the light intensity. Results imply that the proposed design has significant potential as it also demonstrated much better convergence rate. We hope this paper encourages expansion of the proposed design in other fields, problems and experiments.
Surface-based data is commonly observed in diverse practical applications spanning various fields. In this paper, we introduce a novel nonparametric method to discover the underlying signals from data distributed on complex surface-based domains. Our approach involves a penalized spline estimator defined on a triangulation of surface patches, which enables effective signal extraction and recovery. The proposed method offers several advantages over existing methods, including superior handling of "leakage" or "boundary effects" over complex domains, enhanced computational efficiency, and potential applications in analyzing sparse and irregularly distributed data on complex objects. We provide rigorous theoretical guarantees for the proposed method, including convergence rates of the estimator in both the $L_2$ and supremum norms, as well as the asymptotic normality of the estimator. We also demonstrate that the convergence rates achieved by our estimation method are optimal within the framework of nonparametric estimation. Furthermore, we introduce a bootstrap method to quantify the uncertainty associated with the proposed estimators accurately. The superior performance of the proposed method is demonstrated through simulation experiments and data applications on cortical surface functional magnetic resonance imaging data and oceanic near-surface atmospheric data.
In this paper, we introduce the cyclic polygon plot, a representation based on a novel projection concept for multi-dimensional values. Cyclic polygon plots combine the typically competing requirements of quantitativeness, image-space efficiency, and readability. Our approach is complemented with a placement strategy based on its intrinsic features, resulting in a dimensionality reduction strategy that is consistent with our overall concept. As a result, our approach combines advantages from dimensionality reduction techniques and quantitative plots, supporting a wide range of tasks in multi-dimensional data analysis. We examine and discuss the overall properties of our approach, and demonstrate its utility with a user study and selected examples.
Recommender systems play a crucial role in mitigating the problem of information overload by suggesting users' personalized items or services. The vast majority of traditional recommender systems consider the recommendation procedure as a static process and make recommendations following a fixed strategy. In this paper, we propose a novel recommender system with the capability of continuously improving its strategies during the interactions with users. We model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedbacks. In particular, we introduce an online user-agent interacting environment simulator, which can pre-train and evaluate model parameters offline before applying the model online. Moreover, we validate the importance of list-wise recommendations during the interactions between users and agent, and develop a novel approach to incorporate them into the proposed framework LIRD for list-wide recommendations. The experimental results based on a real-world e-commerce dataset demonstrate the effectiveness of the proposed framework.