Non-negative Matrix Factorization (NMF) is an effective algorithm for multivariate data analysis, including applications to feature selection, pattern recognition, and computer vision. Its variant, Semi-Nonnegative Matrix Factorization (SNF), extends the ability of NMF to render parts-based data representations to include mixed-sign data. Graph Regularized SNF builds upon this paradigm by adding a graph regularization term to preserve the local geometrical structure of the data space. Despite their successes, SNF-related algorithms to date still suffer from instability caused by the Frobenius norm due to the effects of outliers and noise. In this paper, we present a new $L_{2,1}$ SNF algorithm that utilizes the noise-insensitive $L_{2,1}$ norm. We provide monotonic convergence analysis of the $L_{2,1}$ SNF algorithm. In addition, we conduct numerical experiments on three benchmark mixed-sign datasets as well as several randomized mixed-sign matrices to demonstrate the performance superiority of $L_{2,1}$ SNF over conventional SNF algorithms under the influence of Gaussian noise at different levels.
Recent advances in large language models (LLMs) have shown significant promise, yet their evaluation raises concerns, particularly regarding data contamination due to the lack of access to proprietary training data. To address this issue, we present C$^2$LEVA, a comprehensive bilingual benchmark featuring systematic contamination prevention. C$^2$LEVA firstly offers a holistic evaluation encompassing 22 tasks, each targeting a specific application or ability of LLMs, and secondly a trustworthy assessment due to our contamination-free tasks, ensured by a systematic contamination prevention strategy that fully automates test data renewal and enforces data protection during benchmark data release. Our large-scale evaluation of 15 open-source and proprietary models demonstrates the effectiveness of C$^2$LEVA.
Text-Based Person Search (TBPS) is a crucial task in the Internet of Things (IoT) domain that enables accurate retrieval of target individuals from large-scale galleries with only given textual caption. For cross-modal TBPS tasks, it is critical to obtain well-distributed representation in the common embedding space to reduce the inter-modal gap. Furthermore, learning detailed image-text correspondences is essential to discriminate similar targets and enable fine-grained search. To address these challenges, we present a simple yet effective method named Sew Calibration and Masked Modeling (SCMM) that calibrates cross-modal representations by learning compact and well-aligned embeddings. SCMM introduces two novel losses for fine-grained cross-modal representations: Sew calibration loss that aligns image and text features based on textual caption quality, and Masked Caption Modeling (MCM) loss that establishes detailed relationships between textual and visual parts. This dual-pronged strategy enhances feature alignment and cross-modal correspondences, enabling accurate distinction of similar individuals while maintaining a streamlined dual-encoder architecture for real-time inference, which is essential for resource-limited sensors and IoT systems. Extensive experiments on three popular TBPS benchmarks demonstrate the superiority of SCMM, achieving 73.81%, 64.25%, and 57.35% Rank-1 accuracy on CUHK-PEDES, ICFG-PEDES, and RSTPReID, respectively.
Bayesian optimization is efficient even with a small amount of data and is used in engineering and in science, including biology and chemistry. In Bayesian optimization, a parameterized model with an uncertainty is fitted to explain the experimental data, and then the model suggests parameters that would most likely improve the results. Batch Bayesian optimization reduces the processing time of optimization by parallelizing experiments. However, batch Bayesian optimization cannot be applied if the number of parallelized experiments is limited by the cost or scarcity of equipment; in such cases, sequential methods require an unrealistic amount of time. In this study, we developed pipelining Bayesian optimization (PipeBO) to reduce the processing time of optimization even with a limited number of parallel experiments. PipeBO was inspired by the pipelining of central processing unit architecture, which divides computational tasks into multiple processes. PipeBO was designed to achieve experiment parallelization by overlapping various processes of the experiments. PipeBO uses the results of completed experiments to update the parameters of running parallelized experiments. Using the Black-Box Optimization Benchmarking, which consists of 24 benchmark functions, we compared PipeBO with the sequential Bayesian optimization methods. PipeBO reduced the average processing time of optimization to about 56% for the experiments that consisted of two processes or even less for those with more processes for 20 out of the 24 functions. Overall, PipeBO parallelizes Bayesian optimization in the resource-constrained settings so that efficient optimization can be achieved.
Multimodal information extraction (IE) tasks have attracted increasing attention because many studies have shown that multimodal information benefits text information extraction. However, existing multimodal IE datasets mainly focus on sentence-level image-facilitated IE in English text, and pay little attention to video-based multimodal IE and fine-grained visual grounding. Therefore, in order to promote the development of multimodal IE, we constructed a multimodal multilingual multitask dataset, named M$^{3}$D, which has the following features: (1) It contains paired document-level text and video to enrich multimodal information; (2) It supports two widely-used languages, namely English and Chinese; (3) It includes more multimodal IE tasks such as entity recognition, entity chain extraction, relation extraction and visual grounding. In addition, our dataset introduces an unexplored theme, i.e., biography, enriching the domains of multimodal IE resources. To establish a benchmark for our dataset, we propose an innovative hierarchical multimodal IE model. This model effectively leverages and integrates multimodal information through a Denoised Feature Fusion Module (DFFM). Furthermore, in non-ideal scenarios, modal information is often incomplete. Thus, we designed a Missing Modality Construction Module (MMCM) to alleviate the issues caused by missing modalities. Our model achieved an average performance of 53.80% and 53.77% on four tasks in English and Chinese datasets, respectively, which set a reasonable standard for subsequent research. In addition, we conducted more analytical experiments to verify the effectiveness of our proposed module. We believe that our work can promote the development of the field of multimodal IE.
The fairness of clustering algorithms has gained widespread attention across various areas, including machine learning, In this paper, we study fair $k$-means clustering in Euclidean space. Given a dataset comprising several groups, the fairness constraint requires that each cluster should contain a proportion of points from each group within specified lower and upper bounds. Due to these fairness constraints, determining the optimal locations of $k$ centers is a quite challenging task. We propose a novel ``Relax and Merge'' framework that returns a $(1+4\rho + O(\epsilon))$-approximate solution, where $\rho$ is the approximate ratio of an off-the-shelf vanilla $k$-means algorithm and $O(\epsilon)$ can be an arbitrarily small positive number. If equipped with a PTAS of $k$-means, our solution can achieve an approximation ratio of $(5+O(\epsilon))$ with only a slight violation of the fairness constraints, which improves the current state-of-the-art approximation guarantee. Furthermore, using our framework, we can also obtain a $(1+4\rho +O(\epsilon))$-approximate solution for the $k$-sparse Wasserstein Barycenter problem, which is a fundamental optimization problem in the field of optimal transport, and a $(2+6\rho)$-approximate solution for the strictly fair $k$-means clustering with no violation, both of which are better than the current state-of-the-art methods. In addition, the empirical results demonstrate that our proposed algorithm can significantly outperform baseline approaches in terms of clustering cost.
Selecting interactions from an ultrahigh-dimensional statistical model with $n$ observations and $p$ variables when $p\gg n$ is difficult because the number of candidates for interactions is $p(p-1)/2$ and a selected model should satisfy the strong hierarchical (SH) restriction. A new method called the SHL0 is proposed to overcome the difficulty. The objective function of the SHL0 method is composed of a loglikelihood function and an $L_0$ penalty. A well-known approach in theoretical computer science called local combinatorial optimization is used to optimize the objective function. We show that any local solution of the SHL0 is consistent and enjoys the oracle properties, implying that it is unnecessary to use a global solution in practice. Three additional advantages are: a tuning parameter is used to penalize the main effects and interactions; a closed-form expression can derive the tuning parameter; and the idea can be extended to arbitrary ultrahigh-dimensional statistical models. The proposed method is more flexible than the previous methods for selecting interactions. A simulation study of the research shows that the proposed SHL0 outperforms its competitors.
We describe a fast computation method for leave-one-out cross-validation (LOOCV) for $k$-nearest neighbours ($k$-NN) regression. We show that, under a tie-breaking condition for nearest neighbours, the LOOCV estimate of the mean square error for $k$-NN regression is identical to the mean square error of $(k+1)$-NN regression evaluated on the training data, multiplied by the scaling factor $(k+1)^2/k^2$. Therefore, to compute the LOOCV score, one only needs to fit $(k+1)$-NN regression only once, and does not need to repeat training-validation of $k$-NN regression for the number of training data. Numerical experiments confirm the validity of the fast computation method.
Edge computing is emerging as a key enabler of low-latency, high-efficiency processing for the Internet of Things (IoT) and other real-time applications. To support these demands, containerization has gained traction in edge computing due to its lightweight virtualization and efficient resource management. However, there is currently no established framework to leverage both containers and unikernels on edge devices for optimized IoT deployments. This paper proposes a hybrid edge system design that leverages container and unikernel technologies to optimize resource utilization based on application complexity. Containers are employed for resource-intensive applications, e.g., computer vision, providing faster processing, flexibility, and ease of deployment. In contrast, unikernels are used for lightweight applications, offering enhanced resource performance with minimal overhead. Our system design also incorporates container orchestration to efficiently manage multiple instances across the edge efficiently, ensuring scalability and reliability. We demonstrate our hybrid approach's performance and efficiency advantages through real-world computer vision and data science applications on ARM-powered edge device. Our results demonstrate that this hybrid approach improves resource utilization and reduces latency compared to traditional virtualized solutions. This work provides insights into optimizing edge infrastructures, enabling more efficient and specialized deployment strategies for diverse application workloads.
Graph Neural Networks (GNNs) have received considerable attention on graph-structured data learning for a wide variety of tasks. The well-designed propagation mechanism which has been demonstrated effective is the most fundamental part of GNNs. Although most of GNNs basically follow a message passing manner, litter effort has been made to discover and analyze their essential relations. In this paper, we establish a surprising connection between different propagation mechanisms with a unified optimization problem, showing that despite the proliferation of various GNNs, in fact, their proposed propagation mechanisms are the optimal solution optimizing a feature fitting function over a wide class of graph kernels with a graph regularization term. Our proposed unified optimization framework, summarizing the commonalities between several of the most representative GNNs, not only provides a macroscopic view on surveying the relations between different GNNs, but also further opens up new opportunities for flexibly designing new GNNs. With the proposed framework, we discover that existing works usually utilize naive graph convolutional kernels for feature fitting function, and we further develop two novel objective functions considering adjustable graph kernels showing low-pass or high-pass filtering capabilities respectively. Moreover, we provide the convergence proofs and expressive power comparisons for the proposed models. Extensive experiments on benchmark datasets clearly show that the proposed GNNs not only outperform the state-of-the-art methods but also have good ability to alleviate over-smoothing, and further verify the feasibility for designing GNNs with our unified optimization framework.
Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.