We investigate a more generalized form of submodular maximization, referred to as $k$-submodular maximization, with applications across social networks and machine learning domains. In this work, we propose the multilinear extension of $k$-submodular functions and unified Frank-Wolfe-type frameworks based on that. Our frameworks accomodate 1) monotone or non-monotone functions, and 2) various constraint types including matroid constraints, knapsack constraints, and their combinations. Notably, we attain an asymptotically optimal $1/2$-approximation for monotone $k$-submodular maximization problems with knapsack constraints, surpassing the previous $1/3$-approximation. The foundation for our analysis stems from new insights into specific linear and monotone properties pertaining to the multilinear extension.
Computing the proximal operator of the sparsity-promoting piece-wise exponential (PiE) penalty $1-e^{-|x|/\sigma}$ with a given shape parameter $\sigma>0$, which is treated as a popular nonconvex surrogate of $\ell_0$-norm, is fundamental in feature selection via support vector machines, image reconstruction, zero-one programming problems, compressed sensing, etc. Due to the nonconvexity of PiE, for a long time, its proximal operator is frequently evaluated via an iteratively reweighted $\ell_1$ algorithm, which substitutes PiE with its first-order approximation, however, the obtained solutions only are the critical point. Based on the exact characterization of the proximal operator of PiE, we explore how the iteratively reweighted $\ell_1$ solution deviates from the true proximal operator in certain regions, which can be explicitly identified in terms of $\sigma$, the initial value and the regularization parameter in the definition of the proximal operator. Moreover, the initial value can be adaptively and simply chosen to ensure that the iteratively reweighted $\ell_1$ solution belongs to the proximal operator of PiE.
Learning schemes for planning and control are limited by the difficulty of collecting large amounts of experimental data or having to rely on high-fidelity simulations. This paper explores the potential of a proposed learning scheme that leverages dimensionless numbers based on Buckingham's $\pi$ theorem to improve data efficiency and facilitate knowledge sharing between similar systems. A case study using car-like robots compares traditional and dimensionless learning models on simulated and experimental data to validate the benefits of the new dimensionless learning approach. Preliminary results show that this new dimensionless approach could accelerate the learning rate and improve the accuracy of the model and should be investigated further.
Recently, multimodal recommendations have gained increasing attention for effectively addressing the data sparsity problem by incorporating modality-based representations. Although multimodal recommendations excel in accuracy, the introduction of different modalities (e.g., images, text, and audio) may expose more users' sensitive information (e.g., gender and age) to recommender systems, resulting in potentially more serious unfairness issues. Despite many efforts on fairness, existing fairness-aware methods are either incompatible with multimodal scenarios, or lead to suboptimal fairness performance due to neglecting sensitive information of multimodal content. To achieve counterfactual fairness in multimodal recommendations, we propose a novel fairness-aware multimodal recommendation approach (dubbed as FMMRec) to disentangle the sensitive and non-sensitive information from modal representations and leverage the disentangled modal representations to guide fairer representation learning. Specifically, we first disentangle biased and filtered modal representations by maximizing and minimizing their sensitive attribute prediction ability respectively. With the disentangled modal representations, we mine the modality-based unfair and fair (corresponding to biased and filtered) user-user structures for enhancing explicit user representation with the biased and filtered neighbors from the corresponding structures, followed by adversarially filtering out sensitive information. Experiments on two real-world public datasets demonstrate the superiority of our FMMRec relative to the state-of-the-art baselines. Our source code is available at //anonymous.4open.science/r/FMMRec.
Obtaining the solutions of partial differential equations based on various machine learning methods has drawn more and more attention in the fields of scientific computation and engineering applications. In this work, we first propose a coupled Extreme Learning Machine (called CELM) method incorporated with the physical laws to solve a class of fourth-order biharmonic equations by reformulating it into two well-posed Poisson problems. In addition, some activation functions including tangent, gauss, sine, and trigonometric (sin+cos) functions are introduced to assess our CELM method. Notably, the sine and trigonometric functions demonstrate a remarkable ability to effectively minimize the approximation error of the CELM model. In the end, several numerical experiments are performed to study the initializing approaches for both the weights and biases of the hidden units in our CELM model and explore the required number of hidden units. Numerical results show the proposed CELM algorithm is high-precision and efficient to address the biharmonic equation in both regular and irregular domains.
A number of engineering and scientific problems require representing and manipulating probability distributions over large alphabets, which we may think of as long vectors of reals summing to $1$. In some cases it is required to represent such a vector with only $b$ bits per entry. A natural choice is to partition the interval $[0,1]$ into $2^b$ uniform bins and quantize entries to each bin independently. We show that a minor modification of this procedure -- applying an entrywise non-linear function (compander) $f(x)$ prior to quantization -- yields an extremely effective quantization method. For example, for $b=8 (16)$ and $10^5$-sized alphabets, the quality of representation improves from a loss (under KL divergence) of $0.5 (0.1)$ bits/entry to $10^{-4} (10^{-9})$ bits/entry. Compared to floating point representations, our compander method improves the loss from $10^{-1}(10^{-6})$ to $10^{-4}(10^{-9})$ bits/entry. These numbers hold for both real-world data (word frequencies in books and DNA $k$-mer counts) and for synthetic randomly generated distributions. Theoretically, we set up a minimax optimality criterion and show that the compander $f(x) ~\propto~ \mathrm{ArcSinh}(\sqrt{(1/2) (K \log K) x})$ achieves near-optimal performance, attaining a KL-quantization loss of $\asymp 2^{-2b} \log^2 K$ for a $K$-letter alphabet and $b\to \infty$. Interestingly, a similar minimax criterion for the quadratic loss on the hypercube shows optimality of the standard uniform quantizer. This suggests that the $\mathrm{ArcSinh}$ quantizer is as fundamental for KL-distortion as the uniform quantizer for quadratic distortion.
In predictive modeling with simulation or machine learning, it is critical to accurately assess the quality of estimated values through output analysis. In recent decades output analysis has become enriched with methods that quantify the impact of input data uncertainty in the model outputs to increase robustness. However, most developments are applicable assuming that the input data adheres to a parametric family of distributions. We propose a unified output analysis framework for simulation and machine learning outputs through the lens of Monte Carlo sampling. This framework provides nonparametric quantification of the variance and bias induced in the outputs with higher-order accuracy. Our new bias-corrected estimation from the model outputs leverages the extension of fast iterative bootstrap sampling and higher-order influence functions. For the scalability of the proposed estimation methods, we devise budget-optimal rules and leverage control variates for variance reduction. Our theoretical and numerical results demonstrate a clear advantage in building more robust confidence intervals from the model outputs with higher coverage probability.
Deep-learning models have been successful in biomedical image segmentation. To generalize for real-world deployment, test-time augmentation (TTA) methods are often used to transform the test image into different versions that are hopefully closer to the training domain. Unfortunately, due to the vast diversity of instance scale and image styles, many augmented test images produce undesirable results, thus lowering the overall performance. This work proposes a new TTA framework, S$^3$-TTA, which selects the suitable image scale and style for each test image based on a transformation consistency metric. In addition, S$^3$-TTA constructs an end-to-end augmentation-segmentation joint-training pipeline to ensure a task-oriented augmentation. On public benchmarks for cell and lung segmentation, S$^3$-TTA demonstrates improvements over the prior art by 3.4% and 1.3%, respectively, by simply augmenting the input data in testing phase.
Given some binary matrix $M$, suppose we are presented with the collection of its rows and columns in independent arbitrary orderings. From this information, are we able to recover the unique original orderings and matrix? We present an algorithm that identifies whether there is a unique ordering associated with a set of rows and columns, and outputs either the unique correct orderings for the rows and columns or the full collection of all valid orderings and valid matrices. We show that there is a constant $c > 0$ such that the algorithm terminates in $O(n^2)$ time with high probability and in expectation for random $n \times n$ binary matrices with i.i.d.\ Bernoulli $(p)$ entries $(m_{ij})_{ij=1}^n$ such that $\frac{c\log^2(n)}{n(\log\log(n))^2} \leq p \leq \frac{1}{2}$.
In reliable decision-making systems based on machine learning, models have to be robust to distributional shifts or provide the uncertainty of their predictions. In node-level problems of graph learning, distributional shifts can be especially complex since the samples are interdependent. To evaluate the performance of graph models, it is important to test them on diverse and meaningful distributional shifts. However, most graph benchmarks considering distributional shifts for node-level problems focus mainly on node features, while structural properties are also essential for graph problems. In this work, we propose a general approach for inducing diverse distributional shifts based on graph structure. We use this approach to create data splits according to several structural node properties: popularity, locality, and density. In our experiments, we thoroughly evaluate the proposed distributional shifts and show that they can be quite challenging for existing graph models. We also reveal that simple models often outperform more sophisticated methods on the considered structural shifts. Finally, our experiments provide evidence that there is a trade-off between the quality of learned representations for the base classification task under structural distributional shift and the ability to separate the nodes from different distributions using these representations.
We provide several new results on the sample complexity of vector-valued linear predictors (parameterized by a matrix), and more generally neural networks. Focusing on size-independent bounds, where only the Frobenius norm distance of the parameters from some fixed reference matrix $W_0$ is controlled, we show that the sample complexity behavior can be surprisingly different than what we may expect considering the well-studied setting of scalar-valued linear predictors. This also leads to new sample complexity bounds for feed-forward neural networks, tackling some open questions in the literature, and establishing a new convex linear prediction problem that is provably learnable without uniform convergence.