The $n$-way number partitioning problem is a classic problem in combinatorial optimization, with applications to diverse settings such as fair allocation and machine scheduling. All these problems are NP-hard, but various approximation algorithms are known. We consider three closely related kinds of approximations. The first two variants optimize the partition such that: in the first variant some fixed number $s$ of items can be \emph{split} between two or more bins and in the second variant we allow at most a fixed number $t$ of \emph{splittings}. The third variant is a decision problem: the largest bin sum must be within a pre-specified interval, parameterized by a fixed rational number $u$ times the largest item size. When the number of bins $n$ is unbounded, we show that every variant is strongly {\sf NP}-complete. When the number of bins $n$ is fixed, the running time depends on the fixed parameters $s,t,u$. For each variant, we give a complete picture of its running time. For $n=2$, the running time is easy to identify. Our main results consider any fixed integer $n \geq 3$. Using a two-way polynomial-time reduction between the first and the third variant, we show that $n$-way number-partitioning with $s$ split items can be solved in polynomial time if $s \geq n-2$, and it is {\sf NP}-complete otherwise. Also, $n$-way number-partitioning with $t$ splittings can be solved in polynomial time if $t \geq n-1$, and it is {\sf NP}-complete otherwise. Finally, we show that the third variant can be solved in polynomial time if $u \geq (n-2)/n$, and it is {\sf NP}-complete otherwise. Our positive results for the optimization problems consider both min-max and max-min versions. Using the same reduction, and we provide a fully polynomial-time approximation scheme for the case where the number of split items is lower than $n-2$.
Nonparametric contextual bandit is an important model of sequential decision making problems. Under $\alpha$-Tsybakov margin condition, existing research has established a regret bound of $\tilde{O}\left(T^{1-\frac{\alpha+1}{d+2}}\right)$ for bounded supports. However, the optimal regret with unbounded contexts has not been analyzed. The challenge of solving contextual bandit problems with unbounded support is to achieve both exploration-exploitation tradeoff and bias-variance tradeoff simultaneously. In this paper, we solve the nonparametric contextual bandit problem with unbounded contexts. We propose two nearest neighbor methods combined with UCB exploration. The first method uses a fixed $k$. Our analysis shows that this method achieves minimax optimal regret under a weak margin condition and relatively light-tailed context distributions. The second method uses adaptive $k$. By a proper data-driven selection of $k$, this method achieves an expected regret of $\tilde{O}\left(T^{1-\frac{(\alpha+1)\beta}{\alpha+(d+2)\beta}}+T^{1-\beta}\right)$, in which $\beta$ is a parameter describing the tail strength. This bound matches the minimax lower bound up to logarithm factors, indicating that the second method is approximately optimal.
We propose a novel learning framework for Koopman operator of nonlinear dynamical systems that is informed by the governing equation and guarantees long-time stability and robustness to noise. In contrast to existing frameworks where either ad-hoc observables or blackbox neural networks are used to construct observables in the extended dynamic mode decomposition (EDMD), our observables are informed by governing equations via Polyflow. To improve the noise robustness and guarantee long-term stability, we designed a stable parameterization of the Koopman operator together with a progressive learning strategy for roll-out recurrent loss. To further improve model performance in the phase space, a simple iterative strategy of data augmentation was developed. Numerical experiments of prediction and control of classic nonlinear systems with ablation study showed the effectiveness of the proposed techniques over several state-of-the-art practices.
Direct preference optimization (DPO), a widely adopted offline preference optimization algorithm, aims to align large language models (LLMs) with human-desired behaviors using pairwise preference data. However, the winning response and the losing response within pairwise data are generated isolatedly, leading to weak correlations between them as well as suboptimal alignment performance. To address this issue, we propose an effective framework named BMC, for bridging and modeling correlations in pairwise data. Firstly, we increase the consistency and informativeness of the pairwise preference signals by targeted modifications, synthesizing a pseudo winning response through improving the losing response based on the winning response. Secondly, we identify that DPO alone is insufficient to model these correlations and capture nuanced variations. Therefore, we propose learning token-level correlations by dynamically leveraging the policy model's confidence during training. Comprehensive experiments on QA, math, and instruction-following tasks demonstrate the effectiveness of our approach, significantly surpassing competitive baselines, including DPO. Additionally, our in-depth quantitative analysis reveals the reasons behind our method's superior performance over DPO and showcases its versatility to other DPO variants.
The Dawid-Skene model is the most widely assumed model in the analysis of crowdsourcing algorithms that estimate ground-truth labels from noisy worker responses. In this work, we are motivated by crowdsourcing applications where workers have distinct skill sets and their accuracy additionally depends on a task's type. While weighted majority vote (WMV) with a single weight vector for each worker achieves the optimal label estimation error in the Dawid-Skene model, we show that different weights for different types are necessary for a multi-type model. Focusing on the case where there are two types of tasks, we propose a spectral method to partition tasks into two groups that cluster tasks by type. Our analysis reveals that task types can be perfectly recovered if the number of workers $n$ scales logarithmically with the number of tasks $d$. Any algorithm designed for the Dawid-Skene model can then be applied independently to each type to infer the labels. Numerical experiments show how clustering tasks by type before estimating ground-truth labels enhances the performance of crowdsourcing algorithms in practical applications.
Given a list L of elements and a property that L exhibits, ddmin is a well-known test input minimization algorithm designed to automatically eliminate irrelevant elements from L. This algorithm is extensively adopted in test input minimization and software debloating. Recently, ProbDD, an advanced variant of ddmin, has been proposed and achieved state-of-the-art performance. Employing Bayesian optimization, ProbDD predicts the likelihood of each element in L being essential, and statistically decides which elements and how many should be removed each time. Despite its impressive results, the theoretical probabilistic model of ProbDD is complex, and the specific factors driving its superior performance have not been investigated. In this paper, we conduct the first in-depth theoretical analysis of ProbDD, clarifying trends in probability and subset size changes while simplifying the probability model. Complementing this analysis, we perform empirical experiments, including success rate analysis, ablation studies, and analysis on trade-offs and limitations, to better understand and demystify this state-of-the-art algorithm. Our success rate analysis shows how ProbDD addresses bottlenecks of ddmin by skipping inefficient queries that attempt to delete complements of subsets and previously tried subsets. The ablation study reveals that randomness in ProbDD has no significant impact on efficiency. Based on these findings, we propose CDD, a simplified version of ProbDD, reducing complexity in both theory and implementation. Besides, the performance of CDD validates our key findings. Comprehensive evaluations across 76 benchmarks in test input minimization and software debloating show that CDD can achieve the same performance as ProbDD despite its simplification. These insights provide valuable guidance for future research and applications of test input minimization algorithms.
In applied time-to-event analysis, a flexible parametric approach is to model the hazard rate as a piecewise constant function of time. However, the change points and values of the piecewise constant hazard are usually unknown and need to be estimated. In this paper, we develop a fully data-driven procedure for piecewise constant hazard estimation. We work in a general counting process framework which nests a wide range of popular models in time-to-event analysis including Cox's proportional hazards model with potentially high-dimensional covariates, competing risks models as well as more general multi-state models. To construct our estimator, we set up a regression model for the increments of the Breslow estimator and then use fused lasso techniques to approximate the piecewise constant signal in this regression model. In the theoretical part of the paper, we derive the convergence rate of our estimator as well as some results on how well the change points of the piecewise constant hazard are approximated by our method. We complement the theory by both simulations and a real data example, illustrating that our results apply in rather general event histories such as multi-state models.
The similarity between the question and indexed documents is a crucial factor in document retrieval for retrieval-augmented question answering. Although this is typically the only method for obtaining the relevant documents, it is not the sole approach when dealing with entity-centric questions. In this study, we propose Entity Retrieval, a novel retrieval method which rather than relying on question-document similarity, depends on the salient entities within the question to identify the retrieval documents. We conduct an in-depth analysis of the performance of both dense and sparse retrieval methods in comparison to Entity Retrieval. Our findings reveal that our method not only leads to more accurate answers to entity-centric questions but also operates more efficiently.
The large-matrix limit laws of the rescaled largest eigenvalue of the orthogonal, unitary and symplectic $n$-dimensional Gaussian ensembles -- and of the corresponding Laguerre ensembles (Wishart distributions) for various regimes of the parameter $\alpha$ (degrees of freedom $p$) -- are known to be the Tracy-Widom distributions $F_\beta$ ($\beta=1,2,4$). We will establish (paying particular attention to large, or small, ratios $p/n$) that, with careful choices of the rescaling constants and the expansion parameter $h$, the limit laws embed into asymptotic expansions in powers of $h$, where $h \asymp n^{-2/3}$ resp. $h \asymp (n\,\wedge\,p)^{-2/3}$. We find explicit analytic expressions of the first few expansions terms as linear combinations, with rational polynomial coefficients, of higher order derivatives of the limit law $F_\beta$. With a proper parametrization, the expansions in the Gaussian cases can be understood, for given $n$, as the limit $p\to\infty$ of the Laguerre cases. Whereas the results for $\beta=2$ are presented with proof, the discussion of the cases $\beta=1,4$ is based on some hypotheses, focussing on the algebraic aspects of actually computing the polynomial coefficients. For the purposes of illustration and validation, the various results are checked against simulation data with large sample sizes.
Selecting $k$ out of $m$ items based on the preferences of $n$ heterogeneous agents is a widely studied problem in algorithmic game theory. If agents have approval preferences over individual items and harmonic utility functions over bundles -- an agent receives $\sum_{j=1}^t\frac{1}{j}$ utility if $t$ of her approved items are selected -- then welfare optimisation is captured by a voting rule known as Proportional Approval Voting (PAV). PAV also satisfies demanding fairness axioms. However, finding a winning set of items under PAV is NP-hard. In search of a tractable method with strong fairness guarantees, a bounded local search version of PAV was proposed by Aziz et al. It proceeds by starting with an arbitrary size-$k$ set $W$ and, at each step, checking if there is a pair of candidates $a\in W$, $b\not\in W$ such that swapping $a$ and $b$ increases the total welfare by at least $\varepsilon$; if yes, it performs the swap. Aziz et al.~show that setting $\varepsilon=\frac{n}{k^2}$ ensures both the desired fairness guarantees and polynomial running time. However, they leave it open whether the algorithm converges in polynomial time if $\varepsilon$ is very small (in particular, if we do not stop until there are no welfare-improving swaps). We resolve this open question, by showing that if $\varepsilon$ can be arbitrarily small, the running time of this algorithm may be super-polynomial. Specifically, we prove a lower bound of~$\Omega(k^{\log k})$ if improvements are chosen lexicographically. To complement our lower bound, we provide an empirical comparison of two variants of local search -- better-response and best-response -- on several real-life data sets and a variety of synthetic data sets. Our experiments indicate that, empirically, better response exhibits faster running time than best response.
Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.