In 2017, Hughes claimed an equivalence between Tjurs $R^2$ coefficient of discrimination and Youden index for assessing diagnostic test performance on $2\times 2$ contingency tables. We prove an impossibility result when averaging over binary outcomes (0s and 1s) under any continuous real-valued scoring rule. Our findings clarify the limitations of such a possible equivalence and highlights the distinct roles these metrics play in diagnostic test assessment.
In this paper, we develop an iterative method, based on the Bartels-Stewart algorithm to solve $N$-dimensional matrix equations, that relies on the Schur decomposition of the matrices involved. We remark that, unlike other possible implementations of that algorithm, ours avoids recursivity, and makes an efficient use of the available computational resources, which enables considering arbitrarily large problems, up to the size of the available memory. In this respect, we have successfully solved matrix equations in up to $N = 29$ dimensions. We explain carefully all the steps, and calculate accurately the computational cost required. Furthermore, in order to ease the understanding, we offer both pseudocodes and full Matlab codes, and special emphasis is put on making the implementation of the method completely independent from the number of dimensions. As an important application, the method allows to compute the solution of linear $N$-dimensional systems of ODEs of constant coefficients at any time $t$, and, hence, of evolutionary PDEs, after discretizing the spatial derivatives by means of matrices. In this regard, we are able to compute with great accuracy the solution of an advection-diffusion equation on $\mathbb R^N$.
We introduce a pebble game extended by backtracking options for one of the two players (called Prover) and reduce the provability of the pigeonhole principle for a generic predicate $R$ in the bounded arithmetic $T^2_2(R)$ to the existence of a particular kind of winning strategy (called oblivious) for Prover in the game. While the unprovability of the said principle in $T^2_2(R)$ is an immediate consequence of a celebrated theorem of Ajtai (which deals with a stronger theory $T_2(R)$), up-to-date no methods working for $T^2_2(R)$ directly (in particular without switching lemma) are known. Although the full analysis of the introduced pebble game is left open, as a first step towards resolving it, we restrict ourselves to a simplified version of the game. In this case, Prover can use only two pebbles and move in an extremely oblivious way. Besides, a series of backtracks can be made only once during a play. Under these assumptions, we show that no strategy of Prover can be winning.
Current PEFT methods for LLMs can achieve either high quality, efficient training, or scalable serving, but not all three simultaneously. To address this limitation, we investigate sparse fine-tuning and observe a remarkable improvement in generalization ability. Utilizing this key insight, we propose a family of Structured Sparse Fine-Tuning (S$^{2}$FT) methods for LLMs, which concurrently achieve state-of-the-art fine-tuning performance, training efficiency, and inference scalability. S$^{2}$FT accomplishes this by "selecting sparsely and computing densely". It selects a few heads and channels in the MHA and FFN modules for each Transformer block, respectively. Next, it co-permutes weight matrices on both sides of the coupled structures in LLMs to connect the selected components in each layer into a dense submatrix. Finally, S$^{2}$FT performs in-place gradient updates on all submatrices. Through theoretical analysis and empirical results, our method prevents forgetting while simplifying optimization, delivers SOTA performance on both commonsense and arithmetic reasoning with 4.6% and 1.3% average improvements compared to LoRA, and surpasses full FT by 11.5% when generalizing to various domains after instruction tuning. Using our partial backpropagation algorithm, S$^{2}$FT saves training memory up to 3$\times$ and improves latency by 1.5-2.7$\times$ compared to full FT, while delivering an average 10% improvement over LoRA on both metrics. We further demonstrate that the weight updates in S$^{2}$FT can be decoupled into adapters, enabling effective fusion, fast switch, and efficient parallelism for serving multiple fine-tuned models.
The number of independent sets in regular bipartite expander graphs can be efficiently approximated by expressing it as the partition function of a suitable polymer model and truncating its cluster expansion. While this approach has been extensively used for graphs, surprisingly little is known about analogous questions in the context of hypergraphs. In this work, we apply this method to asymptotically determine the number of independent sets in regular $k$-partite $k$-uniform hypergraphs which satisfy natural expansion properties. The resulting formula depends only on the local structure of the hypergraph, making it computationally efficient. In particular, we provide a simple closed-form expression for linear hypergraphs.
Intelligent robots need to interact with diverse objects across various environments. The appearance and state of objects frequently undergo complex transformations depending on the object properties, e.g., phase transitions. However, in the vision community, segmenting dynamic objects with phase transitions is overlooked. In light of this, we introduce the concept of phase in segmentation, which categorizes real-world objects based on their visual characteristics and potential morphological and appearance changes. Then, we present a new benchmark, Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation (M$^3$-VOS), to verify the ability of models to understand object phases, which consists of 479 high-resolution videos spanning over 10 distinct everyday scenarios. It provides dense instance mask annotations that capture both object phases and their transitions. We evaluate state-of-the-art methods on M$^3$-VOS, yielding several key insights. Notably, current appearancebased approaches show significant room for improvement when handling objects with phase transitions. The inherent changes in disorder suggest that the predictive performance of the forward entropy-increasing process can be improved through a reverse entropy-reducing process. These findings lead us to propose ReVOS, a new plug-andplay model that improves its performance by reversal refinement. Our data and code will be publicly available at //zixuan-chen.github.io/M-cubeVOS.github.io/.
We characterise the behaviour of the maximum Diaconis-Ylvisaker prior penalized likelihood estimator in high-dimensional logistic regression, where the number of covariates is a fraction $\kappa \in (0,1)$ of the number of observations $n$, as $n \to \infty$. We derive the estimator's aggregate asymptotic behaviour under this proportional asymptotic regime, when covariates are independent normal random variables with mean zero and the linear predictor has asymptotic variance $\gamma^2$. From this foundation, we devise adjusted $Z$-statistics, penalized likelihood ratio statistics, and aggregate asymptotic results with arbitrary covariate covariance. While the maximum likelihood estimate asymptotically exists only for a narrow range of $(\kappa, \gamma)$ values, the maximum Diaconis-Ylvisaker prior penalized likelihood estimate not only exists always but is also directly computable using maximum likelihood routines. Thus, our asymptotic results also hold for $(\kappa, \gamma)$ values where results for maximum likelihood are not attainable, with no overhead in implementation or computation. We study the estimator's shrinkage properties, compare it to alternative estimation methods that can operate with proportional asymptotics, and present procedures for the estimation of unknown constants that describe the asymptotic behaviour of our estimator. We also provide a conjecture about the behaviour of our estimator when an intercept parameter is present in the model. We present results from extensive numerical studies to demonstrate the theoretical advances and strong evidence to support the conjecture, and illustrate the methodology we put forward through the analysis of a real-world data set on digit recognition.
Clustering is one of the staples of data analysis and unsupervised learning. As such, clustering algorithms are often used on massive data sets, and they need to be extremely fast. We focus on the Euclidean $k$-median and $k$-means problems, two of the standard ways to model the task of clustering. For these, the go-to algorithm is $k$-means++, which yields an $O(\log k)$-approximation in time $\tilde O(nkd)$. While it is possible to improve either the approximation factor [Lattanzi and Sohler, ICML19] or the running time [Cohen-Addad et al., NeurIPS 20], it is unknown how precise a linear-time algorithm can be. In this paper, we almost answer this question by presenting an almost linear-time algorithm to compute a constant-factor approximation.
Tensor data are multi-dimension arrays. Low-rank decomposition-based regression methods with tensor predictors exploit the structural information in tensor predictors while significantly reducing the number of parameters in tensor regression. We propose a method named NA$_0$CT$^2$ (Noise Augmentation for $\ell_0$ regularization on Core Tensor in Tucker decomposition) to regularize the parameters in tensor regression (TR), coupled with Tucker decomposition. We establish theoretically that NA$_0$CT$^2$ achieves exact $\ell_0$ regularization on the core tensor from the Tucker decomposition in linear TR and generalized linear TR. To our knowledge, NA$_0$CT$^2$ is the first Tucker decomposition-based regularization method in TR to achieve $\ell_0$ in core tensors. NA$_0$CT$^2$ is implemented through an iterative procedure and involves two straightforward steps in each iteration -- generating noisy data based on the core tensor from the Tucker decomposition of the updated parameter estimate and running a regular GLM on noise-augmented data on vectorized predictors. We demonstrate the implementation of NA$_0$CT$^2$ and its $\ell_0$ regularization effect in both simulation studies and real data applications. The results suggest that NA$_0$CT$^2$ can improve predictions compared to other decomposition-based TR approaches, with or without regularization and it identifies important predictors though not designed for that purpose.
We introduce and study a purely syntactic notion of lax cones and $(\infty,\infty)$-limits on finite computads in \texttt{CaTT}, a type theory for $(\infty,\infty)$-categories due to Finster and Mimram. Conveniently, finite computads are precisely the contexts in \texttt{CaTT}. We define a cone over a context to be a context, which is obtained by induction over the list of variables of the underlying context. In the case where the underlying context is globular we give an explicit description of the cone and conjecture that an analogous description continues to hold also for general contexts. We use the cone to control the types of the term constructors for the universal cone. The implementation of the universal property follows a similar line of ideas. Starting with a cone as a context, a set of context extension rules produce a context with the shape of a transfor between cones, i.e.~a higher morphism between cones. As in the case of cones, we use this context as a template to control the types of the term constructor required for universal property.
Click-through rate (CTR) prediction plays a critical role in recommender systems and online advertising. The data used in these applications are multi-field categorical data, where each feature belongs to one field. Field information is proved to be important and there are several works considering fields in their models. In this paper, we proposed a novel approach to model the field information effectively and efficiently. The proposed approach is a direct improvement of FwFM, and is named as Field-matrixed Factorization Machines (FmFM, or $FM^2$). We also proposed a new explanation of FM and FwFM within the FmFM framework, and compared it with the FFM. Besides pruning the cross terms, our model supports field-specific variable dimensions of embedding vectors, which acts as soft pruning. We also proposed an efficient way to minimize the dimension while keeping the model performance. The FmFM model can also be optimized further by caching the intermediate vectors, and it only takes thousands of floating-point operations (FLOPs) to make a prediction. Our experiment results show that it can out-perform the FFM, which is more complex. The FmFM model's performance is also comparable to DNN models which require much more FLOPs in runtime.