In this work, we provide data stream algorithms that compute optimal splits in decision tree learning. In particular, given a data stream of observations $x_i$ and their labels $y_i$, the goal is to find the optimal split point $j$ that divides the data into two sets such that the mean squared error (for regression) or misclassification rate (for classification) is minimized. We provide various fast streaming algorithms that use sublinear space and a small number of passes for these problems. These algorithms can also be extended to the massively parallel computation model. Our work, while not directly comparable, complements the seminal work of Domingos and Hulten (KDD 2000).
Inspired by recent progress in dynamic programming approaches for weighted model counting, we investigate a dynamic-programming approach in the context of boolean realizability and synthesis, which takes a conjunctive-normal-form boolean formula over input and output variables, and aims at synthesizing witness functions for the output variables in terms of the inputs. We show how graded project-join trees, obtained via tree decomposition, can be used to compute a BDD representing the realizability set for the input formulas in a bottom-up order. We then show how the intermediate BDDs generated during realizability checking phase can be applied to synthesizing the witness functions in a top-down manner. An experimental evaluation of a solver -- DPSynth -- based on these ideas demonstrates that our approach for Boolean realizabilty and synthesis has superior time and space performance over a heuristics-based approach using same symbolic representations. We discuss the advantage on scalability of the new approach, and also investigate our findings on the performance of the DP framework.
This work proposes novel approaches that jointly design user equipment (UE) association and power control (PC) in a downlink user-centric cell-free massive multiple-input multiple-output (CFmMIMO) network, where each UE is only served by a set of access points (APs) for reducing the fronthaul signalling and computational complexity. In order to maximize the sum spectral efficiency (SE) of the UEs, we formulate a mixed-integer nonconvex optimization problem under constraints on the per-AP transmit power, quality-of-service rate requirements, maximum fronthaul signalling load, and maximum number of UEs served by each AP. In order to solve the formulated problem efficiently, we propose two different schemes according to the different sizes of the CFmMIMO systems. For small-scale CFmMIMO systems, we present a successive convex approximation (SCA) method to obtain a stationary solution and also develop a learning-based method (JointCFNet) to reduce the computational complexity. For large-scale CFmMIMO systems, we propose a low-complexity suboptimal algorithm using accelerated projected gradient (APG) techniques. Numerical results show that our JointCFNet can yield similar performance and significantly decrease the run time compared with the SCA algorithm in small-scale systems. The presented APG approach is confirmed to run much faster than the SCA algorithm in the large-scale system while obtaining an SE performance close to that of the SCA approach. Moreover, the median sum SE of the APG method is up to about 2.8 fold higher than that of the heuristic baseline scheme.
When optimizing machine learning models, there are various scenarios where gradient computations are challenging or even infeasible. Furthermore, in reinforcement learning (RL), preference-based RL that only compares between options has wide applications, including reinforcement learning with human feedback in large language models. In this paper, we systematically study optimization of a smooth function $f\colon\mathbb{R}^n\to\mathbb{R}$ only assuming an oracle that compares function values at two points and tells which is larger. When $f$ is convex, we give two algorithms using $\tilde{O}(n/\epsilon)$ and $\tilde{O}(n^{2})$ comparison queries to find an $\epsilon$-optimal solution, respectively. When $f$ is nonconvex, our algorithm uses $\tilde{O}(n/\epsilon^2)$ comparison queries to find an $\epsilon$-approximate stationary point. All these results match the best-known zeroth-order algorithms with function evaluation queries in $n$ dependence, thus suggest that \emph{comparisons are all you need for optimizing smooth functions using derivative-free methods}. In addition, we also give an algorithm for escaping saddle points and reaching an $\epsilon$-second order stationary point of a nonconvex $f$, using $\tilde{O}(n^{1.5}/\epsilon^{2.5})$ comparison queries.
This study presents an integrated approach for advancing functional Near-Infrared Spectroscopy (fNIRS) neuroimaging through the synthesis of data and application of machine learning models. By addressing the scarcity of high-quality neuroimaging datasets, this work harnesses Monte Carlo simulations and parametric head models to generate a comprehensive synthetic dataset, reflecting a wide spectrum of conditions. We developed a containerized environment employing Docker and Xarray for standardized and reproducible data analysis, facilitating meaningful comparisons across different signal processing modalities. Additionally, a cloud-based infrastructure is established for scalable data generation and processing, enhancing the accessibility and quality of neuroimaging data. The combination of synthetic data generation with machine learning techniques holds promise for improving the accuracy, efficiency, and applicability of fNIRS tomography, potentially revolutionizing diagnostics and treatment strategies for neurological conditions. The methodologies and infrastructure developed herein set new standards in data simulation and analysis, paving the way for future research in neuroimaging and the broader biomedical engineering field.
In this work we consider the HYBRID model of distributed computing, introduced recently by Augustine, Hinnenthal, Kuhn, Scheideler, and Schneider (SODA 2020), where nodes have access to two different communication modes: high-bandwidth local communication along the edges of the graph and low-bandwidth all-to-all communication, capturing the non-uniform nature of modern communication networks. Prior work in HYBRID has focused on showing existentially optimal algorithms, meaning there exists a pathological family of instances on which no algorithm can do better. This neglects the fact that such worst-case instances often do not appear or can be actively avoided in practice. In this work, we focus on the notion of universal optimality, first raised by Garay, Kutten, and Peleg (FOCS 1993). Roughly speaking, a universally optimal algorithm is one that, given any input graph, runs as fast as the best algorithm designed specifically for that graph. We show the first universally optimal algorithms in HYBRID. We present universally optimal solutions for fundamental information dissemination tasks, such as broadcasting and unicasting multiple messages in HYBRID. Furthermore, we apply these tools to obtain universally optimal solutions for various shortest paths problems in HYBRID. A main conceptual contribution of this work is the conception of a new graph parameter called neighborhood quality that captures the inherent complexity of many fundamental graph problems in HYBRID. We also show new existentially optimal shortest paths algorithms in HYBRID, which are utilized as key subroutines in our universally optimal algorithms and are of independent interest. Our new algorithms for $k$-source shortest paths match the existing $\tilde{\Omega}(\sqrt{k})$ lower bound for all $k$. Previously, the lower bound was only known to be tight when $k \in \tilde{\Omega}(n^{2/3})$.
In this paper, we study a remote monitoring system where a receiver observes a remote binary Markov source and decides whether to sample and transmit the state through a randomly delayed channel. We adopt uncertainty of information (UoI), defined as the entropy conditional on past observations at the receiver, as a metric of value of information, in contrast to the traditional state-agnostic nonlinear age of information (AoI) penalty functions. To address the limitations of prior UoI research that assumes one-time-slot delays, we extend our analysis to scenarios with random delays. We model the problem as a partially observable Markov decision process (POMDP) problem and simplify it to a semi-Markov decision process (SMDP) by introducing the belief state. We propose two algorithms: A globally optimal bisection relative value iteration (bisec-RVI) algorithm and a computationally efficient sub-optimal index-based threshold algorithm to solve the long-term average UoI minimization problem. Numerical simulations demonstrate that our sampling policies surpass traditional zero wait and AoI-optimal policies, particularly under conditions of large delay, with the sub-optimal policy nearly matching the performance of the optimal one.
Society's capacity for algorithmic problem-solving has never been greater. Artificial Intelligence is now applied across more domains than ever, a consequence of powerful abstractions, abundant data, and accessible software. As capabilities have expanded, so have risks, with models often deployed without fully understanding their potential impacts. Interpretable and interactive machine learning aims to make complex models more transparent and controllable, enhancing user agency. This review synthesizes key principles from the growing literature in this field. We first introduce precise vocabulary for discussing interpretability, like the distinction between glass box and explainable algorithms. We then explore connections to classical statistical and design principles, like parsimony and the gulfs of interaction. Basic explainability techniques -- including learned embeddings, integrated gradients, and concept bottlenecks -- are illustrated with a simple case study. We also review criteria for objectively evaluating interpretability approaches. Throughout, we underscore the importance of considering audience goals when designing interactive algorithmic systems. Finally, we outline open challenges and discuss the potential role of data science in addressing them. Code to reproduce all examples can be found at //go.wisc.edu/3k1ewe.
In this paper, we investigate the contextual multinomial logit (MNL) bandit problem in which a learning agent sequentially selects an assortment based on contextual information, and user feedback follows an MNL choice model. There has been a significant discrepancy between lower and upper regret bounds, particularly regarding the feature dimension $d$ and the maximum assortment size $K$. Additionally, the variation in reward structures between these bounds complicates the quest for optimality. Under uniform rewards, where all items have the same expected reward, we establish a regret lower bound of $\Omega(d\sqrt{\smash[b]{T/K}})$ and propose a constant-time algorithm, OFU-MNL+, that achieves a matching upper bound of $\tilde{\mathcal{O}}(d\sqrt{\smash[b]{T/K}})$. Under non-uniform rewards, we prove a lower bound of $\Omega(d\sqrt{T})$ and an upper bound of $\tilde{\mathcal{O}}(d\sqrt{T})$, also achievable by OFU-MNL+. Our empirical studies support these theoretical findings. To the best of our knowledge, this is the first work in the MNL contextual bandit literature to prove minimax optimality -- for either uniform or non-uniform reward setting -- and to propose a computationally efficient algorithm that achieves this optimality up to logarithmic factors.
In this work, we consider the approximation of a large class of bounded functions, with minimal regularity assumptions, by ReLU neural networks. We show that the approximation error can be bounded from above by a quantity proportional to the uniform norm of the target function and inversely proportional to the product of network width and depth. We inherit this approximation error bound from Fourier features residual networks, a type of neural network that uses complex exponential activation functions. Our proof is constructive and proceeds by conducting a careful complexity analysis associated with the approximation of a Fourier features residual network by a ReLU network.
In the era of deep learning, modeling for most NLP tasks has converged to several mainstream paradigms. For example, we usually adopt the sequence labeling paradigm to solve a bundle of tasks such as POS-tagging, NER, Chunking, and adopt the classification paradigm to solve tasks like sentiment analysis. With the rapid progress of pre-trained language models, recent years have observed a rising trend of Paradigm Shift, which is solving one NLP task by reformulating it as another one. Paradigm shift has achieved great success on many tasks, becoming a promising way to improve model performance. Moreover, some of these paradigms have shown great potential to unify a large number of NLP tasks, making it possible to build a single model to handle diverse tasks. In this paper, we review such phenomenon of paradigm shifts in recent years, highlighting several paradigms that have the potential to solve different NLP tasks.