We study the problem of $(\epsilon,\delta)$-certified machine unlearning for minimax models. Most of the existing works focus on unlearning from standard statistical learning models that have a single variable and their unlearning steps hinge on the direct Hessian-based conventional Newton update. We develop a new $(\epsilon,\delta)$-certified machine unlearning algorithm for minimax models. It proposes a minimax unlearning step consisting of a total-Hessian-based complete Newton update and the Gaussian mechanism borrowed from differential privacy. To obtain the unlearning certification, our method injects calibrated Gaussian noises by carefully analyzing the "sensitivity" of the minimax unlearning step (i.e., the closeness between the minimax unlearning variables and the retraining-from-scratch variables). We derive the generalization rates in terms of population strong and weak primal-dual risk for three different cases of loss functions, i.e., (strongly-)convex-(strongly-)concave losses. We also provide the deletion capacity to guarantee that a desired population risk can be maintained as long as the number of deleted samples does not exceed the derived amount. With training samples $n$ and model dimension $d$, it yields the order $\mathcal O(n/d^{1/4})$, which shows a strict gap over the baseline method of differentially private minimax learning that has $\mathcal O(n/d^{1/2})$. In addition, our rates of generalization and deletion capacity match the state-of-the-art rates derived previously for standard statistical learning models.
Maximal clique enumeration (MCE) is crucial for tasks like community detection and biological network analysis. Existing algorithms typically adopt the branch-and-bound framework with the vertex-oriented Bron-Kerbosch (BK) branching strategy, which forms the sub-branches by expanding the partial clique with a vertex. In this paper, we present a novel approach called HBBMC, a hybrid framework combining vertex-oriented BK branching and edge-oriented BK branching, where the latter adopts a branch-and-bound framework which forms the sub-branches by expanding the partial clique with an edge. This hybrid strategy enables more effective pruning and helps achieve a worst-case time complexity better than the best known one under a condition that holds for the majority of real-world graphs. To further enhance efficiency, we introduce an early termination technique, which leverages the topological information of the graphs and constructs the maximal cliques directly without branching. Our early termination technique is applicable to all branch-and-bound frameworks. Extensive experiments demonstrate the superior performance of our techniques.
We demonstrate the effectiveness of simple observer-based linear feedback policies for "pixels-to-torques" control of robotic systems using only a robot-facing camera. Specifically, we show that the matrices of an image-based Luenberger observer (linear state estimator) for a "student" output-feedback policy can be learned from demonstration data provided by a "teacher" state-feedback policy via simple linear-least-squares regression. The resulting linear output-feedback controller maps directly from high-dimensional raw images to torques while being amenable to the rich set of analytical tools from linear systems theory, allowing us to enforce closed-loop stability constraints in the learning problem. We also investigate a nonlinear extension of the method via the Koopman embedding. Finally, we demonstrate the surprising effectiveness of linear pixels-to-torques policies on a cartpole system, both in simulation and on real hardware. The policy successfully executes both stabilizing and swing-up trajectory-tracking tasks using only camera feedback while subject to model mismatch, process and sensor noise, perturbations, and occlusions. Open-source code for all experiments can be found here: //roboticexplorationlab.org/projects/linear_pixels_to_torques.html
With the explosive growth of available training data, single-image 3D human modeling is ahead of a transition to a data-centric paradigm. A key to successfully exploiting data scale is to design flexible models that can be supervised from various heterogeneous data sources produced by different researchers or vendors. To this end, we propose a simple yet powerful paradigm for seamlessly unifying different human pose and shape-related tasks and datasets. Our formulation is centered on the ability -- both at training and test time -- to query any arbitrary point of the human volume, and obtain its estimated location in 3D. We achieve this by learning a continuous neural field of body point localizer functions, each of which is a differently parameterized 3D heatmap-based convolutional point localizer (detector). For generating parametric output, we propose an efficient post-processing step for fitting SMPL-family body models to nonparametric joint and vertex predictions. With this approach, we can naturally exploit differently annotated data sources including mesh, 2D/3D skeleton and dense pose, without having to convert between them, and thereby train large-scale 3D human mesh and skeleton estimation models that considerably outperform the state-of-the-art on several public benchmarks including 3DPW, EMDB, EHF, SSP-3D and AGORA.
Bias studies on multilingual models confirm the presence of gender-related stereotypes in masked models processing languages with high NLP resources. We expand on this line of research by introducing Filipino CrowS-Pairs and Filipino WinoQueer: benchmarks that assess both sexist and anti-queer biases in pretrained language models (PLMs) handling texts in Filipino, a low-resource language from the Philippines. The benchmarks consist of 7,074 new challenge pairs resulting from our cultural adaptation of English bias evaluation datasets, a process that we document in detail to guide similar forthcoming efforts. We apply the Filipino benchmarks on masked and causal multilingual models, including those pretrained on Southeast Asian data, and find that they contain considerable amounts of bias. We also find that for multilingual models, the extent of bias learned for a particular language is influenced by how much pretraining data in that language a model was exposed to. Our benchmarks and insights can serve as a foundation for future work analyzing and mitigating bias in multilingual models.
As a paradigm of distributed machine learning, federated learning typically requires all edge devices to train a complete model locally. However, with the increasing scale of artificial intelligence models, the limited resources on edge devices often become a bottleneck for efficient fine-tuning. To address this challenge, federated split learning (FedSL) implements collaborative training across the edge devices and the server through model splitting. In this paper, we propose a lightweight FedSL scheme, that further alleviates the training burden on resource-constrained edge devices by pruning the client-side model dynamicly and using quantized gradient updates to reduce computation overhead. Additionally, we apply random dropout to the activation values at the split layer to reduce communication overhead. We conduct theoretical analysis to quantify the convergence performance of the proposed scheme. Finally, simulation results verify the effectiveness and advantages of the proposed lightweight FedSL in wireless network environments.
Out-of-distribution (OOD) detection is crucial for the deployment of machine learning models in the open world. While existing OOD detectors are effective in identifying OOD samples that deviate significantly from in-distribution (ID) data, they often come with trade-offs. For instance, deep OOD detectors usually suffer from high computational costs, require tuning hyperparameters, and have limited interpretability, whereas traditional OOD detectors may have a low accuracy on large high-dimensional datasets. To address these limitations, we propose a novel effective OOD detection approach that employs an overlap index (OI)-based confidence score function to evaluate the likelihood of a given input belonging to the same distribution as the available ID samples. The proposed OI-based confidence score function is non-parametric, lightweight, and easy to interpret, hence providing strong flexibility and generality. Extensive empirical evaluations indicate that our OI-based OOD detector is competitive with state-of-the-art OOD detectors in terms of detection accuracy on a wide range of datasets while requiring less computation and memory costs. Lastly, we show that the proposed OI-based confidence score function inherits nice properties from OI (e.g., insensitivity to small distributional variations and robustness against Huber $\epsilon$-contamination) and is a versatile tool for estimating OI and model accuracy in specific contexts.
Federated learning (FL) enables collaborative model training through model parameter exchanges instead of raw data. To avoid potential inference attacks from exchanged parameters, differential privacy (DP) offers rigorous guarantee against various attacks. However, conventional methods of ensuring DP by adding local noise alone often result in low training accuracy. Combining secure multi-party computation (SMPC) with DP, while improving the accuracy, incurs high communication and computation overheads and straggler vulnerability, in either client-to-server or client-to-client links. In this paper, we propose LightDP-FL, a novel lightweight scheme that ensures provable DP against untrusted peers and server, while maintaining straggler-resilience, low overheads and high training accuracy. Our approach incorporates both individual and pairwise noise into each client's parameter, which can be implemented with minimal overheads. Given the uncertain straggler and colluder sets, we utilize the upper bound on the numbers of stragglers and colluders to prove sufficient noise variance conditions to ensure DP in the worst case. Moreover, we optimize the expected convergence bound to ensure accuracy performance by flexibly controlling the noise variances. Using the CIFAR-10 dataset, our experimental results demonstrate that LightDP-FL achieves faster convergence and stronger straggler resilience of our scheme compared to baseline methods of the same DP level.
We propose a novel method that solves global optimization problems in two steps: (1) perform a (exponential) power-$N$ transformation to the not-necessarily differentiable objective function $f$ to obtain $f_N$, and (2) optimize the Gaussian-smoothed $f_N$ with stochastic approximations. Under mild conditions on $f$, for any $\delta>0$, we prove that with a sufficiently large power $N_\delta$, this method converges to a solution in the $\delta$-neighborhood of $f$'s global maximum point. The convergence rate is $O(d^2\sigma^4\varepsilon^{-2})$, which is faster than both the standard and single-loop homotopy methods. Extensive experiments show that our method requires significantly fewer iterations than other compared algorithms to produce a high-quality solution.
The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challenges, and future directions. We present empirical evidence from prior art to demonstrate its effectiveness and highlight the importance of ensuring its factuality, fidelity, and unbiasedness. We emphasize the need for responsible use of synthetic data to build more powerful, inclusive, and trustworthy language models.
It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small- and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.