Classical convergence analyses for optimization algorithms rely on the widely-adopted uniform smoothness assumption. However, recent experimental studies have demonstrated that many machine learning problems exhibit non-uniform smoothness, meaning the smoothness factor is a function of the model parameter instead of a universal constant. In particular, it has been observed that the smoothness grows with respect to the gradient norm along the training trajectory. Motivated by this phenomenon, the recently introduced $(L_0, L_1)$-smoothness is a more general notion, compared to traditional $L$-smoothness, that captures such positive relationship between smoothness and gradient norm. Under this type of non-uniform smoothness, existing literature has designed stochastic first-order algorithms by utilizing gradient clipping techniques to obtain the optimal $\mathcal{O}(\epsilon^{-3})$ sample complexity for finding an $\epsilon$-approximate first-order stationary solution. Nevertheless, the studies of quasi-Newton methods are still lacking. Considering higher accuracy and more robustness for quasi-Newton methods, in this paper we propose a fast stochastic quasi-Newton method when there exists non-uniformity in smoothness. Leveraging gradient clipping and variance reduction, our algorithm can achieve the best-known $\mathcal{O}(\epsilon^{-3})$ sample complexity and enjoys convergence speedup with simple hyperparameter tuning. Our numerical experiments show that our proposed algorithm outperforms the state-of-the-art approaches.
We examine the numerical approximation of time-dependent Hamilton-Jacobi equations on networks, providing a convergence error estimate for the semi-Lagrangian scheme introduced in (Carlini and Siconolfi, 2023), where convergence was proven without an error estimate. We derive a convergence error estimate of order one-half. This is achieved showing the equivalence between two definitions of solutions to this problem proposed in (Imbert and Monneau, 2017) and (Siconolfi, 2022), a result of independent interest, and applying a general convergence result from (Carlini, Festa and Forcadel, 2020).
In Industry 4.0 systems, a considerable number of resource-constrained Industrial Internet of Things (IIoT) devices engage in frequent data interactions due to the necessity for model training, which gives rise to concerns pertaining to security and privacy. In order to address these challenges, this paper considers a digital twin (DT) and blockchain-assisted federated learning (FL) scheme. To facilitate the FL process, we initially employ fog devices with abundant computational capabilities to generate DT for resource-constrained edge devices, thereby aiding them in local training. Subsequently, we formulate an FL delay minimization problem for FL, which considers both of model transmission time and synchronization time, also incorporates cooperative jamming to ensure secure synchronization of DT. To address this non-convex optimization problem, we propose a decomposition algorithm. In particular, we introduce upper limits on the local device training delay and the effects of aggregation jamming as auxiliary variables, thereby transforming the problem into a convex optimization problem that can be decomposed for independent solution. Finally, a blockchain verification mechanism is employed to guarantee the integrity of the model uploading throughout the FL process and the identities of the participants. The final global model is obtained from the verified local and global models within the blockchain through the application of deep learning techniques. The efficacy of our proposed cooperative interference-based FL process has been verified through numerical analysis, which demonstrates that the integrated DT blockchain-assisted FL scheme significantly outperforms the benchmark schemes in terms of execution time, block optimization, and accuracy.
Our focus is on simulating the dynamics of non-interacting particles, which, under certain assumptions, can be formally described by the Dean-Kawasaki equation. The Dean-Kawasaki equation can be solved numerically using standard finite volume methods. However, the numerical approximation implicitly requires a sufficiently large number of particles to ensure the positivity of the solution and accurate approximation of the stochastic flux. To address this challenge, we extend hybrid algorithms for particle systems to scenarios where the density is low. The aim is to create a hybrid algorithm that switches from a finite volume discretization to a particle-based method when the particle density falls below a certain threshold. We develop criteria for determining this threshold by comparing higher-order statistics obtained from the finite volume method with particle simulations. We then demonstrate the use of the resulting criteria for dynamic adaptation in both two- and three-dimensional spatial settings.
Semi-supervised learning (SSL) offers a robust framework for harnessing the potential of unannotated data. Traditionally, SSL mandates that all classes possess labeled instances. However, the emergence of open-world SSL (OwSSL) introduces a more practical challenge, wherein unlabeled data may encompass samples from unseen classes. This scenario leads to misclassification of unseen classes as known ones, consequently undermining classification accuracy. To overcome this challenge, this study revisits two methodologies from self-supervised and semi-supervised learning, self-labeling and consistency, tailoring them to address the OwSSL problem. Specifically, we propose an effective framework called OwMatch, combining conditional self-labeling and open-world hierarchical thresholding. Theoretically, we analyze the estimation of class distribution on unlabeled data through rigorous statistical analysis, thus demonstrating that OwMatch can ensure the unbiasedness of the self-label assignment estimator with reliability. Comprehensive empirical analyses demonstrate that our method yields substantial performance enhancements across both known and unknown classes in comparison to previous studies. Code is available at //github.com/niusj03/OwMatch.
We show through numerical simulation that the Quantum Approximate Optimization Algorithm (QAOA) for higher-order, random-coefficient, heavy-hex compatible spin glass Ising models has strong parameter concentration across problem sizes from $16$ up to $127$ qubits for $p=1$ up to $p=5$, which allows for straight-forward transfer learning of QAOA angles on instance sizes where exhaustive grid-search is prohibitive even for $p>1$. We use Matrix Product State (MPS) simulation at different bond dimensions to obtain confidence in these results, and we obtain the optimal solutions to these combinatorial optimization problems using CPLEX. In order to assess the ability of current noisy quantum hardware to exploit such parameter concentration, we execute short-depth QAOA circuits (with a CNOT depth of 6 per $p$, resulting in circuits which contain $1420$ two qubit gates for $127$ qubit $p=5$ QAOA) on $100$ higher-order (cubic term) Ising models on IBM quantum superconducting processors with $16, 27, 127$ qubits using QAOA angles learned from a single $16$-qubit instance. We show that (i) the best quantum processors generally find lower energy solutions up to $p=3$ for 27 qubit systems and up to $p=2$ for 127 qubit systems and are overcome by noise at higher values of $p$, (ii) the best quantum processors find mean energies that are about a factor of two off from the noise-free numerical simulation results. Additional insights from our experiments are that large performance differences exist among different quantum processors even of the same generation and that dynamical decoupling significantly improve performance for some, but decrease performance for other quantum processors. Lastly we show $p=1$ QAOA angle mean energy landscapes computed using up to a $414$ qubit quantum computer, showing that the mean QAOA energy landscapes remain very similar as the problem size changes.
Amplification by subsampling is one of the main primitives in machine learning with differential privacy (DP): Training a model on random batches instead of complete datasets results in stronger privacy. This is traditionally formalized via mechanism-agnostic subsampling guarantees that express the privacy parameters of a subsampled mechanism as a function of the original mechanism's privacy parameters. We propose the first general framework for deriving mechanism-specific guarantees, which leverage additional information beyond these parameters to more tightly characterize the subsampled mechanism's privacy. Such guarantees are of particular importance for privacy accounting, i.e., tracking privacy over multiple iterations. Overall, our framework based on conditional optimal transport lets us derive existing and novel guarantees for approximate DP, accounting with R\'enyi DP, and accounting with dominating pairs in a unified, principled manner. As an application, we analyze how subsampling affects the privacy of groups of multiple users. Our tight mechanism-specific bounds outperform tight mechanism-agnostic bounds and classic group privacy results.
Event detection (ED), a sub-task of event extraction, involves identifying triggers and categorizing event mentions. Existing methods primarily rely upon supervised learning and require large-scale labeled event datasets which are unfortunately not readily available in many real-life applications. In this paper, we consider and reformulate the ED task with limited labeled data as a Few-Shot Learning problem. We propose a Dynamic-Memory-Based Prototypical Network (DMB-PN), which exploits Dynamic Memory Network (DMN) to not only learn better prototypes for event types, but also produce more robust sentence encodings for event mentions. Differing from vanilla prototypical networks simply computing event prototypes by averaging, which only consume event mentions once, our model is more robust and is capable of distilling contextual information from event mentions for multiple times due to the multi-hop mechanism of DMNs. The experiments show that DMB-PN not only deals with sample scarcity better than a series of baseline models but also performs more robustly when the variety of event types is relatively large and the instance quantity is extremely small.
Due to their inherent capability in semantic alignment of aspects and their context words, attention mechanism and Convolutional Neural Networks (CNNs) are widely applied for aspect-based sentiment classification. However, these models lack a mechanism to account for relevant syntactical constraints and long-range word dependencies, and hence may mistakenly recognize syntactically irrelevant contextual words as clues for judging aspect sentiment. To tackle this problem, we propose to build a Graph Convolutional Network (GCN) over the dependency tree of a sentence to exploit syntactical information and word dependencies. Based on it, a novel aspect-specific sentiment classification framework is raised. Experiments on three benchmarking collections illustrate that our proposed model has comparable effectiveness to a range of state-of-the-art models, and further demonstrate that both syntactical information and long-range word dependencies are properly captured by the graph convolution structure.
Automatic License Plate Recognition (ALPR) has been a frequent topic of research due to many practical applications. However, many of the current solutions are still not robust in real-world situations, commonly depending on many constraints. This paper presents a robust and efficient ALPR system based on the state-of-the-art YOLO object detection. The Convolutional Neural Networks (CNNs) are trained and fine-tuned for each ALPR stage so that they are robust under different conditions (e.g., variations in camera, lighting, and background). Specially for character segmentation and recognition, we design a two-stage approach employing simple data augmentation tricks such as inverted License Plates (LPs) and flipped characters. The resulting ALPR approach achieved impressive results in two datasets. First, in the SSIG dataset, composed of 2,000 frames from 101 vehicle videos, our system achieved a recognition rate of 93.53% and 47 Frames Per Second (FPS), performing better than both Sighthound and OpenALPR commercial systems (89.80% and 93.03%, respectively) and considerably outperforming previous results (81.80%). Second, targeting a more realistic scenario, we introduce a larger public dataset, called UFPR-ALPR dataset, designed to ALPR. This dataset contains 150 videos and 4,500 frames captured when both camera and vehicles are moving and also contains different types of vehicles (cars, motorcycles, buses and trucks). In our proposed dataset, the trial versions of commercial systems achieved recognition rates below 70%. On the other hand, our system performed better, with recognition rate of 78.33% and 35 FPS.
Image segmentation is considered to be one of the critical tasks in hyperspectral remote sensing image processing. Recently, convolutional neural network (CNN) has established itself as a powerful model in segmentation and classification by demonstrating excellent performances. The use of a graphical model such as a conditional random field (CRF) contributes further in capturing contextual information and thus improving the segmentation performance. In this paper, we propose a method to segment hyperspectral images by considering both spectral and spatial information via a combined framework consisting of CNN and CRF. We use multiple spectral cubes to learn deep features using CNN, and then formulate deep CRF with CNN-based unary and pairwise potential functions to effectively extract the semantic correlations between patches consisting of three-dimensional data cubes. Effective piecewise training is applied in order to avoid the computationally expensive iterative CRF inference. Furthermore, we introduce a deep deconvolution network that improves the segmentation masks. We also introduce a new dataset and experimented our proposed method on it along with several widely adopted benchmark datasets to evaluate the effectiveness of our method. By comparing our results with those from several state-of-the-art models, we show the promising potential of our method.