We consider a wireless communication system with a passive eavesdropper, in which a transmitter and legitimate receiver generate and use key bits to secure the transmission of their data. These bits are added to and used from a pool of available key bits. In this work, we analyze the reliability of the system in terms of the probability that the budget of available key bits will be exhausted. In addition, we investigate the latency before a transmission can take place. Since security, reliability, and latency are three important metrics for modern communication systems, it is of great interest to jointly analyze them in relation to the system parameters. In particular, we show under what conditions the system may remain in an active state indefinitely, i.e., never run out of available secret-key bits. The results presented in this work will allow system designers to adjust the system parameters in such a way that the requirements of the application in terms of both reliability and latency are met.
Large Language Models (LLMs) are reshaping the research landscape in artificial intelligence, particularly as model parameters scale up significantly, unlocking remarkable capabilities across various domains. Nevertheless, the scalability of model parameters faces constraints due to limitations in GPU memory and computational speed. To address these constraints, various weight compression methods have emerged, such as Pruning and Quantization. Given the low-rank nature of weight matrices in language models, the reduction of weights through matrix decomposition undoubtedly holds significant potential and promise. In this paper, drawing upon the intrinsic structure of LLMs, we propose a novel approach termed Data-free Joint Rank-k Approximation for compressing the parameter matrices. Significantly, our method is characterized by without necessitating additional involvement of any corpus, while simultaneously preserving orthogonality in conjunction with pruning and quantization methods. We achieve a model pruning of 80% parameters while retaining 93.43% of the original performance without any calibration data. Additionally, we explore the fundamental properties of the weight matrix of LLMs undergone Rank-k Approximation and conduct comprehensive experiments to elucidate our hypothesis.
The rapid progress in machine learning in recent years has been based on a highly productive connection to gradient-based optimization. Further progress hinges in part on a shift in focus from pattern recognition to decision-making and multi-agent problems. In these broader settings, new mathematical challenges emerge that involve equilibria and game theory instead of optima. Gradient-based methods remain essential -- given the high dimensionality and large scale of machine-learning problems -- but simple gradient descent is no longer the point of departure for algorithm design. We provide a gentle introduction to a broader framework for gradient-based algorithms in machine learning, beginning with saddle points and monotone games, and proceeding to general variational inequalities. While we provide convergence proofs for several of the algorithms that we present, our main focus is that of providing motivation and intuition.
Research in the area of secure multi-party computation with an unconventional method of using a physical deck of playing cards began in 1989 when den Boar proposed a protocol to compute the logical AND function using five cards. Since then, the area has gained interest from many researchers and several card-based protocols to compute various functions have been developed. In this paper, we propose a card-based protocol called the overwriting protocol that can securely compute the $k$-candidate $n$-variable equality function $f: \{0,1,\ldots ,k-1\}^n \rightarrow \{0,1\}$. We also apply the technique used in this protocol to compute other similar functions.
We show the effectiveness of automatic differentiation in efficiently and correctly computing and controlling the spectrum of implicitly linear operators, a rich family of layer types including all standard convolutional and dense layers. We provide the first clipping method which is correct for general convolution layers, and illuminate the representational limitation that caused correctness issues in prior work. We study the effect of the batch normalization layers when concatenated with convolutional layers and show how our clipping method can be applied to their composition. By comparing the accuracy and performance of our algorithms to the state-of-the-art methods, using various experiments, we show they are more precise and efficient and lead to better generalization and adversarial robustness. We provide the code for using our methods at //github.com/Ali-E/FastClip.
Sequences of linear systems arise in the predictor-corrector method when computing the Pareto front for multi-objective optimization. Rather than discarding information generated when solving one system, it may be advantageous to recycle information for subsequent systems. To accomplish this, we seek to reduce the overall cost of computation when solving linear systems using common recycling methods. In this work, we assessed the performance of recycling minimum residual (RMINRES) method along with a map between coefficient matrices. For these methods to be fully integrated into the software used in Enouen et al. (2022), there must be working version of each in both Python and PyTorch. Herein, we discuss the challenges we encountered and solutions undertaken (and some ongoing) when computing efficient Python implementations of these recycling strategies. The goal of this project was to implement RMINRES in Python and PyTorch and add it to the established Pareto front code to reduce computational cost. Additionally, we wanted to implement the sparse approximate maps code in Python and PyTorch, so that it can be parallelized in future work.
The performance of stochastic gradient descent (SGD), which is the simplest first-order optimizer for training deep neural networks, depends on not only the learning rate but also the batch size. They both affect the number of iterations and the stochastic first-order oracle (SFO) complexity needed for training. In particular, the previous numerical results indicated that, for SGD using a constant learning rate, the number of iterations needed for training decreases when the batch size increases, and the SFO complexity needed for training is minimized at a critical batch size and that it increases once the batch size exceeds that size. Here, we study the relationship between batch size and the iteration and SFO complexities needed for nonconvex optimization in deep learning with SGD using constant or decaying learning rates and show that SGD using the critical batch size minimizes the SFO complexity. We also provide numerical comparisons of SGD with the existing first-order optimizers and show the usefulness of SGD using a critical batch size. Moreover, we show that measured critical batch sizes are close to the sizes estimated from our theoretical results.
Quantum entanglement is a key enabling ingredient in diverse applications. However, the presence of unwanted adversarial entanglement also poses challenges in many applications. In this paper, we explore methods to "break" quantum entanglement. Specifically, we construct a dimension-independent k-partite disentangler (like) channel from bipartite unentangled input. We show: For every $d,\ell\ge k$, there is an efficient channel $\Lambda: \mathbb{C}^{d\ell} \otimes \mathbb{C}^{d\ell} \to \mathbb{C}^{dk}$ such that for every bipartite separable state $\rho_1\otimes \rho_2$, the output $\Lambda(\rho_1\otimes\rho_2)$ is close to a k-partite separable state. Concretely, for some distribution $\mu$ on states from $\mathbb{C}^d$, $$ \left\|\Lambda(\rho_1 \otimes \rho_2) - \int | \psi \rangle \langle \psi |^{\otimes k} d\mu(\psi)\right\|_1 \le \tilde O \left(\left(\frac{k^{3}}{\ell}\right)^{1/4}\right). $$ Moreover, $\Lambda(| \psi \rangle \langle \psi |^{\otimes \ell}\otimes | \psi \rangle \langle \psi |^{\otimes \ell}) = | \psi \rangle \langle \psi |^{\otimes k}$. Without the bipartite unentanglement assumption, the above bound is conjectured to be impossible. Leveraging our disentanglers, we show that unentangled quantum proofs of almost general real amplitudes capture NEXP, greatly relaxing the nonnegative amplitudes assumption in the recent work of QMA^+(2)=NEXP. Specifically, our findings show that to capture NEXP, it suffices to have unentangled proofs of the form $| \psi \rangle = \sqrt{a} | \psi_+ \rangle + \sqrt{1-a} | \psi_- \rangle$ where $| \psi_+ \rangle$ has non-negative amplitudes, $| \psi_- \rangle$ only has negative amplitudes and $| a-(1-a) | \ge 1/poly(n)$ with $a \in [0,1]$. Additionally, we present a protocol achieving an almost largest possible gap before obtaining QMA^R(k)=NEXP$, namely, a 1/poly(n) additive improvement to the gap results in this equality.
Traditional rigid endoscopes have challenges in flexibly treating tumors located deep in the brain, and low operability and fixed viewing angles limit its development. This study introduces a novel dual-segment flexible robotic endoscope MicroNeuro, designed to perform biopsies with dexterous surgical manipulation deep in the brain. Taking into account the uncertainty of the control model, an image-based visual servoing with online robot Jacobian estimation has been implemented to enhance motion accuracy. Furthermore, the application of model predictive control with constraints significantly bolsters the flexible robot's ability to adaptively track mobile objects and resist external interference. Experimental results underscore that the proposed control system enhances motion stability and precision. Phantom testing substantiates its considerable potential for deployment in neurosurgery.
Graph Convolutional Networks (GCNs) have been widely applied in various fields due to their significant power on processing graph-structured data. Typical GCN and its variants work under a homophily assumption (i.e., nodes with same class are prone to connect to each other), while ignoring the heterophily which exists in many real-world networks (i.e., nodes with different classes tend to form edges). Existing methods deal with heterophily by mainly aggregating higher-order neighborhoods or combing the immediate representations, which leads to noise and irrelevant information in the result. But these methods did not change the propagation mechanism which works under homophily assumption (that is a fundamental part of GCNs). This makes it difficult to distinguish the representation of nodes from different classes. To address this problem, in this paper we design a novel propagation mechanism, which can automatically change the propagation and aggregation process according to homophily or heterophily between node pairs. To adaptively learn the propagation process, we introduce two measurements of homophily degree between node pairs, which is learned based on topological and attribute information, respectively. Then we incorporate the learnable homophily degree into the graph convolution framework, which is trained in an end-to-end schema, enabling it to go beyond the assumption of homophily. More importantly, we theoretically prove that our model can constrain the similarity of representations between nodes according to their homophily degree. Experiments on seven real-world datasets demonstrate that this new approach outperforms the state-of-the-art methods under heterophily or low homophily, and gains competitive performance under homophily.
We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.