We outline an unsupervised method for temporal rank ordering of sets of historical documents, namely American State of the Union Addresses and DEEDS, a corpus of medieval English property transfer documents. Our method relies upon effectively capturing the gradual change in word usage via a bandwidth estimate for the non-parametric Generalized Linear Models (Fan, Heckman, and Wand, 1995). The number of possible rank orders needed to search through possible cost functions related to the bandwidth can be quite large, even for a small set of documents. We tackle this problem of combinatorial optimization using the Simulated Annealing algorithm, which allows us to obtain the optimal document temporal orders. Our rank ordering method significantly improved the temporal sequencing of both corpora compared to a randomly sequenced baseline. This unsupervised approach should enable the temporal ordering of undated document sets.
Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing development of foundation models, there is a growing interest in exploring their abilities in reasoning tasks. In this paper, we introduce seminal foundation models proposed or adaptable for reasoning, highlighting the latest advancements in various reasoning tasks, methods, and benchmarks. We then delve into the potential future directions behind the emergence of reasoning abilities within foundation models. We also discuss the relevance of multimodal learning, autonomous agents, and super alignment in the context of reasoning. By discussing these future research directions, we hope to inspire researchers in their exploration of this field, stimulate further advancements in reasoning with foundation models, and contribute to the development of AGI.
This work studies the estimation of many statistical quantiles under differential privacy. More precisely, given a distribution and access to i.i.d. samples from it, we study the estimation of the inverse of its cumulative distribution function (the quantile function) at specific points. For instance, this task is of key importance in private data generation. We present two different approaches. The first one consists in privately estimating the empirical quantiles of the samples and using this result as an estimator of the quantiles of the distribution. In particular, we study the statistical properties of the recently published algorithm introduced by Kaplan et al. 2022 that privately estimates the quantiles recursively. The second approach is to use techniques of density estimation in order to uniformly estimate the quantile function on an interval. In particular, we show that there is a tradeoff between the two methods. When we want to estimate many quantiles, it is better to estimate the density rather than estimating the quantile function at specific points.
This study proposes a method for distilling the knowledge of fine-tuned Large Language Models (LLMs) into a smaller, more efficient, and accurate neural network, specifically targeting the challenge of deploying these models on resource-constrained devices. Our methodology involves training the smaller student model using the prediction probabilities of the LLM, which serves as a teacher model. This is achieved through a specialized loss function tailored to learn from the LLM's output probabilities, ensuring that the student model closely mimics the teacher's performance. To test this approach, we utilized a large dataset, 7T, containing 6,684 student-written responses to science questions and three other datasets with student-written responses. We also compared performance with original neural network (NN) models to validate the accuracy. Results have shown that the NN and distilled student models have comparable accuracy to the teacher model for the 7T dataset; however, other datasets have shown significantly lower accuracy (28% on average) for NN, though our proposed distilled model is still able to achieve 12\% higher accuracy than NN. Furthermore, the student model size ranges from 0.1M to 0.02M, 100 times smaller in terms of parameters and ten times smaller compared with the original output model size. The significance of this research lies in its potential to make advanced AI technologies accessible in typical educational settings, particularly for automatic scoring.
The year 2022 saw a significant increase in Microsoft vulnerabilities, reaching an all-time high in the past decade. With new vulnerabilities constantly emerging, there is an urgent need for proactive approaches to harden systems and protect them from potential cyber threats. This project aims to investigate the vulnerabilities of the Windows Operating System and explore the effectiveness of key security features such as BitLocker, Microsoft Defender, and Windows Firewall in addressing these threats. To achieve this, various security threats are simulated in controlled environments using coded examples, allowing for a thorough evaluation of the security solutions' effectiveness. Based on the results, this study will provide recommendations for mitigation strategies to enhance system security and strengthen the protection provided by Windows security features. By identifying potential weaknesses and areas of improvement in the Windows security infrastructure, this project will contribute to the development of more robust and resilient security solutions that can better safeguard systems against emerging cyber threats.
We study ways of evaluating the performance of losing projects in participatory budgeting (PB) elections by seeking actions that would have led to their victory. We focus on lowering the projects' costs, obtaining additional approvals for them, and asking supporters to refrain from approving other projects: The larger a change is needed, the less successful is the given project. We seek efficient algorithms for computing our measures and we analyze and compare them experimentally. We focus on the greedyAV, Phragm\'en, and Equal-Shares PB rules.
This work considers the asymptotic behavior of the distance between two sample covariance matrices (SCM). A general result is provided for a class of functionals that can be expressed as sums of traces of functions that are separately applied to each covariance matrix. In particular, this class includes very conventional metrics, such as the Euclidean distance or Jeffrery's divergence, as well as a number of other more sophisticated distances recently derived from Riemannian geometry considerations, such as the log-Euclidean metric. In particular, we analyze the asymptotic behavior of this class of functionals by establishing a central limit theorem that allows us to describe their asymptotic statistical law. In order to account for the fact that the sample sizes of two SCMs are of the same order of magnitude as their observation dimension, results are provided by assuming that these parameters grow to infinity while their quotients converge to fixed quantities. Numerical results illustrate how this type of result can be used in order to predict the performance of these metrics in practical machine learning algorithms, such as clustering of SCMs.
Finding the exact spanning ratio of a Delaunay graph has been one of the longstanding open problems in Computational Geometry. Currently there are only four convex shapes for which the exact spanning ratio of their Delaunay graph is known: the equilateral triangle, the square, the regular hexagon and the rectangle. In this paper, we show the exact spanning ratio of the parallelogram Delaunay graph, making the parallelogram the fifth convex shape for which an exact bound is known. The worst-case spanning ratio is exactly $$\frac{\sqrt{2}\sqrt{1+A^2+2A\cos(\theta_0)+(A+\cos(\theta_0))\sqrt{1+A^2+2A\cos(\theta_0)}}}{\sin(\theta_0)} .$$ where $A$ is the aspect ratio and $\theta_0$ is the non-obtuse angle of the parallelogram. Moreover, we show how to construct a parallelogram Delaunay graph whose spanning ratio matches the above mentioned spanning ratio.
This article presents a general solution to the problem of computational complexity. First, it gives a historical introduction to the problem since the revival of the foundational problems of mathematics at the end of the 19th century. Second, building on the theory of functional relations in mathematics, it provides a theoretical framework where we can rigorously distinguish two pairs of concepts: Between solving a problem and verifying the solution to a problem. Between a deterministic and a non-deterministic model of computation. Third, it presents the theory of computational complexity and the difficulties in solving the P versus NP problem. Finally, it gives a complete proof that a certain decision problem in NP has an algorithmic exponential lower bound thus establishing firmly that P is different from NP. The proof presents a new way of approaching the subject: neither by entering into the unmanageable difficulties of proving this type of lower bound for the known NP-complete problems nor by entering into the difficulties regarding the properties of the many complexity classes established since the mid-1970s.
Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.
Deep Convolutional Neural Networks (CNNs) are a special type of Neural Networks, which have shown state-of-the-art results on various competitive benchmarks. The powerful learning ability of deep CNN is largely achieved with the use of multiple non-linear feature extraction stages that can automatically learn hierarchical representation from the data. Availability of a large amount of data and improvements in the hardware processing units have accelerated the research in CNNs and recently very interesting deep CNN architectures are reported. The recent race in deep CNN architectures for achieving high performance on the challenging benchmarks has shown that the innovative architectural ideas, as well as parameter optimization, can improve the CNN performance on various vision-related tasks. In this regard, different ideas in the CNN design have been explored such as use of different activation and loss functions, parameter optimization, regularization, and restructuring of processing units. However, the major improvement in representational capacity is achieved by the restructuring of the processing units. Especially, the idea of using a block as a structural unit instead of a layer is gaining substantial appreciation. This survey thus focuses on the intrinsic taxonomy present in the recently reported CNN architectures and consequently, classifies the recent innovations in CNN architectures into seven different categories. These seven categories are based on spatial exploitation, depth, multi-path, width, feature map exploitation, channel boosting and attention. Additionally, it covers the elementary understanding of the CNN components and sheds light on the current challenges and applications of CNNs.