The resolution of the P vs. NP problem, a cornerstone in computational theory, remains elusive despite extensive exploration through mathematical logic and algorithmic theory. This paper takes a novel approach by integrating information theory, thermodynamics, and computational complexity, offering a comprehensive landscape of interdisciplinary study. We focus on entropy, a concept traditionally linked with uncertainty and disorder, and reinterpret it to assess the complexity of computational problems. Our research presents a structured framework for establishing entropy profiles within computational tasks, enabling a clear distinction between P and NP-classified problems. This framework quantifies the 'information cost' associated with these problem categories, highlighting their intrinsic computational complexity. We introduce Entropy-Driven Annealing (EDA) as a new method to decipher the energy landscapes of computational problems, focusing on the unique characteristics of NP problems. This method proposes a differential thermodynamic profile for NP problems in contrast to P problems and explores potential thermodynamic routes for finding polynomial-time solutions to NP challenges. Our introduction of EDA and its application to complex computational problems like the Boolean satisfiability problem (SAT) and protein-DNA complexes suggests a potential pathway toward unraveling the intricacies of the P vs. NP problem.
Approximation theory plays a central role in numerical analysis, undergoing continuous evolution through a spectrum of methodologies. Notably, Lebesgue, Weierstrass, Fourier, and Chebyshev approximations stand out among these methods. However, each technique possesses inherent limitations, underscoring the critical importance of selecting an appropriate approximation method tailored to specific problem domains. This article delves into the utilization of Chebyshev polynomials at Chebyshev nodes for approximation. For sufficiently smooth functions, the partial sum of Chebyshev series expansion offers optimal polynomial approximation, rendering it a preferred choice in various applications such as digital signal processing and graph filters due to its computational efficiency. In this article, we focus on functions of bounded variation, which find numerous applications across mathematical physics, hyperbolic conservations, and optimization. We present two optimal error estimations associated with Chebyshev polynomial approximations tailored for such functions. To validate our theoretical assertions, we conduct numerical experiments. Additionally, we delineate promising future avenues aligned with this research, particularly within the realms of machine learning and related fields.
Language Models (LMs) acquire parametric knowledge from their training process, embedding it within their weights. The increasing scalability of LMs, however, poses significant challenges for understanding a model's inner workings and further for updating or correcting this embedded knowledge without the significant cost of retraining. This underscores the importance of unveiling exactly what knowledge is stored and its association with specific model components. Instance Attribution (IA) and Neuron Attribution (NA) offer insights into this training-acquired knowledge, though they have not been compared systematically. Our study introduces a novel evaluation framework to quantify and compare the knowledge revealed by IA and NA. To align the results of the methods we introduce the attribution method NA-Instances to apply NA for retrieving influential training instances, and IA-Neurons to discover important neurons of influential instances discovered by IA. We further propose a comprehensive list of faithfulness tests to evaluate the comprehensiveness and sufficiency of the explanations provided by both methods. Through extensive experiments and analysis, we demonstrate that NA generally reveals more diverse and comprehensive information regarding the LM's parametric knowledge compared to IA. Nevertheless, IA provides unique and valuable insights into the LM's parametric knowledge, which are not revealed by NA. Our findings further suggest the potential of a synergistic approach of combining the diverse findings of IA and NA for a more holistic understanding of an LM's parametric knowledge.
Reconfigurable Intelligent Surface (RIS) modeling and optimization are a crucial steps in developing the next generation of wireless communications. To this aim, the availability of accurate electromagnetic (EM) models is of paramount important for the design of RIS-assisted communication links. In this work, we validate a widely-used analytical multiport network for RISs by means of a well-established full-wave numerical method based on the Partial Elements Equivalent Circuit (PEEC) approach. Numerical results show good agreement between the two methods, thus demonstrating i) the considered multiport network model being effective and ii) the PEEC method being appropriate for EM modeling of RIS-assisted wireless links.
Large Language Models (LLMs) have highlighted the necessity of effective unlearning mechanisms to comply with data regulations and ethical AI practices. LLM unlearning aims at removing undesired data influences and associated model capabilities without compromising utility out of the scope of unlearning. While interest in studying LLM unlearning is growing,the impact of the optimizer choice for LLM unlearning remains under-explored. In this work, we shed light on the significance of optimizer selection in LLM unlearning for the first time, establishing a clear connection between {second-order optimization} and influence unlearning (a classical approach using influence functions to update the model for data influence removal). This insight propels us to develop a second-order unlearning framework, termed SOUL, built upon the second-order clipped stochastic optimization (Sophia)-based LLM training method. SOUL extends the static, one-shot model update using influence unlearning to a dynamic, iterative unlearning process. Our extensive experiments show that SOUL consistently outperforms conventional first-order methods across various unlearning tasks, models, and metrics, suggesting the promise of second-order optimization in providing a scalable and easily implementable solution for LLM unlearning.
Machine Learning (ML) systems are becoming instrumental in the public sector, with applications spanning areas like criminal justice, social welfare, financial fraud detection, and public health. While these systems offer great potential benefits to institutional decision-making processes, such as improved efficiency and reliability, they still face the challenge of aligning nuanced policy objectives with the precise formalization requirements necessitated by ML models. In this paper, we aim to bridge the gap between ML model requirements and public sector decision-making by presenting a comprehensive overview of key technical challenges where disjunctions between policy goals and ML models commonly arise. We concentrate on pivotal points of the ML pipeline that connect the model to its operational environment, discussing the significance of representative training data and highlighting the importance of a model setup that facilitates effective decision-making. Additionally, we link these challenges with emerging methodological advancements, encompassing causal ML, domain adaptation, uncertainty quantification, and multi-objective optimization, illustrating the path forward for harmonizing ML and public sector objectives.
Decision-making, a complicated task requiring various types of abilities, presents an excellent framework for assessing Large Language Models (LLMs). Our research investigates LLMs' decision-making capabilities through the lens of a well-established field, Game Theory. We focus specifically on games that support the participation of more than two agents simultaneously. Subsequently, we introduce our framework, GAMA-Bench, including eight classical multi-agent games. We design a scoring scheme to assess a model's performance in these games quantitatively. Through GAMA-Bench, we investigate LLMs' robustness, generalizability, and enhancement strategies. Results reveal that while GPT-3.5 shows satisfying robustness, its generalizability is relatively limited. However, its performance can be improved through approaches such as Chain-of-Thought. Additionally, we conduct evaluations across various LLMs and find that GPT-4 outperforms other models on GAMA-Bench, achieving a score of 60.5. Moreover, Gemini-1.0-Pro and GPT-3.5 (0613, 1106, 0125) demonstrate similar intelligence on GAMA-Bench. The code and experimental results are made publicly available via //github.com/CUHK-ARISE/GAMABench.
The Unique Games Conjecture (UGC) constitutes a highly dynamic subarea within computational complexity theory, intricately linked to the outstanding P versus NP problem. Despite multiple insightful results in the past few years, a proof for the conjecture remains elusive. In this work, we construct a novel dynamical systems-based approach for studying unique games and, more generally, the field of computational complexity. We propose a family of dynamical systems whose equilibria correspond to solutions of unique games and prove that unsatisfiable instances lead to ergodic dynamics. Moreover, as the instance hardness increases, the weight of the invariant measure in the vicinity of the optimal assignments scales polynomially, sub-exponentially, or exponentially depending on the value gap. We numerically reproduce a previously hypothesized hardness plot associated with the UGC. Our results indicate that the UGC is likely true, subject to our proposed conjectures that link dynamical systems theory with computational complexity.
Deep Learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval and more. However, with the progressive improvements in deep learning models, their number of parameters, latency, resources required to train, etc. have all have increased significantly. Consequently, it has become important to pay attention to these footprint metrics of a model as well, not just its quality. We present and motivate the problem of efficiency in deep learning, followed by a thorough survey of the five core areas of model efficiency (spanning modeling techniques, infrastructure, and hardware) and the seminal work there. We also present an experiment-based guide along with code, for practitioners to optimize their model training and deployment. We believe this is the first comprehensive survey in the efficient deep learning space that covers the landscape of model efficiency from modeling techniques to hardware support. Our hope is that this survey would provide the reader with the mental model and the necessary understanding of the field to apply generic efficiency techniques to immediately get significant improvements, and also equip them with ideas for further research and experimentation to achieve additional gains.
Influenced by the stunning success of deep learning in computer vision and language understanding, research in recommendation has shifted to inventing new recommender models based on neural networks. In recent years, we have witnessed significant progress in developing neural recommender models, which generalize and surpass traditional recommender models owing to the strong representation power of neural networks. In this survey paper, we conduct a systematic review on neural recommender models, aiming to summarize the field to facilitate future progress. Distinct from existing surveys that categorize existing methods based on the taxonomy of deep learning techniques, we instead summarize the field from the perspective of recommendation modeling, which could be more instructive to researchers and practitioners working on recommender systems. Specifically, we divide the work into three types based on the data they used for recommendation modeling: 1) collaborative filtering models, which leverage the key source of user-item interaction data; 2) content enriched models, which additionally utilize the side information associated with users and items, like user profile and item knowledge graph; and 3) context enriched models, which account for the contextual information associated with an interaction, such as time, location, and the past interactions. After reviewing representative works for each type, we finally discuss some promising directions in this field, including benchmarking recommender systems, graph reasoning based recommendation models, and explainable and fair recommendations for social good.
The problem of Multiple Object Tracking (MOT) consists in following the trajectory of different objects in a sequence, usually a video. In recent years, with the rise of Deep Learning, the algorithms that provide a solution to this problem have benefited from the representational power of deep models. This paper provides a comprehensive survey on works that employ Deep Learning models to solve the task of MOT on single-camera videos. Four main steps in MOT algorithms are identified, and an in-depth review of how Deep Learning was employed in each one of these stages is presented. A complete experimental comparison of the presented works on the three MOTChallenge datasets is also provided, identifying a number of similarities among the top-performing methods and presenting some possible future research directions.