Computer programming (coding) is indispensable for researchers across disciplines, yet it remains challenging to learn and time-consuming to carry out. Generative AI, particularly large language models (LLMs), has the potential to transform coding into intuitive conversations, but best practices and effective workflows are only emerging. We dissect AI-based coding through three key lenses: the nature and role of LLMs in coding (why), six types of coding assistance they provide (what), and a five-step workflow in action with practical implementation strategies (how). Additionally, we address the limitations and future outlook of AI in coding. By offering actionable insights, this framework helps to guide researchers in effectively leveraging AI to enhance coding practices and education, accelerating scientific progress.
Implications of uncertain objective functions and permutative symmetry of traditional deep learning architectures are discussed. It is shown that traditional architectures are polluted by an astronomical number of equivalent global and local optima. Uncertainty of the objective makes local optima unattainable, and, as the size of the network grows, the global optimization landscape likely becomes a tangled web of valleys and ridges. Some remedies which reduce or eliminate ghost optima are discussed including forced pre-pruning, re-ordering, ortho-polynomial activations, and modular bio-inspired architectures.
If an algorithm is to be counted as a practically working solution to a decision problem, then the algorithm must must verifiable in some constructed and ``trusted'' theory such as PA or ZF. In this paper, a class of decision problems related to inconsistency proofs for a general class of formal theories is used to demonstrate that under this constructibility restriction, an unformalizable yet arguably convincing argument can be given for the existence of decision problems for which there exists an explicitly constructible algorithm recognizing correct solutions in polynomial time, but for which there exists no explicitly constructible solution algorithm. While these results do not solve the P versus NP problem in the classical sense of supplying a proof one way or the other in a ``trusted'' formal theory, arguably they resolve the natural constructive version of the problem.
If the conclusion of a data analysis is sensitive to dropping very few data points, that conclusion might hinge on the particular data at hand rather than representing a more broadly applicable truth. How could we check whether this sensitivity holds? One idea is to consider every small subset of data, drop it from the dataset, and re-run our analysis. But running MCMC to approximate a Bayesian posterior is already very expensive; running multiple times is prohibitive, and the number of re-runs needed here is combinatorially large. Recent work proposes a fast and accurate approximation to find the worst-case dropped data subset, but that work was developed for problems based on estimating equations -- and does not directly handle Bayesian posterior approximations using MCMC. We make two principal contributions in the present work. We adapt the existing data-dropping approximation to estimators computed via MCMC. Observing that Monte Carlo errors induce variability in the approximation, we use a variant of the bootstrap to quantify this uncertainty. We demonstrate how to use our approximation in practice to determine whether there is non-robustness in a problem. Empirically, our method is accurate in simple models, such as linear regression. In models with complicated structure, such as hierarchical models, the performance of our method is mixed.
Stochastic gradient descent (SGD) is a workhorse algorithm for solving large-scale optimization problems in data science and machine learning. Understanding the convergence of SGD is hence of fundamental importance. In this work we examine the SGD convergence (with various step sizes) when applied to unconstrained convex quadratic programming (essentially least-squares (LS) problems), and in particular analyze the error components respect to the eigenvectors of the Hessian. The main message is that the convergence depends largely on the corresponding eigenvalues (singular values of the coefficient matrix in the LS context), namely the components for the large singular values converge faster in the initial phase. We then show there is a phase transition in the convergence where the convergence speed of the components, especially those corresponding to the larger singular values, will decrease. Finally, we show that the convergence of the overall error (in the solution) tends to decay as more iterations are run, that is, the initial convergence is faster than the asymptote.
The rapid development of modern artificial intelligence (AI) systems has created an urgent need for their scientific quantification. While their fluency across a variety of domains is impressive, modern AI systems fall short on tests requiring symbolic processing and abstraction - a glaring limitation given the necessity for interpretable and reliable technology. Despite a surge of reasoning benchmarks emerging from the academic community, no comprehensive and theoretically-motivated framework exists to quantify reasoning (and more generally, symbolic ability) in AI systems. Here, we adopt a framework from computational complexity theory to explicitly quantify symbolic generalization: algebraic circuit complexity. Many symbolic reasoning problems can be recast as algebraic expressions. Thus, algebraic circuit complexity theory - the study of algebraic expressions as circuit models (i.e., directed acyclic graphs) - is a natural framework to study the complexity of symbolic computation. The tools of algebraic circuit complexity enable the study of generalization by defining benchmarks in terms of their complexity-theoretic properties (i.e., the difficulty of a problem). Moreover, algebraic circuits are generic mathematical objects; for a given algebraic circuit, an arbitrarily large number of samples can be generated for a specific circuit, making it an optimal testbed for the data-hungry machine learning algorithms that are used today. Here, we adopt tools from algebraic circuit complexity theory, apply it to formalize a science of symbolic generalization, and address key theoretical and empirical challenges for its successful application to AI science and its impact on the broader community.
Physics-guided deep learning is an important prevalent research topic in scientific machine learning, which has tremendous potential in various complex applications including science and engineering. In these applications, data is expensive to acquire and high accuracy is required for making decisions. In this work, we introduce an efficient physics-guided deep learning framework for the variational modeling of nonlinear inverse problems, which is then applied to solve an electrical impedance tomography (EIT) inverse problem. The framework is achieved by unrolling the proposed Anderson accelerated Gauss-Newton (GNAA) algorithm into an end-to-end deep learning method. Firstly, we show the convergence of the GNAA algorithm in both cases: Anderson depth is equal to one and Anderson depth is greater than one. Then, we propose three types of strategies by combining the complementary strengths of GNAA and deep learning: GNAA of learned regularization (GNAA-LRNet), where the singular values of the regularization matrix are learned by a deep neural network; GNAA of learned proximity (GNAA-LPNet), where the regularization proximal operator is learned by using a deep neural network; GNAA of plug-and-play method (GNAA-PnPNet) where the regularization proximal operator is replaced by a pre-trained deep denoisers. Lastly, we present some numerical experiments to illustrate that the proposed approaches greatly improve the convergence rate and the quality of inverse solutions.
We propose a simple methodology to approximate functions with given asymptotic behavior by specifically constructed terms and an unconstrained deep neural network (DNN). The methodology we describe extends to various asymptotic behaviors and multiple dimensions and is easy to implement. In this work we demonstrate it for linear asymptotic behavior in one-dimensional examples. We apply it to function approximation and regression problems where we measure approximation of only function values (``Vanilla Machine Learning''-VML) or also approximation of function and derivative values (``Differential Machine Learning''-DML) on several examples. We see that enforcing given asymptotic behavior leads to better approximation and faster convergence.
The unpredictability of random numbers is fundamental to both digital security and applications that fairly distribute resources. However, existing random number generators have limitations-the generation processes cannot be fully traced, audited, and certified to be unpredictable. The algorithmic steps used in pseudorandom number generators are auditable, but they cannot guarantee that their outputs were a priori unpredictable given knowledge of the initial seed. Device-independent quantum random number generators can ensure that the source of randomness was unknown beforehand, but the steps used to extract the randomness are vulnerable to tampering. Here, for the first time, we demonstrate a fully traceable random number generation protocol based on device-independent techniques. Our protocol extracts randomness from unpredictable non-local quantum correlations, and uses distributed intertwined hash chains to cryptographically trace and verify the extraction process. This protocol is at the heart of a public traceable and certifiable quantum randomness beacon that we have launched. Over the first 40 days of operation, we completed the protocol 7434 out of 7454 attempts -- a success rate of 99.7%. Each time the protocol succeeded, the beacon emitted a pulse of 512 bits of traceable randomness. The bits are certified to be uniform with error times actual success probability bounded by $2^{-64}$. The generation of certifiable and traceable randomness represents one of the first public services that operates with an entanglement-derived advantage over comparable classical approaches.
The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard's universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants' predictions of the AI.
Deep learning is usually described as an experiment-driven field under continuous criticizes of lacking theoretical foundations. This problem has been partially fixed by a large volume of literature which has so far not been well organized. This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analyzing the generalizability of deep learning; (2) stochastic differential equations and their dynamic systems for modelling stochastic gradient descent and its variants, which characterize the optimization and generalization of deep learning, partially inspired by Bayesian inference; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; (4) the roles of over-parameterization of deep neural networks from both positive and negative perspectives; (5) theoretical foundations of several special structures in network architectures; and (6) the increasingly intensive concerns in ethics and security and their relationships with generalizability.