亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Symbolic Regression (SR) algorithms attempt to learn analytic expressions which fit data accurately and in a highly interpretable manner. Conventional SR suffers from two fundamental issues which we address here. First, these methods search the space stochastically (typically using genetic programming) and hence do not necessarily find the best function. Second, the criteria used to select the equation optimally balancing accuracy with simplicity have been variable and subjective. To address these issues we introduce Exhaustive Symbolic Regression (ESR), which systematically and efficiently considers all possible equations -- made with a given basis set of operators and up to a specified maximum complexity -- and is therefore guaranteed to find the true optimum (if parameters are perfectly optimised) and a complete function ranking subject to these constraints. We implement the minimum description length principle as a rigorous method for combining these preferences into a single objective. To illustrate the power of ESR we apply it to a catalogue of cosmic chronometers and the Pantheon+ sample of supernovae to learn the Hubble rate as a function of redshift, finding $\sim$40 functions (out of 5.2 million trial functions) that fit the data more economically than the Friedmann equation. These low-redshift data therefore do not uniquely prefer the expansion history of the standard model of cosmology. We make our code and full equation sets publicly available.

相關內容

Let $G$ be a graph, which represents a social network, and suppose each node $v$ has a threshold value $\tau(v)$. Consider an initial configuration, where each node is either positive or negative. In each discrete time step, a node $v$ becomes/remains positive if at least $\tau(v)$ of its neighbors are positive and negative otherwise. A node set $\mathcal{S}$ is a Target Set (TS) whenever the following holds: if $\mathcal{S}$ is fully positive initially, all nodes in the graph become positive eventually. We focus on a generalization of TS, called Timed TS (TTS), where it is permitted to assign a positive state to a node at any step of the process, rather than just at the beginning. We provide graph structures for which the minimum TTS is significantly smaller than the minimum TS, indicating that timing is an essential aspect of successful target selection strategies. Furthermore, we prove tight bounds on the minimum size of a TTS in terms of the number of nodes and maximum degree when the thresholds are assigned based on the majority rule. We show that the problem of determining the minimum size of a TTS is NP-hard and provide an Integer Linear Programming formulation and a greedy algorithm. We evaluate the performance of our algorithm by conducting experiments on various synthetic and real-world networks. We also present a linear-time exact algorithm for trees.

In this paper, we focus on the high-dimensional double sparse structure, where the parameter of interest simultaneously encourages group-wise sparsity and element-wise sparsity in each group. By combining the Gilbert-Varshamov bound and its variants, we develop a novel lower bound technique for the metric entropy of the parameter space, specifically tailored for the double sparse structure over $\ell_u(\ell_q)$-balls with $u,q \in [0,1]$. We prove lower bounds on the estimation error using an information-theoretic approach, leveraging our proposed lower bound technique and Fano's inequality. To complement the lower bounds, we establish matching upper bounds through a direct analysis of constrained least-squares estimators and utilize results from empirical processes. A significant finding of our study is the discovery of a phase transition phenomenon in the minimax rates for $u,q \in (0, 1]$. Furthermore, we extend the theoretical results to the double sparse regression model and determine its minimax rate for estimation error. To tackle double sparse linear regression, we develop the DSIHT (Double Sparse Iterative Hard Thresholding) algorithm, demonstrating its optimality in the minimax sense. Finally, we demonstrate the superiority of our method through numerical experiments.

Covariate shift in regression problems and the associated distribution mismatch between training and test data is a commonly encountered phenomenon in machine learning. In this paper, we extend recent results on nonparametric convergence rates for i.i.d. data to Markovian dependence structures. We demonstrate that under H\"older smoothness assumptions on the regression function, convergence rates for the generalization risk of a Nadaraya-Watson kernel estimator are determined by the similarity between the invariant distributions associated to source and target Markov chains. The similarity is explicitly captured in terms of a bandwidth-dependent similarity measure recently introduced in Pathak, Ma and Wainwright [ICML, 2022]. Precise convergence rates are derived for the particular cases of finite Markov chains and spectral gap Markov chains for which the similarity measure between their invariant distributions grows polynomially with decreasing bandwidth. For the latter, we extend the notion of a distribution transfer exponent from Kpotufe and Martinet [Ann. Stat., 49(6), 2021] to kernel transfer exponents of uniformly ergodic Markov chains in order to generate a rich class of Markov kernel pairs for which convergence guarantees for the covariate shift problem can be formulated.

The classical latent factor model for linear regression is extended by assuming that, up to an unknown orthogonal transformation, the features consist of subsets that are relevant and irrelevant for the response. Furthermore, a joint low-dimensionality is imposed only on the relevant features vector and the response variable. This framework allows for a comprehensive study of the partial-least-squares (PLS) algorithm under random design. In particular, a novel perturbation bound for PLS solutions is proven and the high-probability $L^2$-estimation rate for the PLS estimator is obtained. This novel framework also sheds light on the performance of other regularisation methods for ill-posed linear regression that exploit sparsity or unsupervised projection. The theoretical findings are confirmed by numerical studies on both real and simulated data.

We present a simple linear regression based approach for learning the weights and biases of a neural network, as an alternative to standard gradient based backpropagation. The present work is exploratory in nature, and we restrict the description and experiments to (i) simple feedforward neural networks, (ii) scalar (single output) regression problems, and (iii) invertible activation functions. However, the approach is intended to be extensible to larger, more complex architectures. The key idea is the observation that the input to every neuron in a neural network is a linear combination of the activations of neurons in the previous layer, as well as the parameters (weights and biases) of the layer. If we are able to compute the ideal total input values to every neuron by working backwards from the output, we can formulate the learning problem as a linear least squares problem which iterates between updating the parameters and the activation values. We present an explicit algorithm that implements this idea, and we show that (at least for small problems) the approach is more stable and faster than gradient-based methods.

In the kernelized bandit problem, a learner aims to sequentially compute the optimum of a function lying in a reproducing kernel Hilbert space given only noisy evaluations at sequentially chosen points. In particular, the learner aims to minimize regret, which is a measure of the suboptimality of the choices made. Arguably the most popular algorithm is the Gaussian Process Upper Confidence Bound (GP-UCB) algorithm, which involves acting based on a simple linear estimator of the unknown function. Despite its popularity, existing analyses of GP-UCB give a suboptimal regret rate, which fails to be sublinear for many commonly used kernels such as the Mat\'ern kernel. This has led to a longstanding open question: are existing regret analyses for GP-UCB tight, or can bounds be improved by using more sophisticated analytical techniques? In this work, we resolve this open question and show that GP-UCB enjoys nearly optimal regret. In particular, our results directly imply sublinear regret rates for the Mat\'ern kernel, improving over the state-of-the-art analyses and partially resolving a COLT open problem posed by Vakili et al. Our improvements rely on two key technical results. First, we use modern supermartingale techniques to construct a novel, self-normalized concentration inequality that greatly simplifies existing approaches. Second, we address the importance of regularizing in proportion to the smoothness of the underlying kernel $k$. Together, these new technical tools enable a simplified, tighter analysis of the GP-UCB algorithm.

The widespread use of maximum Jeffreys'-prior penalized likelihood in binomial-response generalized linear models, and in logistic regression, in particular, are supported by the results of Kosmidis and Firth (2021, Biometrika), who show that the resulting estimates are also always finite-valued, even in cases where the maximum likelihood estimates are not, which is a practical issue regardless of the size of the data set. In logistic regression, the implied adjusted score equations are formally bias-reducing in asymptotic frameworks with a fixed number of parameters and appear to deliver a substantial reduction in the persistent bias of the maximum likelihood estimator in high-dimensional settings where the number of parameters grows asymptotically linearly and slower than the number of observations. In this work, we develop and present two new variants of iteratively reweighted least squares for estimating generalized linear models with adjusted score equations for mean bias reduction and maximization of the likelihood penalized by a positive power of the Jeffreys-prior penalty, which eliminate the requirement of storing $O(n)$ quantities in memory, and can operate with data sets that exceed computer memory or even hard drive capacity. We achieve that through incremental QR decompositions, which enable IWLS iterations to have access only to data chunks of predetermined size. We assess the procedures through a real-data application with millions of observations, and in high-dimensional logistic regression, where a large-scale simulation experiment produces concrete evidence for the existence of a simple adjustment to the maximum Jeffreys'-penalized likelihood estimates that delivers high accuracy in terms of signal recovery even in cases where estimates from ML and other recently-proposed corrective methods do not exist.

Density-functional theory (DFT) has revolutionized computer simulations in chemistry and material science. A faithful implementation of the theory requires self-consistent calculations. However, this effort involves repeatedly diagonalizing the Hamiltonian, for which a classical algorithm typically requires a computational complexity that scales cubically with respect to the number of electrons. This limits DFT's applicability to large-scale problems with complex chemical environments and microstructures. This article presents a quantum algorithm that has a linear scaling with respect to the number of atoms, which is much smaller than the number of electrons. Our algorithm leverages the quantum singular value transformation (QSVT) to generate a quantum circuit to encode the density-matrix, and an estimation method for computing the output electron density. In addition, we present a randomized block coordinate fixed-point method to accelerate the self-consistent field calculations by reducing the number of components of the electron density that needs to be estimated. The proposed framework is accompanied by a rigorous error analysis that quantifies the function approximation error, the statistical fluctuation, and the iteration complexity. In particular, the analysis of our self-consistent iterations takes into account the measurement noise from the quantum circuit. These advancements offer a promising avenue for tackling large-scale DFT problems, enabling simulations of complex systems that were previously computationally infeasible.

Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.

Convolutional neural networks (CNNs) have shown dramatic improvements in single image super-resolution (SISR) by using large-scale external samples. Despite their remarkable performance based on the external dataset, they cannot exploit internal information within a specific image. Another problem is that they are applicable only to the specific condition of data that they are supervised. For instance, the low-resolution (LR) image should be a "bicubic" downsampled noise-free image from a high-resolution (HR) one. To address both issues, zero-shot super-resolution (ZSSR) has been proposed for flexible internal learning. However, they require thousands of gradient updates, i.e., long inference time. In this paper, we present Meta-Transfer Learning for Zero-Shot Super-Resolution (MZSR), which leverages ZSSR. Precisely, it is based on finding a generic initial parameter that is suitable for internal learning. Thus, we can exploit both external and internal information, where one single gradient update can yield quite considerable results. (See Figure 1). With our method, the network can quickly adapt to a given image condition. In this respect, our method can be applied to a large spectrum of image conditions within a fast adaptation process.

北京阿比特科技有限公司