亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Many machine learning problems can be framed in the context of estimating functions, and often these are time-dependent functions that are estimated in real-time as observations arrive. Gaussian processes (GPs) are an attractive choice for modeling real-valued nonlinear functions due to their flexibility and uncertainty quantification. However, the typical GP regression model suffers from several drawbacks: 1) Conventional GP inference scales $O(N^{3})$ with respect to the number of observations; 2) Updating a GP model sequentially is not trivial; and 3) Covariance kernels typically enforce stationarity constraints on the function, while GPs with non-stationary covariance kernels are often intractable to use in practice. To overcome these issues, we propose a sequential Monte Carlo algorithm to fit infinite mixtures of GPs that capture non-stationary behavior while allowing for online, distributed inference. Our approach empirically improves performance over state-of-the-art methods for online GP estimation in the presence of non-stationarity in time-series data. To demonstrate the utility of our proposed online Gaussian process mixture-of-experts approach in applied settings, we show that we can sucessfully implement an optimization algorithm using online Gaussian process bandits.

相關內容

We initiate a principled study of algorithmic collective action on digital platforms that deploy machine learning algorithms. We propose a simple theoretical model of a collective interacting with a firm's learning algorithm. The collective pools the data of participating individuals and executes an algorithmic strategy by instructing participants how to modify their own data to achieve a collective goal. We investigate the consequences of this model in three fundamental learning-theoretic settings: the case of a nonparametric optimal learning algorithm, a parametric risk minimizer, and gradient-based optimization. In each setting, we come up with coordinated algorithmic strategies and characterize natural success criteria as a function of the collective's size. Complementing our theory, we conduct systematic experiments on a skill classification task involving tens of thousands of resumes from a gig platform for freelancers. Through more than two thousand model training runs of a BERT-like language model, we see a striking correspondence emerge between our empirical observations and the predictions made by our theory. Taken together, our theory and experiments broadly support the conclusion that algorithmic collectives of exceedingly small fractional size can exert significant control over a platform's learning algorithm.

In this paper, we present a kernel-based, multi-task Gaussian Process (GP) model for approximating the underlying function of an individual's mobility state using a time-inhomogeneous Markov Process with two states: moves and pauses. Our approach accounts for the correlations between the transition probabilities by creating a covariance matrix over the tasks. We also introduce time-variability by assuming that an individual's transition probabilities vary over time in response to exogenous variables. We enforce the stochasticity and non-negativity constraints of probabilities in a Markov process through the incorporation of a set of constraint points in the GP. We also discuss opportunities to speed up GP estimation and inference in this context by exploiting Toeplitz and Kronecker product structures. Our numerical experiments demonstrate the ability of our formulation to enforce the desired constraints while learning the functional form of transition probabilities.

Spatial processes observed in various fields, such as climate and environmental science, often occur on a large scale and demonstrate spatial nonstationarity. Fitting a Gaussian process with a nonstationary Mat\'ern covariance is challenging. Previous studies in the literature have tackled this challenge by employing spatial partitioning techniques to estimate the parameters that vary spatially in the covariance function. The selection of partitions is an important consideration, but it is often subjective and lacks a data-driven approach. To address this issue, in this study, we utilize the power of Convolutional Neural Networks (ConvNets) to derive subregions from the nonstationary data. We employ a selection mechanism to identify subregions that exhibit similar behavior to stationary fields. In order to distinguish between stationary and nonstationary random fields, we conducted training on ConvNet using various simulated data. These simulations are generated from Gaussian processes with Mat\'ern covariance models under a wide range of parameter settings, ensuring adequate representation of both stationary and nonstationary spatial data. We assess the performance of the proposed method with synthetic and real datasets at a large scale. The results revealed enhanced accuracy in parameter estimations when relying on ConvNet-based partition compared to traditional user-defined approaches.

Learning the graphical structure of Bayesian networks is key to describing data-generating mechanisms in many complex applications but poses considerable computational challenges. Observational data can only identify the equivalence class of the directed acyclic graph underlying a Bayesian network model, and a variety of methods exist to tackle the problem. Under certain assumptions, the popular PC algorithm can consistently recover the correct equivalence class by reverse-engineering the conditional independence (CI) relationships holding in the variable distribution. The dual PC algorithm is a novel scheme to carry out the CI tests within the PC algorithm by leveraging the inverse relationship between covariance and precision matrices. By exploiting block matrix inversions we can also perform tests on partial correlations of complementary (or dual) conditioning sets. The multiple CI tests of the dual PC algorithm proceed by first considering marginal and full-order CI relationships and progressively moving to central-order ones. Simulation studies show that the dual PC algorithm outperforms the classic PC algorithm both in terms of run time and in recovering the underlying network structure, even in the presence of deviations from Gaussianity. Additionally, we show that the dual PC algorithm applies for Gaussian copula models, and demonstrate its performance in that setting.

Given sparse observations of buoy velocities, oceanographers are interested in reconstructing ocean currents away from the buoys and identifying divergences in a current vector field. As a first and modular step, we focus on the time-stationary case - for instance, by restricting to short time periods. Since we expect current velocity to be a continuous but highly non-linear function of spatial location, Gaussian processes (GPs) offer an attractive model. But we show that applying a GP with a standard stationary kernel directly to buoy data can struggle at both current reconstruction and divergence identification, due to some physically unrealistic prior assumptions. To better reflect known physical properties of currents, we propose to instead put a standard stationary kernel on the divergence and curl-free components of a vector field obtained through a Helmholtz decomposition. We show that, because this decomposition relates to the original vector field just via mixed partial derivatives, we can still perform inference given the original data with only a small constant multiple of additional computational expense. We illustrate the benefits of our method with theory and experiments on synthetic and real ocean data.

If the assumed model does not accurately capture the underlying structure of the data, a statistical method is likely to yield sub-optimal results, and so model selection is crucial in order to conduct any statistical analysis. However, in case of massive datasets, the selection of an appropriate model from a large pool of candidates becomes computationally challenging, and limited research has been conducted on data selection for model selection. In this study, we conduct subdata selection based on the A-optimality criterion, allowing to perform model selection on a smaller subset of the data. We evaluate our approach based on the probability of selecting the best model and on the estimation efficiency through simulation experiments and two real data applications.

This paper considers a single-trajectory system identification problem for linear systems under general nonlinear and/or time-varying policies with i.i.d. random excitation noises. The problem is motivated by safe learning-based control for constrained linear systems, where the safe policies during the learning process are usually nonlinear and time-varying for satisfying the state and input constraints. In this paper, we provide a non-asymptotic error bound for least square estimation when the data trajectory is generated by any nonlinear and/or time-varying policies as long as the generated state and action trajectories are bounded. This significantly generalizes the existing non-asymptotic guarantees for linear system identification, which usually consider i.i.d. random inputs or linear policies. Interestingly, our error bound is consistent with that for linear policies with respect to the dependence on the trajectory length, system dimensions, and excitation levels. Lastly, we demonstrate the applications of our results by safe learning with robust model predictive control and provide numerical analysis.

Learning a nonparametric system of ordinary differential equations (ODEs) from $n$ trajectory snapshots in a $d$-dimensional state space requires learning $d$ functions of $d$ variables. Explicit formulations scale quadratically in $d$ unless additional knowledge about system properties, such as sparsity and symmetries, is available. In this work, we propose a linear approach to learning using the implicit formulation provided by vector-valued Reproducing Kernel Hilbert Spaces. By rewriting the ODEs in a weaker integral form, which we subsequently minimize, we derive our learning algorithm. The minimization problem's solution for the vector field relies on multivariate occupation kernel functions associated with the solution trajectories. We validate our approach through experiments on highly nonlinear simulated and real data, where $d$ may exceed 100. We further demonstrate the versatility of the proposed method by learning a nonparametric first order quasilinear partial differential equation.

Learning interpretable representations of neural dynamics at a population level is a crucial first step to understanding how observed neural activity relates to perception and behavior. Models of neural dynamics often focus on either low-dimensional projections of neural activity, or on learning dynamical systems that explicitly relate to the neural state over time. We discuss how these two approaches are interrelated by considering dynamical systems as representative of flows on a low-dimensional manifold. Building on this concept, we propose a new decomposed dynamical system model that represents complex non-stationary and nonlinear dynamics of time series data as a sparse combination of simpler, more interpretable components. Our model is trained through a dictionary learning procedure, where we leverage recent results in tracking sparse vectors over time. The decomposed nature of the dynamics is more expressive than previous switched approaches for a given number of parameters and enables modeling of overlapping and non-stationary dynamics. In both continuous-time and discrete-time instructional examples we demonstrate that our model can well approximate the original system, learn efficient representations, and capture smooth transitions between dynamical modes, focusing on intuitive low-dimensional non-stationary linear and nonlinear systems. Furthermore, we highlight our model's ability to efficiently capture and demix population dynamics generated from multiple independent subnetworks, a task that is computationally impractical for switched models. Finally, we apply our model to neural "full brain" recordings of C. elegans data, illustrating a diversity of dynamics that is obscured when classified into discrete states.

We present a new method to learn video representations from large-scale unlabeled video data. Ideally, this representation will be generic and transferable, directly usable for new tasks such as action recognition and zero or few-shot learning. We formulate unsupervised representation learning as a multi-modal, multi-task learning problem, where the representations are shared across different modalities via distillation. Further, we introduce the concept of loss function evolution by using an evolutionary search algorithm to automatically find optimal combination of loss functions capturing many (self-supervised) tasks and modalities. Thirdly, we propose an unsupervised representation evaluation metric using distribution matching to a large unlabeled dataset as a prior constraint, based on Zipf's law. This unsupervised constraint, which is not guided by any labeling, produces similar results to weakly-supervised, task-specific ones. The proposed unsupervised representation learning results in a single RGB network and outperforms previous methods. Notably, it is also more effective than several label-based methods (e.g., ImageNet), with the exception of large, fully labeled video datasets.

北京阿比特科技有限公司