亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

We develop a unifying framework for interpolatory $\mathcal{L}_2$-optimal reduced-order modeling for a wide classes of problems ranging from stationary models to parametric dynamical systems. We first show that the framework naturally covers the well-known interpolatory necessary conditions for $\mathcal{H}_2$-optimal model order reduction and leads to the interpolatory conditions for $\mathcal{H}_2 \otimes \mathcal{L}_2$-optimal model order reduction of multi-input/multi-output parametric dynamical systems. Moreover, we derive novel interpolatory optimality conditions for rational discrete least-squares minimization and for $\mathcal{L}_2$-optimal model order reduction of a class of parametric stationary models. We show that bitangential Hermite interpolation appears as the main tool for optimality across different domains. The theoretical results are illustrated on two numerical examples.

相關內容

ACM/IEEE第23屆模型驅動工程語言和系統國際會議,是模型驅動軟件和系統工程的首要會議系列,由ACM-SIGSOFT和IEEE-TCSE支持組織。自1998年以來,模型涵蓋了建模的各個方面,從語言和方法到工具和應用程序。模特的參加者來自不同的背景,包括研究人員、學者、工程師和工業專業人士。MODELS 2019是一個論壇,參與者可以圍繞建模和模型驅動的軟件和系統交流前沿研究成果和創新實踐經驗。今年的版本將為建模社區提供進一步推進建模基礎的機會,并在網絡物理系統、嵌入式系統、社會技術系統、云計算、大數據、機器學習、安全、開源等新興領域提出建模的創新應用以及可持續性。 官網鏈接: · 估計/估計量 · 核化 · 泛化理論 · Learning ·
2023 年 5 月 9 日

We consider the problem of learning functions in the $\mathcal{F}_{p,\pi}$ and Barron spaces, which are natural function spaces that arise in the high-dimensional analysis of random feature models (RFMs) and two-layer neural networks. Through a duality analysis, we reveal that the approximation and estimation of these spaces can be considered equivalent in a certain sense. This enables us to focus on the easier problem of approximation and estimation when studying the generalization of both models. The dual equivalence is established by defining an information-based complexity that can effectively control estimation errors. Additionally, we demonstrate the flexibility of our duality framework through comprehensive analyses of two concrete applications. The first application is to study learning functions in $\mathcal{F}_{p,\pi}$ with RFMs. We prove that the learning does not suffer from the curse of dimensionality as long as $p>1$, implying RFMs can work beyond the kernel regime. Our analysis extends existing results [CMM21] to the noisy case and removes the requirement of overparameterization. The second application is to investigate the learnability of reproducing kernel Hilbert space (RKHS) under the $L^\infty$ metric. We derive both lower and upper bounds of the minimax estimation error by using the spectrum of the associated kernel. We then apply these bounds to dot-product kernels and analyze how they scale with the input dimension. Our results suggest that learning with ReLU (random) features is generally intractable in terms of reaching high uniform accuracy.

Dispersion relation reflects the dependence of wave frequency on its wave vector when the wave passes through certain material. It demonstrates the properties of this material and thus it is critical. However, dispersion relation reconstruction is very time consuming and expensive. To address this bottleneck, we propose in this paper an efficient dispersion relation reconstruction scheme based on global polynomial interpolation for the approximation of 2D photonic band functions. Our method relies on the fact that the band functions are piecewise analytic with respect to the wave vector in the first Brillouin zone. We utilize suitable sampling points in the first Brillouin zone at which we solve the eigenvalue problem involved in the band function calculation, and then employ Lagrange interpolation to approximate the band functions on the whole first Brillouin zone. Numerical results show that our proposed methods can significantly improve the computational efficiency.

A joint mix is a random vector with a constant component-wise sum. The dependence structure of a joint mix minimizes some common objectives such as the variance of the component-wise sum, and it is regarded as a concept of extremal negative dependence. In this paper, we explore the connection between the joint mix structure and popular notions of negative dependence in statistics, such as negative correlation dependence, negative orthant dependence and negative association. A joint mix is not always negatively dependent in any of the above senses, but some natural classes of joint mixes are. We derive various necessary and sufficient conditions for a joint mix to be negatively dependent, and study the compatibility of these notions. For identical marginal distributions, we show that a negatively dependent joint mix solves a multi-marginal optimal transport problem for quadratic cost under a novel setting of uncertainty. Analysis of this optimal transport problem with heterogeneous marginals reveals a trade-off between negative dependence and the joint mix structure.

Many geometry processing techniques require the solution of partial differential equations (PDEs) on surfaces. Such surface PDEs often involve boundary conditions prescribed on the surface, at points or curves on its interior or along the geometric (exterior) boundary of an open surface. However, input surfaces can take many forms (e.g., meshes, parametric surfaces, point clouds, level sets, neural implicits). One must therefore generate a mesh to apply finite element-type techniques or derive specialized discretization procedures for each surface representation. We propose instead to address such problems through a novel extension of the closest point method (CPM) to handle interior boundary conditions specified at surface points or curves. CPM solves the surface PDE by solving a volumetric PDE defined over the Cartesian embedding space containing the surface; only a closest point function is required to represent the surface. As such, CPM supports surfaces that are open or closed, orientable or not, and of any codimension or even mixed-codimension. To enable support for interior boundary conditions, we develop a method to implicitly partition the embedding space across interior boundaries. CPM's finite difference and interpolation stencils are adapted to respect this partition while preserving second-order accuracy. Furthermore, an efficient sparse-grid implementation and numerical solver is developed that can scale to tens of millions of degrees of freedom, allowing PDEs to be solved on more complex surfaces. We demonstrate our method's convergence behaviour on selected model PDEs. Several geometry processing problems are explored: diffusion curves on surfaces, geodesic distance, tangent vector field design, and harmonic map construction. Our proposed approach thus offers a powerful and flexible new tool for a range of geometry processing tasks on general surface representations.

Let a polytope $P$ be defined by a system $A x \leq b$. We consider the problem of counting the number of integer points inside $P$, assuming that $P$ is $\Delta$-modular, where the polytope $P$ is called $\Delta$-modular if all the rank sub-determinants of $A$ are bounded by $\Delta$ in the absolute value. We present a new FPT-algorithm, parameterized by $\Delta$ and by the maximal number of vertices in $P$, where the maximum is taken by all r.h.s. vectors $b$. We show that our algorithm is more efficient for $\Delta$-modular problems than the approach of A. Barvinok et al. To this end, we do not directly compute the short rational generating function for $P \cap Z^n$, which is commonly used for the considered problem. Instead, we use the dynamic programming principle to compute its particular representation in the form of exponential series that depends on a single variable. We completely do not rely to the Barvinok's unimodular sign decomposition technique. Using our new complexity bound, we consider different special cases that may be of independent interest. For example, we give FPT-algorithms for counting the integer points number in $\Delta$-modular simplices and similar polytopes that have $n + O(1)$ facets. As a special case, for any fixed $m$, we give an FPT-algorithm to count solutions of the unbounded $m$-dimensional $\Delta$-modular subset-sum problem.

We propose a new auto-regressive model for the statistical analysis of multivariate distributional time series. The data of interest consist of a collection of multiple series of probability measures supported over a bounded interval of the real line, and that are indexed by distinct time instants. The probability measures are modelled as random objects in the Wasserstein space. We establish the auto-regressive model in the tangent space at the Lebesgue measure by first centering all the raw measures so that their Fr\'echet means turn to be the Lebesgue measure. Using the theory of iterated random function systems, results on the existence, uniqueness and stationarity of the solution of such a model are provided. We also propose a consistent estimator for the model coefficient. In addition to the analysis of simulated data, the proposed model is illustrated with two real data sets made of observations from age distribution in different countries and bike sharing network in Paris. Finally, due to the positive and boundedness constraints that we impose on the model coefficients, the proposed estimator that is learned under these constraints, naturally has a sparse structure. The sparsity allows furthermore the application of the proposed model in learning a graph of temporal dependency from the multivariate distributional time series.

The ability of neural networks to represent more features than neurons makes interpreting them challenging. This phenomenon, known as superposition, has spurred efforts to find architectures that are more interpretable than standard multilayer perceptrons (MLPs) with elementwise activation functions. In this note, I examine bilinear layers, which are a type of MLP layer that are mathematically much easier to analyze while simultaneously performing better than standard MLPs. Although they are nonlinear functions of their input, I demonstrate that bilinear layers can be expressed using only linear operations and third order tensors. We can integrate this expression for bilinear layers into a mathematical framework for transformer circuits, which was previously limited to attention-only transformers. These results suggest that bilinear layers are easier to analyze mathematically than current architectures and thus may lend themselves to deeper safety insights by allowing us to talk more formally about circuits in neural networks. Additionally, bilinear layers may offer an alternative path for mechanistic interpretability through understanding the mechanisms of feature construction instead of enumerating a (potentially exponentially) large number of features in large models.

We study the basic statistical problem of testing whether normally distributed $n$-dimensional data has been truncated, i.e. altered by only retaining points that lie in some unknown truncation set $S \subseteq \mathbb{R}^n$. As our main algorithmic results, (1) We give a computationally efficient $O(n)$-sample algorithm that can distinguish the standard normal distribution $N(0,I_n)$ from $N(0,I_n)$ conditioned on an unknown and arbitrary convex set $S$. (2) We give a different computationally efficient $O(n)$-sample algorithm that can distinguish $N(0,I_n)$ from $N(0,I_n)$ conditioned on an unknown and arbitrary mixture of symmetric convex sets. These results stand in sharp contrast with known results for learning or testing convex bodies with respect to the normal distribution or learning convex-truncated normal distributions, where state-of-the-art algorithms require essentially $n^{\sqrt{n}}$ samples. An easy argument shows that no finite number of samples suffices to distinguish $N(0,I_n)$ from an unknown and arbitrary mixture of general (not necessarily symmetric) convex sets, so no common generalization of results (1) and (2) above is possible. We also prove that any algorithm (computationally efficient or otherwise) that can distinguish $N(0,I_n)$ from $N(0,I_n)$ conditioned on an unknown symmetric convex set must use $\Omega(n)$ samples. This shows that the sample complexity of each of our algorithms is optimal up to a constant factor.

Training a neural network (NN) typically relies on some type of curve-following method, such as gradient descent (GD) (and stochastic gradient descent (SGD)), ADADELTA, ADAM or limited memory algorithms. Convergence for these algorithms usually relies on having access to a large quantity of observations in order to achieve a high level of accuracy and, with certain classes of functions, these algorithms could take multiple epochs of data points to catch on. Herein, a different technique with the potential of achieving dramatically better speeds of convergence, especially for shallow networks, is explored: it does not curve-follow but rather relies on 'decoupling' hidden layers and on updating their weighted connections through bootstrapping, resampling and linear regression. By utilizing resampled observations, the convergence of this process is empirically shown to be remarkably fast and to require a lower amount of data points: in particular, our experiments show that one needs a fraction of the observations that are required with traditional neural network training methods to approximate various classes of functions.

Class Incremental Learning (CIL) aims at learning a multi-class classifier in a phase-by-phase manner, in which only data of a subset of the classes are provided at each phase. Previous works mainly focus on mitigating forgetting in phases after the initial one. However, we find that improving CIL at its initial phase is also a promising direction. Specifically, we experimentally show that directly encouraging CIL Learner at the initial phase to output similar representations as the model jointly trained on all classes can greatly boost the CIL performance. Motivated by this, we study the difference between a na\"ively-trained initial-phase model and the oracle model. Specifically, since one major difference between these two models is the number of training classes, we investigate how such difference affects the model representations. We find that, with fewer training classes, the data representations of each class lie in a long and narrow region; with more training classes, the representations of each class scatter more uniformly. Inspired by this observation, we propose Class-wise Decorrelation (CwD) that effectively regularizes representations of each class to scatter more uniformly, thus mimicking the model jointly trained with all classes (i.e., the oracle model). Our CwD is simple to implement and easy to plug into existing methods. Extensive experiments on various benchmark datasets show that CwD consistently and significantly improves the performance of existing state-of-the-art methods by around 1\% to 3\%. Code will be released.

北京阿比特科技有限公司