亚洲男人的天堂2018av,欧美草比,久久久久久免费视频精选,国色天香在线看免费,久久久久亚洲av成人片仓井空

Regular transition systems (RTS) are a popular formalism for modeling infinite-state systems in general, and parameterised systems in particular. In a CONCUR 22 paper, Esparza et al. introduce a novel approach to the verification of RTS, based on inductive invariants. The approach computes the intersection of all inductive invariants of a given RTS that can be expressed as CNF formulas with a bounded number of clauses, and uses it to construct an automaton recognising an overapproximation of the reachable configurations. The paper shows that the problem of deciding if the language of this automaton intersects a given regular set of unsafe configurations is in $\textsf{EXPSPACE}$ and $\textsf{PSPACE}$-hard. We introduce $\textit{regular abstraction frameworks}$, a generalisation of the approach of Esparza et al., very similar to the regular abstractions of Hong and Lin. A framework consists of a regular language of $\textit{constraints}$, and a transducer, called the $\textit{interpretation}$, that assigns to each constraint the set of configurations of the RTS satisfying it. Examples of regular abstraction frameworks include the formulas of Esparza et al., octagons, bounded difference matrices, and views. We show that the generalisation of the decision problem above to regular abstraction frameworks remains in $\textsf{EXPSPACE}$, and prove a matching (highly non-trivial) $\textsf{EXPSPACE}$-hardness bound. $\textsf{EXPSPACE}$-hardness implies that, in the worst case, the automaton recognising the overapproximation of the reachable configurations has a double-exponential number of states. We introduce a learning algorithm that computes this automaton in a lazy manner, stopping whenever the current hypothesis is already strong enough to prove safety. We report on an implementation and show that our experimental results improve on those of Esparza et al.

相關內容

A deep latent variable model is a powerful method for capturing complex distributions. These models assume that underlying structures, but unobserved, are present within the data. In this dissertation, we explore high-dimensional problems related to physiological monitoring using latent variable models. First, we present a novel deep state-space model to generate electrical waveforms of the heart using optically obtained signals as inputs. This can bring about clinical diagnoses of heart disease via simple assessment through wearable devices. Second, we present a brain signal modeling scheme that combines the strengths of probabilistic graphical models and deep adversarial learning. The structured representations can provide interpretability and encode inductive biases to reduce the data complexity of neural oscillations. The efficacy of the learned representations is further studied in epilepsy seizure detection formulated as an unsupervised learning problem. Third, we propose a framework for the joint modeling of physiological measures and behavior. Existing methods to combine multiple sources of brain data provided are limited. Direct analysis of the relationship between different types of physiological measures usually does not involve behavioral data. Our method can identify the unique and shared contributions of brain regions to behavior and can be used to discover new functions of brain regions. The success of these innovative computational methods would allow the translation of biomarker findings across species and provide insight into neurocognitive analysis in numerous biological studies and clinical diagnoses, as well as emerging consumer applications.

Coding theory revolves around the incorporation of redundancy into transmitted symbols, computation tasks, and stored data to guard against adversarial manipulation. However, error correction in coding theory is contingent upon a strict trust assumption. In the context of computation and storage, it is required that honest nodes outnumber adversarial ones by a certain margin. However, in several emerging real-world cases, particularly, in decentralized blockchain-oriented applications, such assumptions are often unrealistic. Consequently, despite the important role of coding in addressing significant challenges within decentralized systems, its applications become constrained. Still, in decentralized platforms, a distinctive characteristic emerges, offering new avenues for secure coding beyond the constraints of conventional methods. In these scenarios, the adversary benefits when the legitimate decoder recovers the data, and preferably with a high estimation error. This incentive motivates them to act rationally, trying to maximize their gains. In this paper, we propose a game theoretic formulation for coding, called the game of coding, that captures this unique dynamic where each of the adversary and the data collector (decoder) have a utility function to optimize. The utility functions reflect the fact that both the data collector and the adversary are interested in increasing the chance of data being recoverable by the data collector. Moreover, the utility functions express the interest of the data collector to estimate the input with lower estimation error, but the opposite interest of the adversary. As a first, still highly non-trivial step, we characterize the equilibrium of the game for the repetition code with a repetition factor of 2, for a wide class of utility functions with minimal assumptions.

With the advancement of diffusion models (DMs) and the substantially increased computational requirements, quantization emerges as a practical solution to obtain compact and efficient low-bit DMs. However, the highly discrete representation leads to severe accuracy degradation, hindering the quantization of diffusion models to ultra-low bit-widths. This paper proposes a novel quantization-aware training approach for DMs, namely BinaryDM. The proposed method pushes DMs' weights toward accurate and efficient binarization, considering the representation and computation properties. From the representation perspective, we present a Learnable Multi-basis Binarizer (LMB) to recover the representations generated by the binarized DM. The LMB enhances detailed information through the flexible combination of dual binary bases while applying to parameter-sparse locations of DM architectures to achieve minor burdens. From the optimization perspective, a Low-rank Representation Mimicking (LRM) is applied to assist the optimization of binarized DMs. The LRM mimics the representations of full-precision DMs in low-rank space, alleviating the direction ambiguity of the optimization process caused by fine-grained alignment. Moreover, a quick progressive warm-up is applied to BinaryDM, avoiding convergence difficulties by layerwisely progressive quantization at the beginning of training. Comprehensive experiments demonstrate that BinaryDM achieves significant accuracy and efficiency gains compared to SOTA quantization methods of DMs under ultra-low bit-widths. With 1.1-bit weight and 4-bit activation (W1.1A4), BinaryDM achieves as low as 7.11 FID and saves the performance from collapse (baseline FID 39.69). As the first binarization method for diffusion models, W1.1A4 BinaryDM achieves impressive 9.3 times OPs and 24.8 times model size savings, showcasing its substantial potential for edge deployment.

Complex dynamical systems are notoriously difficult to model because some degrees of freedom (e.g., small scales) may be computationally unresolvable or are incompletely understood, yet they are dynamically important. For example, the small scales of cloud dynamics and droplet formation are crucial for controlling climate, yet are unresolvable in global climate models. Semi-empirical closure models for the effects of unresolved degrees of freedom often exist and encode important domain-specific knowledge. Building on such closure models and correcting them through learning the structural errors can be an effective way of fusing data with domain knowledge. Here we describe a general approach, principles, and algorithms for learning about structural errors. Key to our approach is to include structural error models inside the models of complex systems, for example, in closure models for unresolved scales. The structural errors then map, usually nonlinearly, to observable data. As a result, however, mismatches between model output and data are only indirectly informative about structural errors, due to a lack of labeled pairs of inputs and outputs of structural error models. Additionally, derivatives of the model may not exist or be readily available. We discuss how structural error models can be learned from indirect data with derivative-free Kalman inversion algorithms and variants, how sparsity constraints enforce a "do no harm" principle, and various ways of modeling structural errors. We also discuss the merits of using non-local and/or stochastic error models. In addition, we demonstrate how data assimilation techniques can assist the learning about structural errors in non-ergodic systems. The concepts and algorithms are illustrated in two numerical examples based on the Lorenz-96 system and a human glucose-insulin model.

While traditional distributionally robust optimization (DRO) aims to minimize the maximal risk over a set of distributions, Agarwal and Zhang (2022) recently proposed a variant that replaces risk with excess risk. Compared to DRO, the new formulation$\unicode{x2013}$minimax excess risk optimization (MERO) has the advantage of suppressing the effect of heterogeneous noise in different distributions. However, the choice of excess risk leads to a very challenging minimax optimization problem, and currently there exists only an inefficient algorithm for empirical MERO. In this paper, we develop efficient stochastic approximation approaches which directly target MERO. Specifically, we leverage techniques from stochastic convex optimization to estimate the minimal risk of every distribution, and solve MERO as a stochastic convex-concave optimization (SCCO) problem with biased gradients. The presence of bias makes existing theoretical guarantees of SCCO inapplicable, and fortunately, we demonstrate that the bias, caused by the estimation error of the minimal risk, is under-control. Thus, MERO can still be optimized with a nearly optimal convergence rate. Moreover, we investigate a practical scenario where the quantity of samples drawn from each distribution may differ, and propose a stochastic approach that delivers distribution-dependent convergence rates.

This work uniquely identifies and characterizes four prevalent multimodal model architectural patterns in the contemporary multimodal landscape. Systematically categorizing models by architecture type facilitates monitoring of developments in the multimodal domain. Distinct from recent survey papers that present general information on multimodal architectures, this research conducts a comprehensive exploration of architectural details and identifies four specific architectural types. The types are distinguished by their respective methodologies for integrating multimodal inputs into the deep neural network model. The first two types (Type A and B) deeply fuses multimodal inputs within the internal layers of the model, whereas the following two types (Type C and D) facilitate early fusion at the input stage. Type-A employs standard cross-attention, whereas Type-B utilizes custom-designed layers for modality fusion within the internal layers. On the other hand, Type-C utilizes modality-specific encoders, while Type-D leverages tokenizers to process the modalities at the model's input stage. The identified architecture types aid the monitoring of any-to-any multimodal model development. Notably, Type-C and Type-D are currently favored in the construction of any-to-any multimodal models. Type-C, distinguished by its non-tokenizing multimodal model architecture, is emerging as a viable alternative to Type-D, which utilizes input-tokenizing techniques. To assist in model selection, this work highlights the advantages and disadvantages of each architecture type based on data and compute requirements, architecture complexity, scalability, simplification of adding modalities, training objectives, and any-to-any multimodal generation capability.

Bayesian hypothesis tests leverage posterior probabilities, Bayes factors, or credible intervals to inform data-driven decision making. We propose a framework for power curve approximation with such hypothesis tests. We present a fast approach to explore the approximate sampling distribution of posterior probabilities when the conditions for the Bernstein-von Mises theorem are satisfied. We extend that approach to consider segments of such sampling distributions in a targeted manner for each sample size explored. These sampling distribution segments are used to construct power curves for various types of posterior analyses. Our resulting method for power curve approximation is orders of magnitude faster than conventional power curve estimation for Bayesian hypothesis tests. We also prove the consistency of the corresponding power estimates and sample size recommendations under certain conditions.

Large Language Models (LLMs) have shown excellent generalization capabilities that have led to the development of numerous models. These models propose various new architectures, tweaking existing architectures with refined training strategies, increasing context length, using high-quality training data, and increasing training time to outperform baselines. Analyzing new developments is crucial for identifying changes that enhance training stability and improve generalization in LLMs. This survey paper comprehensively analyses the LLMs architectures and their categorization, training strategies, training datasets, and performance evaluations and discusses future research directions. Moreover, the paper also discusses the basic building blocks and concepts behind LLMs, followed by a complete overview of LLMs, including their important features and functions. Finally, the paper summarizes significant findings from LLM research and consolidates essential architectural and training strategies for developing advanced LLMs. Given the continuous advancements in LLMs, we intend to regularly update this paper by incorporating new sections and featuring the latest LLM models.

Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.

Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph. This allows one to apply GCNs not only in scenarios where the given graph is incomplete or corrupted but also in those where a graph is not available. We conduct a series of experiments that analyze the behavior of the proposed method and demonstrate that it outperforms related methods by a significant margin.

北京阿比特科技有限公司