We describe a method to parse a complex, cluttered indoor scene into primitives which offer a parsimonious abstraction of scene structure. Our primitives are simple convexes. Our method uses a learned regression procedure to parse a scene into a fixed number of convexes from RGBD input, and can optionally accept segmentations to improve the decomposition. The result is then polished with a descent method which adjusts the convexes to produce a very good fit, and greedily removes superfluous primitives. Because the entire scene is parsed, we can evaluate using traditional depth, normal, and segmentation error metrics. Our evaluation procedure demonstrates that the error from our primitive representation is comparable to that of predicting depth from a single image.
We consider outlier-robust and sparse estimation of linear regression coefficients, when the covariates and the noises are contaminated by adversarial outliers and noises are sampled from a heavy-tailed distribution. Our results present sharper error bounds under weaker assumptions than prior studies that share similar interests with this study. Our analysis relies on some sharp concentration inequalities resulting from generic chaining.
We derive optimality conditions for the optimum sample allocation problem in stratified sampling, formulated as the determination of the fixed strata sample sizes that minimize the total cost of the survey, under the assumed level of variance of the stratified $\pi$ estimator of the population total (or mean) and one-sided upper bounds imposed on sample sizes in strata. In this context, we presume that the variance function is of some generic form that, in particular, covers the case of the simple random sampling without replacement design in strata. The optimality conditions mentioned above will be derived from the Karush-Kuhn-Tucker conditions. Based on the established optimality conditions, we provide a formal proof of the optimality of the existing procedure, termed here as LRNA, which solves the allocation problem considered. We formulate the LRNA in such a way that it also provides the solution to the classical optimum allocation problem (i.e. minimization of the estimator's variance under a fixed total cost) under one-sided lower bounds imposed on sample sizes in strata. In this context, the LRNA can be considered as a counterparty to the popular recursive Neyman allocation procedure that is used to solve the classical problem of an optimum sample allocation with added one-sided upper bounds. Ready-to-use R-implementation of the LRNA is available through our stratallo package, which is published on the Comprehensive R Archive Network (CRAN) package repository.
There have been recent advances in the analysis and visualization of 3D symmetric tensor fields, with a focus on the robust extraction of tensor field topology. However, topological features such as degenerate curves and neutral surfaces do not live in isolation. Instead, they intriguingly interact with each other. In this paper, we introduce the notion of {\em topological graph} for 3D symmetric tensor fields to facilitate global topological analysis of such fields. The nodes of the graph include degenerate curves and regions bounded by neutral surfaces in the domain. The edges in the graph denote the adjacency information between the regions and degenerate curves. In addition, we observe that a degenerate curve can be a loop and even a knot and that two degenerate curves (whether in the same region or not) can form a link. We provide a definition and theoretical analysis of individual degenerate curves in order to help understand why knots and links may occur. Moreover, we differentiate between wedges and trisectors, thus making the analysis more detailed about degenerate curves. We incorporate this information into the topological graph. Such a graph can not only reveal the global structure in a 3D symmetric tensor field but also allow two symmetric tensor fields to be compared. We demonstrate our approach by applying it to solid mechanics and material science data sets.
We present an exact approach to analyze and quantify the sensitivity of higher moments of probabilistic loops with symbolic parameters, polynomial arithmetic and potentially uncountable state spaces. Our approach integrates methods from symbolic computation, probability theory, and static analysis in order to automatically capture sensitivity information about probabilistic loops. Sensitivity information allows us to formally establish how value distributions of probabilistic loop variables influence the functional behavior of loops, which can in particular be helpful when choosing values of loop variables in order to ensure efficient/expected computations. Our work uses algebraic techniques to model higher moments of loop variables via linear recurrence equations and introduce the notion of sensitivity recurrences. We show that sensitivity recurrences precisely model loop sensitivities, even in cases where the moments of loop variables do not satisfy a system of linear recurrences. As such, we enlarge the class of probabilistic loops for which sensitivity analysis was so far feasible. We demonstrate the success of our approach while analyzing the sensitivities of probabilistic loops.
We initiate the study of voting rules for participatory budgeting using the so-called epistemic approach, where one interprets votes as noisy reflections of some ground truth regarding the objectively best set of projects to fund. Using this approach, we first show that both the most studied rules in the literature and the most widely used rule in practice cannot be justified on epistemic grounds: they cannot be interpreted as maximum likelihood estimators, whatever assumptions we make about the accuracy of voters. Focusing then on welfare-maximising rules, we obtain both positive and negative results regarding epistemic guarantees.
Large Language Models (LLMs) have shown excellent generalization capabilities that have led to the development of numerous models. These models propose various new architectures, tweaking existing architectures with refined training strategies, increasing context length, using high-quality training data, and increasing training time to outperform baselines. Analyzing new developments is crucial for identifying changes that enhance training stability and improve generalization in LLMs. This survey paper comprehensively analyses the LLMs architectures and their categorization, training strategies, training datasets, and performance evaluations and discusses future research directions. Moreover, the paper also discusses the basic building blocks and concepts behind LLMs, followed by a complete overview of LLMs, including their important features and functions. Finally, the paper summarizes significant findings from LLM research and consolidates essential architectural and training strategies for developing advanced LLMs. Given the continuous advancements in LLMs, we intend to regularly update this paper by incorporating new sections and featuring the latest LLM models.
The concept of causality plays an important role in human cognition . In the past few decades, causal inference has been well developed in many fields, such as computer science, medicine, economics, and education. With the advancement of deep learning techniques, it has been increasingly used in causal inference against counterfactual data. Typically, deep causal models map the characteristics of covariates to a representation space and then design various objective optimization functions to estimate counterfactual data unbiasedly based on the different optimization methods. This paper focuses on the survey of the deep causal models, and its core contributions are as follows: 1) we provide relevant metrics under multiple treatments and continuous-dose treatment; 2) we incorporate a comprehensive overview of deep causal models from both temporal development and method classification perspectives; 3) we assist a detailed and comprehensive classification and analysis of relevant datasets and source code.
Transformer, an attention-based encoder-decoder architecture, has revolutionized the field of natural language processing. Inspired by this significant achievement, some pioneering works have recently been done on adapting Transformerliked architectures to Computer Vision (CV) fields, which have demonstrated their effectiveness on various CV tasks. Relying on competitive modeling capability, visual Transformers have achieved impressive performance on multiple benchmarks such as ImageNet, COCO, and ADE20k as compared with modern Convolution Neural Networks (CNN). In this paper, we have provided a comprehensive review of over one hundred different visual Transformers for three fundamental CV tasks (classification, detection, and segmentation), where a taxonomy is proposed to organize these methods according to their motivations, structures, and usage scenarios. Because of the differences in training settings and oriented tasks, we have also evaluated these methods on different configurations for easy and intuitive comparison instead of only various benchmarks. Furthermore, we have revealed a series of essential but unexploited aspects that may empower Transformer to stand out from numerous architectures, e.g., slack high-level semantic embeddings to bridge the gap between visual and sequential Transformers. Finally, three promising future research directions are suggested for further investment.
Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.
We describe the new field of mathematical analysis of deep learning. This field emerged around a list of research questions that were not answered within the classical framework of learning theory. These questions concern: the outstanding generalization power of overparametrized neural networks, the role of depth in deep architectures, the apparent absence of the curse of dimensionality, the surprisingly successful optimization performance despite the non-convexity of the problem, understanding what features are learned, why deep architectures perform exceptionally well in physical problems, and which fine aspects of an architecture affect the behavior of a learning task in which way. We present an overview of modern approaches that yield partial answers to these questions. For selected approaches, we describe the main ideas in more detail.