In various applied areas such as reliability engineering, molecular biology, finance, etc., the measure of uncertainty of a probability distribution plays an important role. In the present work, we consider the estimation of a function of the scale parameter, namely entropy of many exponential distributions having unknown and unequal location parameters with a common scale parameter. For this estimation problem, we have considered bowl-shaped location invariant loss functions. The inadmissibility of the minimum risk invariant estimator (MRIE) is proved by proposing a non-smooth improved estimator. Also, we have obtained a smooth estimator which improves upon the MRIE. As an application, we have obtained explicit expressions of improved estimators for two well-known loss functions namely squared error loss and linex loss. Further, we have shown that these estimators can be derived for other important censored sampling schemes. At first, we obtained the results for the complete and i.i.d. sample. We have seen that the results can be applied for (i) record values, (ii) type-II censoring, and (iii) progressive Type-II censoring. Finally, a simulation study has been carried out to compare the risk performance of the proposed improved estimators.
The role of symmetry in Boolean functions $f:\{0,1\}^n \to \{0,1\}$ has been extensively studied in complexity theory. For example, symmetric functions, that is, functions that are invariant under the action of $S_n$, is an important class of functions in the study of Boolean functions. A function $f:\{0,1\}^n \to \{0,1\}$ is called transitive (or weakly-symmetric) if there exists a transitive group $G$ of $S_n$ such that $f$ is invariant under the action of $G$ - that is the function value remains unchanged even after the bits of the input of $f$ are moved around according to some permutation $\sigma \in G$. Understanding various complexity measures of transitive functions has been a rich area of research for the past few decades. In this work, we study transitive functions in light of several combinatorial measures. We look at the maximum separation between various pairs of measures for transitive functions. Such study for general Boolean functions has been going on for past many years. The best-known results for general Boolean functions have been nicely compiled by Aaronson et. al (STOC, 2021). The separation between a pair of combinatorial measures is shown by constructing interesting functions that demonstrate the separation. But many of the celebrated separation results are via the construction of functions (like "pointer functions" from Ambainis et al. (JACM, 2017) and "cheat-sheet functions" Aaronson et al. (STOC, 2016)) that are not transitive. Hence, we don't have such separation between the pairs of measures for transitive functions. In this paper we show how to modify some of these functions to construct transitive functions that demonstrate similar separations between pairs of combinatorial measures.
Dynamic Time Warping (DTW) is a popular time series distance measure that aligns the points in two series with one another. These alignments support warping of the time dimension to allow for processes that unfold at differing rates. The distance is the minimum sum of costs of the resulting alignments over any allowable warping of the time dimension. The cost of an alignment of two points is a function of the difference in the values of those points. The original cost function was the absolute value of this difference. Other cost functions have been proposed. A popular alternative is the square of the difference. However, to our knowledge, this is the first investigation of both the relative impacts of using different cost functions and the potential to tune cost functions to different tasks. We do so in this paper by using a tunable cost function {\lambda}{\gamma} with parameter {\gamma}. We show that higher values of {\gamma} place greater weight on larger pairwise differences, while lower values place greater weight on smaller pairwise differences. We demonstrate that training {\gamma} significantly improves the accuracy of both the DTW nearest neighbor and Proximity Forest classifiers.
We propose a new distributed algorithm that combines heavy-ball momentum and a consensus-based gradient method to find a Nash equilibrium (NE) in a class of non-cooperative convex games with unconstrained action sets. In this approach, each agent in the game has access to its own smooth local cost function and can exchange information with its neighbors over a communication network. The proposed method is designed to work on a general sequence of time-varying directed graphs and allows for non-identical step-sizes and momentum parameters. Our work is the first to incorporate heavy-ball momentum in the context of non-cooperative games, and we provide a rigorous proof of its geometric convergence to the NE under the common assumptions of strong convexity and Lipschitz continuity of the agents' cost functions. Moreover, we establish explicit bounds for the step-size values and momentum parameters based on the characteristics of the cost functions, mixing matrices, and graph connectivity structures. To showcase the efficacy of our proposed method, we perform numerical simulations on a Nash-Cournot game to demonstrate its accelerated convergence compared to existing methods.
Minimum distance estimation methodology based on an empirical distribution function has been popular due to its desirable properties including robustness. Even though the statistical literature is awash with the research on the minimum distance estimation, the most of it is confined to the theoretical findings: only few statisticians conducted research on the application of the method to real world problems. Through this paper, we extend the domain of application of this methodology to various applied fields by providing a solution to a rather challenging and complicated computational problem. The problem this paper tackles is an image segmentation which has been used in various fields. We propose a novel method based on the classical minimum distance estimation theory to solve the image segmentation problem. The performance of the proposed method is then further elevated by integrating it with the ``segmenting-together" strategy. We demonstrate that the proposed method combined with the segmenting-together strategy successfully completes the segmentation problem when it is applied to the complex, real images such as magnetic resonance images.
We consider a scenario where we have access to the target domain, but cannot afford on-the-fly training data annotation, and instead would like to construct an alternative training set from a large-scale data pool such that a competitive model can be obtained. We propose a search and pruning (SnP) solution to this training data search problem, tailored to object re-identification (re-ID), an application aiming to match the same object captured by different cameras. Specifically, the search stage identifies and merges clusters of source identities which exhibit similar distributions with the target domain. The second stage, subject to a budget, then selects identities and their images from the Stage I output, to control the size of the resulting training set for efficient training. The two steps provide us with training sets 80\% smaller than the source pool while achieving a similar or even higher re-ID accuracy. These training sets are also shown to be superior to a few existing search methods such as random sampling and greedy sampling under the same budget on training data size. If we release the budget, training sets resulting from the first stage alone allow even higher re-ID accuracy. We provide interesting discussions on the specificity of our method to the re-ID problem and particularly its role in bridging the re-ID domain gap. The code is available at //github.com/yorkeyao/SnP.
We consider two-step estimation of latent variable models, in which just the measurement model is estimated in the first step and the measurement parameters are then fixed at their estimated values in the second step where the structural model is estimated. We show how this approach can be implemented for latent trait models (item response theory models) where the latent variables are continuous and their measurement indicators are categorical variables. The properties of two-step estimators are examined using simulation studies and applied examples. They perform well, and have attractive practical and conceptual properties compared to the alternative one-step and three-step approaches. These results are in line with previous findings for other families of latent variable models. This provides strong evidence that two-step estimation is a flexible and useful general method of estimation for different types of latent variable models.
In this paper we study the properties of the Lasso estimator of the drift component in the diffusion setting. More specifically, we consider a multivariate parametric diffusion model $X$ observed continuously over the interval $[0,T]$ and investigate drift estimation under sparsity constraints. We allow the dimensions of the model and the parameter space to be large. We obtain an oracle inequality for the Lasso estimator and derive an error bound for the $L^2$-distance using concentration inequalities for linear functionals of diffusion processes. The probabilistic part is based upon elements of empirical processes theory and, in particular, on the chaining method.
Many real-world dynamical systems are associated with first integrals (a.k.a. invariant quantities), which are quantities that remain unchanged over time. The discovery and understanding of first integrals are fundamental and important topics both in the natural sciences and in industrial applications. First integrals arise from the conservation laws of system energy, momentum, and mass, and from constraints on states; these are typically related to specific geometric structures of the governing equations. Existing neural networks designed to ensure such first integrals have shown excellent accuracy in modeling from data. However, these models incorporate the underlying structures, and in most situations where neural networks learn unknown systems, these structures are also unknown. This limitation needs to be overcome for scientific discovery and modeling of unknown systems. To this end, we propose first integral-preserving neural differential equation (FINDE). By leveraging the projection method and the discrete gradient method, FINDE finds and preserves first integrals from data, even in the absence of prior knowledge about underlying structures. Experimental results demonstrate that FINDE can predict future states of target systems much longer and find various quantities consistent with well-known first integrals in a unified manner.
Computer science research has led to many breakthrough innovations but has also been scrutinized for enabling technology that has negative, unintended consequences for society. Given the increasing discussions of ethics in the news and among researchers, we interviewed 20 researchers in various CS sub-disciplines to identify whether and how they consider potential unintended consequences of their research innovations. We show that considering unintended consequences is generally seen as important but rarely practiced. Principal barriers are a lack of formal process and strategy as well as the academic practice that prioritizes fast progress and publications. Drawing on these findings, we discuss approaches to support researchers in routinely considering unintended consequences, from bringing diverse perspectives through community participation to increasing incentives to investigate potential consequences. We intend for our work to pave the way for routine explorations of the societal implications of technological innovations before, during, and after the research process.
Learning on big data brings success for artificial intelligence (AI), but the annotation and training costs are expensive. In future, learning on small data is one of the ultimate purposes of AI, which requires machines to recognize objectives and scenarios relying on small data as humans. A series of machine learning models is going on this way such as active learning, few-shot learning, deep clustering. However, there are few theoretical guarantees for their generalization performance. Moreover, most of their settings are passive, that is, the label distribution is explicitly controlled by one specified sampling scenario. This survey follows the agnostic active sampling under a PAC (Probably Approximately Correct) framework to analyze the generalization error and label complexity of learning on small data using a supervised and unsupervised fashion. With these theoretical analyses, we categorize the small data learning models from two geometric perspectives: the Euclidean and non-Euclidean (hyperbolic) mean representation, where their optimization solutions are also presented and discussed. Later, some potential learning scenarios that may benefit from small data learning are then summarized, and their potential learning scenarios are also analyzed. Finally, some challenging applications such as computer vision, natural language processing that may benefit from learning on small data are also surveyed.