For parameter estimation of continuous and discrete distributions, we propose a generalization of the method of moments (MM), where Stein identities are utilized for improved estimation performance. The construction of these Stein-type MM-estimators makes use of a weight function as implied by an appropriate form of the Stein identity. Our general approach as well as potential benefits thereof are first illustrated by the simple example of the exponential distribution. Afterward, we investigate the more sophisticated two-parameter inverse Gaussian distribution and the two-parameter negative-binomial distribution in great detail, together with illustrative real-world data examples. Given an appropriate choice of the respective weight functions, their Stein-MM estimators, which are defined by simple closed-form formulas and allow for closed-form asymptotic computations, exhibit a better performance regarding bias and mean squared error than competing estimators.
In proof-theoretic semantics, meaning is based on inference. It may seen as the mathematical expression of the inferentialist interpretation of logic. Much recent work has focused on base-extension semantics, in which the validity of formulas is given by an inductive definition generated by provability in a `base' of atomic rules. Base-extension semantics for classical and intuitionistic propositional logic have been explored by several authors. In this paper, we develop base-extension semantics for the classical propositional modal systems K, KT , K4, and S4, with $\square$ as the primary modal operator. We establish appropriate soundness and completeness theorems and establish the duality between $\square$ and a natural presentation of $\lozenge$. We also show that our semantics is in its current form not complete with respect to euclidean modal logics. Our formulation makes essential use of relational structures on bases.
In this paper, we explore two fundamental first-order algorithms in convex optimization, namely, gradient descent (GD) and proximal gradient method (ProxGD). Our focus is on making these algorithms entirely adaptive by leveraging local curvature information of smooth functions. We propose adaptive versions of GD and ProxGD that are based on observed gradient differences and, thus, have no added computational costs. Moreover, we prove convergence of our methods assuming only local Lipschitzness of the gradient. In addition, the proposed versions allow for even larger stepsizes than those initially suggested in [MM20].
In data analysis, there continues to be a need for interpretable dimensionality reduction methods whereby instrinic meaning associated with the data is retained in the reduced space. Standard approaches such as Principal Component Analysis (PCA) and the Singular Value Decomposition (SVD) fail at this task. A popular alternative is the CUR decomposition. In an SVD-like manner, the CUR decomposition approximates a matrix $A \in \mathbb{R}^{m \times n}$ as $A \approx CUR$, where $C$ and $R$ are matrices whose columns and rows are selected from the original matrix \cite{goreinov1997theory}, \cite{mahoney2009cur}. The difficulty in constructing a CUR decomposition is in determining which columns and rows to select when forming $C$ and $R$. Current column/row selection algorithms, particularly those that rely on an SVD, become infeasible as the size of the data becomes large \cite{dong2021simpler}. We address this problem by reducing the column/row selection problem to a collection of smaller sub-problems. The basic idea is to first partition the rows/columns of a matrix, and then apply an existing selection algorithm on each piece; for illustration purposes we use the Discrete Empirical Interpolation Method (\textsf{DEIM}) \cite{sorensen2016deim}. For the first task, we consider two existing algorithms that construct a Voronoi Tessellation (VT) of the rows and columns of a given matrix. We then extend these methods to automatically adapt to the data. The result is four data-driven row/column selection methods that are well-suited for parallelization, and compatible with nearly any existing column/row selection strategy. Theory and numerical examples show the design to be competitive with the original \textsf{DEIM} routine.
Dense 3D reconstruction has many applications in automated driving including automated annotation validation, multimodal data augmentation, providing ground truth annotations for systems lacking LiDAR, as well as enhancing auto-labeling accuracy. LiDAR provides highly accurate but sparse depth, whereas camera images enable estimation of dense depth but noisy particularly at long ranges. In this paper, we harness the strengths of both sensors and propose a multimodal 3D scene reconstruction using a framework combining neural implicit surfaces and radiance fields. In particular, our method estimates dense and accurate 3D structures and creates an implicit map representation based on signed distance fields, which can be further rendered into RGB images, and depth maps. A mesh can be extracted from the learned signed distance field and culled based on occlusion. Dynamic objects are efficiently filtered on the fly during sampling using 3D object detection models. We demonstrate qualitative and quantitative results on challenging automotive scenes.
Private selection mechanisms (e.g., Report Noisy Max, Sparse Vector) are fundamental primitives of differentially private (DP) data analysis with wide applications to private query release, voting, and hyperparameter tuning. Recent work (Liu and Talwar, 2019; Papernot and Steinke, 2022) has made significant progress in both generalizing private selection mechanisms and tightening their privacy analysis using modern numerical privacy accounting tools, e.g., R\'enyi DP. But R\'enyi DP is known to be lossy when $(\epsilon,\delta)$-DP is ultimately needed, and there is a trend to close the gap by directly handling privacy profiles, i.e., $\delta$ as a function of $\epsilon$ or its equivalent dual form known as $f$-DPs. In this paper, we work out an easy-to-use recipe that bounds the privacy profiles of ReportNoisyMax and PrivateTuning using the privacy profiles of the base algorithms they corral. Numerically, our approach improves over the RDP-based accounting in all regimes of interest and leads to substantial benefits in end-to-end private learning experiments. Our analysis also suggests new distributions, e.g., binomial distribution for randomizing the number of rounds that leads to more substantial improvements in certain regimes.
Data augmentation has been widely used to improve generalizability of machine learning models. However, comparatively little work studies data augmentation for graphs. This is largely due to the complex, non-Euclidean structure of graphs, which limits possible manipulation operations. Augmentation operations commonly used in vision and language have no analogs for graphs. Our work studies graph data augmentation for graph neural networks (GNNs) in the context of improving semi-supervised node-classification. We discuss practical and theoretical motivations, considerations and strategies for graph data augmentation. Our work shows that neural edge predictors can effectively encode class-homophilic structure to promote intra-class edges and demote inter-class edges in given graph structure, and our main contribution introduces the GAug graph data augmentation framework, which leverages these insights to improve performance in GNN-based node classification via edge prediction. Extensive experiments on multiple benchmarks show that augmentation via GAug improves performance across GNN architectures and datasets.
The recent proliferation of knowledge graphs (KGs) coupled with incomplete or partial information, in the form of missing relations (links) between entities, has fueled a lot of research on knowledge base completion (also known as relation prediction). Several recent works suggest that convolutional neural network (CNN) based models generate richer and more expressive feature embeddings and hence also perform well on relation prediction. However, we observe that these KG embeddings treat triples independently and thus fail to cover the complex and hidden information that is inherently implicit in the local neighborhood surrounding a triple. To this effect, our paper proposes a novel attention based feature embedding that captures both entity and relation features in any given entity's neighborhood. Additionally, we also encapsulate relation clusters and multihop relations in our model. Our empirical study offers insights into the efficacy of our attention based model and we show marked performance gains in comparison to state of the art methods on all datasets.
Recently, ensemble has been applied to deep metric learning to yield state-of-the-art results. Deep metric learning aims to learn deep neural networks for feature embeddings, distances of which satisfy given constraint. In deep metric learning, ensemble takes average of distances learned by multiple learners. As one important aspect of ensemble, the learners should be diverse in their feature embeddings. To this end, we propose an attention-based ensemble, which uses multiple attention masks, so that each learner can attend to different parts of the object. We also propose a divergence loss, which encourages diversity among the learners. The proposed method is applied to the standard benchmarks of deep metric learning and experimental results show that it outperforms the state-of-the-art methods by a significant margin on image retrieval tasks.
Automatic License Plate Recognition (ALPR) has been a frequent topic of research due to many practical applications. However, many of the current solutions are still not robust in real-world situations, commonly depending on many constraints. This paper presents a robust and efficient ALPR system based on the state-of-the-art YOLO object detection. The Convolutional Neural Networks (CNNs) are trained and fine-tuned for each ALPR stage so that they are robust under different conditions (e.g., variations in camera, lighting, and background). Specially for character segmentation and recognition, we design a two-stage approach employing simple data augmentation tricks such as inverted License Plates (LPs) and flipped characters. The resulting ALPR approach achieved impressive results in two datasets. First, in the SSIG dataset, composed of 2,000 frames from 101 vehicle videos, our system achieved a recognition rate of 93.53% and 47 Frames Per Second (FPS), performing better than both Sighthound and OpenALPR commercial systems (89.80% and 93.03%, respectively) and considerably outperforming previous results (81.80%). Second, targeting a more realistic scenario, we introduce a larger public dataset, called UFPR-ALPR dataset, designed to ALPR. This dataset contains 150 videos and 4,500 frames captured when both camera and vehicles are moving and also contains different types of vehicles (cars, motorcycles, buses and trucks). In our proposed dataset, the trial versions of commercial systems achieved recognition rates below 70%. On the other hand, our system performed better, with recognition rate of 78.33% and 35 FPS.
Attention mechanism has been used as an ancillary means to help RNN or CNN. However, the Transformer (Vaswani et al., 2017) recently recorded the state-of-the-art performance in machine translation with a dramatic reduction in training time by solely using attention. Motivated by the Transformer, Directional Self Attention Network (Shen et al., 2017), a fully attention-based sentence encoder, was proposed. It showed good performance with various data by using forward and backward directional information in a sentence. But in their study, not considered at all was the distance between words, an important feature when learning the local dependency to help understand the context of input text. We propose Distance-based Self-Attention Network, which considers the word distance by using a simple distance mask in order to model the local dependency without losing the ability of modeling global dependency which attention has inherent. Our model shows good performance with NLI data, and it records the new state-of-the-art result with SNLI data. Additionally, we show that our model has a strength in long sentences or documents.