We introduce a new consistency-based approach for defining and solving nonnegative/positive matrix and tensor completion problems. The novelty of the framework is that instead of artificially making the problem well-posed in the form of an application-arbitrary optimization problem, e.g., minimizing a bulk structural measure such as rank or norm, we show that a single property/constraint: preserving unit-scale consistency, guarantees the existence of both a solution and, under relatively weak support assumptions, uniqueness. The framework and solution algorithms also generalize directly to tensors of arbitrary dimensions while maintaining computational complexity that is linear in problem size for fixed dimension d. In the context of recommender system (RS) applications, we prove that two reasonable properties that should be expected to hold for any solution to the RS problem are sufficient to permit uniqueness guarantees to be established within our framework. This is remarkable because it obviates the need for heuristic-based statistical or AI methods despite what appear to be distinctly human/subjective variables at the heart of the problem. Key theoretical contributions include a general unit-consistent tensor-completion framework with proofs of its properties, e.g., consensus-order and fairness, and algorithms with optimal runtime and space complexities, e.g., O(1) term-completion with preprocessing complexity that is linear in the number of known terms of the matrix/tensor. From a practical perspective, the seamless ability of the framework to generalize to exploit high-dimensional structural relationships among key state variables, e.g., user and product attributes, offers a means for extracting significantly more information than is possible for alternative methods that cannot generalize beyond direct user-product relationships.
Neural-ODE parameterize a differential equation using continuous depth neural network and solve it using numerical ODE-integrator. These models offer a constant memory cost compared to models with discrete sequence of hidden layers in which memory cost increases linearly with the number of layers. In addition to memory efficiency, other benefits of neural-ode include adaptability of evaluation approach to input, and flexibility to choose numerical precision or fast training. However, despite having all these benefits, it still has some limitations. We identify the ODE-integrator (also called ODE-solver) as the weakest link in the chain as it may have stability, consistency and convergence (CCS) issues and may suffer from slower convergence or may not converge at all. We propose a first-order Nesterov's accelerated gradient (NAG) based ODE-solver which is proven to be tuned vis-a-vis CCS conditions. We empirically demonstrate the efficacy of our approach by training faster, while achieving better or comparable performance against neural-ode employing other fixed-step explicit ODE-solvers as well discrete depth models such as ResNet in three different tasks including supervised classification, density estimation, and time-series modelling.
We present online deterministic algorithms for minimum coloring and minimum dominating set problems in the context of geometric intersection graphs. We consider a graph parameter: the independent kissing number $\zeta$, which is a number equal to `the size of the largest induced star in the graph $-1$'. For a graph with an independent kissing number at most $\zeta$, we show that the famous greedy algorithm achieves an optimal competitive ratio of $\zeta$ for the minimum dominating set and the minimum independent dominating set problems. However, for the minimum connected dominating set problem, we obtain a competitive ratio of at most $2\zeta$. To complement this, we prove that for the minimum connected dominating set problem, any deterministic online algorithm has a competitive ratio of at least $2(\zeta-1)$ for the geometric intersection graph of translates of a convex object in $\mathbb{R}^2$. Next, for the minimum coloring problem, we obtain algorithms having a competitive ratio of $O\left({\zeta'}{\log m}\right)$ for geometric intersection graphs of bounded scaled $\alpha$-fat objects in $\mathbb{R}^d$ having widths in the interval $[1,m]$, where $\zeta'$ is the independent kissing number of the geometric intersection graph of bounded scaled $\alpha$-fat objects having widths in the interval $[1,2]$. Finally, we investigate the value of $\zeta$ for geometric intersection graphs of various families of geometric objects.
This work introduces BRILLsson, a novel binary neural network-based representation learning model for a broad range of non-semantic speech tasks. We train the model with knowledge distillation from a large and real-valued TRILLsson model with only a fraction of the dataset used to train TRILLsson. The resulting BRILLsson models are only 2MB in size with a latency less than 8ms, making them suitable for deployment in low-resource devices such as wearables. We evaluate BRILLsson on eight benchmark tasks (including but not limited to spoken language identification, emotion recognition, health condition diagnosis, and keyword spotting), and demonstrate that our proposed ultra-light and low-latency models perform as well as large-scale models.
A major concern in using deep learning based generative models for document-grounded dialogs is the potential generation of responses that are not \textit{faithful} to the underlying document. Existing automated metrics used for evaluating the faithfulness of response with respect to the grounding document measure the degree of similarity between the generated response and the document's content. However, these automated metrics are far from being well aligned with human judgments. Therefore, to improve the measurement of faithfulness, we propose a new metric that utilizes (Conditional) Point-wise Mutual Information (PMI) between the generated response and the source document, conditioned on the dialogue. PMI quantifies the extent to which the document influences the generated response -- with a higher PMI indicating a more faithful response. We build upon this idea to create a new decoding technique that incorporates PMI into the response generation process to predict more faithful responses. Our experiments on the BEGIN benchmark demonstrate an improved correlation of our metric with human evaluation. We also show that our decoding technique is effective in generating more faithful responses when compared to standard decoding techniques on a set of publicly available document-grounded dialog datasets.
Supervised learning methods have shown effectiveness in estimating spatial acoustic parameters such as time difference of arrival, direct-to-reverberant ratio and reverberation time. However, they still suffer from the simulation-to-reality generalization problem due to the mismatch between simulated and real-world acoustic characteristics and the deficiency of annotated real-world data. To this end, this work proposes a self-supervised method that takes full advantage of unlabeled data for spatial acoustic parameter estimation. First, a new pretext task, i.e. cross-channel signal reconstruction (CCSR), is designed to learn a universal spatial acoustic representation from unlabeled multi-channel microphone signals. We mask partial signals of one channel and ask the model to reconstruct them, which makes it possible to learn spatial acoustic information from unmasked signals and extract source information from the other microphone channel. An encoder-decoder structure is used to disentangle the two kinds of information. By fine-tuning the pre-trained spatial encoder with a small annotated dataset, this encoder can be used to estimate spatial acoustic parameters. Second, a novel multi-channel audio Conformer (MC-Conformer) is adopted as the encoder model architecture, which is suitable for both the pretext and downstream tasks. It is carefully designed to be able to capture the local and global characteristics of spatial acoustics exhibited in the time-frequency domain. Experimental results of five acoustic parameter estimation tasks on both simulated and real-world data show the effectiveness of the proposed method. To the best of our knowledge, this is the first self-supervised learning method in the field of spatial acoustic representation learning and multi-channel audio signal processing.
Many important tasks of large-scale recommender systems can be naturally cast as testing multiple linear forms for noisy matrix completion. These problems, however, present unique challenges because of the subtle bias-and-variance tradeoff of and an intricate dependence among the estimated entries induced by the low-rank structure. In this paper, we develop a general approach to overcome these difficulties by introducing new statistics for individual tests with sharp asymptotics both marginally and jointly, and utilizing them to control the false discovery rate (FDR) via a data splitting and symmetric aggregation scheme. We show that valid FDR control can be achieved with guaranteed power under nearly optimal sample size requirements using the proposed methodology. Extensive numerical simulations and real data examples are also presented to further illustrate its practical merits.
We present a novel formulation of structural design optimization problems specifically tailored to be solved by quantum annealing (QA). Structural design optimization aims to find the best, i.e., material-efficient yet high-performance, configuration of a structure. To this end, computational optimization strategies can be employed, where a recently evolving strategy based on quantum mechanical effects is QA. This approach requires the optimization problem to be present, e.g., as a quadratic unconstrained binary optimization (QUBO) model. Thus, we develop a novel formulation of the optimization problem. The latter typically involves an analysis model for the component. Here, we use energy minimization principles that govern the behavior of structures under applied loads. This allows us to state the optimization problem as one overall minimization problem. Next, we map this to a QUBO problem that can be immediately solved by QA. We validate the proposed approach using a size optimization problem of a compound rod under self-weight loading. To this end, we develop strategies to account for the limitations of currently available hardware and find that the presented formulation is suitable for solving structural design optimization problems through QA and, for small-scale problems, already works on today's hardware.
The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving.
We introduce a multi-task setup of identifying and classifying entities, relations, and coreference clusters in scientific articles. We create SciERC, a dataset that includes annotations for all three tasks and develop a unified framework called Scientific Information Extractor (SciIE) for with shared span representations. The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. Experiments show that our multi-task model outperforms previous models in scientific information extraction without using any domain-specific features. We further show that the framework supports construction of a scientific knowledge graph, which we use to analyze information in scientific literature.
We introduce a generic framework that reduces the computational cost of object detection while retaining accuracy for scenarios where objects with varied sizes appear in high resolution images. Detection progresses in a coarse-to-fine manner, first on a down-sampled version of the image and then on a sequence of higher resolution regions identified as likely to improve the detection accuracy. Built upon reinforcement learning, our approach consists of a model (R-net) that uses coarse detection results to predict the potential accuracy gain for analyzing a region at a higher resolution and another model (Q-net) that sequentially selects regions to zoom in. Experiments on the Caltech Pedestrians dataset show that our approach reduces the number of processed pixels by over 50% without a drop in detection accuracy. The merits of our approach become more significant on a high resolution test set collected from YFCC100M dataset, where our approach maintains high detection performance while reducing the number of processed pixels by about 70% and the detection time by over 50%.