Let $\mathcal{T}$ be a set of $n$ flat (planar) semi-algebraic regions in $\mathbb{R}^3$ of constant complexity (e.g., triangles, disks), which we call plates. We wish to preprocess $\mathcal{T}$ into a data structure so that for a query object $\gamma$, which is also a plate, we can quickly answer various intersection queries, such as detecting whether $\gamma$ intersects any plate of $\mathcal{T}$, reporting all the plates intersected by $\gamma$, or counting them. We also consider two simpler cases of this general setting: (i) the input objects are plates and the query objects are constant-degree parametrized algebraic arcs in $\mathbb{R}^3$ (arcs, for short), or (ii) the input objects are arcs and the query objects are plates in $\mathbb{R}^3$. Besides being interesting in their own right, the data structures for these two special cases form the building blocks for handling the general case. By combining the polynomial-partitioning technique with additional tools from real algebraic geometry, we present many different data structures for intersection queries, which also provide trade-offs between their size and query time. The performance of these data structures depends on the complexity of the input and query objects. For example, if $\mathcal{T}$ is a set of plates and the query objects are algebraic arcs, we obtain a data structure that uses $O^*(n^{4/3})$ storage (where the $O^*(\cdot)$ notation hides subpolynomial factors) and answers an arc-intersection query in $O^*(n^{2/3})$ time. Alternatively, for a parameter $s\in [n^{4/3}, n^t]$ where $t\ge 3$ is the number of real parameters needed to specify a query arc, the query time can be decreased to $O^*((n^t/s)^{\tfrac{2}{3(t-1)}})$ by increasing the storage to $O^*(s)$.
A parametric class of trust-region algorithms for unconstrained nonconvex optimization is considered where the value of the objective function is never computed. The class contains a deterministic version of the first-order Adagrad method typically used for minimization of noisy function, but also allows the use of (possibly approximate) second-order information when available. The rate of convergence of methods in the class is analyzed and is shown to be identical to that known for first-order optimization methods using both function and gradients values, recovering existing results for purely-first order variants and improving the explicit dependence on problem dimension. This rate is shown to be essentially sharp. A new class of methods is also presented, for which a slightly worse and essentially sharp complexity result holds. Limited numerical experiments show that the new methods' performance may be comparable to that of standard steepest descent, despite using significantly less information, and that this performance is relatively insensitive to noise.
Measurement of interaction quality is a critical task for the improvement of spoken dialog systems. Existing approaches to dialog quality estimation either focus on evaluating the quality of individual turns, or collect dialog-level quality measurements from end users immediately following an interaction. In contrast to these approaches, we introduce a new dialog-level annotation workflow called Dialog Quality Annotation (DQA). DQA expert annotators evaluate the quality of dialogs as a whole, and also label dialogs for attributes such as goal completion and user sentiment. In this contribution, we show that: (i) while dialog quality cannot be completely decomposed into dialog-level attributes, there is a strong relationship between some objective dialog attributes and judgments of dialog quality; (ii) for the task of dialog-level quality estimation, a supervised model trained on dialog-level annotations outperforms methods based purely on aggregating turn-level features; and (iii) the proposed evaluation model shows better domain generalization ability compared to the baselines. On the basis of these results, we argue that having high-quality human-annotated data is an important component of evaluating interaction quality for large industrial-scale voice assistant platforms.
With the advance of AI, road object detection has been a prominent topic in computer vision, mostly using perspective cameras. Fisheye lens provides omnidirectional wide coverage for using fewer cameras to monitor road intersections, however with view distortions. To our knowledge, there is no existing open dataset prepared for traffic surveillance on fisheye cameras. This paper introduces an open FishEye8K benchmark dataset for road object detection tasks, which comprises 157K bounding boxes across five classes (Pedestrian, Bike, Car, Bus, and Truck). In addition, we present benchmark results of State-of-The-Art (SoTA) models, including variations of YOLOv5, YOLOR, YOLO7, and YOLOv8. The dataset comprises 8,000 images recorded in 22 videos using 18 fisheye cameras for traffic monitoring in Hsinchu, Taiwan, at resolutions of 1080$\times$1080 and 1280$\times$1280. The data annotation and validation process were arduous and time-consuming, due to the ultra-wide panoramic and hemispherical fisheye camera images with large distortion and numerous road participants, particularly people riding scooters. To avoid bias, frames from a particular camera were assigned to either the training or test sets, maintaining a ratio of about 70:30 for both the number of images and bounding boxes in each class. Experimental results show that YOLOv8 and YOLOR outperform on input sizes 640$\times$640 and 1280$\times$1280, respectively. The dataset will be available on GitHub with PASCAL VOC, MS COCO, and YOLO annotation formats. The FishEye8K benchmark will provide significant contributions to the fisheye video analytics and smart city applications.
Recently it has been shown that using diffusion models for inverse problems can lead to remarkable results. However, these approaches require a closed-form expression of the degradation model and can not support complex degradations. To overcome this limitation, we propose a method (INDigo) that combines invertible neural networks (INN) and diffusion models for general inverse problems. Specifically, we train the forward process of INN to simulate an arbitrary degradation process and use the inverse as a reconstruction process. During the diffusion sampling process, we impose an additional data-consistency step that minimizes the distance between the intermediate result and the INN-optimized result at every iteration, where the INN-optimized image is composed of the coarse information given by the observed degraded image and the details generated by the diffusion process. With the help of INN, our algorithm effectively estimates the details lost in the degradation process and is no longer limited by the requirement of knowing the closed-form expression of the degradation model. Experiments demonstrate that our algorithm obtains competitive results compared with recently leading methods both quantitatively and visually. Moreover, our algorithm performs well on more complex degradation models and real-world low-quality images.
We tackle the problem of answering regular path queries over graph databases, while simultaneously returning the paths which witness our answers. We study this problem under the arbitrary, all-shortest, trail, and simple-path semantics, which are the common path matching semantics considered in the research literature, and are also prescribed by the upcoming ISO Graph Query Language (GQL) and SQL/PGQ standards. In the paper we present how the classical product construction from the theoretical literature on graph querying can be modified in order to allow returning paths. We then discuss how this approach can be implemented, both when the data resides in a classical B+ tree structure, and when it is assumed to be available in main memory via a compressed sparse row representation. Finally, we perform a detailed experimental study of different trade-offs these algorithms have over real world queries, and compare them with existing approaches.
A family $\mathcal{F}$ of sets satisfies the $(p,q)$-property if among every $p$ members of $\mathcal{F}$, some $q$ can be pierced by a single point. The celebrated $(p,q)$-theorem of Alon and Kleitman asserts that for any $p \geq q \geq d+1$, any family $\mathcal{F}$ of compact convex sets in $\mathbb{R}^d$ that satisfies the $(p,q)$-property can be pierced by a finite number $c(p,q,d)$ of points. A similar theorem with respect to piercing by $(d-1)$-dimensional flats, called $(d-1)$-transversals, was obtained by Alon and Kalai. In this paper we prove the following result, which can be viewed as an $(\aleph_0,k+2)$-theorem with respect to $k$-transversals: Let $\mathcal{F}$ be an infinite family of closed balls in $\mathbb{R}^d$, and let $0 \leq k < d$. If among every $\aleph_0$ elements of $\mathcal{F}$, some $k+2$ can be pierced by a $k$-dimensional flat, then $\mathcal{F}$ can be pierced by a finite number of $k$-dimensional flats. The same result holds also for a wider class of families which consist of \emph{near-balls}, to be defined below. This is the first $(p,q)$-theorem in which the assumption is weakened to an $(\infty,\cdot)$ assumption. Our proofs combine geometric and topological tools.
We develop a novel, general and computationally efficient framework, called Divide and Conquer Dynamic Programming (DCDP), for localizing change points in time series data with high-dimensional features. DCDP deploys a class of greedy algorithms that are applicable to a broad variety of high-dimensional statistical models and can enjoy almost linear computational complexity. We investigate the performance of DCDP in three commonly studied change point settings in high dimensions: the mean model, the Gaussian graphical model, and the linear regression model. In all three cases, we derive non-asymptotic bounds for the accuracy of the DCDP change point estimators. We demonstrate that the DCDP procedures consistently estimate the change points with sharp, and in some cases, optimal rates while incurring significantly smaller computational costs than the best available algorithms. Our findings are supported by extensive numerical experiments on both synthetic and real data.
Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.
Benefit from the quick development of deep learning techniques, salient object detection has achieved remarkable progresses recently. However, there still exists following two major challenges that hinder its application in embedded devices, low resolution output and heavy model weight. To this end, this paper presents an accurate yet compact deep network for efficient salient object detection. More specifically, given a coarse saliency prediction in the deepest layer, we first employ residual learning to learn side-output residual features for saliency refinement, which can be achieved with very limited convolutional parameters while keep accuracy. Secondly, we further propose reverse attention to guide such side-output residual learning in a top-down manner. By erasing the current predicted salient regions from side-output features, the network can eventually explore the missing object parts and details which results in high resolution and accuracy. Experiments on six benchmark datasets demonstrate that the proposed approach compares favorably against state-of-the-art methods, and with advantages in terms of simplicity, efficiency (45 FPS) and model size (81 MB).
Object detection is an important and challenging problem in computer vision. Although the past decade has witnessed major advances in object detection in natural scenes, such successes have been slow to aerial imagery, not only because of the huge variation in the scale, orientation and shape of the object instances on the earth's surface, but also due to the scarcity of well-annotated datasets of objects in aerial scenes. To advance object detection research in Earth Vision, also known as Earth Observation and Remote Sensing, we introduce a large-scale Dataset for Object deTection in Aerial images (DOTA). To this end, we collect $2806$ aerial images from different sensors and platforms. Each image is of the size about 4000-by-4000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. These DOTA images are then annotated by experts in aerial image interpretation using $15$ common object categories. The fully annotated DOTA images contains $188,282$ instances, each of which is labeled by an arbitrary (8 d.o.f.) quadrilateral To build a baseline for object detection in Earth Vision, we evaluate state-of-the-art object detection algorithms on DOTA. Experiments demonstrate that DOTA well represents real Earth Vision applications and are quite challenging.