We present a new dataset condensation framework termed Squeeze, Recover and Relabel (SRe$^2$L) that decouples the bilevel optimization of model and synthetic data during training, to handle varying scales of datasets, model architectures and image resolutions for effective dataset condensation. The proposed method demonstrates flexibility across diverse dataset scales and exhibits multiple advantages in terms of arbitrary resolutions of synthesized images, low training cost and memory consumption with high-resolution training, and the ability to scale up to arbitrary evaluation network architectures. Extensive experiments are conducted on Tiny-ImageNet and full ImageNet-1K datasets. Under 50 IPC, our approach achieves the highest 42.5% and 60.8% validation accuracy on Tiny-ImageNet and ImageNet-1K, outperforming all previous state-of-the-art methods by margins of 14.5% and 32.9%, respectively. Our approach also outperforms MTT by approximately 52$\times$ (ConvNet-4) and 16$\times$ (ResNet-18) faster in speed with less memory consumption of 11.6$\times$ and 6.4$\times$ during data synthesis. Our code and condensed datasets of 50, 200 IPC with 4K recovery budget are available at //zeyuanyin.github.io/projects/SRe2L/.
We investigate a framework for binary image denoising via restricted Boltzmann machines (RBMs) that introduces a denoising objective in quadratic unconstrained binary optimization (QUBO) form and is well-suited for quantum annealing. The denoising objective is attained by balancing the distribution learned by a trained RBM with a penalty term for derivations from the noisy image. We derive the statistically optimal choice of the penalty parameter assuming the target distribution has been well-approximated, and further suggest an empirically supported modification to make the method robust to that idealistic assumption. We also show under additional assumptions that the denoised images attained by our method are, in expectation, strictly closer to the noise-free images than the noisy images are. While we frame the model as an image denoising model, it can be applied to any binary data. As the QUBO formulation is well-suited for implementation on quantum annealers, we test the model on a D-Wave Advantage machine, and also test on data too large for current quantum annealers by approximating QUBO solutions through classical heuristics.
This paper reports on the development of \textbf{a novel style guided diffusion model (SGDiff)} which overcomes certain weaknesses inherent in existing models for image synthesis. The proposed SGDiff combines image modality with a pretrained text-to-image diffusion model to facilitate creative fashion image synthesis. It addresses the limitations of text-to-image diffusion models by incorporating supplementary style guidance, substantially reducing training costs, and overcoming the difficulties of controlling synthesized styles with text-only inputs. This paper also introduces a new dataset -- SG-Fashion, specifically designed for fashion image synthesis applications, offering high-resolution images and an extensive range of garment categories. By means of comprehensive ablation study, we examine the application of classifier-free guidance to a variety of conditions and validate the effectiveness of the proposed model for generating fashion images of the desired categories, product attributes, and styles. The contributions of this paper include a novel classifier-free guidance method for multi-modal feature fusion, a comprehensive dataset for fashion image synthesis application, a thorough investigation on conditioned text-to-image synthesis, and valuable insights for future research in the text-to-image synthesis domain. The code and dataset are available at: \url{//github.com/taited/SGDiff}.
We introduce a new, open-source computational general relativity framework for the Wolfram Language called Gravitas, which boasts a number of novel and distinctive features as compared to the many pre-existing computational and numerical relativity frameworks currently available within the open-source community. These include, but are not limited to: seamless integration of its powerful symbolic and numerical subsystems, and, by extension, seamless transition between analytic/continuous representations and numerical/discrete representations of arbitrary spacetime geometries; highly modular, general and extensible representations of spacetime geometries, spacetime topologies, gauge conditions, coordinate systems, matter fields, evolution equations and initial data; ability to set up and run complex numerical relativity simulations, and to perform 2D and 3D visualizations, symbolic computations and numerical analysis (including the extraction of gravitational wave signals) on the resulting data, all from within a single notebook environment; and a totally-unstructured adaptive refinement scheme based on hypergraph rewriting, allowing for exceedingly efficient discretization and numerical evolution of Cauchy initial data for a wide range of challenging computational problems involving strong relativistic field dynamics. In this first in a series of two articles covering the framework, we focus on the design and capabilities of Gravitas's symbolic subsystem, including its general and flexible handling of arbitrary geometries parametrized by arbitrary curvilinear coordinate systems (along with an in-built library of standard metrics and coordinate conditions), as well as its various high-level tensor calculus and differential geometry features. We proceed to show how this subsystem can be used to solve the Einstein field equations both analytically and numerically.
Mixed Reality (MR) and Virtual Reality (VR) simulations are hampered by requirements for hand controllers or attempts to perseverate in use of two-dimensional computer interface paradigms from the 1980s. From our efforts to produce more naturalistic interactions for combat medic training for the military, USC has developed an open-source toolkit that enables direct hand controlled responsive interactions that is sensor independent and can function with depth sensing cameras, webcams or sensory gloves. Natural approaches we have examined include the ability to manipulate virtual smart objects in a similar manner to how they are used in the real world. From this research and review of current literature, we have discerned several best approaches for hand-based human computer interactions which provide intuitive, responsive, useful, and low frustration experiences for VR users.
The Segment Anything Model (SAM) serves as a fundamental model for semantic segmentation and demonstrates remarkable generalization capabilities across a wide range of downstream scenarios. In this empirical study, we examine SAM's robustness and zero-shot generalizability in the field of robotic surgery. We comprehensively explore different scenarios, including prompted and unprompted situations, bounding box and points-based prompt approaches, as well as the ability to generalize under corruptions and perturbations at five severity levels. Additionally, we compare the performance of SAM with state-of-the-art supervised models. We conduct all the experiments with two well-known robotic instrument segmentation datasets from MICCAI EndoVis 2017 and 2018 challenges. Our extensive evaluation results reveal that although SAM shows remarkable zero-shot generalization ability with bounding box prompts, it struggles to segment the whole instrument with point-based prompts and unprompted settings. Furthermore, our qualitative figures demonstrate that the model either failed to predict certain parts of the instrument mask (e.g., jaws, wrist) or predicted parts of the instrument as wrong classes in the scenario of overlapping instruments within the same bounding box or with the point-based prompt. In fact, SAM struggles to identify instruments in complex surgical scenarios characterized by the presence of blood, reflection, blur, and shade. Additionally, SAM is insufficiently robust to maintain high performance when subjected to various forms of data corruption. We also attempt to fine-tune SAM using Low-rank Adaptation (LoRA) and propose SurgicalSAM, which shows the capability in class-wise mask prediction without prompt. Therefore, we can argue that, without further domain-specific fine-tuning, SAM is not ready for downstream surgical tasks.
Adversarial camouflage has garnered attention for its ability to attack object detectors from any viewpoint by covering the entire object's surface. However, universality and robustness in existing methods often fall short as the transferability aspect is often overlooked, thus restricting their application only to a specific target with limited performance. To address these challenges, we present Adversarial Camouflage for Transferable and Intensive Vehicle Evasion (ACTIVE), a state-of-the-art physical camouflage attack framework designed to generate universal and robust adversarial camouflage capable of concealing any 3D vehicle from detectors. Our framework incorporates innovative techniques to enhance universality and robustness: a refined texture rendering that enables common texture application to different vehicles without being constrained to a specific texture map, a novel stealth loss that renders the vehicle undetectable, and a smooth and camouflage loss to enhance the naturalness of the adversarial camouflage. Our extensive experiments on 15 different models show that ACTIVE consistently outperforms existing works on various public detectors, including the latest YOLOv7. Notably, our universality evaluations reveal promising transferability to other vehicle classes, tasks (segmentation models), and the real world, not just other vehicles.
We present csSampling, an R package for estimation of Bayesian models for data collected from complex survey samples. csSampling combines functionality from the probabilistic programming language Stan (via the rstan and brms R packages) and the handling of complex survey data from the survey R package. Under this approach, the user creates a survey-weighted model in brms or provides a custom weighted model via rstan. Survey design information is provided via the svydesign function of the survey package. The cs_sampling function of csSampling estimates the weighted stan model and provides an asymptotic covariance correction for model mis-specification due to using survey sampling weights as plug-in values in the likelihood. This is often known as a ``design effect'' which is the ratio between the variance from a complex survey sample and a simple random sample of the same size. The resulting adjusted posterior draws can then be used for the usual Bayesian inference while also achieving frequentist properties of asymptotic consistency and correct uncertainty (e.g. coverage).
In this paper, we present a new family of cross $Z$-complementary pairs (CZCPs) based on generalized Boolean functions and two roots of unity. Our key idea is to consider an arbitrary partition of the set $\{1,2,\cdots, n\}$ with two subsets corresponding to two given roots of unity for which two truncated sequences of new alphabet size determined by the two roots of unity are obtained. We show that these two truncated sequences form a new $q$-ary CZCP with flexible sequence length and large zero-correlation zone width. Furthermore, we derive an enumeration formula by considering the Stirling number of the second kind for the partitions and show that the number of constructed CZCPs increases significantly compared to the existing works.
Multilevel lattice codes, such as the associated to Constructions $C$, $\overline{D}$, D and D', have relevant applications in communications. In this paper, we investigate some properties of lattices obtained via Constructions D and D' from $q$-ary linear codes. Connections with Construction A, generator matrices, expressions and bounds for the lattice volume and minimum distances are derived. Extensions of previous results regarding construction and decoding of binary and $p$-ary linear codes ($p$ prime) are also presented.
We present CoDEx, a set of knowledge graph completion datasets extracted from Wikidata and Wikipedia that improve upon existing knowledge graph completion benchmarks in scope and level of difficulty. In terms of scope, CoDEx comprises three knowledge graphs varying in size and structure, multilingual descriptions of entities and relations, and tens of thousands of hard negative triples that are plausible but verified to be false. To characterize CoDEx, we contribute thorough empirical analyses and benchmarking experiments. First, we analyze each CoDEx dataset in terms of logical relation patterns. Next, we report baseline link prediction and triple classification results on CoDEx for five extensively tuned embedding models. Finally, we differentiate CoDEx from the popular FB15K-237 knowledge graph completion dataset by showing that CoDEx covers more diverse and interpretable content, and is a more difficult link prediction benchmark. Data, code, and pretrained models are available at //bit.ly/2EPbrJs.