SynSpill: Improved Industrial Spill Detection With Synthetic Data

Abstract

Large-scale Vision-Language Models (VLMs) have transformed general-purpose visual recognition through strong zero-shot capabilities. However, their performance degrades significantly in niche, safety-critical domains such as industrial spill detection, where hazardous events are rare, sensitive, and difficult to annotate.

This scarcity—driven by privacy concerns, data sensitivity, and the infrequency of real incidents—renders conventional fine-tuning of detectors infeasible for most industrial settings.

We address this challenge by introducing a scalable framework centered on a high-quality synthetic data generation pipeline. We demonstrate that this synthetic corpus enables effective Parameter-Efficient Fine-Tuning (PEFT) of VLMs and substantially boosts the performance of state-of-the-art object detectors such as YOLO and DETR.

Notably, in the absence of synthetic data (SynSpill dataset), VLMs still generalize better to unseen spill scenarios than these detectors. When SynSpill is used, both VLMs and detectors achieve marked improvements, with their performance becoming comparable.

Our results underscore that a high-fidelity synthetic data is a powerful means to bridge the domain gap in safety-critical applications. The combination of synthetic generation and lightweight adaptation offers a cost-effective, scalable pathway for deploying vision systems in industrial environments where real data is scarce/impractical to obtain.

Key Highlights

Breakthrough innovations in synthetic data generation and model adaptation for industrial safety applications

🏭

Industrial Safety Focus

First comprehensive framework for automated industrial spill detection using computer vision

🎨

Synthetic Data Pipeline

Novel AnomalInfusion technique using Stable Diffusion XL + IP adapters for realistic spill generation

🧠

VLM Adaptation

Parameter-efficient fine-tuning with LoRA for domain specialization without full model retraining

📊

Dual Approach

Benefits both Vision-Language Models and traditional object detectors (YOLO, DETR)

⚡

Zero-Shot Capability

Strong generalization to unseen spill scenarios even without synthetic data training

🎯

Real-World Validation

Tested and validated on actual industrial CCTV footage from manufacturing facilities

🚀Revolutionizing Industrial Safety with AI

System Architecture

End-to-end pipeline for industrial spill detection using synthetic data and Vision-Language Models

📹

Input Stage

CCTV Feed

Text Prompt

🧠

VLM Processing

PEFT Training

LoRA Adaptation

Confidence Scoring

🚨

Alert System

Detection Results

Alert Trigger

AnomalInfusion Pipeline

1

Stable Diffusion XL

Base image generation with controlled prompts

2

IP Adapters

Style and content conditioning for realism

3

Inpainting

Precise spill placement and anomaly insertion

Data Flow

Real ImagesScarce

Synthetic DataAbundant

Trained ModelReady

Methodology Comparison

Comprehensive evaluation of different approaches to industrial spill detection

mAP: 42.1%

Zero-Shot VLM

Baseline performance without any adaptation

No training required
General capabilities
Limited domain knowledge

mAP: 63.4%

PEFT + SynSpill

Parameter-efficient fine-tuning with synthetic data

LoRA adaptation
Synthetic data training
Domain specialization

mAP: 39.8%

Traditional Detectors

YOLO/DETR baseline without synthetic data

Object detection focus
Limited generalization
Real-world challenges

mAP: 61.2%

Detectors + SynSpill

Traditional detectors trained with synthetic data

Improved accuracy
Better robustness
Enhanced performance

PEFT Training Process

🎨

Generate

Create synthetic spill images using diffusion models

🏷️

Annotate

Automatically label synthetic data with ground truth

🔧

Adapt

Fine-tune models using LoRA on synthetic dataset

✅

Deploy

Deploy adapted model for real-world detection

💡 Key Innovation

Our synthetic data generation pipeline bridges the gap between scarce real-world data and the need for robust industrial spill detection, enabling effective model adaptation with minimal computational overhead.

Experimental Results

Comprehensive evaluation demonstrates significant improvements with synthetic data

🎯

84%

Detection Accuracy

Best mAP@50 with Qwen-VL 32B + LoRA

📊

2,000

Synthetic Images

High-quality samples generated via pipeline

🔧

PEFT

Model Adaptation

Parameter-efficient fine-tuning approach

⚡

VLM

State-of-the-Art

Outperforms fine-tuned detectors

Quantitative Comparison

Method	Public Dataset	Proprietary Dataset
Qwen-VL 7B (Zero-Shot)	35%	15%
Qwen-VL 32B (Zero-Shot)	42%	24%
YOLOv11 (Fine-tuned)	81%	64%
RF-DETR (Fine-tuned)	83%	67%
Qwen-VL 7B + LoRA (V+L)	78%	66%
Qwen-VL 32B + LoRA (V+L)Best	84%	71%

🔍 Key Findings

PEFT VLMs achieve state-of-the-art performance - Qwen-VL 32B + LoRA (V+L) outperforms all baselines with 84% mAP@50

Synthetic data enables effective adaptation - 2,000 synthetic images bridge the domain gap for industrial spill detection

Joint vision-language adaptation optimal - LoRA (V+L) provides the best performance across both datasets

🚀 Impact

1

First scalable solution for industrial spill detection using synthetic data

2

Enables deployment in data-scarce industrial environments

3

Provides cost-effective alternative to manual monitoring

Research Collaboration

Collaborative effort between academic researchers and industry experts to advance Smart Sensing

University of Central Florida

Visit Website

Siemens Energy

Visit Website

🤝 Academic-Industry Collaboration

This work represents a successful collaboration between academic research and industrial application, combining cutting-edge computer vision research with real-world safety requirements in industrial environments. Our partnership ensures that research innovations translate directly into practical solutions for industrial safety.

Get In Touch

Research Inquiries

Citation

If you use our work, please cite our paper

📝 BibTeX Citation

@inproceedings{baranwal2025synspill,
    title={SynSpill: Improved Industrial Spill Detection With Synthetic Data},
    author={Aaditya Baranwal, Abdul Mueez, Jason Voelker, Guneet Bhatia and Sruti Vyas},
    booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision - Workshops (ICCV-W)},
    year={2025},
    url={https://arxiv.org/abs/2508.10171}
}