Adversarial Image Detection Using Deep Learning in Agricultural Contexts

Preprint 2025
Md Nazmul Kabir Sikder1, Mehmet Oguz Yardimci2, Trey Ward3, Shubham Laxmikant Deshmukh2, Feras A. Batarseh3
1Virginia Tech, Commonwealth Cyber Initiative (CCI), Arlington, VA, USA
2Virginia Tech, Department of Computer Science, Arlington, VA, USA
3Virginia Tech, Department of Biological Systems Engineering, Arlington, VA, USA
đź’» Code (GitHub)

Abstract

The gradual digitalization of agricultural systems through data-driven techniques has reshaped production growth. However, this transformation has also introduced new vulnerabilities, exposing these systems to cyber threats. While numerous domain-specific attack detection methods have been proposed, there is a lack of comprehensive cybersecurity frameworks tailored for agriculture, particularly as AI becomes increasingly integrated into these systems. To address this gap, we propose a novel framework capable of classifying high-fidelity adversarial plant images. This supervised approach not only detects attacks but also able to identify their specific source models. We employ state-of-the-art GAN architectures, including StyleGAN2 and StyleGAN3, alongside powerful diffusion models such as DS8, BLIP, and Pix2Pix, to produce diverse adversarial images via both image-to-image and text-to-image generation. These images are then used to train a classifier capable of distinguishing among all generation classes. Our experiments include comparative classification tasks, and logarithmic accuracy degradation with increasing class count. This demonstrates the scalability of the framework, allowing additional computer vision tasks to be incorporated without compromising performance. As GAN and diffusion models continue to advance, our framework is designed to evolve, ensuring its generation and detection capabilities remain robust against emerging threats.

Overview

We target cyber-biosecurity risks in Agriculture 4.0 by detecting and attributing synthetic plant images introduced by adversaries. Our framework generates high-fidelity fakes with GANs (StyleGAN2/3) and diffusion pipelines (Pix2Pix, BLIP, DS8-inpainting), then trains a classifier to perform: (i) binary health detection, (ii) 3-way source detection (Real / GAN / Diffusion), and (iii) detailed 10-way crop–health–generator attribution.

Simplified pipeline: Original & Synthetic images feed a classifier. High-level framework overview diagram.

Left: simplified adversarial image generation → detection pipeline. Right: framework overview across crops and generators.

StyleGAN models are trained per crop–health class to improve fidelity under limited data. Diffusion pipelines preserve scene layout while editing only leaf regions, producing subtle yet realistic perturbations. The downstream classifier (EfficientNet-B0 / ResNet-50 / CLIP-ViT) learns generator fingerprints in addition to crop and health cues.

Methods

GAN synthesis. We adopt StyleGAN2-ADA and StyleGAN3 to synthesize class-conditional leaves (Apple/Tomato/Maize Ă— Healthy/Unhealthy). Models are monitored with FID over training kimg; best checkpoints are retained for dataset creation and analysis.

Diffusion pipelines. Three complementary image-to-image strategies are used: (1) Instruct-Pix2Pix for prompt-driven edits with strict structure preservation; (2) BLIP-Diffusion for concept-conditioned edits aligned to “leaf”; and (3) DS8 inpainting guided by OpenCV masks to localize changes. Prompts are tuned to avoid hallucinations and keep context intact.

Detectors. We evaluate EfficientNet-B0, ResNet-50, and CLIP (ViT-B/32) across three tasks: Binary (Healthy/Unhealthy), Generation Source (Real/GAN/Diffusion), and Detailed 10-way (crop–health–source). Metrics include Accuracy, F1, Precision/Recall; confusion matrices are used for diagnostics.

Results

We evaluate (i) GAN training dynamics and best FID (StyleGAN2-ADA vs StyleGAN3), (ii) diffusion image quality, and (iii) classifier performance for binary, source, and detailed (10-way) attribution. Below we show quantitative trends followed by qualitative examples and a summary table.

Quantitative (FID & Training Dynamics)

(a)
Best FID and FID trajectories for StyleGAN2-ADA vs StyleGAN3 across crops.
Best FID (bars) and FID vs kimg (lines) across Apple, Tomato, and Maize for StyleGAN2-ADA and StyleGAN3.
(b)
FID vs kimg curves across datasets.
FID training curves (lower is better) across all six crop–health datasets.
(c)
Additional FID vs kimg trends.
Additional FID trends showing convergence and stability differences.

Qualitative (Real vs Synthetic)

(d) Real vs synthetic (Pix2Pix, BLIP, DS8, StyleGAN2, StyleGAN3) examples across crops and health.
Real and synthetic leaves across crops (Apple, Maize, Tomato) and health (H/Un). Diffusion pipelines yield minimally edited, high-fidelity images; StyleGAN3 often improves texture realism.

Model Performance Summary

The table below summarizes Accuracy / F1 / Precision / Recall / Loss for CLIP, ResNet-50, and EfficientNet-B0 on Binary, Generation, and Detailed tasks across crops.

(e) Model performance table across plants and classification types.
EfficientNet-B0 achieves near-perfect attribution, while CLIP degrades on detailed 10-way classes.

Appendix (Extended Details)

The appendix provides extended experimental details and supplementary figures beyond the main text. This includes:

  • Diffusion prompts: Complete lists of prompts used for Pix2Pix, BLIP-Diffusion, and DS8 inpainting pipelines, including positive/negative prompt templates.
  • Hyper-parameter grids: Training configurations and tuning ranges for StyleGAN2-ADA, StyleGAN3, and diffusion models (learning rates, augmentation probabilities, kimg checkpoints).
  • DS8 inpainting masks: Leaf-level segmentation masks and before/after composites demonstrating localized edits and preservation of background context.
  • Crop-specific adjustments: Maize segmentation refinements and custom preprocessing applied to ensure faithful leaf isolation.
  • HP tuning examples: Side-by-side qualitative comparisons before and after hyper-parameter adjustments for GAN and diffusion outputs.
  • Confusion matrices: Detailed matrices for Binary, Generation, and 10-way classification tasks with EfficientNet-B0, ResNet-50, and CLIP-ViT.
  • Additional qualitative panels: Extended real vs synthetic comparisons across Apple, Tomato, and Maize, covering multiple health states and generator types.

For the full set of tables, figures, and experimental configurations, please refer to the PDF appendix.

BibTeX

@article{Sikder2025AgriAdversarial,
  title   = {Adversarial Image Detection Using Deep Learning in Agricultural Contexts},
  author  = {Sikder, Md Nazmul Kabir and Yardimci, Mehmet and Ward, Trey and Deshmukh, Shubham Laxmikant and Batarseh, Feras A.},
  journal = {Preprint},
  year    = {2025},
  month   = {September}
}