Robust Image Forgery Detection and Localization Framework using Vision Transformers (ViTs)
Mahesh Enumula1, M. Giri2, V. K. Sharma3
1Mahesh Enumula, Department of ECE, Bhagwant University, Ajmer (Rajasthan), India.
2Dr. M. Giri, Department of CSE, Siddharth Institute of Engineering and Technology, Puttur (Karnataka), India.
3Dr. V. K. Sharma, Department of ECE, Bhagwant University, Ajmer (Rajasthan), India.
Manuscript received on 25 November 2024 | Revised Manuscript received on 09 December 2024 | Manuscript Accepted on 15 December 2024 | Manuscript published on 30 December 2024 | PP: 20-29 | Volume-14 Issue-1, December 2024 | Retrieval Number: 100.1/ijitee.L101214011224 | DOI: 10.35940/ijitee.L1012.14011224
Open Access | Editorial and Publishing Policies | Cite | Zenodo | OJS | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Image forgery detection has become increasingly critical with the proliferation of image editing tools capable of generating realistic forgeries. Traditional deep learning approaches, such as convolutional neural networks (CNNs), often struggle with capturing global dependencies and subtle inconsistencies across larger image contexts. To address these challenges, this paper proposes a novel Vision Transformer(ViT)- based framework for robust image forgery detection and localization. Leveraging the self-attention mechanism of transformers, our approach effectively models long-range dependencies and detects even subtle tampered regions with high precision. The proposed framework processes images as patch embeddings, extracting both local and global features, and outputs a detailed forgery map for accurate localization. We evaluate our method on multiple benchmark datasets containing diverse forgery types, including splicing, cloning, and inpainting. Experimental results demonstrate that the ViTbased model outperforms state-of-the-art CNN and GAN-based methods, achieving superior accuracy, precision, and recall. Additionally, qualitative analyses highlight its capability to localize forgeries in complex scenarios. The results underscore the potential of Vision Transformers as a powerful tool for advancing the fieldof image forgery detection.
Keywords: Image Forgery Detection, Vision Transformers (ViT), Self-Attention Mechanism, Global Depen- Dencies, Image Localization, Patch Embeddings, Forgery Localization.
Scope of the Article: Artificial Intelligence & Methods