Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs) for Enterprise Knowledge Management and Document Automation: A Systematic Literature Review

Karakurt, Ehlullah; AKBULUT, AKHAN

doi:10.3390/app16010368

Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs) for Enterprise Knowledge Management and Document Automation: A Systematic Literature Review

Karakurt E., AKBULUT A.

Applied Sciences (Switzerland), cilt.16, sa.1, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Derleme
Cilt numarası: 16 Sayı: 1
Basım Tarihi: 2026
Doi Numarası: 10.3390/app16010368
Dergi Adı: Applied Sciences (Switzerland)
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
Anahtar Kelimeler: document automation, enterprise knowledge management, large language models, retrieval-augmented generation, systematic literature review
İstanbul Kültür Üniversitesi Adresli: Evet

Özet

The integration of Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs) is rapidly transforming enterprise knowledge management, yet a comprehensive understanding of their deployment in real-world workflows remains limited. This study presents a systematic literature review (SLR) analyzing 63 high-quality primary studies selected after rigorous screening to evaluate how these technologies address practical enterprise challenges. We formulated nine research questions targeting platforms, datasets, algorithms, and validation metrics to map the current landscape. Our findings reveal that enterprise adoption is largely in the experimental phase: 63.6% of implementations utilize GPT based models, and 80.5% rely on standard retrieval frameworks such as FAISS or Elasticsearch. Critically, this review identifies a significant ‘lab-to-market’ gap; while retrieval and classification sub-tasks frequently employ academic validation methods like k-fold cross-validation (93.6%), generative evaluation predominantly relies on static hold-out sets due to computational constraints. Furthermore, fewer than 15% of studies address real-time integration challenges required for production scale deployment. By systematically mapping these disparities, this study offers a data-driven perspective and a strategic roadmap for bridging the gap between academic prototypes and robust enterprise applications.