DocuQuery AI is an end-to-end AI-powered document understanding and question-answering system that allows users to upload documents (PDFs, images, or text files), automatically extract their content using PDF parsing and OCR, and ask natural language questions using a Retrieval-Augmented Generation (RAG) workflow powered by a Large Language Model (LLM).
This project focuses on building a practical, production-style AI pipeline by integrating document processing, OCR, backend APIs, and LLM-based reasoning into a single system.
- Upload and process:
- Digital PDFs
- Scanned PDFs
- Images (JPG, PNG, WEBP)
- Text and code files
- Automatic file type detection
- Text extraction from PDFs using
pdfplumber - OCR fallback for scanned documents using
OpenAI vision - PDF page preview using
pdf2image - AI-powered question answering using LLM-based RAG
- REST API-based backend using Flask
- Interactive and responsive web interface
- Optical Character Recognition (OCR)
- Document parsing and preprocessing
- Retrieval-Augmented Generation (RAG)
- Prompt engineering
- LLM API integration
- REST API design
- Full-stack AI system integration
- Error handling and fallback pipelines
- Python
- Flask
- pdfplumber
- pdf2image
- OpenAI vision
- Pillow
- python-dotenv
- Werkzeug
- OpenAI API (LLM-based question answering)
- HTML
- CSS
- JavaScript (Fetch API)
- User uploads a document from the web interface.
- Backend detects the file type automatically.
- Based on file type:
- Digital PDF → Text extracted using
pdfplumber - Scanned PDF → Converted to image → OCR using
OpenAI vision - Image file → OCR applied directly
- Text file → Direct decoding
- Digital PDF → Text extracted using
- Extracted text is stored as contextual knowledge.
- User submits a natural language question.
- Document text + user query are sent to the LLM through a RAG-style prompt.
- The AI generates a grounded answer based strictly on the document.
- The response is displayed in the UI in real time.
DocuQuery_AI/
│
├── app.py
├── config.py
├── pdf_utils.py
├── ocr_utils.py
├── rag_utils.py
├── file_utils.py
├── graph.py
├── requirements.txt
├── runtime.txt
├── README.md
│
├── templates/
│ └── index.html
│
└── static/
├── app.js
└── style.css
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Home page |
/upload |
POST | Upload and process a file |
/preview_pdf_page |
POST | Preview selected PDF page |
/extract_pdf |
POST | Extract PDF text (page/range/full) |
/ocr |
POST | Run OCR on image |
/chat |
POST | Ask questions using RAG |
- Ask questions from research papers
- Extract text from scanned notes
- Summarize PDF reports
- Query invoices or bills
- Educational document analysis
- Designed a full multi-step AI workflow
- Integrated OCR + LLM reasoning
- Built REST APIs for document processing
- Handled real-world document edge cases
- Implemented RAG-style question answering
- Learned practical deployment challenges
- Add vector embeddings + FAISS for advanced RAG
- Use EasyOCR / PaddleOCR for better accuracy
- Multi-document querying
- Automatic summarization and key-entity extraction
- Session memory and chat history
- Full cloud deployment (Render / AWS / Streamlit Cloud)
https://drive.google.com/file/d/1d-gUCekkUgPPC9ay5JFjKJeG5n1fy5xv/view?usp=drive_link
https://github.com/omprakash0702/DocuQuery_AI
DocuQuery AI is a complete Document Intelligence and Question-Answering system that demonstrates how OCR, PDF parsing, and LLM-based reasoning can be combined into a real-world AI application. The project highlights strong applied AI skills, system design thinking, and clean backend–frontend integration.
Author: Omprakash
Domain: Applied AI | Document Intelligence | RAG Systems