Advanced RAG Pipelines and Evaluation
-
Updated
Feb 23, 2026 - Python
Advanced RAG Pipelines and Evaluation
Advanced RAG pipeline optimization framework using DSPy. Implements modular RAG pipelines with Query-Rewriting, Sub-Query Decomposition, and Hybrid Search via Weaviate. Automates prompt tuning and few-shot selection using MIPRO, COPRO, and BootstrapFewShot optimizers on datasets like FreshQA, HotpotQA, TriviaQA, Wikipedia and PubMedQA.
🚀 Production-ready modular RAG monorepo: Local LLM inference (vLLM) • Hybrid retrieval with Qdrant • Semantic caching • Docling document parsing • Cross-encoder reranking • DeepEval evaluation • Full observability with Langfuse • Open WebUI chat interface • OpenAI-compatible API • Fully Dockerized
LLM & RAG evaluation framework using DeepEval. Includes 11+ executable tests for metrics like Faithfulness, Hallucination, and Agentic Tool Usage
A robust, modular pipeline for automated LLM chatbot evaluation, using DeepEval, GROQ models, and Confident AI dashboard logging. Designed for systematic QA, reliable evaluation, and portfolio-quality results in AI/QA engineering.
Framework for evaluating and improving LLM-generated scientific abstracts using ROUGE metrics, semantic embeddings, and LLM-as-judge techniques.
[UNDER DEVELOPMENT] Clinical-RAG is a production-grade, citation-backed AI system designed to bridge the "Trust Gap" in medical information retrieval.
Add a description, image, and links to the deepeval topic page so that developers can more easily learn about it.
To associate your repository with the deepeval topic, visit your repo's landing page and select "manage topics."