deepeval

Here are 12 public repositories matching this topic...

avnlp / rag-pipelines

Advanced RAG Pipelines and Evaluation

pubmed unstructured rag baml milvus earnings-calls contextual-ai llm langgraph rag-pipeline agentic-rag deepeval financebench healthbench

Updated Feb 23, 2026
Python

Advanced RAG pipeline optimization framework using DSPy. Implements modular RAG pipelines with Query-Rewriting, Sub-Query Decomposition, and Hybrid Search via Weaviate. Automates prompt tuning and few-shot selection using MIPRO, COPRO, and BootstrapFewShot optimizers on datasets like FreshQA, HotpotQA, TriviaQA, Wikipedia and PubMedQA.

metadata-extraction query-rewriting rag weaviate dspy rag-pipeline deepeval sub-query-generation

Updated Feb 23, 2026
Python

MERakram / Advanced-RAG-monorepo

Star

🚀 Production-ready modular RAG monorepo: Local LLM inference (vLLM) • Hybrid retrieval with Qdrant • Semantic caching • Docling document parsing • Cross-encoder reranking • DeepEval evaluation • Full observability with Langfuse • Open WebUI chat interface • OpenAI-compatible API • Fully Dockerized

python nlp ai self-hosted reranking rag fastapi vector-database cross-encoder qdrant vllm langfuse open-webui deepeval

Updated Jan 28, 2026
Python

JohnRitchie / qa-llm-guard

Star

python pytest allure testing-framework qa-automation llm-testing deepeval

Updated May 20, 2025
Python

sritajkumarpatel / learn_llmtesting_2025

Star

Project demonstrating LLM testing using Deepeval with OpenAI and local LLMs as judge

openai llm ollama deepeval

Updated Oct 29, 2025
Python

abhi9avx / deepeval-llm-evaluation

Star

LLM & RAG evaluation framework using DeepEval. Includes 11+ executable tests for metrics like Faithfulness, Hallucination, and Agentic Tool Usage

ai-agents llm-evaluation llm-evaluation-framework deepeval

Updated Feb 8, 2026
Python

olensmar / deepeval-junit-reporter

Star

JUnit-style XML report generation for deepeval test runs

xml reporting junit deepeval

Updated Feb 10, 2026
Python

A5hit / FinChatbot_Eval

Star

A robust, modular pipeline for automated LLM chatbot evaluation, using DeepEval, GROQ models, and Confident AI dashboard logging. Designed for systematic QA, reliable evaluation, and portfolio-quality results in AI/QA engineering.

deepeval

Updated Nov 24, 2025
Python

nsourlos / llm-scientific-abstract-evaluation

Star

Framework for evaluating and improving LLM-generated scientific abstracts using ROUGE metrics, semantic embeddings, and LLM-as-judge techniques.

python text-generation semantic-similarity rouge-metric sentence-transformers scientific-abstracts openai-api dspy prompt-engineering llm-evaluation llm-as-a-judge deepeval

Updated Oct 18, 2025
Python

messeb / py-deepeval-behave-bdd-testing-example

Sponsor

Star

An example that combines Behave (BDD testing) with DeepEval (LLM evaluation) to create human-readable, stakeholder-friendly tests for AI Agents / chatbots.

python bdd chatbot openai behave ai-agents deepeval

Updated Jan 11, 2026
Python

rimironenko / rostcamp

Star

openai-api llm generative-ai amazon-bedrock llm-testing deepeval

Updated Jan 17, 2026
Python

SchadenKai / Clinical-RAG

Star

[UNDER DEVELOPMENT] Clinical-RAG is a production-grade, citation-backed AI system designed to bridge the "Trust Gap" in medical information retrieval.

milvus healthcare-ai langchain-python rag-pipeline rag-chatbot langgraph-python deepeval

Updated Feb 25, 2026
Python

Improve this page

Add a description, image, and links to the deepeval topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the deepeval topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deepeval

Here are 12 public repositories matching this topic...

avnlp / rag-pipelines

avnlp / dspy-opt

MERakram / Advanced-RAG-monorepo

JohnRitchie / qa-llm-guard

sritajkumarpatel / learn_llmtesting_2025

abhi9avx / deepeval-llm-evaluation

olensmar / deepeval-junit-reporter

A5hit / FinChatbot_Eval

nsourlos / llm-scientific-abstract-evaluation

messeb / py-deepeval-behave-bdd-testing-example

rimironenko / rostcamp

SchadenKai / Clinical-RAG

Improve this page

Add this topic to your repo