All tools · tag
Evaluation
Evaluation platforms and libraries to test, score, and harden LLM and agent behavior. 13 tools.
Observability
AgentOps
Observability built specifically for AI agents.
Open source freemium
Arize Phoenix
Open-source LLM tracing and eval.
Open source Self-hostable free
Comet Opik
Open-source eval and tracing from Comet.
Open source Self-hostable freemium
Laminar
Open-source observability and eval for AI agents.
Open source Self-hostable freemium
Langfuse
The most-used open-source LLM observability tool.
Open source Self-hostable freemium Featured
LangSmith
LangChain's observability and eval platform.
Proprietary freemium
Weights and Biases Weave
LLM tracing and eval inside W&B.
Proprietary freemium
Evaluation
Braintrust
Evals and prompt playground for serious teams.
Proprietary freemium
DeepEval (Confident AI)
Open-source unit tests for LLMs.
Open source Self-hostable free
Galileo
Guardrails and evaluation for production LLMs.
Proprietary paid
Latitude
Open-source prompt management and eval.
Open source Self-hostable freemium
Maxim AI
Evaluation and simulation for AI agents.
Proprietary freemium
Patronus AI
Automated evaluation and guardrails for LLMs.
Proprietary paid