A agentindex
Tools Best for Knowledge Wiki Methodology
All tools · tag

Evaluation

Evaluation platforms and libraries to test, score, and harden LLM and agent behavior. 13 tools.

Observability

AgentOps

Observability built specifically for AI agents.

Open source freemium

Arize Phoenix

Open-source LLM tracing and eval.

Open source Self-hostable free

Comet Opik

Open-source eval and tracing from Comet.

Open source Self-hostable freemium

Laminar

Open-source observability and eval for AI agents.

Open source Self-hostable freemium

Langfuse

The most-used open-source LLM observability tool.

Open source Self-hostable freemium Featured

LangSmith

LangChain's observability and eval platform.

Proprietary freemium

Weights and Biases Weave

LLM tracing and eval inside W&B.

Proprietary freemium
Evaluation

Braintrust

Evals and prompt playground for serious teams.

Proprietary freemium

DeepEval (Confident AI)

Open-source unit tests for LLMs.

Open source Self-hostable free

Galileo

Guardrails and evaluation for production LLMs.

Proprietary paid

Latitude

Open-source prompt management and eval.

Open source Self-hostable freemium

Maxim AI

Evaluation and simulation for AI agents.

Proprietary freemium

Patronus AI

Automated evaluation and guardrails for LLMs.

Proprietary paid

Related tags

Library · 20Observability · 12Guardrails · 3Prompt management · 3
agentindex — a neutral, practitioner-curated index. Data is open in this repo and refreshed weekly.
Methodology · Knowledge · GitHub