Why Weave?
Building LLM applications is fundamentally different from traditional software development. LLM outputs are non-deterministic, making debugging harder. Quality is subjective and context-dependent. Small prompt changes can cause unexpected behavior changes. Traditional testing approaches fall short. Weave addresses these challenges by providing:- Visibility into every LLM call, input, and output in your application
- Systematic evaluation to measure performance against curated test cases
- Version tracking for prompts, models, and data so you can understand what changed
- Feedback collection to capture human judgments and production signals
What you can do with Weave
Debug with traces
Weave automatically traces your LLM calls and shows them in an interactive UI. You can see exactly what went into each call, what came out, how long it took, and how calls relate to each other. Get started with tracingEvaluate systematically
Run your application against curated test datasets and measure performance with scoring functions. Track how changes to prompts or models affect quality over time. Build an evaluation pipelineVersion everything
Weave tracks versions of your prompts, datasets, and model configurations. When something breaks, you can see exactly what changed. When something works, you can reproduce it. Learn about versioningCollect feedback
Capture human feedback, annotations, and corrections from production use. Use this data to build better test cases and improve your application. Collect feedbackMonitor production
Score production traffic with the same scorers you use in evaluation. Set up guardrails to catch issues before they reach users. Set up guardrails and monitorsHow Weave fits in your workflow
Weave supports the full LLM application development lifecycle:| Phase | What Weave provides |
|---|---|
| Build | Trace calls to understand behavior, debug issues, and iterate quickly |
| Test | Evaluate against datasets with custom and built-in scorers |
| Deploy | Version prompts and models for reproducible deployments |
| Monitor | Score production traffic, collect feedback, catch regressions |
Supported languages
Weave provides SDKs for Python and TypeScript:- Python
- TypeScript
Integrations
Weave integrates with popular LLM providers and frameworks:- LLM providers: OpenAI, Anthropic, Google, Mistral, Cohere, and more
- Frameworks: LangChain, LlamaIndex, DSPy, CrewAI, and more
- Local models: Ollama, vLLM, and other local inference servers