Evaluation how-to guides
These guides answer “How do I….?” format questions. They are goal-oriented and concrete, and are meant to help you complete a specific task. For conceptual explanations see the Conceptual guide. For end-to-end walkthroughs see Tutorials. For comprehensive descriptions of every class and function see the API Reference.
Offline evaluation
Evaluate and improve your application.
Run an evaluation
- Run an evaluation using the SDK
- Run an evaluation asynchronously
- Run an evaluation comparing two experiments
- Run an evaluation of a LangChain / LangGraph object
- Run an evaluation of an existing experiment
- Run an evaluation using the REST API
- Run an evaluation in the prompt playground
Define an evaluator
- Define a custom evaluator
- Use an off-the-shelf evaluator (Python only)
- Evaluate aggregate experiment results
- Evaluate intermediate steps
- Return multiple metrics in one evaluator
- Return categorical and continuous metrics
- Check your evaluator setup
Configure the data
Configure an evaluation job
Unit testing
Unit test your system to identify bugs and regressions.
Online evaluation
Evaluate and monitor your system's live performance on production data.
Automatic evaluation
Set up evaluators that automatically run for all experiments against a dataset.
Analyzing experiment results
Use the UI & API to understand your experiment results.
- Compare experiments with the comparison view
- Filter experiments
- View pairwise experiments
- Fetch experiment results in the SDK
- Upload experiments run outside of LangSmith with the REST API
Dataset management
Manage datasets in LangSmith used by your evaluations.
- Manage datasets from the UI
- Manage datasets programmatically
- Version datasets
- Share or unshare a dataset publicly
- Export filtered traces from an experiment to a dataset
Annotation queues and human feedback
Collect feedback from subject matter experts and users to improve your applications.