Evaluating LLM Changes with Phoenix
In the past 10 days alone we've had 3+ major model releases: GPT-4o-mini, Llama 3.1, and Mistral Large 2. All these new options mean more choices, and more time spent evaluating and testing each model. Fortunately, we have a structured, easy way to experiment with different models on your own LLM app. This video walks through how you can easily experiment with different models and prompt changes - and compare results side-by-side. Tools used: - Arize Phoenix (https://github.com/arize-ai/phoenix/) - OpenAI, Anthropic, Mistral Link to notebook: https://drive.google.com/file/d/1eDQO...