We’ve added two major planned features to improve your fine-tuning workflow:
1. GPT 4.1 Mini Support
Now supporting OpenAI’s GPT 4.1 Mini model for fine-tuning and usage.
GPT 4.1 Mini delivers near GPT-4o level performance and is approximately 70% cheaper for fine-tuned models.
It provides noticeably better fine-tuned results compared to GPT-4o Mini models.
Fine-tuned GPT 4.1 Mini models are achieving over 93% average accuracy, with some exceeding 94%, a major improvement over the ~80% average seen with fine-tuned GPT-4o Mini models.
Note: Requests with GPT 4.1 Mini cost about 3x more than GPT 4o Mini.
2. Sort & Filter for Model Evaluations
Added sorting and filtering capabilities for evaluation results based on scores.
Users can now sort evaluated responses by different metrics: similarity, relevance, completeness, error handling, and hallucinations, in both ascending and descending order.
This makes it much easier to identify incorrectly answered questions and prioritize them for re-training, significantly improving the fine-tuning experience.