Building the FineTune Lab Assistant: A Dataset Iteration Journey

The Problem
We needed a support assistant that actually understands FineTune Lab. Not a generic chatbot that hallucinates features we don't have, but one that:
- Knows exactly where every button is
Understands our workflows
Can guide users step-by-step
Doesn't make stuff up
The Approach
We started with Qwen and a dataset of ~2,000 Q&A pairs. Here's what we learned through 10+ iterations:
Iteration 1: The Baseline Disaster
Our first dataset was written like documentation. Formal, complete, boring. The model responded like a manual, not a helpful assistant.
Example:
Q: "How do I view my datasets?"
A (v1): "To view datasets, navigate to the dashboard and locate the datasets section in the main navigation panel..."
Problem: We don't have a dashboard. The model was hallucinating UI elements.
Iteration 2: Grounding in Reality
We rewrote answers to match our actual UI:
- Q: "How do I view my datasets?"
A (v2): "Go to the left sidebar and click 'Manage Datasets'. Your uploaded datasets are listed below the upload section."
Better. But still missing personality and helpfulness.
Iteration 3-5: Adding Context
We added details users actually need:
Exact button locations
What they'll see when they get there
Common follow-up actions
Iteration 6-8: Adversarial Examples
This was the game-changer. We added:
Questions that sound similar but are different
"No, we don't have that" responses
Disambiguation between similar features
Example adversarial:
Q: "Can I regenerate my training data?"
A: "FineTune Lab doesn't have an automatic feature to regenerate training data. If you meant regenerating responses in the Chat Interface, that's a different feature - look for the regenerate icon under each LLM response."
Iteration 9-10: Reasoning Anchoring
We added thinking patterns that start with "What does FineTune Lab actually offer for this?"
This prevents the model from defaulting to generic LLM knowledge and keeps it grounded in our platform.
The Results
Metricv1v10
--
--
-- Accuracy on navigation45%94% Hallucinated features23%2% User satisfaction3.2/54.7/5
Key Takeaways
1. Start with real UI, not documentation - Write answers by actually clicking through the app
Adversarial examples are essential - Teach the model what NOT to say
Anchor reasoning to your product - Force the model to think about YOUR platform first
Iterate on failures - Every wrong answer is a training opportunity
This is a living case study. We'll update it as we continue iterating on the dataset.