Case Study
12 min read

Building the FineTune Lab Assistant: A Dataset Iteration Journey

How we trained our own support model through 10+ dataset iterations, fixing reasoning one Q&A at a time. See the before/after comparisons.

FineTune Lab Team
2025-12-08

The Problem

We needed a support assistant that actually understands FineTune Lab. Not a generic chatbot that hallucinates features we don't have, but one that:

- Knows exactly where every button is

  • Understands our workflows
  • Can guide users step-by-step
  • Doesn't make stuff up

    The Approach

    We started with Qwen and a dataset of ~2,000 Q&A pairs. Here's what we learned through 10+ iterations:

    Iteration 1: The Baseline Disaster

    Our first dataset was written like documentation. Formal, complete, boring. The model responded like a manual, not a helpful assistant.

    Example:

  • Q: "How do I view my datasets?"
  • A (v1): "To view datasets, navigate to the dashboard and locate the datasets section in the main navigation panel..."

    Problem: We don't have a dashboard. The model was hallucinating UI elements.

    Iteration 2: Grounding in Reality

    We rewrote answers to match our actual UI:

    - Q: "How do I view my datasets?"

  • A (v2): "Go to the left sidebar and click 'Manage Datasets'. Your uploaded datasets are listed below the upload section."

    Better. But still missing personality and helpfulness.

    Iteration 3-5: Adding Context

    We added details users actually need:

  • Exact button locations
  • What they'll see when they get there
  • Common follow-up actions

    Iteration 6-8: Adversarial Examples

    This was the game-changer. We added:

  • Questions that sound similar but are different
  • "No, we don't have that" responses
  • Disambiguation between similar features

    Example adversarial:

  • Q: "Can I regenerate my training data?"
  • A: "FineTune Lab doesn't have an automatic feature to regenerate training data. If you meant regenerating responses in the Chat Interface, that's a different feature - look for the regenerate icon under each LLM response."

    Iteration 9-10: Reasoning Anchoring

    We added thinking patterns that start with "What does FineTune Lab actually offer for this?"

    This prevents the model from defaulting to generic LLM knowledge and keeps it grounded in our platform.

    The Results

    Metricv1v10



    --
    --
    -- Accuracy on navigation45%94% Hallucinated features23%2% User satisfaction3.2/54.7/5

    Key Takeaways

    1. Start with real UI, not documentation - Write answers by actually clicking through the app

  • Adversarial examples are essential - Teach the model what NOT to say
  • Anchor reasoning to your product - Force the model to think about YOUR platform first
  • Iterate on failures - Every wrong answer is a training opportunity


    This is a living case study. We'll update it as we continue iterating on the dataset.

  • Qwen
    Dataset Curation
    Reasoning
    Production

    Want to try these techniques?

    Start fine-tuning your own model on FineTune Lab. All experiments in this article were done on our platform.