Case Study
12 min read

The 7-Example Rule: Why Category Balance Isn't Enough

Why category balance isn't enough. Learn why you need 7+ examples per fact and how to balance similar data points for robust model performance.

FineTune Lab Team
2025-12-08

The "Category Balance" Trap

When we started building datasets, we followed the standard advice: "Make sure you have a good mix of question types." We aimed for the golden ratio:

  • 30% Factual
  • 30% Instructional
  • 15% Troubleshooting
  • 10% Comparative
  • 15% Edge Cases

    We hit those numbers perfectly. But our model still failed.

    Why? Because we were balancing the dataset, not the facts.

    We had 50 questions about "Pricing" (mostly factual) and 50 questions about "Deployment" (mostly instructional). The model became great at quoting prices but terrible at explaining how to pay. It was great at deploying but couldn't tell you how much it cost.


    The Per-Fact Paradigm

    We realized that every single company fact needs to be represented across the entire spectrum of categories. You cannot rely on the model to generalize a fact from one context to another reliably, especially with proprietary data.

    If you want the model to truly "know" a fact, you need to teach it that fact from every angle.


    The 7-Example Rule

    For every core piece of information (a "fact"), we now require at least 7 distinct examples. Let's take a simple fact: "We use RunPod for GPU infrastructure."

    Here is how we balance that single fact:

    1. Factual (The "What") > Q: What infrastructure provider do you use? > A: We utilize RunPod for our GPU infrastructure needs.

    2. Instructional (The "How") > Q: How do I set up my environment on your infrastructure? > A: Since we use RunPod, you'll need to select a pod template... [steps]

    3. Comparative (The "Why") > Q: Why do you use RunPod instead of AWS? > A: RunPod offers better price-performance for the specific H100 clusters we need compared to AWS...

    4. Troubleshooting (The "Fix") > Q: My pod isn't starting up. > A: This is a common issue with RunPod spot instances. Try switching to on-demand...

    5. Ambiguous (The "Clarification") > Q: Is the infrastructure fast? > A: That depends on your definition of fast. RunPod provides high-bandwidth interconnects, but...

    6. Negative (The "Boundary") > Q: Can I deploy this on Google Cloud TPU? > A: No, our platform is exclusively optimized for RunPod NVIDIA GPUs. We do not support TPUs.

    7. Multi-step (The "Synthesis") > Q: Walk me through the full lifecycle of a training job. > A: First, your data is uploaded. Then, we provision a RunPod instance...


    Separating Similar Data

    The hardest part of this process is dealing with similar data points.

    For example, "Fine-tuning" vs. "Training". To a layperson, these are the same. To us, they are distinct pipelines.

    If you don't clarify this separation with specific examples, the model will conflate them. You need specific examples that force the model to distinguish between them:

    > Q: I want to train a model from scratch. > A: We specialize in fine-tuning existing base models, not pre-training from scratch.

    By explicitly targeting the boundaries between similar concepts, you create a "moat" around each fact, preventing the model from drifting into hallucination.

    Conclusion

    Creating 7 examples for every fact is tedious. It explodes the size of your dataset creation task. But remember: Model size is negotiable, but dataset quality is not.

    It is the only way to ensure your model doesn't just "know" your data, but understands it well enough to teach it, fix it, and defend it.

  • Dataset Curation
    Best Practices
    Fine-Tuning
    Data Balance

    Want to try these techniques?

    Start fine-tuning your own model on FineTune Lab. All experiments in this article were done on our platform.