Experiment
8 min read

Quality Over Quantity: Why 2,000 Perfect Examples Beat 50,000 Mediocre Ones

We ran experiments comparing dataset sizes vs quality. The results changed how we think about data curation.

FineTune Lab Team
2025-12-05

The Experiment

We trained the same base model (Qwen 2.5 7B) on three different datasets:

1. Dataset A: 50,000 examples scraped and lightly cleaned

  • Dataset B: 10,000 examples with moderate curation
  • Dataset C: 2,000 examples with heavy manual curation

    All datasets covered the same domain. Training time was normalized.

    Results

    DatasetSizeAccuracyHallucination RateResponse Quality









    -





    -





    A50K67%18%3.1/5 B10K78%9%3.8/5 C2K91%3%4.6/5

    Why Does This Happen?

    1. Noise Drowns Signal

    Large datasets often contain contradictory examples. The model learns the average, not the ideal.

    2. Garbage In, Garbage Out

    Scraped data includes errors, outdated info, and edge cases that shouldn't be learned.

    3. Consistency Matters

    A small, consistent dataset teaches clear patterns. A large, inconsistent one teaches confusion.

    Practical Implications

    - Don't chase dataset size - Focus on quality first

  • Manual review is worth it - Every hour spent curating saves 10 hours fixing model behavior
  • Iterate, don't accumulate - Better to refine 2K examples than add another 10K mediocre ones


    Want to replicate this experiment? All training was done on FineTune Lab with identical hyperparameters.

  • Dataset
    Experiments
    Best Practices

    Want to try these techniques?

    Start fine-tuning your own model on FineTune Lab. All experiments in this article were done on our platform.