The Experiment
We trained the same base model (Qwen 2.5 7B) on three different datasets:
1. Dataset A: 50,000 examples scraped and lightly cleaned
All datasets covered the same domain. Training time was normalized.
Results
-
-
Why Does This Happen?
1. Noise Drowns Signal
Large datasets often contain contradictory examples. The model learns the average, not the ideal.
2. Garbage In, Garbage Out
Scraped data includes errors, outdated info, and edge cases that shouldn't be learned.
3. Consistency Matters
A small, consistent dataset teaches clear patterns. A large, inconsistent one teaches confusion.
Practical Implications
- Don't chase dataset size - Focus on quality first
Want to replicate this experiment? All training was done on FineTune Lab with identical hyperparameters.