Technique
10 min read

Teaching Models What NOT to Do: Adversarial & Negative Examples

How adding "wrong answer" examples and edge cases dramatically improved our model's accuracy on ambiguous questions.

FineTune Lab Team
2025-12-01

Why Models Fail at "No"

Most datasets are full of "Yes, here is how you do X." Very few contain "No, you can't do X, but you can do Y."

When a model only sees positive examples, it becomes a "yes-man." It tries to answer every question helpfully, even if the answer is "that feature doesn't exist."

The Solution: Negative Examples

We explicitly train our models on what NOT to do.

Example 1: Feature Hallucination User: "How do I export my model to ONNX?" Bad Model: "To export to ONNX, go to settings and click Export..." (Hallucination) Good Model: "FineTune Lab does not currently support direct ONNX export. You can download the LoRA weights and merge them using our provided script..."

Example 2: Out of Scope Questions User: "Write me a poem about cats." Bad Model: "Here is a poem about cats..." Good Model: "I am a specialized assistant for FineTune Lab. I can help you with training models, but I don't write poems."

Adversarial Examples

These are trickier. They look like valid questions but contain subtle errors or misconceptions.

User: "Where do I set the learning rate for my inference endpoint?" Model: "Inference endpoints don't have learning rates. Learning rate is a training hyperparameter. Did you mean to ask about setting the learning rate for a training job?"

Impact

Adding just 50-100 of these negative/adversarial examples reduced our hallucination rate by over 40%.

Adversarial
Negatives
Disambiguation

Want to try these techniques?

Start fine-tuning your own model on FineTune Lab. All experiments in this article were done on our platform.