Train Llama, Mistral, Qwen models on your data. Monitor with real-time analytics. Test with GraphRAG. Deploy to production with one click.
Fine-tuning, analytics, testing, and predictions - everything you need to train production AI models.
Train custom models on your data in under 2 minutes
💡 Use Case
Upload your customer support conversations in JSONL format, select Llama 3.3 as the base model, enable 4-bit quantization to reduce memory, and click train. In under 2 minutes, you'll have a custom model that understands your product and responds like your best support agent.
Monitor every aspect of your training in real-time
💡 Use Case
Monitor training progress live in the Training Monitor page. See loss curves update with every batch, catch overfitting immediately when validation loss plateaus, and stop training the moment metrics stop improving - all without waiting hours to discover issues.
Test models with GraphRAG and context-aware evaluation
💡 Use Case
Upload your product documentation to GraphRAG, then test if your fine-tuned model can answer customer questions with grounded citations. The chat interface shows which document chunks were used, helping validate that your model leverages context instead of hallucinating answers.
Monitor learning progress and automate evaluation at scale
💡 Use Case
Configure predictions to generate every 100 steps during training. Watch the Prediction Evolution view to see actual responses improving from vague to accurate. If predictions aren't improving even though loss is decreasing, you've caught overfitting in real-time.
From training to deployment in three simple steps
Upload your JSONL dataset, select a base model (Llama, Mistral, Qwen), configure training parameters, and click train. Enable quantization to reduce costs. Training starts on RunPod cloud GPUs in seconds.
Learn about fine-tuning →Watch real-time loss curves, GPU metrics, and throughput in the Training Monitor. Catch overfitting immediately when validation loss diverges. View sample predictions as the model learns.
Learn about training analytics →Upload documentation to GraphRAG and test your model in the Chat Portal. Rate responses, run batch tests, and enable LLM-as-a-Judge for automated evaluation. Compare checkpoints side-by-side.
Learn about chat testing →Select the best checkpoint and deploy to RunPod Serverless with one click. Auto-scaling from 0 to 100+ GPUs. Track response times, costs, and quality metrics in Model Observability.
Learn about deployment →Common questions about FineTune Lab features
FineTune Lab supports Llama 3.3, 3.1, 2, Mistral, Qwen, and other popular open-source models. Training methods include LoRA (efficient adaptation), full fine-tuning, SFT (Supervised Fine-Tuning), DPO, ORPO, and RLHF. You can also enable 4-bit or 8-bit quantization to reduce memory by up to 75%.
Training costs depend on GPU type (A4000 to H100) and duration. FineTune Lab shows a live cost counter and projected total as you train. You can configure hard limits like "stop at $50" or "stop after 10 hours". When the limit is reached, training stops automatically and saves the latest checkpoint. Resume later with a higher budget.
FineTune Lab provides real-time analytics, intelligent testing with GraphRAG, and automated evaluation that local training doesn't offer. Instead of staring at terminal logs, you get live loss curves, GPU monitoring, and instant overfitting detection. Plus, one-click deployment to production with automatic scaling on RunPod Serverless.
Yes. Export analytics in three formats: CSV (opens in Excel/Sheets), JSON (for data pipelines), and PDF (report-ready charts). All exports include training metrics, evaluation results, costs, and model comparisons with stable schemas for automated processing.
Monitor Training shows live metrics for one training at a time - use it while training is running. Training Analytics is for comparing multiple completed runs side-by-side with overlaid loss curves and metric tables. Use Monitor for real-time tracking, Analytics for post-training comparison.
Yes, the judge model consumes tokens for each evaluation. GPT-4 and Claude Sonnet provide strong evaluation at reasonable cost. GPT-5 Pro offers exceptional reasoning but costs 10-15x more - reserve it for critical evaluations. You can also use your own fine-tuned models as judges.
Checkpoint management uses multi-metric scoring combining eval loss, overfitting penalty (train/eval gap), perplexity, and improvement rate. The best checkpoint is automatically highlighted. You can also compare predictions from different checkpoints side-by-side to see actual response quality before deploying.
GraphRAG is primarily designed for testing and evaluation during model development. It helps you validate context usage and response accuracy with citation-backed answers. For production RAG, you'd typically integrate your own vector database or knowledge graph with deployed model endpoints.
Start with our free tier. No credit card required.