Edition #6: The Convergence of RAG and Fine-Tuning (RAFT)
Welcome back to Fine-Tuned. This week we are looking at the fine-tuning ecosystem and why the war between RAG and Fine-Tuning is finally ending.
🔬 The Deep Dive: The Convergence of RAG and Fine-Tuning
If you look at Twitter, you’ll see a religious war between two camps:
Camp A: \”You don’t need fine-tuning, just use better RAG.\”
Camp B: \”RAG is too slow and loses context, just fine-tune an SLM.\”
The reality in 2026 is that the best teams are doing both.
The New Architecture: RAFT (Retrieval-Augmented Fine-Tuning)
Instead of choosing one or the other, we are seeing the rise of RAFT. Here is how it works:
You take a massive, unstructured dataset (like your entire corporate knowledge base).
You use a cheap model to generate synthetic Q&A pairs from that data.
You fine-tune an SLM on these Q&A pairs so the model actually learns the domain knowledge and the required answering format.
AT INFERENCE, you still use a vector database to pull in the exact numbers.
The fine-tuned model now knows how to answer and understands the jargon of your business, while the RAG system provides the exact, up-to-date facts.
🗞️ The Roundup: 3 Big Updates This Week
Synthetic Data is the Only Data Left: Models trained on curated synthetic data now outperform models trained on raw human internet data.
The 1-Bit LLM: A new architecture quantizes model weights down to a single bit, potentially allowing 70B parameter models to run natively on mobile devices.
\”Small\” is Relative: 1B and 2B models are becoming the new \”small,\” being embedded directly into specific app features rather than running as centralized brains.
🛠️ Tool of the Week: Axolotl
If you want to try fine-tuning without the headache, use Axolotl. It’s a declarative framework where you write a simple YAML file to handle the entire training loop.
Keep building.
Kyle Anderson