February 19, 2025

Shocking Release! DeepSeek 671B Fine-Tuning Guide Revealed—Unlock the Upgraded DeepSeek Suite with One Click, AI Players Ecstatic!

3 minute read

DeepSeek V3/R1 is a hit around the world, with solutions and API services based on the original model becoming widely available, leading to a race to the bottom in pricing and free offerings.

How can we stand on the shoulders of the giant and leverage post-training with domain-specific data to build high-quality private models at low cost, enhancing business competitiveness and value?

Colossal-AI, which has garnered nearly 40,000 GitHub stars, has released an open-source post-training toolbox for large models, featuring:

DeepSeek V3/R1 671B LoRA low-cost fine-tuning
A comprehensive reinforcement learning toolchain, including PPO, GRPO, DPO, SimPO, and more
Seamless adaptation for HuggingFace open-source models, including DeepSeek distilled models
Support for mixed-precision training, gradient checkpointing, etc to accelerate training and reduce costs
Flexible training configuration interfaces, enabling customization of reward functions, loss functions, and more
Advanced parallelization strategies, including data parallelism, model parallelism, expert parallelism, ZeRO, and offloading to accommodate different hardware scales

Open-source repository: https://github.com/hpcaitech/ColossalAI

Low-Cost Supervised Fine-Tuning for DeepSeek V3/R1 671B

DeepSeek V3/R1 boasts an impressive 671 billion parameters—but how can we fine-tune it at a low cost? With just a few steps, you can efficiently complete fine-tuning with minimal expense.

Dataset Preparation

The script accepts input datasets in JSONL format, eg. https://github.com/hpcaitech/ColossalAI/blob/main/applications/ColossalChat/examples/training_scripts/lora_sft_data.jsonl, where each line represents a chat conversation list. For example:

[{"role": "user", "content": "Hello. How are you?"}, {"role": "assistant", "content": "I'm fine. Can I help you today?"}]

[{"role": "user", "content": "Can I still live with only one heart left?"}, {"role": "assistant", "content": "Yes, usually people only have one heart."}]

This data format is compatible with the Hugging Face chat template and supports custom system prompts, allowing for flexible configuration as needed.

Model Weights Preparation

To ensure optimal fine-tuning results, BF16 weights should be used for fine-tuning.

If you have already downloaded the FP8 DeepSeek V3/R1 weights, you can convert them to BF16 using the official DeepSeek script via GPU:

🔗 DeepSeek FP8 to BF16 Conversion Script

Usage Guide

Once the dataset and model weights are prepared, you can leverage Colossal-AI’s one-click fine-tuning script: 🔗 Colossal-AI LoRA Fine-Tuning Script

This script follows the same structure as standard Supervised Fine-Tuning (SFT) scripts and is fully compatible with Hugging Face PEFT. Launch Command:

colossalai run --hostfile path-to-host-file --nproc_per_node 8 lora_finetune.py --pretrained path-to-DeepSeek-R1-bf16 --dataset path-to-dataset.jsonl --plugin moe --lr 2e-5 --max_length 256 -g --ep 8 --pp 3 --batch_size 24 --lora_rank 8 --lora_alpha 16 --num_epochs 2 --warmup_steps 8 --tensorboard_dir logs --save_dir DeepSeek-R1-bf16-lora

For more details on each parameter, you can run python lora_finetune.py --help. This script integrates TensorBoard logging, allowing you to monitor learning rate, loss, and gradient norm throughout the training process.

Optimized Hardware Requirements with LoRA

By leveraging LoRA and other optimizations, the minimum hardware requirement for fine-tuning DeepSeek V3/R1 671B has been reduced by nearly 10x. It can now be run using 24 H100 GPUs with ep=8,pp=3.

If you enable CPU offloading using --zero_cpu_offload, hardware requirements can be further reduced, although this may result in slower training speed.

As shown in the chart below, the loss consistently decreases during the fine-tuning process for DeepSeek V3/R1 671B.

For well-funded development teams, the above script can be used to efficiently scale parallelism to hundreds or even thousands of GPUs, enabling full-parameter fine-tuning or accelerated parallel training for DeepSeek V3/R1 671B.

For teams with limited budgets looking to leverage reinforcement learning to develop their own DeepSeek R1-style models, Colossal-AI offers a cost-effective solution. The team has also validated its approach using smaller models to ensure algorithm effectiveness.

Reinforcement Learning Fine-Tuning for Distilled DeepSeek

The Colossal-AI team has implemented and verified the GRPO algorithm and verifiable reward mechanism from the DeepSeek research paper, conducting experiments using the Qwen-3B-Base model.

Reward Function Design:

Reward = 0 if the format is incorrect.
Reward = 1 if the format is correct but the result is incorrect.
Reward = 10 if both the format and result are correct.

The Colossal-AI team has provided a conversation template and settings for verifying GRPO, using the Qwen2.5-3B-Base model as an example. You can find the template here: 🔗 Qwen2.5-3B Conversation Template

To start training, simply configure the following bash script: 🔗 GRPO Training Script

Additionally, in the GRPO section, the Colossal-AI team has provided insights from the verification process along with detailed descriptions of various parameters for reference.

The code includes a flexible reward function template, allowing users to customize their own reward system based on specific needs.

From the graph below, we can observe that even with a 3B model, both the average reward and response length gradually increase over time.

Interesting Observations During Training

As training progresses, some fascinating behaviors emerge. For example, with more iterations, the model starts exhibiting self-correction:

Colossal-AI: The Best Plug-and-Play Post-Training Tool

Building on its expertise in cost-efficient large model pretraining, Colossal-AI is committed to becoming the best out-of-the-box post-training solution for developers. It enables users to quickly and cost-effectively build private models based on open-source models.

🔗 Open-source repository: https://github.com/hpcaitech/ColossalAI