(from https://github.com/modelscope/ms-swift)
ModelScope Community Website
中文 | English
Paper | Swift3.x En Doc | Swift3.x中文文档
Swift2.x En Doc | Swift2.x中文文档
You can contact us and communicate with us by adding our group:
Discord Group | WeChat Group |
---|---|
![]() | ![]() |
🍲 ms-swift is an official framework provided by the ModelScope community for fine-tuning and deploying large language models and multi-modal large models. It currently supports the training (pre-training, fine-tuning, human alignment), inference, evaluation, quantization, and deployment of 450+ large models and 150+ multi-modal large models. These large language models (LLMs) include models such as Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, DeepSeek-R1, Yi1.5, TeleChat2, Baichuan2, and Gemma2. The multi-modal LLMs include models such as Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, and GOT-OCR2.
🍔 Additionally, ms-swift incorporates the latest training technologies, including lightweight techniques such as LoRA, QLoRA, Llama-Pro, LongLoRA, GaLore, Q-GaLore, LoRA+, LISA, DoRA, FourierFt, ReFT, UnSloth, and Liger, as well as human alignment training methods like DPO, GRPO, RM, PPO, KTO, CPO, SimPO, and ORPO. ms-swift supports acceleration of inference, evaluation, and deployment modules using vLLM and LMDeploy, and it supports model quantization with technologies like GPTQ, AWQ, and BNB. Furthermore, ms-swift offers a Gradio-based Web UI and a wealth of best practices.
Why choose ms-swift?
- 🍎 Model Types: Supports 450+ pure text large models, 150+ multi-modal large models, as well as All-to-All multi-modal models, sequence classification models, and embedding models, covering the entire process from training to deployment.
- Dataset Types: Comes with 150+ pre-training, fine-tuning, human alignment, multi-modal datasets, and supports custom datasets.
- Hardware Support: Compatible with CPU, RTX series, T4/V100, A10/A100/H100, Ascend NPU, MPS, etc.
- 🍊 Lightweight Training: Supports lightweight fine-tuning methods like LoRA, QLoRA, DoRA, LoRA+, ReFT, RS-LoRA, LLaMAPro, Adapter, GaLore, Q-Galore, LISA, UnSloth, Liger-Kernel.
- Distributed Training: Supports distributed data parallel (DDP), device_map simple model parallelism, DeepSpeed ZeRO2/ZeRO3, FSDP, and other distributed training techniques.
- Quantization Training: Supports training quantized models like BNB, AWQ, GPTQ, AQLM, HQQ, EETQ.
- RLHF Training: Supports human alignment training methods such as DPO, GRPO, RM, PPO, KTO, CPO, SimPO, ORPO for both pure text and multi-modal large models.
- 🍓 Multi-Modal Training: Supports training on different modalities like images, videos, and audio, for tasks like VQA, captioning, OCR, and grounding.
- Interface Training: Provides capabilities for training, inference, evaluation, quantization through an interface, completing the whole large model pipeline.
- Plugin and Extension: Supports custom model and dataset extensions, as well as customization of components like loss, metric, trainer, loss-scale, callback, optimizer.
- 🍉 Toolbox Capabilities: Offers not only training support for large models and multi-modal large models but also covers the entire process of inference, evaluation, quantization, and deployment.
- Inference Acceleration: Supports inference acceleration engines like PyTorch, vLLM, LmDeploy, and provides OpenAI API for accelerating inference, deployment, and evaluation modules.
- Model Evaluation: Uses EvalScope as the evaluation backend and supports evaluation on 100+ datasets for both pure text and multi-modal models.
- Model Quantization: Supports AWQ, GPTQ, and BNB quantized exports, with models that can use vLLM/LmDeploy for inference acceleration and continue training.
- 🎁 2025.03.15: SWIFT support the fine-tuning of gme(multi-modal) embedding models,please check the training script。
- 🎁 2025.03.13: We provide a script of GRPO to train a 72B model with only 4 GPUs(4*80G), please check here
- 🎁 2025.03.05: We support the hybrid mode of GRPO(rollout and actor on the same GPU, rollout sleep when actor training), meanwhile tensor parallel for GRPO, check training script here
- 🎁 2025.02.21: We test the speed performance of GRPO,and with some tricks to speed up to 300%. WanDB charts can be found here
- 🎁 2025.02.21: Support distill from LLM API,Please check this example
- 🎁 2025.02.17: Support SwanLab, just add a few of arguments you can use swanlab to analysis your training results
- 🎁 2025.02.16: Support LMDeploy in GRPO, use
--use_lmdeploy true
. Please check this script - 🔥 2025.02.12: Support for GRPO(Group Relative Policy Optimization) algorithm for llm and mllm, document can be found in here
- 🎁 2025.02.10: SWIFT support the fine-tuning of embedding models,please check the training script。
- 🎁 2025.01.23: SWIFT support the
sample
command, this is a very important feature for complex CoT and RFT. Meanwhile, we support an Reinforced Fine-tuning script. - 🎁 2024.12.04: SWIFT3.0 major version update. Please check the Release Notes and Changes.
- 🎉 2024.08.12: The SWIFT paper has been published on arXiv, and you can read it here.
- 🔥 2024.08.05: Support for using evalscope as a backend for evaluating large models and multimodal models.
- 🔥 2024.07.29: Support for using vllm and lmdeploy to accelerate inference for large models and multimodal models. When performing infer/deploy/eval, you can specify
--infer_backend vllm/lmdeploy
. - 🔥 2024.07.24: Support for human preference alignment training for multimodal large models, including DPO/ORPO/SimPO/CPO/KTO/RM/PPO.
- 🔥 2024.02.01: Support for Agent training! The training algorithm is derived from this paper.
To install using pip:
pip install ms-swift -U
To install from source:
# pip install git+https://github.com/modelscope/ms-swift.git
git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
pip install -e .
Running Environment:
Range | Recommended | Notes | |
---|---|---|---|
python | >=3.9 | 3.10 | |
cuda | cuda12 | No need to install if using CPU, NPU, MPS | |
torch | >=2.0 | ||
transformers | >=4.33 | 4.49 | |
modelscope | >=1.19 | ||
peft | >=0.11,<0.15 | ||
trl | >=0.13,<0.17 | 0.15 | RLHF |
deepspeed | >=0.14 | 0.14.5 | Training |
vllm | >=0.5.1 | 0.7.3 | Inference/Deployment/Evaluation |
lmdeploy | >=0.5 | 0.7.1 | Inference/Deployment/Evaluation |
evalscope | >=0.11 | Evaluation |
For more optional dependencies, you can refer to here.
10 minutes of self-cognition fine-tuning of Qwen2.5-7B-Instruct on a single 3090 GPU:
# 22GB
CUDA_VISIBLE_DEVICES=0 \
swift sft \
--model Qwen/Qwen2.5-7B-Instruct \
--train_type lora \
--dataset 'AI-ModelScope/alpaca-gpt4-data-zh#500' \
'AI-ModelScope/alpaca-gpt4-data-en#500' \
'swift/self-cognition#500' \
--torch_dtype bfloat16 \
--num_train_epochs 1 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--learning_rate 1e-4 \
--lora_rank 8 \
--lora_alpha 32 \
--target_modules all-linear \
--gradient_accumulation_steps 16 \
--eval_steps 50 \
--save_steps 50 \
--save_total_limit 5 \
--logging_steps 5 \
--max_length 2048 \
--output_dir output \
--system 'You are a helpful assistant.' \
--warmup_ratio 0.05 \
--dataloader_num_workers 4 \
--model_author swift \
--model_name swift-robot
Tips:
- If you want to train with a custom dataset, you can refer to this guide to organize your dataset format and specify
--dataset <dataset_path>
. - The
--model_author
and--model_name
parameters are only effective when the dataset includesswift/self-cognition
. - To train with a different model, simply modify
--model <model_id/model_path>
. - By default, ModelScope is used for downloading models and datasets. If you want to use HuggingFace, simply specify
--use_hf true
.
After training is complete, use the following command to infer with the trained weights:
- Here,
--adapters
should be replaced with the last checkpoint folder generated during training. Since the adapters folder contains the training parameter fileargs.json
, there is no need to specify--model
,--system
separately; Swift will automatically read these parameters. To disable this behavior, you can set--load_args false
.
# Using an interactive command line for inference.
CUDA_VISIBLE_DEVICES=0 \
swift infer \
--adapters output/vx-xxx/checkpoint-xxx \
--stream true \
--temperature 0 \
--max_new_tokens 2048
# merge-lora and use vLLM for inference acceleration
CUDA_VISIBLE_DEVICES=0 \
swift infer \
--adapters output/vx-xxx/checkpoint-xxx \
--stream true \
--merge_lora true \
--infer_backend vllm \
--max_model_len 8192 \
--temperature 0 \
--max_new_tokens 2048
Finally, use the following command to push the model to ModelScope:
CUDA_VISIBLE_DEVICES=0 \
swift export \
--adapters output/vx-xxx/checkpoint-xxx \
--push_to_hub true \
--hub_model_id '<your-model-id>' \
--hub_token '<your-sdk-token>' \
--use_hf false
The Web-UI is a zero-threshold training and deployment interface solution based on Gradio interface technology. For more details, you can check here.
SWIFT_UI_LANG=en swift web-ui
ms-swift also supports training and inference using Python. Below is pseudocode for training and inference. For more details, you can refer to here.
Training:
# Retrieve the model and template, and add a trainable LoRA module
model, tokenizer = get_model_tokenizer(model_id_or_path, ...)
template = get_template(model.model_meta.template, tokenizer, ...)
model = Swift.prepare_model(model, lora_config)
# Download and load the dataset, and encode the text into tokens
train_dataset, val_dataset = load_dataset(dataset_id_or_path, ...)
train_dataset = EncodePreprocessor(template=template)(train_dataset, num_proc=num_proc)
val_dataset = EncodePreprocessor(template=template)(val_dataset, num_proc=num_proc)
# Train the model
trainer = Seq2SeqTrainer(
model=model,
args=training_args,
data_collator=template.data_collator,
train_dataset=train_dataset,
eval_dataset=val_dataset,
template=template,
)
trainer.train()
Inference:
# Perform inference using the native PyTorch engine
engine = PtEngine(model_id_or_path, adapters=[lora_checkpoint])
infer_request = InferRequest(messages=[{'role': 'user', 'content': 'who are you?'}])
request_config = RequestConfig(max_tokens=max_new_tokens, temperature=temperature)
resp_list = engine.infer([infer_request], request_config)
print(f'response: {resp_list[0].choices[0].message.content}')
Here is a minimal example of training to deployment using ms-swift. For more details, you can check the examples.
- If you want to use other models or datasets (including multimodal models and datasets), you only need to modify
--model
to specify the corresponding model's ID or path, and modify--dataset
to specify the corresponding dataset's ID or path. - By default, ModelScope is used for downloading models and datasets. If you want to use HuggingFace, simply specify
--use_hf true
.
Useful Links |
---|
🔥Command Line Parameters |
Supported Models and Datasets |
Custom Models, 🔥Custom Datasets |
LLM Tutorial |
Supported Training Methods:
Method | Full-Parameter | LoRA | QLoRA | Deepspeed | Multi-Node | Multi-Modal |
---|---|---|---|---|---|---|
Pre-training | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Instruction Supervised Fine-tuning | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
DPO Training | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
GRPO Training | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Reward Model Training | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
PPO Training | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
KTO Training | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
CPO Training | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
SimPO Training | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
ORPO Training | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Classification Model Training | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Embedding Model Training | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Pre-training:
# 8*A100
NPROC_PER_NODE=8 \
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
swift pt \
--model Qwen/Qwen2.5-7B \
--dataset swift/chinese-c4 \
--streaming true \
--train_type full \
--deepspeed zero2 \
--output_dir output \
--max_steps 100000 \
...
Fine-tuning:
CUDA_VISIBLE_DEVICES=0 swift sft \
--model Qwen/Qwen2.5-7B-Instruct \
--dataset AI-ModelScope/alpaca-gpt4-data-en \
--train_type lora \
--output_dir output \
...
RLHF:
CUDA_VISIBLE_DEVICES=0 swift rlhf \
--rlhf_type dpo \
--model Qwen/Qwen2.5-7B-Instruct \
--dataset hjh0119/shareAI-Llama3-DPO-zh-en-emoji \
--train_type lora \
--output_dir output \
...
CUDA_VISIBLE_DEVICES=0 swift infer \
--model Qwen/Qwen2.5-7B-Instruct \
--stream true \
--infer_backend pt \
--max_new_tokens 2048
# LoRA
CUDA_VISIBLE_DEVICES=0 swift infer \
--model Qwen/Qwen2.5-7B-Instruct \
--adapters swift/test_lora \
--stream true \
--infer_backend pt \
--temperature 0 \
--max_new_tokens 2048
CUDA_VISIBLE_DEVICES=0 swift app \
--model Qwen/Qwen2.5-7B-Instruct \
--stream true \
--infer_backend pt \
--max_new_tokens 2048
CUDA_VISIBLE_DEVICES=0 swift deploy \
--model Qwen/Qwen2.5-7B-Instruct \
--infer_backend vllm
CUDA_VISIBLE_DEVICES=0 swift sample \
--model LLM-Research/Meta-Llama-3.1-8B-Instruct \
--sampler_engine pt \
--num_return_sequences 5 \
--dataset AI-ModelScope/alpaca-gpt4-data-zh#5
CUDA_VISIBLE_DEVICES=0 swift eval \
--model Qwen/Qwen2.5-7B-Instruct \
--infer_backend lmdeploy \
--eval_backend OpenCompass \
--eval_dataset ARC_c
CUDA_VISIBLE_DEVICES=0 swift export \
--model Qwen/Qwen2.5-7B-Instruct \
--quant_bits 4 --quant_method awq \
--dataset AI-ModelScope/alpaca-gpt4-data-zh \
--output_dir Qwen2.5-7B-Instruct-AWQ
swift export \
--model <model-path> \
--push_to_hub true \
--hub_model_id '<model-id>' \
--hub_token '<sdk-token>'
No comments:
Post a Comment