f2CBVx

ppt.cc/fVjECx ppt.cc/fEnHsx ppt.cc/fRZTnx ppt.cc/fSZ3cx ppt.cc/fLOuCx ppt.cc/fE9Nux ppt.cc/fL5Kyx ppt.cc/fIr1ax ppt.cc/f71Yqx tecmint.com linuxcool.com linux.die.net linux.it.net.cn ostechnix.com unix.com ubuntugeek.com runoob.com man.linuxde.net ppt.cc/fwpCex ppt.cc/fxcLIx ppt.cc/foX6Ux linuxprobe.com linuxtechi.com howtoforge.com linuxstory.org systutorials.com ghacks.net linuxopsys.com ppt.cc/ffAGfx ppt.cc/fJbezx ppt.cc/fNIQDx ppt.cc/fCSllx ppt.cc/fybDVx ppt.cc/fIMQxx ppt.cc/fKlBax

Thursday, 20 March 2025

llm 训练和部署框架：ms-swift

(from https://github.com/modelscope/ms-swift)

SWIFT (Scalable lightWeight Infrastructure for Fine-Tuning)

ModelScope Community Website
中文｜ English

Paper ｜ Swift3.x En Doc ｜ Swift3.x中文文档

Swift2.x En Doc ｜ Swift2.x中文文档

📖 Table of Contents

☎ Groups

You can contact us and communicate with us by adding our group:

Discord Group	WeChat Group

📝 Introduction

🍲 ms-swift is an official framework provided by the ModelScope community for fine-tuning and deploying large language models and multi-modal large models. It currently supports the training (pre-training, fine-tuning, human alignment), inference, evaluation, quantization, and deployment of 450+ large models and 150+ multi-modal large models. These large language models (LLMs) include models such as Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, DeepSeek-R1, Yi1.5, TeleChat2, Baichuan2, and Gemma2. The multi-modal LLMs include models such as Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, and GOT-OCR2.

🍔 Additionally, ms-swift incorporates the latest training technologies, including lightweight techniques such as LoRA, QLoRA, Llama-Pro, LongLoRA, GaLore, Q-GaLore, LoRA+, LISA, DoRA, FourierFt, ReFT, UnSloth, and Liger, as well as human alignment training methods like DPO, GRPO, RM, PPO, KTO, CPO, SimPO, and ORPO. ms-swift supports acceleration of inference, evaluation, and deployment modules using vLLM and LMDeploy, and it supports model quantization with technologies like GPTQ, AWQ, and BNB. Furthermore, ms-swift offers a Gradio-based Web UI and a wealth of best practices.

Why choose ms-swift?

🍎 Model Types: Supports 450+ pure text large models, 150+ multi-modal large models, as well as All-to-All multi-modal models, sequence classification models, and embedding models, covering the entire process from training to deployment.
Dataset Types: Comes with 150+ pre-training, fine-tuning, human alignment, multi-modal datasets, and supports custom datasets.
Hardware Support: Compatible with CPU, RTX series, T4/V100, A10/A100/H100, Ascend NPU, MPS, etc.
🍊 Lightweight Training: Supports lightweight fine-tuning methods like LoRA, QLoRA, DoRA, LoRA+, ReFT, RS-LoRA, LLaMAPro, Adapter, GaLore, Q-Galore, LISA, UnSloth, Liger-Kernel.
Distributed Training: Supports distributed data parallel (DDP), device_map simple model parallelism, DeepSpeed ZeRO2/ZeRO3, FSDP, and other distributed training techniques.
Quantization Training: Supports training quantized models like BNB, AWQ, GPTQ, AQLM, HQQ, EETQ.
RLHF Training: Supports human alignment training methods such as DPO, GRPO, RM, PPO, KTO, CPO, SimPO, ORPO for both pure text and multi-modal large models.
🍓 Multi-Modal Training: Supports training on different modalities like images, videos, and audio, for tasks like VQA, captioning, OCR, and grounding.
Interface Training: Provides capabilities for training, inference, evaluation, quantization through an interface, completing the whole large model pipeline.
Plugin and Extension: Supports custom model and dataset extensions, as well as customization of components like loss, metric, trainer, loss-scale, callback, optimizer.
🍉 Toolbox Capabilities: Offers not only training support for large models and multi-modal large models but also covers the entire process of inference, evaluation, quantization, and deployment.
Inference Acceleration: Supports inference acceleration engines like PyTorch, vLLM, LmDeploy, and provides OpenAI API for accelerating inference, deployment, and evaluation modules.
Model Evaluation: Uses EvalScope as the evaluation backend and supports evaluation on 100+ datasets for both pure text and multi-modal models.
Model Quantization: Supports AWQ, GPTQ, and BNB quantized exports, with models that can use vLLM/LmDeploy for inference acceleration and continue training.

🎉 News

🎁 2025.03.15: SWIFT support the fine-tuning of gme(multi-modal) embedding models，please check the training script。
🎁 2025.03.13: We provide a script of GRPO to train a 72B model with only 4 GPUs(4*80G), please check here
🎁 2025.03.05: We support the hybrid mode of GRPO(rollout and actor on the same GPU, rollout sleep when actor training), meanwhile tensor parallel for GRPO, check training script here
🎁 2025.02.21: We test the speed performance of GRPO，and with some tricks to speed up to 300%. WanDB charts can be found here
🎁 2025.02.21: Support distill from LLM API，Please check this example
🎁 2025.02.17: Support SwanLab, just add a few of arguments you can use swanlab to analysis your training results
🎁 2025.02.16: Support LMDeploy in GRPO, use --use_lmdeploy true. Please check this script
🔥 2025.02.12: Support for GRPO(Group Relative Policy Optimization) algorithm for llm and mllm, document can be found in here
🎁 2025.02.10: SWIFT support the fine-tuning of embedding models，please check the training script。
🎁 2025.01.23: SWIFT support the sample command, this is a very important feature for complex CoT and RFT. Meanwhile, we support an Reinforced Fine-tuning script.
🎁 2024.12.04: SWIFT3.0 major version update. Please check the Release Notes and Changes.
🎉 2024.08.12: The SWIFT paper has been published on arXiv, and you can read it here.
🔥 2024.08.05: Support for using evalscope as a backend for evaluating large models and multimodal models.
🔥 2024.07.29: Support for using vllm and lmdeploy to accelerate inference for large models and multimodal models. When performing infer/deploy/eval, you can specify --infer_backend vllm/lmdeploy.
🔥 2024.07.24: Support for human preference alignment training for multimodal large models, including DPO/ORPO/SimPO/CPO/KTO/RM/PPO.
🔥 2024.02.01: Support for Agent training! The training algorithm is derived from this paper.

🛠️ Installation

To install using pip:

pip install ms-swift -U

To install from source:

# pip install git+https://github.com/modelscope/ms-swift.git

git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
pip install -e .

Running Environment:

	Range	Recommended	Notes
python	>=3.9	3.10
cuda		cuda12	No need to install if using CPU, NPU, MPS
torch	>=2.0
transformers	>=4.33	4.49
modelscope	>=1.19
peft	>=0.11,<0.15
trl	>=0.13,<0.17	0.15	RLHF
deepspeed	>=0.14	0.14.5	Training
vllm	>=0.5.1	0.7.3	Inference/Deployment/Evaluation
lmdeploy	>=0.5	0.7.1	Inference/Deployment/Evaluation
evalscope	>=0.11		Evaluation

For more optional dependencies, you can refer to here.

🚀 Quick Start

10 minutes of self-cognition fine-tuning of Qwen2.5-7B-Instruct on a single 3090 GPU:

Command Line Interface

# 22GB
CUDA_VISIBLE_DEVICES=0 \
swift sft \
    --model Qwen/Qwen2.5-7B-Instruct \
    --train_type lora \
    --dataset 'AI-ModelScope/alpaca-gpt4-data-zh#500' \
              'AI-ModelScope/alpaca-gpt4-data-en#500' \
              'swift/self-cognition#500' \
    --torch_dtype bfloat16 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --learning_rate 1e-4 \
    --lora_rank 8 \
    --lora_alpha 32 \
    --target_modules all-linear \
    --gradient_accumulation_steps 16 \
    --eval_steps 50 \
    --save_steps 50 \
    --save_total_limit 5 \
    --logging_steps 5 \
    --max_length 2048 \
    --output_dir output \
    --system 'You are a helpful assistant.' \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 4 \
    --model_author swift \
    --model_name swift-robot

Tips:

If you want to train with a custom dataset, you can refer to this guide to organize your dataset format and specify --dataset <dataset_path>.
The --model_author and --model_name parameters are only effective when the dataset includes swift/self-cognition.
To train with a different model, simply modify --model <model_id/model_path>.
By default, ModelScope is used for downloading models and datasets. If you want to use HuggingFace, simply specify --use_hf true.

After training is complete, use the following command to infer with the trained weights:

Here, --adapters should be replaced with the last checkpoint folder generated during training. Since the adapters folder contains the training parameter file args.json, there is no need to specify --model, --system separately; Swift will automatically read these parameters. To disable this behavior, you can set --load_args false.

# Using an interactive command line for inference.
CUDA_VISIBLE_DEVICES=0 \
swift infer \
    --adapters output/vx-xxx/checkpoint-xxx \
    --stream true \
    --temperature 0 \
    --max_new_tokens 2048

# merge-lora and use vLLM for inference acceleration
CUDA_VISIBLE_DEVICES=0 \
swift infer \
    --adapters output/vx-xxx/checkpoint-xxx \
    --stream true \
    --merge_lora true \
    --infer_backend vllm \
    --max_model_len 8192 \
    --temperature 0 \
    --max_new_tokens 2048

Finally, use the following command to push the model to ModelScope:

CUDA_VISIBLE_DEVICES=0 \
swift export \
    --adapters output/vx-xxx/checkpoint-xxx \
    --push_to_hub true \
    --hub_model_id '<your-model-id>' \
    --hub_token '<your-sdk-token>' \
    --use_hf false

Web-UI

The Web-UI is a zero-threshold training and deployment interface solution based on Gradio interface technology. For more details, you can check here.

SWIFT_UI_LANG=en swift web-ui

Using Python

ms-swift also supports training and inference using Python. Below is pseudocode for training and inference. For more details, you can refer to here.

Training:

# Retrieve the model and template, and add a trainable LoRA module
model, tokenizer = get_model_tokenizer(model_id_or_path, ...)
template = get_template(model.model_meta.template, tokenizer, ...)
model = Swift.prepare_model(model, lora_config)

# Download and load the dataset, and encode the text into tokens
train_dataset, val_dataset = load_dataset(dataset_id_or_path, ...)
train_dataset = EncodePreprocessor(template=template)(train_dataset, num_proc=num_proc)
val_dataset = EncodePreprocessor(template=template)(val_dataset, num_proc=num_proc)

# Train the model
trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    data_collator=template.data_collator,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    template=template,
)
trainer.train()

Inference:

# Perform inference using the native PyTorch engine
engine = PtEngine(model_id_or_path, adapters=[lora_checkpoint])
infer_request = InferRequest(messages=[{'role': 'user', 'content': 'who are you?'}])
request_config = RequestConfig(max_tokens=max_new_tokens, temperature=temperature)

resp_list = engine.infer([infer_request], request_config)
print(f'response: {resp_list[0].choices[0].message.content}')

✨ Usage

Here is a minimal example of training to deployment using ms-swift. For more details, you can check the examples.

If you want to use other models or datasets (including multimodal models and datasets), you only need to modify --model to specify the corresponding model's ID or path, and modify --dataset to specify the corresponding dataset's ID or path.
By default, ModelScope is used for downloading models and datasets. If you want to use HuggingFace, simply specify --use_hf true.

Useful Links
🔥Command Line Parameters
Supported Models and Datasets
Custom Models, 🔥Custom Datasets
LLM Tutorial

Training

Supported Training Methods:

Method	Full-Parameter	LoRA	QLoRA	Deepspeed	Multi-Node	Multi-Modal
Pre-training	✅	✅	✅	✅	✅	✅
Instruction Supervised Fine-tuning	✅	✅	✅	✅	✅	✅
DPO Training	✅	✅	✅	✅	✅	✅
GRPO Training	✅	✅	✅	✅	✅	✅
Reward Model Training	✅	✅	✅	✅	✅	✅
PPO Training	✅	✅	✅	✅	✅	❌
KTO Training	✅	✅	✅	✅	✅	✅
CPO Training	✅	✅	✅	✅	✅	✅
SimPO Training	✅	✅	✅	✅	✅	✅
ORPO Training	✅	✅	✅	✅	✅	✅
Classification Model Training	✅	✅	✅	✅	✅	✅
Embedding Model Training	✅	✅	✅	✅	✅	✅

Pre-training:

# 8*A100
NPROC_PER_NODE=8 \
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
swift pt \
    --model Qwen/Qwen2.5-7B \
    --dataset swift/chinese-c4 \
    --streaming true \
    --train_type full \
    --deepspeed zero2 \
    --output_dir output \
    --max_steps 100000 \
    ...

Fine-tuning:

CUDA_VISIBLE_DEVICES=0 swift sft \
    --model Qwen/Qwen2.5-7B-Instruct \
    --dataset AI-ModelScope/alpaca-gpt4-data-en \
    --train_type lora \
    --output_dir output \
    ...

RLHF:

CUDA_VISIBLE_DEVICES=0 swift rlhf \
    --rlhf_type dpo \
    --model Qwen/Qwen2.5-7B-Instruct \
    --dataset hjh0119/shareAI-Llama3-DPO-zh-en-emoji \
    --train_type lora \
    --output_dir output \
    ...

Inference

CUDA_VISIBLE_DEVICES=0 swift infer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --stream true \
    --infer_backend pt \
    --max_new_tokens 2048

# LoRA
CUDA_VISIBLE_DEVICES=0 swift infer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --adapters swift/test_lora \
    --stream true \
    --infer_backend pt \
    --temperature 0 \
    --max_new_tokens 2048

Interface Inference

CUDA_VISIBLE_DEVICES=0 swift app \
    --model Qwen/Qwen2.5-7B-Instruct \
    --stream true \
    --infer_backend pt \
    --max_new_tokens 2048

Deployment

CUDA_VISIBLE_DEVICES=0 swift deploy \
    --model Qwen/Qwen2.5-7B-Instruct \
    --infer_backend vllm

Sampling

CUDA_VISIBLE_DEVICES=0 swift sample \
    --model LLM-Research/Meta-Llama-3.1-8B-Instruct \
    --sampler_engine pt \
    --num_return_sequences 5 \
    --dataset AI-ModelScope/alpaca-gpt4-data-zh#5

Evaluation

CUDA_VISIBLE_DEVICES=0 swift eval \
    --model Qwen/Qwen2.5-7B-Instruct \
    --infer_backend lmdeploy \
    --eval_backend OpenCompass \
    --eval_dataset ARC_c

Quantization

CUDA_VISIBLE_DEVICES=0 swift export \
    --model Qwen/Qwen2.5-7B-Instruct \
    --quant_bits 4 --quant_method awq \
    --dataset AI-ModelScope/alpaca-gpt4-data-zh \
    --output_dir Qwen2.5-7B-Instruct-AWQ

Push Model

swift export \
    --model <model-path> \
    --push_to_hub true \
    --hub_model_id '<model-id>' \
    --hub_token '<sdk-token>'