InternVL 多模态模型部署微调实践

16 阅读 0 评论 0 点赞

0 什么是MLLM

1 开发机创建与使用

0 什么是MLLM

多模态大语言模型 ( Multimodal Large Language Model) 是指能够处理和融合多种不同类型数据(如文本、图像、音频、视频等) 的大型人工智能模型。这些模型通常基于深度学习技术，能够理解和生成多种模态的数据，从而在各种复杂的应用场景中表现出强大的能力。

常见的MLLM

多模态研究的重点是不同模态特征空间的对齐

1 开发机创建与使用

创建开发机选择，镜像：Cuda12.2-conda，资源配置：50% A100 * 1

通过SSH密钥连接本地的vscode

2 LMDeploy部署

2.1 环境配置

conda create -n lmdeploy python=3.10 -y
conda activate lmdeploy
pip install lmdeploy gradio==4.44.1 timm==1.0.9

2.2 LMDeploy基本用法介绍

我们主要通过pipeline.chat 接口来构造多轮对话管线，核心代码为：

## 1.导入相关依赖包
from lmdeploy import pipeline, TurbomindEngineConfig, GenerationConfig
from lmdeploy.vl import load_image

## 2.使用你的模型初始化推理管线
model_path = "your_model_path"
pipe = pipeline(model_path,
                backend_config=TurbomindEngineConfig(session_len=8192))
                
## 3.读取图片（此处使用PIL读取也行）
image = load_image('your_image_path')

## 4.配置推理参数
gen_config = GenerationConfig(top_p=0.8, temperature=0.8)
## 5.利用 pipeline.chat 接口 进行对话，需传入生成参数
sess = pipe.chat(('describe this image', image), gen_config=gen_config)
print(sess.response.text)
## 6.之后的对话轮次需要传入之前的session，以告知模型历史上下文
sess = pipe.chat('What is the woman doing?', session=sess, gen_config=gen_config)
print(sess.response.text)

lmdeploy推理的核心代码如上注释所述。

2.3 网页应用部署体验

我们可以使用UI界面先体验与InternVL对话：

拉取本教程的github仓库https://github.com/Control-derek/InternVL2-Tutorial.git：

git clone https://github.com/Control-derek/InternVL2-Tutorial.git
cd InternVL2-Tutorial

demo.py文件中，MODEL_PATH处传入InternVL2-2B的路径，如果使用的是InternStudio的开发机则无需修改，否则改为模型路径。

启动demo:

conda activate lmdeploy
python demo.py

上述命令请在vscode下运行，因为vscode自带端口转发，可以把部署在服务器上的网页服务转发到本地。

启动后，CTRL+鼠标左键点进这个链接或者复制链接到浏览器

会看到如下界面：

点击Start Chat即可开始聊天，下方食物快捷栏可以快速输入图片，输入示例可以快速输入文字。输入完毕后，按enter键即可发送。

3 XTuner微调实践

3.1 环境配置

conda create --name xtuner-env python=3.10 -y
conda activate xtuner-env

安装与deepspeed集成的xtuner和相关包：

pip install -U 'xtuner[deepspeed]' timm==1.0.9
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
pip install transformers==4.39.0

在InternStudio开发机的/root/xtuner路径下，即为开机自带的xtuner，先进入工作目录并激活训练环境：

cd root/xtuner
conda activate xtuner-env  # 或者是你自命名的训练环境

原始internvl的微调配置文件在路径./xtuner/configs/internvl/v2下，假设上面克隆的仓库在/root/InternVL2-Tutorial,复制配置文件到目标目录下：

cp /root/InternVL2-Tutorial/xtuner_config/internvl_v2_internlm2_2b_lora_finetune_food.py /root/xtuner/xtuner/configs/internvl/v2/internvl_v2_internlm2_2b_lora_finetune_food.py

如果没有再拉取

git clone https://github.com/InternLM/xtuner.git

3.2.配置文件参数解读

在第一部分的设置中，有如下参数：

path: 需要微调的模型路径，在InternStudio环境下，无需修改。
data_root: 数据集所在路径。
data_path: 训练数据文件路径。
image_folder: 训练图像根路径。
prompt_temple: 配置模型训练时使用的聊天模板、系统提示等。使用与模型对应的即可，此处无需修改。
max_length: 训练数据每一条最大token数。
batch_size: 训练批次大小，可以根据显存大小调整。
accumulative_counts: 梯度累积的步数，用于模拟较大的batch_size，在显存有限的情况下，提高训练稳定性。
dataloader_num_workers: 指定数据集加载时子进程的个数。
max_epochs:训练轮次。
optim_type:优化器类型。
lr: 学习率
betas: Adam优化器的beta1, beta2
weight_decay: 权重衰减，防止训练过拟合用
max_norm: 梯度裁剪时的梯度最大值
warmup_ratio: 预热比例，前多少的数据训练时，学习率将会逐步增加。
save_steps: 多少步存一次checkpoint
save_total_limit: 最多保存几个checkpoint，设为-1即无限制

LoRA相关参数：

r: 低秩矩阵的秩，决定了低秩矩阵的维度。
lora_alpha 缩放因子，用于调整低秩矩阵的权重。
lora_dropout dropout 概率，以防止过拟合。

3.3 开始微调

运行命令，开始微调：

xtuner train internvl_v2_internlm2_2b_lora_finetune_food --deepspeed deepspeed_zero2

xtuner train /root/xtuner/xtuner/configs/internvl/v2/internvl_v2_internlm2_2b_lora_finetune_food.py --deepspeed deepspeed_zero2

看到有日志输出，即为启动成功：

微调后，把模型checkpoint的格式转化为便于测试的格式：

python xtuner/configs/internvl/v1_5/convert_to_official.py xtuner/configs/internvl/v2/internvl_v2_internlm2_2b_lora_finetune_food.py ./work_dirs/internvl_v2_internlm2_2b_lora_finetune_food/iter_640.pth ./work_dirs/internvl_v2_internlm2_2b_lora_finetune_food/lr35_ep10/

如果修改了超参数，iter_xxx.pth需要修改为对应的想要转的checkpoint。 ./work_dirs/internvl_v2_internlm2_2b_lora_finetune_food/lr35_ep10/为转换后的模型checkpoint保存的路径。

4.体验模型美食鉴赏能力

修改MODEL_PATH为刚刚转换后保存的模型路径：

就像在第2节中做的那样，启动网页应用：

cd /root/InternVL2-Tutorial
conda activate lmdeploy
python demo.py

对比展示：

1.微调之前输出是饺子，不正确；微调之后输出是肠粉，正确

2.微调前输出是红烧肉，错误；微调后输出是宫保鸡丁，正确

本站资源均来自互联网，仅供研究学习，禁止违法使用和商用，产生法律纠纷本站概不负责！如果侵犯了您的权益请与我们联系！

转载请注明出处：免费源码网-免费的源码资源网站 » InternVL 多模态模型部署微调实践

点赞(0) 打赏

本文分类：文章资讯
本文标签：InternVL 多模态模型部署微调实践
浏览次数：16 次浏览
本文链接：https://freeymw.com/article/37183.html

上一篇 > 后端一次性返回数据，前端分页
下一篇 > AI绘画经验（stable-diffusion）

评论列表共有 0 条评论

暂无评论

InternVL 多模态模型部署微调实践

0 什么是MLLM

1 开发机创建与使用

2 LMDeploy部署

2.1 环境配置

2.2 LMDeploy基本用法介绍

2.3 网页应用部署体验

3 XTuner微调实践

3.1 环境配置

3.2.配置文件参数解读

3.3 开始微调

4.体验模型美食鉴赏能力

评论列表 共有 0 条评论

发表评论 取消回复

评论列表共有 0 条评论

发表评论取消回复