Embedding 模型和Model 批量推理和多卡部署

11 阅读 0 评论 0 点赞

批量推理

多卡部署

使用huggingface

【AI大模型】Transformers大模型库（七）：单机多卡推理之device_map_transformers多卡推理-CSDN博客

首先用

CUDA_VISIBLE_DEVICES=1,2,3 python
或者os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2" 限制gpu

安装transformers 和 acce库
pip install transformers -i https://mirrors.cloud.tencent.com/pypi/simple
pip install accelerate -i https://mirrors.cloud.tencent.com/pypi/simple

然后
model =AutoModelForCausalLM.from_pretrained(
model_dir,device_map="auto",trust_remote_code=True,torch_dtype=torch.float16)

也可以想问中一样对于模型的层进行分割然后部署

Huggingface Transformers+Accelerate多卡推理实践（指定GPU和最大显存） - 知乎

使用Pytorch自带的DDP和DP

不要用DP效率低

实践

使用transformers的auto分配显存

速率尽然要13个小时这2000条数据但是之前单卡只十几万条才44个小时

单卡4小时左右

首先是有这个提示

We've detected an older driver with an RTX 4000 series GPU. These drivers have issues with P2P. This can affect the multi-gpu inference when using accelerate device_map.Please make sure to update your driver to the latest version which resolves this.

然后我用的是GPU0和GPU4是不在一张PCIE板上

(TinyRAG) jsh@user-ESC8000A-E11:/data/jsh/code/TinyRAG$ nvidia-smi topo -m
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NODE    NODE    NODE    SYS     SYS     SYS     SYS     0-63,128-191    0               N/A
GPU1    NODE     X      NODE    NODE    SYS     SYS     SYS     SYS     0-63,128-191    0               N/A
GPU2    NODE    NODE     X      NODE    SYS     SYS     SYS     SYS     0-63,128-191    0               N/A
GPU3    NODE    NODE    NODE     X      SYS     SYS     SYS     SYS     0-63,128-191    0               N/A
GPU4    SYS     SYS     SYS     SYS      X      NODE    NODE    NODE    64-127,192-255  1               N/A
GPU5    SYS     SYS     SYS     SYS     NODE     X      NODE    NODE    64-127,192-255  1               N/A
GPU6    SYS     SYS     SYS     SYS     NODE    NODE     X      NODE    64-127,192-255  1               N/A
GPU7    SYS     SYS     SYS     SYS     NODE    NODE    NODE     X      64-127,192-255  1               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

尝试用GPU4 和 GPU7在同一个NODE上

本站资源均来自互联网，仅供研究学习，禁止违法使用和商用，产生法律纠纷本站概不负责！如果侵犯了您的权益请与我们联系！

转载请注明出处：免费源码网-免费的源码资源网站 » Embedding 模型和Model 批量推理和多卡部署

点赞(0) 打赏

本文分类：文章资讯
本文标签：Embedding 模型和Model 批量推理和多卡部署
浏览次数：11 次浏览
本文链接：https://freeymw.com/article/33863.html

上一篇 > GGUF和GGML格式介绍与比较
下一篇 > TPair＜TKey, TValue＞键值对

评论列表共有 0 条评论

暂无评论

Embedding 模型和Model 批量推理和多卡部署

批量推理

多卡部署

使用huggingface

使用Pytorch自带的DDP和DP

实践

评论列表 共有 0 条评论

发表评论 取消回复

评论列表共有 0 条评论

发表评论取消回复