书生浦语第六期 L1-G3000-L1 Intern-S1-mini 本地部署实践

书生浦语第六期 L1-G3000-L1 Intern-S1-mini 本地部署实践 LMDeploy 部署1、开发机的选择在创建开发机界面选择镜像为 Cuda12.2-conda并选择 GPU 为 30%A100安装依赖conda create -n lmdeploy python3.10 -y conda activate lmdeploypip install lmdeploy0.9.2.post1 transformers4.55.22、启动lmdeploy serve api_server /root/share/new_models/Intern-S1-mini \ --reasoning-parser intern-s1 \ --tool-call-parser intern-s1 \ --cache-max-entry-count 0.1 \ --max-batch-size 8 \ --backend turbomind \ --session-len 20483、推理infer2.pyfrom openai import OpenAI import json messages [ { role: user, content: who are you }, { role: assistant, content: I am an AI }, { role: user, content: AGI is? }] openai_api_key EMPTY openai_api_base http://0.0.0.0:23333/v1 client OpenAI( api_keyopenai_api_key, base_urlopenai_api_base, ) model_name client.models.list().data[0].id response client.chat.completions.create( modelmodel_name, messagesmessages, temperature0.8, top_p0.8, max_tokens2048, extra_body{ enable_thinking: False, } ) print(json.dumps(response.model_dump(), indent2, ensure_asciiFalse))