ChatGLM3-6B部署与Web集成:Gradio/Streamlit/FastAPI三种方案

ChatGLM3-6B部署与Web集成:Gradio/Streamlit/FastAPI三种方案 ChatGLM3-6B部署与Web集成Gradio/Streamlit/FastAPI三种方案1. 项目概述ChatGLM3-6B是智谱AI团队开源的大语言模型具备32k超长上下文记忆能力。本文将详细介绍如何在本地服务器部署该模型并通过三种主流Web框架Gradio、Streamlit、FastAPI实现交互式应用。2. 环境准备与模型部署2.1 基础环境配置# 创建Python虚拟环境 conda create -n chatglm3-6b python3.8 conda activate chatglm3-6b # 安装基础依赖 pip install torch2.0.0 transformers4.37.2 sentencepiece0.1.992.2 模型下载与加载使用ModelScope下载模型from modelscope import snapshot_download model_dir snapshot_download(ZhipuAI/chatglm3-6b, cache_dir/path/to/model)基础调用示例from transformers import AutoModel, AutoTokenizer tokenizer AutoTokenizer.from_pretrained(/path/to/model, trust_remote_codeTrue) model AutoModel.from_pretrained(/path/to/model, trust_remote_codeTrue).half().cuda() model model.eval() response, history model.chat(tokenizer, 你好) print(response)3. Web集成方案对比3.1 Gradio方案Gradio提供快速构建AI演示界面的能力from transformers import AutoModel, AutoTokenizer import gradio as gr tokenizer AutoTokenizer.from_pretrained(/path/to/model, trust_remote_codeTrue) model AutoModel.from_pretrained(/path/to/model, trust_remote_codeTrue).half().cuda() model model.eval() def predict(query, historyNone): if history is None: history [] for response, history in model.stream_chat(tokenizer, query, historyhistory): updates [] for i, (q, r) in enumerate(history): updates.append(gr.update(visibleTrue, valuef用户{q})) updates.append(gr.update(visibleTrue, valuefAI{r})) yield [history] updates demo gr.ChatInterface(predict) demo.queue().launch(server_port6006)特点内置聊天界面组件自动处理对话历史支持流式输出快速原型开发3.2 Streamlit方案Streamlit适合构建数据科学应用from transformers import AutoModel, AutoTokenizer import streamlit as st from streamlit_chat import message st.cache_resource def load_model(): tokenizer AutoTokenizer.from_pretrained(/path/to/model, trust_remote_codeTrue) model AutoModel.from_pretrained(/path/to/model, trust_remote_codeTrue).half().cuda() return tokenizer, model tokenizer, model load_model() if history not in st.session_state: st.session_state[history] [] user_input st.text_input(请输入您的问题) if user_input: with st.spinner(AI正在思考...): for response, history in model.stream_chat(tokenizer, user_input, st.session_state[history]): st.session_state[history] history for i, (q, r) in enumerate(st.session_state[history]): message(q, is_userTrue, keyf{i}_user) message(r, keyf{i})优势内置状态管理更美观的UI组件适合构建数据分析仪表盘缓存机制提升性能3.3 FastAPIWebSocket方案FastAPI适合构建生产级API服务from fastapi import FastAPI, WebSocket from transformers import AutoModel, AutoTokenizer import uvicorn app FastAPI() tokenizer AutoTokenizer.from_pretrained(/path/to/model, trust_remote_codeTrue) model AutoModel.from_pretrained(/path/to/model, trust_remote_codeTrue).half().cuda() app.websocket(/ws) async def websocket_endpoint(websocket: WebSocket): await websocket.accept() try: while True: data await websocket.receive_json() query data[query] history data.get(history, []) for response, history in model.stream_chat(tokenizer, query, historyhistory): await websocket.send_json({ response: response, history: history, status: 202 }) await websocket.send_json({status: 200}) except Exception as e: print(fError: {e}) if __name__ __main__: uvicorn.run(app, host0.0.0.0, port8000)前端HTML示例!DOCTYPE html html head titleChatGLM3 WebSocket Chat/title /head body input idinput placeholder输入消息... button onclicksendMessage()发送/button div idmessages/div script const ws new WebSocket(ws://localhost:8000/ws); ws.onmessage (event) { const data JSON.parse(event.data); document.getElementById(messages).innerHTML divAI: ${data.response}/div; }; function sendMessage() { const input document.getElementById(input); ws.send(JSON.stringify({query: input.value})); document.getElementById(messages).innerHTML div用户: ${input.value}/div; input.value ; } /script /body /html核心优势真正的双向实时通信适合集成到现有Web应用高性能异步处理标准化API接口4. 方案对比与选型建议特性GradioStreamlitFastAPI开发速度⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐UI灵活性⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐性能⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐适合场景演示/原型数据分析应用生产级API服务学习曲线最简单中等较陡峭扩展性有限中等无限选型建议快速演示选择Gradio数据科学应用选择Streamlit企业级集成选择FastAPI需要WebSocket实时通信必须使用FastAPI5. 高级功能实现5.1 清言智能体API集成import requests def get_access_token(api_key, api_secret): url https://chatglm.cn/chatglm/assistant-api/v1/get_token response requests.post(url, json{api_key: api_key, api_secret: api_secret}) return response.json()[result][access_token] def send_message(assistant_id, access_token, prompt): url https://chatglm.cn/chatglm/assistant-api/v1/stream headers {Authorization: fBearer {access_token}} data {assistant_id: assistant_id, prompt: prompt} with requests.post(url, jsondata, headersheaders, streamTrue) as response: for line in response.iter_lines(): if line: print(line.decode(utf-8))5.2 OpenAI API兼容接口通过Docker部署兼容OpenAI的API服务docker run -p 8000:8000 -e API_KEYyour_key vinlic/zhipuai-agent-to-openai:latest客户端调用示例from openai import OpenAI client OpenAI(base_urlhttp://localhost:8000/v1, api_keyyour_key) response client.chat.completions.create( modelyour_assistant_id, messages[{role: user, content: 你好}] ) print(response.choices[0].message.content)6. 总结本文详细介绍了ChatGLM3-6B模型的三种Web集成方案每种方案各有优势Gradio最适合快速构建演示界面5分钟即可上线Streamlit数据科学家首选内置丰富可视化组件FastAPI企业级解决方案支持高并发和实时通信实际项目中可以根据团队技术栈和项目需求灵活选择。对于大多数应用场景推荐从Gradio开始快速验证想法再逐步迁移到更强大的框架。获取更多AI镜像想探索更多AI镜像和应用场景访问 CSDN星图镜像广场提供丰富的预置镜像覆盖大模型推理、图像生成、视频生成、模型微调等多个领域支持一键部署。