亮点Lance是一款轻量级原生统一多模态模型支持在单一框架内实现图像与视频的理解、生成和编辑。30亿参数规模高效运行。仅使用30亿活跃参数Lance在图像生成、图像编辑和视频生成基准测试中均展现出强劲性能。完全从头训练。采用分阶段多任务训练方案在128块A100 GPU的算力预算内实现完全从零开始训练。安装步骤首先克隆仓库gitclone https://github.com/bytedance/Lance.gitcdLance然后设置环境conda create-nLancepython3.11-yconda activate Lance pipinstalltorch2.5.1cu124torchvision0.20.1cu124torchaudio2.5.1cu124 --index-url https://download.pytorch.org/whl/cu124 pipinstall-rrequirements.txt pipinstallflash-attn2.8.3 --no-build-isolation注意如果从源码安装flash-attn失败可以改为安装预构建的 wheel 包pipinstall--no-cache-dir --no-deps --force-reinstall\https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3%2Bcu12torch2.5cxx11abiFALSE-cp311-cp311-linux_x86_64.whl下载模型权重请从 Hugging Face 上的 Lance-3B 下载所有必要的模型检查点并将其放置在downloads/目录中。from huggingface_hubimportsnapshot_download save_dir./downloads/repo_idbytedance-research/Lancecache_dirsave_dir /cachesnapshot_download(cache_dircache_dir,local_dirsave_dir,repo_idrepo_id,local_dir_use_symlinksFalse,resume_downloadTrue,allow_patterns[*.json,*.safetensors,*.bin,*.py,*.md,*.txt,*.pth,],) 使用指南推理我们为所有生成/编辑/理解任务提供了统一的命令行接口选项1配置并运行统一脚本bashinference_lance.sh运行前请先在inference_lance.sh文件顶部配置推理参数。支持任务类型:文生图(t2i)、文生视频(t2v)、图像编辑(image_edit)、视频编辑(video_edit)、图像理解(x2t_image)和视频理解(x2t_video)。您可通过修改inference_lance.py中的TASK_DEFAULT_CONFIGS来自定义各任务的默认数据样本。注意:对于所有任务我们建议按照示例中的prompt格式编写输入提示词这样通常能获得更优的生成效果。方案二配置并运行统一脚本我们为不同生成、编辑和理解任务提供了专属的一键式命令。文本到视频生成bashinference_lance.sh\--TASK_NAMEt2v\--MODEL_PATHdownloads/Lance_3B_Video\--RESOLUTIONvideo_480p\--NUM_FRAMES121\--VIDEO_HEIGHT480\--VIDEO_WIDTH848\--SAVE_PATH_GENresults/t2v_121f文本到图像生成bashinference_lance.sh\--TASK_NAMEt2i\--MODEL_PATHdownloads/Lance_3B\--RESOLUTIONimage_768res\--VIDEO_HEIGHT768\--VIDEO_WIDTH768\--SAVE_PATH_GENresults/t2i视频编辑bashinference_lance.sh\--TASK_NAMEvideo_edit\--MODEL_PATHdownloads/Lance_3B_Video\--RESOLUTIONvideo_480p\--SAVE_PATH_GENresults/video_edit图片编辑bashinference_lance.sh\--TASK_NAMEimage_edit\--MODEL_PATHdownloads/Lance_3B\--RESOLUTIONimage_768res\--SAVE_PATH_GENresults/image_edit视频理解bashinference_lance.sh\--TASK_NAMEx2t_video\--MODEL_PATHdownloads/Lance_3B_Video\--RESOLUTIONvideo_480p\--NUM_FRAMES50\--SAVE_PATH_GENresults/x2t_video图像理解bashinference_lance.sh\--TASK_NAMEx2t_image\--MODEL_PATHdownloads/Lance_3B\--RESOLUTIONimage_768res\--SAVE_PATH_GENresults/x2t_image可用任务任务名称描述示例JSONt2v文本生成视频config/examples/t2v_example.jsont2i文本生成图像config/examples/t2i_example.jsonimage_edit图像编辑config/examples/image_edit_example.jsonvideo_edit视频编辑config/examples/video_edit_example.jsonx2t_image图像理解config/examples/x2t_image_example.jsonx2t_video视频理解config/examples/x2t_video_example.json关于理解类示例config/examples/x2t_image_example.json: 包含视觉问答和基于图像的推理等图像理解示例。config/examples/x2t_video_example.json: 包含视频问答和视频字幕生成等视频理解示例。参数配置您可以在inference_lance.sh脚本顶部配置以下超参数参数名默认值说明MODEL_PATHdownloads/lance_3b已下载的Lance模型权重路径。NUM_GPUS1用于推理的GPU数量。VALIDATION_NUM_TIMESTEPS30去噪步数例如30或50步。VALIDATION_TIMESTEP_SHIFT3.5流匹配调度的时间步偏移参数。CFG_TEXT_SCALE4.0文本条件分类器自由引导(CFG)的缩放系数。VALIDATION_DATA_SEED42生成结果可复现性的随机种子。NUM_FRAMES50视频生成的帧数最大值121。图像任务不使用此参数VIDEO_HEIGHT/VIDEO_WIDTH768空间分辨率。编辑任务不使用由输入图像/视频决定RESOLUTIONvideo_480p基础分辨率预设image_768res或video_480p。Gradio界面python lance_gradio_t2v_v2t.py--gpus0--server-port7860基准测试DPG-Bench 评估模型参数量全局实体属性关系其他综合仅生成模型SDXL3.5B83.2782.4380.9186.7680.4174.65DALL-E 3-90.9789.6188.3990.5889.8383.50SD3-Medium2B87.9091.0188.8380.7088.6884.08FLUX.1-dev12B74.3590.0088.9690.8788.3383.84Qwen-Image20B91.3291.5692.0294.3192.7388.32统一模型Janus-Pro-7B7B86.9088.9089.4089.3289.4884.19OmniGen24B88.8188.8390.1889.3790.2783.57Show-o27B89.0091.7889.9691.8191.6486.14BAGEL†7B88.9490.3791.2990.8288.6785.07InternVL-U1.7B90.3990.7890.6890.2988.7785.18TUNA7B90.4291.6890.9491.8790.7386.76TUNA-27B89.5091.4092.0791.9188.8186.54Lance (Ours)3B83.8991.0789.3693.3880.8084.67†表示在生成前使用LLM重写器进行提示重写的方法。GenEval 评估模型参数量单目标双目标计数颜色位置属性综合仅生成模型SDXL3.5B0.980.740.390.850.150.230.55DALL-E 3-0.960.870.470.830.430.450.67SD3-Medium2B0.990.940.720.890.330.600.74FLUX.1-dev12B0.980.930.750.930.680.650.82Qwen-Image20B0.990.920.890.880.760.770.87统一模型Janus-Pro-7B7B0.990.890.590.900.790.660.80OmniGen24B1.000.950.640.880.550.760.80Show-o27B1.000.870.580.920.520.620.76BAGEL†7B0.980.950.840.950.780.770.88Mogao7B1.000.970.830.930.840.800.89InternVL-U1.7B0.990.940.740.910.770.740.85TUNA7B1.000.970.810.910.880.830.90TUNA-27B0.990.960.800.910.840.760.87Lance (Ours)3B1.000.940.840.970.870.810.90†表示在生成前使用LLM重写器进行提示重写的方法。GEdit-Bench 评估Models# Params.BCCAMMMCPBSTSASRSRpTMTTAvg/G_OGeneration-only ModelsGemini 2.0------------6.32GPT Image 1-6.966.857.105.416.747.447.518.738.558.458.697.49Qwen-Image-Edit20B8.238.307.338.057.496.748.578.098.298.488.508.01Unified ModelsLumina-DiMOO8B3.434.273.082.774.745.194.443.804.382.684.203.91Ovis-U11.2B7.496.886.214.795.986.467.497.257.274.486.316.42BAGEL7B7.326.916.384.754.576.157.907.167.027.326.226.52InternVL-U1.7B7.087.056.387.026.036.277.136.556.336.596.856.66InternVL-U (w/ CoT)1.7B7.057.876.506.995.776.107.337.167.127.366.466.88Lance (Ours)3B7.737.747.287.837.507.037.647.857.714.467.577.30VBench 评估(视频生成)TypeModel# Params.Total Score ↑Gen. OnlyModelScope1.7B75.75LaVie3B77.08Show-16B78.93AnimateDiff-V2-80.27VideoCrafter-2.0-80.44CogVideoX5B81.61Kling-81.85Open-Sora-2.0-81.71Gen-3-82.32Step-Video-T2V30B81.83Hunyuan Video-83.43Wan2.1-T2V14B83.69UnifiedHaproOmni7B78.10Emu38B80.96VILA-U7B74.01Show-o22B81.34TUNA1.5B84.06Lance (Ours)3B85.11运行基准测试在benchmarks/目录下提供了可直接运行的基准测试脚本基准测试模态脚本GenEVAL (图像生成)图像benchmarks/image_gen/GenEVAL/sample_GenEVAL.shDPG (图像生成)图像benchmarks/image_gen/DPG/sample_DPG.shGEdit (图像编辑)图像benchmarks/image_gen/GEdit/sample_GEdit.shVBench (视频生成)视频benchmarks/video_gen/Vbench/sample_vbench.sh 许可证版权所有 © 2025 字节跳动有限公司及其关联公司。 致谢我们要感谢 BAGEL、Qwen2.5-VL-3B-Instruct 和 Wan2.2 的贡献者们感谢他们的开放研究和贡献。 引用如果您发现Lance对您的项目或研究有所帮助欢迎给本仓库点个 并使用以下 BibTeX 引用我们的工作misc{lance2026, title {Lance: Unified Multimodal Modeling by Multi-Task Synergy}, author {Fengyi Fu and Mengqi Huang and Shaojin Wu and Yunsheng Jiang and Yufei Huo and Jianzhu Guo and Hao Li and Yinghang Song and Fei Ding and Qian He and Zheren Fu and Zhendong Mao and Yongdong Zhang}, year {2026}, note {Manuscript} }
【字节拥抱开源】Lance: 多任务协同的统一多模态建模
亮点Lance是一款轻量级原生统一多模态模型支持在单一框架内实现图像与视频的理解、生成和编辑。30亿参数规模高效运行。仅使用30亿活跃参数Lance在图像生成、图像编辑和视频生成基准测试中均展现出强劲性能。完全从头训练。采用分阶段多任务训练方案在128块A100 GPU的算力预算内实现完全从零开始训练。安装步骤首先克隆仓库gitclone https://github.com/bytedance/Lance.gitcdLance然后设置环境conda create-nLancepython3.11-yconda activate Lance pipinstalltorch2.5.1cu124torchvision0.20.1cu124torchaudio2.5.1cu124 --index-url https://download.pytorch.org/whl/cu124 pipinstall-rrequirements.txt pipinstallflash-attn2.8.3 --no-build-isolation注意如果从源码安装flash-attn失败可以改为安装预构建的 wheel 包pipinstall--no-cache-dir --no-deps --force-reinstall\https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3%2Bcu12torch2.5cxx11abiFALSE-cp311-cp311-linux_x86_64.whl下载模型权重请从 Hugging Face 上的 Lance-3B 下载所有必要的模型检查点并将其放置在downloads/目录中。from huggingface_hubimportsnapshot_download save_dir./downloads/repo_idbytedance-research/Lancecache_dirsave_dir /cachesnapshot_download(cache_dircache_dir,local_dirsave_dir,repo_idrepo_id,local_dir_use_symlinksFalse,resume_downloadTrue,allow_patterns[*.json,*.safetensors,*.bin,*.py,*.md,*.txt,*.pth,],) 使用指南推理我们为所有生成/编辑/理解任务提供了统一的命令行接口选项1配置并运行统一脚本bashinference_lance.sh运行前请先在inference_lance.sh文件顶部配置推理参数。支持任务类型:文生图(t2i)、文生视频(t2v)、图像编辑(image_edit)、视频编辑(video_edit)、图像理解(x2t_image)和视频理解(x2t_video)。您可通过修改inference_lance.py中的TASK_DEFAULT_CONFIGS来自定义各任务的默认数据样本。注意:对于所有任务我们建议按照示例中的prompt格式编写输入提示词这样通常能获得更优的生成效果。方案二配置并运行统一脚本我们为不同生成、编辑和理解任务提供了专属的一键式命令。文本到视频生成bashinference_lance.sh\--TASK_NAMEt2v\--MODEL_PATHdownloads/Lance_3B_Video\--RESOLUTIONvideo_480p\--NUM_FRAMES121\--VIDEO_HEIGHT480\--VIDEO_WIDTH848\--SAVE_PATH_GENresults/t2v_121f文本到图像生成bashinference_lance.sh\--TASK_NAMEt2i\--MODEL_PATHdownloads/Lance_3B\--RESOLUTIONimage_768res\--VIDEO_HEIGHT768\--VIDEO_WIDTH768\--SAVE_PATH_GENresults/t2i视频编辑bashinference_lance.sh\--TASK_NAMEvideo_edit\--MODEL_PATHdownloads/Lance_3B_Video\--RESOLUTIONvideo_480p\--SAVE_PATH_GENresults/video_edit图片编辑bashinference_lance.sh\--TASK_NAMEimage_edit\--MODEL_PATHdownloads/Lance_3B\--RESOLUTIONimage_768res\--SAVE_PATH_GENresults/image_edit视频理解bashinference_lance.sh\--TASK_NAMEx2t_video\--MODEL_PATHdownloads/Lance_3B_Video\--RESOLUTIONvideo_480p\--NUM_FRAMES50\--SAVE_PATH_GENresults/x2t_video图像理解bashinference_lance.sh\--TASK_NAMEx2t_image\--MODEL_PATHdownloads/Lance_3B\--RESOLUTIONimage_768res\--SAVE_PATH_GENresults/x2t_image可用任务任务名称描述示例JSONt2v文本生成视频config/examples/t2v_example.jsont2i文本生成图像config/examples/t2i_example.jsonimage_edit图像编辑config/examples/image_edit_example.jsonvideo_edit视频编辑config/examples/video_edit_example.jsonx2t_image图像理解config/examples/x2t_image_example.jsonx2t_video视频理解config/examples/x2t_video_example.json关于理解类示例config/examples/x2t_image_example.json: 包含视觉问答和基于图像的推理等图像理解示例。config/examples/x2t_video_example.json: 包含视频问答和视频字幕生成等视频理解示例。参数配置您可以在inference_lance.sh脚本顶部配置以下超参数参数名默认值说明MODEL_PATHdownloads/lance_3b已下载的Lance模型权重路径。NUM_GPUS1用于推理的GPU数量。VALIDATION_NUM_TIMESTEPS30去噪步数例如30或50步。VALIDATION_TIMESTEP_SHIFT3.5流匹配调度的时间步偏移参数。CFG_TEXT_SCALE4.0文本条件分类器自由引导(CFG)的缩放系数。VALIDATION_DATA_SEED42生成结果可复现性的随机种子。NUM_FRAMES50视频生成的帧数最大值121。图像任务不使用此参数VIDEO_HEIGHT/VIDEO_WIDTH768空间分辨率。编辑任务不使用由输入图像/视频决定RESOLUTIONvideo_480p基础分辨率预设image_768res或video_480p。Gradio界面python lance_gradio_t2v_v2t.py--gpus0--server-port7860基准测试DPG-Bench 评估模型参数量全局实体属性关系其他综合仅生成模型SDXL3.5B83.2782.4380.9186.7680.4174.65DALL-E 3-90.9789.6188.3990.5889.8383.50SD3-Medium2B87.9091.0188.8380.7088.6884.08FLUX.1-dev12B74.3590.0088.9690.8788.3383.84Qwen-Image20B91.3291.5692.0294.3192.7388.32统一模型Janus-Pro-7B7B86.9088.9089.4089.3289.4884.19OmniGen24B88.8188.8390.1889.3790.2783.57Show-o27B89.0091.7889.9691.8191.6486.14BAGEL†7B88.9490.3791.2990.8288.6785.07InternVL-U1.7B90.3990.7890.6890.2988.7785.18TUNA7B90.4291.6890.9491.8790.7386.76TUNA-27B89.5091.4092.0791.9188.8186.54Lance (Ours)3B83.8991.0789.3693.3880.8084.67†表示在生成前使用LLM重写器进行提示重写的方法。GenEval 评估模型参数量单目标双目标计数颜色位置属性综合仅生成模型SDXL3.5B0.980.740.390.850.150.230.55DALL-E 3-0.960.870.470.830.430.450.67SD3-Medium2B0.990.940.720.890.330.600.74FLUX.1-dev12B0.980.930.750.930.680.650.82Qwen-Image20B0.990.920.890.880.760.770.87统一模型Janus-Pro-7B7B0.990.890.590.900.790.660.80OmniGen24B1.000.950.640.880.550.760.80Show-o27B1.000.870.580.920.520.620.76BAGEL†7B0.980.950.840.950.780.770.88Mogao7B1.000.970.830.930.840.800.89InternVL-U1.7B0.990.940.740.910.770.740.85TUNA7B1.000.970.810.910.880.830.90TUNA-27B0.990.960.800.910.840.760.87Lance (Ours)3B1.000.940.840.970.870.810.90†表示在生成前使用LLM重写器进行提示重写的方法。GEdit-Bench 评估Models# Params.BCCAMMMCPBSTSASRSRpTMTTAvg/G_OGeneration-only ModelsGemini 2.0------------6.32GPT Image 1-6.966.857.105.416.747.447.518.738.558.458.697.49Qwen-Image-Edit20B8.238.307.338.057.496.748.578.098.298.488.508.01Unified ModelsLumina-DiMOO8B3.434.273.082.774.745.194.443.804.382.684.203.91Ovis-U11.2B7.496.886.214.795.986.467.497.257.274.486.316.42BAGEL7B7.326.916.384.754.576.157.907.167.027.326.226.52InternVL-U1.7B7.087.056.387.026.036.277.136.556.336.596.856.66InternVL-U (w/ CoT)1.7B7.057.876.506.995.776.107.337.167.127.366.466.88Lance (Ours)3B7.737.747.287.837.507.037.647.857.714.467.577.30VBench 评估(视频生成)TypeModel# Params.Total Score ↑Gen. OnlyModelScope1.7B75.75LaVie3B77.08Show-16B78.93AnimateDiff-V2-80.27VideoCrafter-2.0-80.44CogVideoX5B81.61Kling-81.85Open-Sora-2.0-81.71Gen-3-82.32Step-Video-T2V30B81.83Hunyuan Video-83.43Wan2.1-T2V14B83.69UnifiedHaproOmni7B78.10Emu38B80.96VILA-U7B74.01Show-o22B81.34TUNA1.5B84.06Lance (Ours)3B85.11运行基准测试在benchmarks/目录下提供了可直接运行的基准测试脚本基准测试模态脚本GenEVAL (图像生成)图像benchmarks/image_gen/GenEVAL/sample_GenEVAL.shDPG (图像生成)图像benchmarks/image_gen/DPG/sample_DPG.shGEdit (图像编辑)图像benchmarks/image_gen/GEdit/sample_GEdit.shVBench (视频生成)视频benchmarks/video_gen/Vbench/sample_vbench.sh 许可证版权所有 © 2025 字节跳动有限公司及其关联公司。 致谢我们要感谢 BAGEL、Qwen2.5-VL-3B-Instruct 和 Wan2.2 的贡献者们感谢他们的开放研究和贡献。 引用如果您发现Lance对您的项目或研究有所帮助欢迎给本仓库点个 并使用以下 BibTeX 引用我们的工作misc{lance2026, title {Lance: Unified Multimodal Modeling by Multi-Task Synergy}, author {Fengyi Fu and Mengqi Huang and Shaojin Wu and Yunsheng Jiang and Yufei Huo and Jianzhu Guo and Hao Li and Yinghang Song and Fei Ding and Qian He and Zheren Fu and Zhendong Mao and Yongdong Zhang}, year {2026}, note {Manuscript} }