JavisDiT部署推理中遇到的若干问题及解决办法

JavisDiT部署推理中遇到的若干问题及解决办法 一、项目背景官方地址https://github.com/JavisVerse/JavisDiT本次目标是在单机单卡 A100 环境下部署 JavisDiT完成 text-to-video audio generation inference保证推理可运行涉及组件python3.10PyTorch 2.5.1 CUDA 12.1Flash-Attention核心加速模块Wan2.1 T2V / AudioLDM2 / VAE / LoRA自定义 DiT attention pipeline二、遇到的困难及解决办法2.1 显存问题首先是在Nvidia A10 24GB显存上部署的最后发现会OOM因此24G是不够用的后续想办法成功在A100 40G单卡上部署推理成功2.2 torch和torchvision版本不匹配问题在作者在JavisDiT的README2026/7/2中的requirements文件中指出需要torch2.5.1和torchvision0.21.0但实际上这两个版本根本不匹配实际可用版本如下python -c import torch; print(torch:, torch.__version__) python -c import torchvision; print(torchvision:, torchvision.__version__) python -c import torchaudio; print(torchaudio:, torchaudio.__version__)其输出torch: 2.5.1cu121 torchvision: 0.20.1cu121 torchaudio: 2.5.1cu1212.3 flash-attn库下载编译问题如果按照官方README中的pip install flash-attn --no-build-isolation方式安装最后会卡在本地编译而本地编译大概率失败因此需要想办法下载已经编译好的wheel文件。此处根据我的cuda版本和torch版本选择到flash-attn的realease中下载了flash_attn-2.8.0.post2cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl该文件其安装命令为pip uninstall -y flash-attnpip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.0.post2/flash_attn-2.8.0.post2cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl2.4 replace文件失败在作者给出的Installation中的最后一步cp assets/src/funasr_utils_load_utils.py ${PYTHON_SITE_PACKAGES}/funasr/utils/load_utils.py会失败需要安装FunASR:pip install funasr安装完成后再执行cp assets/src/funasr_utils_load_utils.py \ $(python -c from distutils.sysconfig import get_python_lib; print(get_python_lib()))/funasr/utils/load_utils.py2.5 ModuleNotFoundError: No module named ‘pkg_resources’这个问题非常恶心原本以为是安装不完全/源文件不完整导致没有这个模块后面发现是因为新的setuptools 82已经移除了pkg_resources。因此必须把setuptools降级方能不报错。pip uninstall setuptools -y pip install setuptools68.2.2 --no-cache-dir2.6 HuggingFace下载慢问题作者给出下载模型权重的命令# download JavisDiT weights hf download JavisVerse/JavisDiT-v1.0-jav --local-dir ./checkpoints/JavisDiT-v1.0-jav # download VAEs hf download Wan-AI/Wan2.1-T2V-1.3B --local-dir ./checkpoints/Wan2.1-T2V-1.3B hf download cvssp/audioldm2 --local-dir ./checkpoints/audioldm2但是国内网络下载极慢经学长点拨换源到export HF_ENDPOINThttps://hf-mirror.com速度差不多5MB/s下了一晚上下好了。2.7 transformers库版本冲突问题在该项目中有两个库硬性要求的transformers库版本不同javisdit 0.1.0 → transformers4.49.0colossalai 0.5.0 → transformers4.51.3直接pip安装会失败这里以colossalai为重因为后面推理的时候发现colossalai是必要模块删不掉这个库不仅在训练的时候用了在推理的时候跟什么分布式相关的东西有关是没办法不下载的选择保留transformers4.51.3。加参数使得安装忽略依赖就不会报错但是后续会缺模块需要手动补pip install -v -e . --no-deps2.8 No module named ‘colossalai’这个在2.7中已经提到也是非常的恶心必须保证transformer库版本是4.51.3才可以安装2.9 numpyhuggingface_hubuvicorn版本过高要求numpy版本不能高于2.0.0:pip install numpy2.0.0huggingface_hub版本也必须低huggingface_hub0.36.2uvicorn版本也必须低uvicorn0.29.02.10 KeyError: ‘Adafactor is already registered in optimizer at torch.optim’**禁止 mmengine optimizer和transformers optimizer 自动注册在运行前加export MMENGINE_DISABLE_OPS1 export TRANSFORMERS_NO_ADAFACTOR1还是解决不了就用pip install --force-reinstall mmengine0.10.72.11 ModuleNotFoundError: No module named ‘torchvision.transforms.functional_tensor’根本原因是pytorchvideo版本太旧升级为pip install -U pytorchvideo除此之外还需要改动一部分源码在命令行输入code 你的主机目录/anaconda3/envs/javisdit/lib/python3.10/site-packages/pytorchvideo/transforms/augmentations.py在augmentations.py文件中把import torchvision.transforms.functional_tensor as F_t改为import torchvision.transforms.functional as F_t以适应新的接口总结解决完以上内容后使用CUDA_VISIBLE_DEVICES2 python scripts/inference.py configs/javisdit-v1-0/inference/sample.py --model-path /data/checkpoints/JavisDiT-v1.0-jav --num-frames 81 --resolution 480p --aspect-ratio 9:16 --prompt A brown bear is walking towards the camera --verbose 2命令即可完成demo的推理。三、附环境清单accelerate0.29.2 addict2.4.0 aliyun-python-sdk-core2.16.0 aliyun-python-sdk-kms2.16.5 annotated-doc0.0.4 annotated-types0.7.0 antlr4-python3-runtime4.9.3 anyio4.14.1 attrs26.1.0 audioflux0.1.9 audioread3.1.0 av13.1.0 bcrypt5.0.0 beartype0.22.9 beautifulsoup44.15.0 bitsandbytes0.49.2 brotli1.2.0 certifi2026.6.17 cffi2.0.0 cfgv3.5.0 charset-normalizer3.4.7 click8.4.2 colossalai0.5.0 contexttimer0.3.3 contourpy1.3.2 crcmod1.7 cryptography49.0.0 cycler0.12.1 decorator5.3.1 decord0.6.0 Deprecated1.3.1 diffusers0.29.0 distlib0.4.3 easydict1.13 editdistance0.8.1 einops0.8.2 exceptiongroup1.3.1 fabric3.2.3 fastapi0.138.2 ffmpeg-python0.2.0 filelock3.29.0 flash_attn https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.0.post2/flash_attn-2.8.0.post2cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl#sha256043bf4bf846a2d68a34c210bf392b0af6e5fb33f1d5b0c7ffa6d5837f1f338e2 fonttools4.63.0 fsspec2026.4.0 ftfy6.3.1 funasr1.3.14 future1.0.0 fvcore0.1.5.post20221221 galore-torch1.0 google3.0.0 gradio6.19.0 gradio_client2.5.0 groovy0.1.2 h110.16.0 hf-gradio0.4.1 hf-xet1.5.1 httpcore1.0.9 httpx0.28.1 huggingface_hub0.36.2 hydra-core1.3.3 identify2.6.19 idna3.18 importlib_metadata9.0.0 invoke2.2.1 iopath0.1.10 ipykernel7.3.0 ipywidgets8.1.8 jaconv0.5.0 jamo0.4.1 -e githttps://github.com/JavisVerse/JavisDiTb505b37faa9668b52b982abe364825d3d0a5bdca#eggjavisdit jieba0.42.1 Jinja23.1.6 jmespath0.10.0 joblib1.5.3 jsonschema4.26.0 jsonschema-specifications2025.9.1 kaldiio2.18.1 kaleido1.3.0 kiwisolver1.5.0 librosa0.9.2 llvmlite0.47.0 markdown-it-py4.2.0 MarkupSafe3.0.3 matplotlib3.10.9 mdurl0.1.2 mmengine0.10.7 modelscope1.38.0 modelscope-hub0.1.5 mpmath1.3.0 msgpack1.2.1 networkx3.4.2 ninja1.13.0 nodeenv1.10.0 numba0.65.1 numpy1.26.4 nvidia-cublas-cu1212.1.3.1 nvidia-cuda-cupti-cu1212.1.105 nvidia-cuda-nvrtc-cu1212.1.105 nvidia-cuda-runtime-cu1212.1.105 nvidia-cudnn-cu129.1.0.70 nvidia-cufft-cu1211.0.2.54 nvidia-curand-cu1210.3.2.106 nvidia-cusolver-cu1211.4.5.107 nvidia-cusparse-cu1212.1.0.106 nvidia-nccl-cu122.21.5 nvidia-nvjitlink-cu1212.9.86 nvidia-nvtx-cu1212.1.105 omegaconf2.3.1 openai2.44.0 opencv-python4.13.0.92 orjson3.11.9 oss22.19.1 packaging file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_packaging_1777103621/work pandarallel1.6.5 pandas2.3.3 parameterized0.9.0 paramiko5.0.0 peft0.13.2 pillow12.2.0 platformdirs4.10.0 plotly6.8.0 plumbum2.0.1 pooch1.9.0 portalocker3.2.0 pre_commit4.6.0 protobuf7.35.1 psutil7.2.2 pyarrow24.0.0 pycparser3.0 pycryptodome3.23.0 pydantic2.13.4 pydantic_core2.46.4 pydub0.25.1 Pygments2.20.0 PyNaCl1.6.2 pynndescent0.6.0 pyparsing3.3.2 python-dateutil2.9.0.post0 python-discovery1.4.2 python-multipart0.0.32 pytorchvideo githttps://github.com/facebookresearch/pytorchvideo.git28fe037d212663c6a24f373b94cc5d478c8c1a1d pytz2026.2 PyYAML6.0.3 ray2.56.0 referencing0.37.0 regex2026.6.28 requests2.34.2 resampy0.4.3 rich15.0.0 rotary-embedding-torch0.5.3 rpds-py0.30.0 rpyc6.0.0 safehttpx0.1.7 safetensors0.8.0 scikit-learn1.7.2 scipy1.14.1 semantic-version2.10.0 sentencepiece0.2.1 shellingham1.5.4 six1.17.0 soundfile0.12.1 soupsieve2.8.4 spaces0.50.4 starlette1.3.1 sympy1.13.1 tabulate0.10.0 tensorboard2.21.0 tensorboardX2.6.5 termcolor3.3.0 threadpoolctl3.6.0 tiktoken0.13.0 timm0.9.16 tokenizers0.21.4 tomli2.4.1 tomlkit0.14.0 torch2.5.1cu121 torch-complex0.4.4 torchaudio2.5.1cu121 torchvision0.20.1cu121 tqdm4.68.3 transformers4.51.3 triton3.1.0 typer0.25.1 typing-inspection0.4.2 typing_extensions4.15.0 tzdata2026.2 umap-learn0.5.12 urllib32.7.0 uvicorn0.29.0 virtualenv21.5.1 wandb0.28.0 wcwidth0.8.2 wrapt2.2.2 yacs0.1.8 yapf0.43.0 zipp4.1.0愿世界再无版本冲突orz