AMCT Large Model Quantization【免费下载链接】amctAMCT是CANN提供的昇腾AI处理器亲和的模型压缩工具仓。项目地址: https://gitcode.com/cann/amct1 Quantization Prerequisites1.1 Install DependenciesThe dependency packages for this sample can be found in requirements.txtNote that the torch_npu package version needs to match the Python and torch package versions, and the CANN package needs to be installed1.2 Model and Dataset PreparationThis sample uses Llama2-7b, qwen2-7b, and qwen3-8b models with pileval data and wikitext2 dataset as examples. Please download the models yourself and pass the model path to the script. The dataset is loaded online.1.3 Simple Quantization ConfigurationThe quantization configuration used in this sample is built into the tool and can be obtained and used in the following ways:from amct_pytorch import HIFP8_OFMR_CFGIf you need to modify the detailed configuration, please refer to the documentation to construct the required quantization configuration dict.The OFMR algorithm supports weight-only quantization and full quantization. The supported quantization types and quantization configurations are:FieldTypeDescriptionValue RangeNotesbatch_numuint32Number of batches used for quantization1/skip_layersstrLayers to skip quantization/Skip quantization layers support fuzzy matching. When the configured string is a layer name substring or matches the layer name, skip quantization for that layer and do not generate quantization configuration. The string must contain numbers or lettersweights.typestrQuantized weight typefloat8_e4m3fn/hifloat8/weights.symmetricboolSymmetric quantizationTRUE/weights.strategystrQuantization granularitytensor/channel/inputs.typestrQuantized activation typefloat8_e4m3fn/hifloat8/inputs.symmetricboolSymmetric quantizationTRUE/inputs.strategystrQuantization granularitytensor/algorithmdictQuantization algorithm configuration used{ofmr}/2 Quantization Example2.1 Use Interface Method to Callstep 1.Please execute the following command in the current directory to run the sample program. Users need to modify the model path in the sample program according to actual conditions:python3 src/run_llama2_samples.py --model_path/data/Llama2_7b_hf/python3 src/run_qwen_samples.py --model_path/data/Qwen2-7b/python3 src/run_qwen_samples.py --model_path/data/Qwen3-8B/If the following information appears, it indicates that quantization is successful:Test time taken: 1.0 min 59.24865388870239 s Score: 5.477707Where Score is the quantized model PPL. For specific values, refer to the following table:ModelCalibration SetDatasetPre-quantization PPLPost-quantization PPLLLAMA2-7Bpilevalwikitext25.4725.505QWEN2-7Bpilevalwikitext27.1377.196QWEN3-8Bpilevalwikitext29.7159.808After inference succeeds, a quantization log file ./amct_log/amct_pytorch.log is generated in the current directory【免费下载链接】amctAMCT是CANN提供的昇腾AI处理器亲和的模型压缩工具仓。项目地址: https://gitcode.com/cann/amct创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考
CANN/AMCT OFMR算法示例
AMCT Large Model Quantization【免费下载链接】amctAMCT是CANN提供的昇腾AI处理器亲和的模型压缩工具仓。项目地址: https://gitcode.com/cann/amct1 Quantization Prerequisites1.1 Install DependenciesThe dependency packages for this sample can be found in requirements.txtNote that the torch_npu package version needs to match the Python and torch package versions, and the CANN package needs to be installed1.2 Model and Dataset PreparationThis sample uses Llama2-7b, qwen2-7b, and qwen3-8b models with pileval data and wikitext2 dataset as examples. Please download the models yourself and pass the model path to the script. The dataset is loaded online.1.3 Simple Quantization ConfigurationThe quantization configuration used in this sample is built into the tool and can be obtained and used in the following ways:from amct_pytorch import HIFP8_OFMR_CFGIf you need to modify the detailed configuration, please refer to the documentation to construct the required quantization configuration dict.The OFMR algorithm supports weight-only quantization and full quantization. The supported quantization types and quantization configurations are:FieldTypeDescriptionValue RangeNotesbatch_numuint32Number of batches used for quantization1/skip_layersstrLayers to skip quantization/Skip quantization layers support fuzzy matching. When the configured string is a layer name substring or matches the layer name, skip quantization for that layer and do not generate quantization configuration. The string must contain numbers or lettersweights.typestrQuantized weight typefloat8_e4m3fn/hifloat8/weights.symmetricboolSymmetric quantizationTRUE/weights.strategystrQuantization granularitytensor/channel/inputs.typestrQuantized activation typefloat8_e4m3fn/hifloat8/inputs.symmetricboolSymmetric quantizationTRUE/inputs.strategystrQuantization granularitytensor/algorithmdictQuantization algorithm configuration used{ofmr}/2 Quantization Example2.1 Use Interface Method to Callstep 1.Please execute the following command in the current directory to run the sample program. Users need to modify the model path in the sample program according to actual conditions:python3 src/run_llama2_samples.py --model_path/data/Llama2_7b_hf/python3 src/run_qwen_samples.py --model_path/data/Qwen2-7b/python3 src/run_qwen_samples.py --model_path/data/Qwen3-8B/If the following information appears, it indicates that quantization is successful:Test time taken: 1.0 min 59.24865388870239 s Score: 5.477707Where Score is the quantized model PPL. For specific values, refer to the following table:ModelCalibration SetDatasetPre-quantization PPLPost-quantization PPLLLAMA2-7Bpilevalwikitext25.4725.505QWEN2-7Bpilevalwikitext27.1377.196QWEN3-8Bpilevalwikitext29.7159.808After inference succeeds, a quantization log file ./amct_log/amct_pytorch.log is generated in the current directory【免费下载链接】amctAMCT是CANN提供的昇腾AI处理器亲和的模型压缩工具仓。项目地址: https://gitcode.com/cann/amct创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考