CANN/asc-devkit SIMD API量化设置-尧图企业网站定制

SetDeqScale【免费下载链接】asc-devkit本项目是CANN 推出的昇腾AI处理器专用的算子程序开发语言原生支持C和C标准规范主要由类库和语言扩展层构成提供多层级API满足多维场景算子开发诉求。项目地址: https://gitcode.com/cann/asc-devkit产品支持情况产品是否支持Ascend 950PR/Ascend 950DT√Atlas A3 训练系列产品 / Atlas A3 推理系列产品√Atlas A2 训练系列产品 / Atlas A2 推理系列产品√Atlas 200I/500 A2 推理产品xAtlas 推理系列产品 AI Core√Atlas 推理系列产品 Vector CorexAtlas 训练系列产品x功能说明设置DEQSCALE寄存器的值。函数原型用于AddDeqRelu/Cast/CastDeq的s322f16场景__aicore__ inline void SetDeqScale(half scale)用于CastDeqisVecDeqfalse的场景__aicore__ inline void SetDeqScale(float scale, int16_t offset, bool signMode)用于CastDeqisVecDeqtrue的场景template typename T __aicore__ inline void SetDeqScale(const LocalTensorT vdeq, const VdeqInfo vdeqInfo)参数说明表 1模板参数说明参数名描述T输入量化Tensor的数据类型。支持的数据类型为uint64_t。表 2参数说明参数名输入/输出描述scalehalf输入scale量化参数half类型。Ascend 950PR/Ascend 950DT用于AddDeqRelu/CastDeq/Cast的s322f16场景。Atlas A3 训练系列产品 / Atlas A3 推理系列产品用于AddDeqRelu/Cast/CastDeq的s322f16场景。Atlas A2 训练系列产品 / Atlas A2 推理系列产品用于AddDeqRelu/Cast/CastDeq的s322f16场景。Atlas 推理系列产品 AI Core用于AddDeqRelu或者Cast的s322f16场景。scalefloat输入scale量化参数float类型。用于CastDeqisVecDeqfalse场景设置DEQSCALE寄存器的值。offset输入offset量化参数int16_t类型只有前9位有效。用于CastDeqisVecDeqfalse的场景设置offset。signMode输入bool类型表示量化结果是否带符号。用于CastDeqisVecDeqfalse的场景设置signMode。vdeq输入用于CastDeqisVecDeqtrue的场景输入量化tensor大小为128Byte。类型为LocalTensor支持的TPosition为VECIN/VECCALC/VECOUT。LocalTensor的起始地址需要32字节对齐。vdeqInfo输入存储量化tensor信息的数据结构结构体内包含量化tensor中的16组量化参数const uint8_t VDEQ_TENSOR_SIZE 16;struct VdeqInfo {aicoreVdeqInfo() {}aicoreVdeqInfo(const float vdeqScaleIn[VDEQ_TENSOR_SIZE], const int16_t vdeqOffsetIn[VDEQ_TENSOR_SIZE], const bool vdeqSignModeIn[VDEQ_TENSOR_SIZE]) { for (int32_t i 0; i VDEQ_TENSOR_SIZE; i) { vdeqScale[i] vdeqScaleIn[i]; vdeqOffset[i] vdeqOffsetIn[i]; vdeqSignMode[i] vdeqSignModeIn[i]; } }float vdeqScale[VDEQ_TENSOR_SIZE] { 0 }; int16_t vdeqOffset[VDEQ_TENSOR_SIZE] { 0 }; bool vdeqSignMode[VDEQ_TENSOR_SIZE] { 0 };};vdeqScalefloat类型的数组用于存储量化tensor中的scale参数scale0-scale15。vdeqOffsetint16_t类型的数组用于存储量化tensor中的offset参数offset0-offset15。vdeqSignModebool类型的数组用于存储量化tensor中的signMode参数signMode0-signMode15。返回值说明无约束说明无调用示例SetDeqScale(half scale)// 配合Cast的s322f16场景使用 // dstLocal为half类型的LocalTensorsrcLocal为int32_t类型的LocalTensor uint32_t srcSize 256; // 参与计算的元素个数 half scale 1.0; // 量化参数为1 AscendC::SetDeqScale(scale); // dst src AscendC::Cast(dstLocal, srcLocal, AscendC::RoundMode::CAST_NONE, srcSize);结果示例如下输入数据(srcLocal): [1, 2, 3, 4, 5, 6, ... 256] 输出数据(dstLocal): [1, 2, 3, 4, 5, 6, ... 256]SetDeqScale(float scale, int16_t offset, bool signMode)// 配合CastDeqisVecDeqfalse场景使用 // dstLocal为int8_t类型的LocalTensorsrcLocal为int16_t类型的LocalTensor uint32_t srcSize 256; // 参与计算的元素个数 float scale 1.0; // 量化参数为1 int16_t offset 0; // 不带偏移 bool signMode true; // dstLocal为int8_t类型为有符号数 AscendC::SetDeqScale(scale, offset, signMode); // dst src AscendC::CastDeqint8_t, int16_t, false, false(dstLocal, srcLocal, srcSize);结果示例如下输入数据(srcLocal): [[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ] [ 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 ] [ 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 ] [ 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 ] [ 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 ] [ 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 ] [ 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 ] [ 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 ] 输出数据(dstLocal): // 写入dstLocal的上半Block [[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]SetDeqScale(const LocalTensorT vdeq, const VdeqInfo vdeqInfo)// 配合CastDeqisVecDeqtrue场景使用 // dstLocal为int8_t类型的LocalTensorsrcLocal为int16_t类型的LocalTensor uint32_t srcSize 256; // 参与计算的元素个数 float vdeqScale[16] { 0 }; int16_t vdeqOffset[16] { 0 }; bool vdeqSignMode[16] { 0 }; for (int i 0; i 16; i) { vdeqScale[i] 1.0; // 量化参数为1 vdeqOffset[i] 0; // 不带偏移 vdeqSignMode[i] true; // dstLocal为int8_t类型为有符号数 } AscendC::VdeqInfo vdeqInfo(vdeqScale, vdeqOffset, vdeqSignMode); AscendC::SetDeqScaleuint64_t(tmpBuffer, vdeqInfo); // dst src AscendC::CastDeqint8_t, int16_t, true, true(dstLocal, srcLocal, srcSize);结果示例如下输入数据(srcLocal): [[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ] [ 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 ] [ 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 ] [ 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 ] [ 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 ] [ 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 ] [ 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 ] [ 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 ] 输出数据(dstLocal): // 写入dstLocal的下半Block [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15] [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31] [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47] [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63] [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79] [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95] [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111] [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127]]【免费下载链接】asc-devkit本项目是CANN 推出的昇腾AI处理器专用的算子程序开发语言原生支持C和C标准规范主要由类库和语言扩展层构成提供多层级API满足多维场景算子开发诉求。项目地址: https://gitcode.com/cann/asc-devkit创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考

相关新闻

AI Cover技术深度解析：从OpenAI到AWS S3的完整架构实现

DiffLoss扩散损失函数详解：MAR训练的核心引擎

如何在Windows11中自定义快捷方式？提升操作效率的技巧

FreeJoy固件刷写与配置全攻略：从STM32CubeProgrammer到中文版Configurator

复杂串行协议高效调试：从可视化解析到自动化测试的完整方案

智能视觉瞄准系统：基于YOLOv8的高效游戏辅助解决方案

AI 变频调速电机智能功率 MOSFET 完整选型方案

对比体验在Taotoken模型广场切换不同模型生成文本的风格与速度差异

【性能倍增】GLM-4V-9B五大生态工具链：从基础部署到多模态应用全攻略

优之彩的不锈钢实心台面，为什么是厨房装修的“长期主义者”？

YOLOv11超市货架牛奶目标检测数据集-463张-Milk-1

2025年网盘直链下载终极指南：告别限速，轻松获取高速下载链接

基于CircuitPython与运动传感器的智能LED滑雪板灯光系统全解析

app扫描wifi的时候需要打开GPS定位----否则扫不到

使用辅助权限登录wifi

从stress到stress-ng：一文搞懂Linux压力测试工具怎么选？实战对比CPU/内存/磁盘压测效果

从TTL到eDP：嵌入式工程师选屏接口的实战避坑指南（附信号实测对比）

实测 Taotoken 多模型路由的响应延迟与稳定性体感