CANN Runtime ACL Graph 流捕获特性

CANN Runtime ACL Graph 流捕获特性 ACL Graph 特性【免费下载链接】runtime本项目提供CANN运行时组件和维测功能组件。项目地址: https://gitcode.com/cann/runtime1. 特性概述特性介绍ACL Graph流捕获特性支持将单算子流上的任务序列捕获为可复用的 CaptureModel实现任务序列的优化执行和资源复用。捕获后的图可多次执行减少调度开销。问题背景单算子执行模式下每次算子执行都需要单独提交任务存在调度开销。通过捕获算子序列构建优化图可减少调度开销并实现算子序列复用。设计目标支持流捕获机制BeginCapture/EndCapture支持多捕获模式GLOBAL/THREAD_LOCAL/RELAXED支持级联捕获和扩流机制支持软件 SQ 动态绑定支持 TaskGroup 任务组管理支持图更新和多次执行2. 使用场景与对外接口2.1 使用场景场景一单算子流捕获// 开始捕获 rtError_t ret rtStreamBeginCapture(stream, RT_STREAM_CAPTURE_MODE_GLOBAL); // 执行算子序列任务被记录而非立即执行 rtKernelLaunch(stream, kernel1, ...); rtKernelLaunch(stream, kernel2, ...); // 结束捕获获得 CaptureModel rtModel_t captureModel; ret rtStreamEndCapture(stream, captureModel); // 多次执行捕获的图 ret rtModelExecute(captureModel, exeStream, -1);场景二多流捕获级联捕获// 在原始流上开始捕获 rtStreamBeginCapture(stream1, RT_STREAM_CAPTURE_MODE_GLOBAL); // 当 SQ 深度不足时自动创建级联流继续捕获 // 或主动添加其他流到捕获模型 rtStreamAddToModel(stream2, captureModel); // 结束捕获 rtStreamEndCapture(stream1, captureModel);场景三TaskGroup 任务组// 开始任务组 rtStreamBeginTaskGrp(stream); // 执行一系列任务 rtKernelLaunch(stream, kernel1, ...); rtKernelLaunch(stream, kernel2, ...); // 结束任务组获得 TaskGroup handle TaskGroup *handle; rtStreamEndTaskGrp(stream, handle); // 后续可更新任务组中的任务 rtStreamBeginTaskUpdate(stream, handle); rtStreamEndTaskUpdate(stream);场景四模型更新// 检查模型是否支持更新 rtError_t ret rtCheckCaptureModelForUpdate(stream); // 更新模型 ret rtModelUpdate(captureModel);2.2 对外接口接口文件位置说明rtStreamBeginCapture()src/runtime/api/api_c_standard_soc.cc:674开始流捕获rtStreamEndCapture()src/runtime/api/api_c_standard_soc.cc:695结束流捕获rtStreamGetCaptureInfo()context_aclgraph.cc:502获取捕获状态信息rtStreamAddToModel()context_aclgraph.cc:554添加流到捕获模型rtStreamBeginTaskGrp()context_aclgraph.cc:622开始任务组rtStreamEndTaskGrp()context_aclgraph.cc:656结束任务组rtStreamBeginTaskUpdate()context_aclgraph.cc:690开始任务更新rtStreamEndTaskUpdate()context_aclgraph.cc:708结束任务更新rtModelExecute()capture_model.cc:218执行捕获模型rtModelExecuteAsync()capture_model.cc:222异步执行捕获模型rtModelUpdate()capture_model.cc:776更新捕获模型rtThreadExchangeCaptureMode()context_aclgraph.cc:560交换线程捕获模式2.3 捕获模式定义// 捕获模式控制多线程捕获行为 typedef enum { RT_STREAM_CAPTURE_MODE_GLOBAL 0, // 全局模式所有线程共享捕获状态 RT_STREAM_CAPTURE_MODE_THREAD_LOCAL 1, // 线程本地模式仅当前线程可操作 RT_STREAM_CAPTURE_MODE_RELAXED 2, // 松弛模式允许其他线程操作 RT_STREAM_CAPTURE_MODE_MAX 3 } rtStreamCaptureMode;2.4 捕获状态定义// 流捕获状态 typedef enum { RT_STREAM_CAPTURE_STATUS_NONE 0, // 未捕获 RT_STREAM_CAPTURE_STATUS_ACTIVE 1, // 正在捕获 RT_STREAM_CAPTURE_STATUS_INVALIDATED 2, // 捕获已失效 RT_STREAM_CAPTURE_STATUS_COMPLETED 3, // 捕获已完成 } rtStreamCaptureStatus; // 模型捕获状态 enum class RtCaptureModelStatus { NONE 0, // 初始状态 CAPTURE_ACTIVE, // 正在捕获 CAPTURE_INVALIDATED, // 捕获失效 UPDATING, // 正在更新 FAULT, // 故障状态 READY, // 就绪状态可执行 };3. 架构总览整体设计思路ACL Graph 通过CaptureModel管理捕获的图结构Stream维护捕获状态captureStatus捕获过程中通过级联流和TaskGroup管理任务序列。执行时通过Software SQ动态绑定实现高效调度。架构分层图核心模块交互图4. 详细设计4.1 核心流程流捕获开始流程关键代码// 文件位置src/runtime/feature/aclgraph/context_aclgraph.cc:222-273 rtError_t Context::StreamBeginCapture(Stream * const stm, const rtStreamCaptureMode mode) { Model *captureModel nullptr; BufferAllocator::OpenHugeBuff(); const rtStreamCaptureStatus status stm-GetCaptureStatus(); const int32_t streamId stm-Id_(); // 检查捕获状态 if (status ! RT_STREAM_CAPTURE_STATUS_NONE) { RT_LOG(RT_LOG_ERROR, stream is already in capture status, device_id%u, stream_id%d, status%s., device_-Id_(), streamId, ((status RT_STREAM_CAPTURE_STATUS_ACTIVE) ? active : invalidated)); return RT_ERROR_STREAM_CAPTURED; } // 创建 CaptureModel rtError_t error ModelCreate(captureModel, RT_MODEL_CAPTURE_MODEL); if (error ! RT_ERROR_NONE) { RT_LOG(RT_LOG_ERROR, Capture model create failed, device_id%u, original stream_id%d, retCode%#x., device_-Id_(), streamId, error); return error; } // 检查是否支持 Software Sq if ((stm-Device_()-IsSupportFeature(RtOptionalFeatureType::RT_FEATURE_MODEL_ACL_GRAPH_SOFTWARE_ENABLE)) (stm-Device_()-CheckFeatureSupport(TS_FEATURE_SOFTWARE_SQ_ENABLE)) (NpuDriver::CheckIsSupportFeature(device_-Id_(), FEATURE_TRSDRV_SQ_SUPPORT_DYNAMIC_BIND)) (!Runtime::Instance()-GetConnectUbFlag())) { CaptureModel *captureModelTmp dynamic_castCaptureModel *(captureModel); captureModelTmp-SetSoftwareSqEnable(); } std::unique_lockstd::mutex taskLock(captureLock_); error StreamAddToCaptureModelProc(stm, captureModel, true); // ... CaptureModeEnter(stm, mode); return RT_ERROR_NONE; }任务捕获分配流程关键代码// 文件位置src/runtime/feature/aclgraph/stream_capture.cc:76-131 rtError_t Stream::AllocCaptureTaskWithoutLock(tsTaskType_t taskType, uint32_t sqeNum, TaskInfo **task) { Stream *curCaptureStream GetCaptureStream(); if (curCaptureStream nullptr) { return RT_ERROR_STREAM_CAPTURE_EXIT; } // 检查 SQ 深度是否足够 if ((curCaptureStream-GetCaptureSqeNum() CAPTURE_TASK_RESERVED_NUM device_-GetDevProperties().expandStreamRsvTaskNum) curCaptureStream-GetSqDepth()) { // SQ 深度不足创建级联流 Stream *newCaptureStream nullptr; Context * const ctx Context_(); rtError_t error AllocCascadeCaptureStream(newCaptureStream, curCaptureStream); // ... error CondStreamActive(newCaptureStream, curCaptureStream); // ... UpdateCascadeCaptureStreamInfo(newCaptureStream, curCaptureStream); curCaptureStream newCaptureStream; } // 分配任务 rtError_t errCode RT_ERROR_TASK_NEW; if (curCaptureStream-taskResMang_ nullptr) { *task device_-GetTaskFactory()-Alloc(curCaptureStream, taskType, errCode); } if (*task ! nullptr) { curCaptureStream-AddCaptureSqeNum(sqeNum); (*task)-stream curCaptureStream; Runtime::Instance()-AllocTaskSn((*task)-taskSn); // ... } return RT_ERROR_NONE; }流捕获结束流程关键代码// 文件位置src/runtime/feature/aclgraph/context_aclgraph.cc:401-500 rtError_t Context::StreamEndCapture(Stream * const stm, Model ** const captureMdl) { std::unique_lockstd::mutex taskLock(captureLock_); const rtStreamCaptureStatus status stm-GetCaptureStatus(); // 检查捕获状态 if (status RT_STREAM_CAPTURE_STATUS_NONE) { return RT_ERROR_STREAM_NOT_CAPTURED; } Stream *captureStream stm-GetCaptureStream(); if (!(captureStream-IsOrigCaptureStream())) { return RT_ERROR_STREAM_CAPTURE_UNMATCHED; } rtError_t error CheckCaptureStreamThreadIsMatch(stm); // ... CaptureModeExit(stm); Model *captureModel captureStream-Model_(); CaptureModel *captureModelTmp RtPtrToPtrCaptureModel *, Model *(captureModel); // 检查模型有效性 error CheckCaptureModelValidity(captureModel); // ... // 设置 Notify error AddNotifyToAddedCaptureStream(stm, static_castCaptureModel *(captureModelTmp)); error SetNotifyForExeModel(captureModelTmp); error captureModelTmp-ResetCaptureEvents(stm); // 非 SoftwareSq 模式需要 EndGraph if (!captureModelTmp-IsSoftwareSqEnable()) { Api * const apiObj Runtime::Instance()-ApiImpl_(); error apiObj-ModelEndGraph(captureModel, captureStream, 0U); error captureModel-LoadComplete(); } stm-ExitCapture(); *captureMdl captureModel; return RT_ERROR_NONE; }模型执行流程关键代码// 文件位置src/runtime/feature/aclgraph/capture_model.cc:178-217 rtError_t CaptureModel::ExecuteCommon(Stream * const stm, int32_t timeout, const uint8_t executeMode) { RT_LOG(RT_LOG_INFO, capture model execute, model_id%u!, Id_()); if (IsCapturing()) { RT_LOG(RT_LOG_ERROR, model is capturing, cant execute, model_id%u!, Id_()); return RT_ERROR_MODEL_CAPTURED; } if (captureModelStatus_ ! RtCaptureModelStatus::READY) { RT_LOG(RT_LOG_ERROR, model is not ready, cant execute, model_id%u, status%d, Id_(), captureModelStatus_); return RT_ERROR_MODEL_EXE_FAILED; } rtError_t error; // 设置执行前同步 error SetNotifyBeforeExecute(stm, this); // ... // 构建 SQ/CQ error BuildSqCq(stm); // ... ReportCacheTrackData(); if (executeMode RT_MODEL_CAPTURE_EXECUTE_DEFAULT) { error Model::Execute(stm, timeout); } else { error Model::ExecuteAsync(stm); } // ... // 设置执行后同步 error SetNotifyAfterExecute(stm, this); return RT_ERROR_NONE; }4.2 核心机制详解CaptureModel 捕获模型设计思想管理捕获的图结构支持 SQ/CQ 动态绑定、Notify 同步、Event 捕获等功能。关键代码// 文件位置src/runtime/core/inc/model/capture_model.hpp:42-316 class CaptureModel : public Model { public: explicit CaptureModel(ModelType type RT_MODEL_CAPTURE_MODEL); ~CaptureModel() noexcept override; rtError_t Execute(Stream * const stm, int32_t timeout -1) override; rtError_t ExecuteAsync(Stream * const stm) override; rtError_t TearDown() override; rtError_t AddStreamToCaptureModel(Stream * const stm); // 状态管理 void SetCaptureModelStatus(RtCaptureModelStatus status); RtCaptureModelStatus GetCaptureModelStatus() const; void TerminateCapture(); bool IsCaptureReady() const; bool IsCapturing() const; bool IsCaptureInvalid() const; bool CanUpdate() const; // SQ/CQ 管理 bool IsSoftwareSqEnable(void) const; void SetSoftwareSqEnable(void); rtError_t BuildSqCq(Stream * const exeStream); void DeconstructSqCq(void); rtError_t ReleaseSqCq(uint32_t releaseNum); // Notify 管理 rtError_t SetNotifyBeforeExecute(Stream * const exeStm, CaptureModel* const captureMdl); rtError_t SetNotifyAfterExecute(Stream * const exeStm, CaptureModel* const captureMdl); void AddNotify(Notify *notify); void AddExeNotify(Notify *notify); // Event 管理 void InsertCaptureEvent(Event * const event); std::setEvent * GetCaptureEvent() const; rtError_t ResetCaptureEvents(Stream * const stm) const; // TaskGroup 管理 void AddTaskGroupList(std::unique_ptrTaskGroup taskGrp); void SetTaskGroupErrCode(const rtError_t errCode); const TaskGroup* GetTaskGroup(uint16_t streamId, uint16_t taskId); // 更新相关 rtError_t Update(void); rtError_t RestoreForSoftwareSq(Device * const dev); private: RtCaptureModelStatus captureModelStatus_{RtCaptureModelStatus::NONE}; bool isSoftwareSqEnable_{false}; rtDeviceSqCqInfo_t *sqCqArray_{nullptr}; uint32_t sqCqNum_{0U}; uint32_t refCount_{0U}; std::mapStream *, std::vectorStream * addStreamMap_; std::vectorNotify * addStreamNotifyList_; std::vectorNotify * executeNotifyList_; std::setEvent * captureEvents_; std::vectorstd::unique_ptrTaskGroup taskGroupList_; // ... };TaskGroup 任务组设计思想记录捕获过程中的任务序列支持任务更新。// 文件位置src/runtime/core/src/stream/stream.hpp:139-143 struct TaskGroup { std::vectorstd::pairuint16_t, uint16_t taskIds; // streamId taskId bool isUpdate{false}; uint32_t updateTaskIndex{0}; };任务组操作// 文件位置src/runtime/feature/aclgraph/context_aclgraph.cc:622-688 rtError_t Context::StreamBeginTaskGrp(Stream * const stm) { // 检查任务组状态 const StreamTaskGroupStatus status stm-GetTaskGroupStatus(); COND_RETURN_ERROR_MSG_INNER(status ! StreamTaskGroupStatus::NONE, RT_ERROR_STREAM_TASKGRP_STATUS, Task group is repeatedly started, or a task group is being updated.); Stream *captureStream stm-GetCaptureStream(); CaptureModel *mdl dynamic_castCaptureModel *(captureStream-Model_()); // 创建任务组 std::unique_ptrTaskGroup taskGrp(new (std::nothrow) TaskGroup); // ... captureStream-UpdateCurrentTaskGroup(taskGrp); mdl-InsertTaskGroupStreamId(static_castuint16_t(captureStream-Id_())); return RT_ERROR_NONE; } rtError_t Context::StreamEndTaskGrp(Stream * const stm, TaskGroup ** const handle) const { Stream * const captureStream stm-GetCaptureStream(); CaptureModel *mdl dynamic_castCaptureModel *(captureStream-Model_()); std::unique_ptrTaskGroup taskGrp captureStream-GetCurrentTaskGroup(); rtError_t errorCode mdl-GetTaskGroupErrCode(); if ((errorCode ! RT_ERROR_NONE) || (mdl-IsCaptureInvalid())) { taskGrp.reset(); *handle nullptr; } else { *handle taskGrp.get(); mdl-AddTaskGroupList(taskGrp); } captureStream-ResetTaskGroup(); // ... return errorCode; }捕获模式管理设计思想支持多线程捕获场景下的不同同步模式。// 文件位置src/runtime/feature/aclgraph/context_aclgraph.cc:573-620 void Context::CaptureModeEnter(Stream * const stm, rtStreamCaptureMode mode) { stm-SetStreamCaptureMode(mode); stm-SetBeginCaptureThreadId(runtime::GetCurrentTid()); captureModeRefNum_[mode]; InnerThreadLocalContainer::ThreadCaptureModeEnter(mode); // 更新 Context 级别捕获模式取最小值 if (mode GetContextCaptureMode()) { SetContextCaptureMode(mode); } } void Context::CaptureModeExit(Stream * const stm) { const rtStreamCaptureMode streamCaptureMode stm-GetStreamCaptureMode(); stm-SetStreamCaptureMode(RT_STREAM_CAPTURE_MODE_MAX); stm-SetBeginCaptureThreadId(UINT32_MAX); if (captureModeRefNum_[streamCaptureMode] 0U) { captureModeRefNum_[streamCaptureMode]--; } InnerThreadLocalContainer::ThreadCaptureModeExit(streamCaptureMode); // 根据引用计数更新 Context 级别捕获模式 // ... }Event 捕获机制设计思想在捕获过程中处理 Event 的 Record/Wait 操作。// 文件位置src/runtime/feature/aclgraph/event_capture.cc:19-90 rtError_t Event::CaptureEventProcess(Stream * const stm) { // 分配捕获任务 TaskInfo *tsk stm-AllocTask(submitTask, TS_TASK_TYPE_EVENT_RECORD, errorReason); // ... // 分配 Event 地址 error dev-AllocExpandingPoolEvent(eventAddr, newEventId); eventAddr_ eventAddr; eventId_ newEventId; // 初始化 MemWriteValue 任务 (void)MemWriteValueTaskInit(tsk, eventAddr, static_castuint64_t(1U)); tsk-typeName EVENT_RECORD; tsk-type TS_TASK_TYPE_CAPTURE_RECORD; // ... return error; } rtError_t Event::CaptureWaitProcess(Stream * const stm) { TaskInfo *tsk stm-AllocTask(submitTask, TS_TASK_TYPE_STREAM_WAIT_EVENT, errorReason, MEM_WAIT_SQE_NUM); // ... tsk-typeName EVENT_WAIT; tsk-type TS_TASK_TYPE_CAPTURE_WAIT; error MemWaitValueTaskInit(tsk, eventAddr, 1, 0x0); // ... return error; }Software SQ 动态绑定设计思想支持 SQ/CQ 的动态绑定实现高效的图执行。// 文件位置src/runtime/feature/aclgraph/capture_model.cc:471-567 rtError_t CaptureModel::BuildSqCq(Stream * const exeStream) { // 检查是否启用 Software Sq COND_PROC(!IsSoftwareSqEnable(), return RT_ERROR_NONE); // ... const uint32_t streamNum static_castuint32_t(StreamList_().size()); // 分配 SQ/CQ 资源 rtError_t error AllocSqCqProc(streamNum); // ... sqCqNum_ streamNum; // 分配 SQ 地址 error AllocSqAddr(); // ... // 绑定 SQ/CQ 并发送 SQE error BindSqCqAndSendSqe(); // ... // 更新 Stream Active 任务 error UpdateStreamActiveTaskFuncCallMem(); refCount_; return RT_ERROR_NONE; } rtError_t CaptureModel::BindSqCq(void) { // 更新流的 SQ/CQ 信息 for (auto stm : StreamList_()) { stm-UpdateSqCq((sqCqArray_[index])); switchInfo_[index].stream_id static_castuint32_t(stm-Id_()); switchInfo_[index].sq_id stm-GetSqId(); switchInfo_[index].sq_depth stm-GetSqDepth(); // ... } // 批量切换流到 SQ error dev-Driver_()-SqSwitchStreamBatch(dev-Id_(), switchInfo_, sqCqNum_); return error; }4.3 模块职责划分模块职责位置CaptureModel捕获模型管理、SQ/CQ 管理、执行调度core/inc/model/capture_model.hppContext捕获流程控制、捕获模式管理feature/aclgraph/context_aclgraph.ccStream捕获状态管理、任务分配、级联流管理feature/aclgraph/stream_capture.ccEvent事件捕获处理feature/aclgraph/event_capture.ccCaptureModelUtils辅助函数检查、获取捕获流等feature/aclgraph/capture_model_utils.ccNotify执行前/后同步capture_model.cc4.4 核心数据结构5. 关键设计思想5.1 捕获与执行分离捕获阶段任务被记录到 CaptureStream不立即执行构建阶段EndCapture 时构建可执行的图结构执行阶段BuildSqCq 动态绑定 SQ/CQ提交优化后的执行任务5.2 级联捕获支持当原始捕获流的 SQ 深度不足时自动创建级联流继续捕获// SQ 深度检查 if ((curCaptureStream-GetCaptureSqeNum() reserved) curCaptureStream-GetSqDepth()) { // 创建级联流 AllocCascadeCaptureStream(newCaptureStream, curCaptureStream); // Stream Active 连接级联流 CondStreamActive(newCaptureStream, curCaptureStream); // 更新捕获流信息 UpdateCascadeCaptureStreamInfo(newCaptureStream, curCaptureStream); }5.3 Software SQ 动态绑定支持 SQ/CQ 的动态分配和绑定执行时 BuildSqCq完成后 ReleaseSqCq通过 SqSwitchStreamBatch 实现批量流切换5.4 Notify 同步机制执行时通过 Notify 实现与 AddStream 的同步// 执行前同步等待 AddStream 完成当前任务 SetNotifyBeforeExecute(exeStream, captureModel); // NotifyRecord(addStream) - NotifyWait(exeStream) // 执行后同步通知 AddStream 继续执行 SetNotifyAfterExecute(exeStream, captureModel); // NotifyRecord(exeStream) - NotifyWait(addStream)5.5 捕获模式控制模式说明适用场景GLOBAL所有线程共享捕获状态单线程捕获THREAD_LOCAL仅当前线程可操作多线程独立捕获RELAXED允许其他线程操作多线程协作捕获6. 关键文件索引模块文件路径核心内容捕获模型src/runtime/core/inc/model/capture_model.hppCaptureModel 类定义捕获模型实现src/runtime/feature/aclgraph/capture_model.ccCaptureModel 实现上下文捕获src/runtime/feature/aclgraph/context_aclgraph.ccBeginCapture/EndCapture 流程流捕获src/runtime/feature/aclgraph/stream_capture.ccAllocCaptureTask、级联流管理事件捕获src/runtime/feature/aclgraph/event_capture.ccEvent 捕获处理捕获工具src/runtime/feature/aclgraph/capture_model_utils.cc辅助函数模型打印src/runtime/feature/aclgraph/model_aclgraph.ccDebugDotPrint、JsonPrintAPI 接口src/runtime/api/api_c_standard_soc.cc:674-695rtStreamBeginCapture/EndCapturev100适配src/runtime/feature/aclgraph/v100/v100 芯片适配v200适配src/runtime/feature/aclgraph/v200/v200 芯片适配7. 兼容性与扩展性7.1 芯片适配v100 适配feature/aclgraph/v100/目录v200 适配feature/aclgraph/v200/目录通过CaptureAdapt类实现不同芯片的适配7.2 状态转换7.3 扩展能力级联流扩展支持无限级联流扩展捕获深度TaskGroup 更新支持捕获后的任务参数更新模型更新支持捕获模型的动态更新本特性文档基于源码src/runtime/feature/aclgraph/及src/runtime/core/inc/model/capture_model.hpp分析。【免费下载链接】runtime本项目提供CANN运行时组件和维测功能组件。项目地址: https://gitcode.com/cann/runtime创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考