yolov26改进 | 添加注意力机制篇 | 添加ACmix自注意力与卷积混合模型改善模型特征识别效率(包含二次创新C2PSA机制)

yolov26改进 | 添加注意力机制篇 | 添加ACmix自注意力与卷积混合模型改善模型特征识别效率(包含二次创新C2PSA机制) 开始讲解之前推荐一下我的专栏本专栏的内容支持(分类、检测、分割、追踪、关键点检测),专栏目前为限时折扣欢迎大家订阅本专栏本专栏每周更新5-7篇最新机制更有包含我所有改进的文件和交流群提供给大家本人定期在群内分享发表论文方法和经验。一、本文介绍本文给大家带来的改进机制是ACmix自注意力机制的改进版本它的核心思想是传统卷积操作和自注意力模块的大部分计算都可以通过1x1的卷积来实现。ACmix首先使用1x1卷积对输入特征图进行投影生成一组中间特征然后根据不同的范式即自注意力和卷积方式分别重用和聚合这些中间特征。这样ACmix既能利用自注意力的全局感知能力又能通过卷积捕获局部特征从而在保持较低计算成本的同时提高模型的性能。本文包含二次创新C2PSA机制和独家网络结构图专栏链接YOLOv26有效涨点专栏包含Conv、注意力机制、主干/Backbone、损失函数、优化器、后处理等改进机制目录一、本文介绍二、ACmix的框架原理2.1 ACMix的基本原理2.1.1 自注意力和卷积的整合2.1.2 运算分解与重构三、ACmix的核心代码四、手把手教你添加ACmix4.1 修改一4.2 修改二4.3 修改三4.4 修改四4.5 修改五4.6 修改六五、正式训练5.1 yaml文件5.1.1 yaml文件15.1.2 yaml文件25.2 训练代码5.3 训练过程截图五、本文总结二、ACmix的框架原理​官方论文地址官方论文地址官方代码地址官方代码地址​2.1 ACMix的基本原理ACmix是一种混合模型结合了自注意力机制和卷积运算的优势。它的核心思想是传统卷积操作和自注意力模块的大部分计算都可以通过1x1的卷积来实现。ACmix首先使用1x1卷积对输入特征图进行投影生成一组中间特征然后根据不同的范式即自注意力和卷积方式分别重用和聚合这些中间特征。这样ACmix既能利用自注意力的全局感知能力又能通过卷积捕获局部特征从而在保持较低计算成本的同时提高模型的性能。ACmix模型的主要改进机制可以分为以下两点1. 自注意力和卷积的整合将自注意力和卷积技术融合实现两者优势的结合。2. 运算分解与重构通过分解自注意力和卷积中的运算重构为1×1卷积形式提高了运算效率。2.1.1 自注意力和卷积的整合文章中指出自注意力和卷积的整合通过以下方式实现特征分解自注意力机制的查询query、键key、值value与卷积操作通过1x1卷积进行特征分解。运算共享卷积和自注意力共享相同的1x1卷积运算减少了重复的计算量。特征融合在ACmix模型中卷积和自注意力生成的特征通过求和操作进行融合加强了模型的特征提取能力。模块化设计通过模块化设计ACmix可以灵活地嵌入到不同的网络结构中增强网络的表征能力。​这张图片展示了ACmix中的主要概念它比较了卷积、自注意力和ACmix各自的结构和计算复杂度。图中(a) 卷积展示了标准卷积操作包含一个的1x1卷积表示卷积核大小和卷积操作的聚合。(b) 自注意力展示了自注意力机制它包含三个头部的1x1卷积代表多头注意力机制中每个头部的线性变换以及自注意力聚合。(c) ACmix我们的方法结合了卷积和自注意力聚合其中1x1卷积在两者之间共享旨在减少计算开销并整合轻量级的聚合操作。整体上ACmix旨在通过共享计算资源1x1卷积并结合两种不同的聚合操作以优化特征通道上的计算复杂度。2.1.2 运算分解与重构在ACmix中运算分解与重构的概念是指将传统的卷积运算和自注意力运算拆分并重新构建为更高效的形式。这主要通过以下步骤实现分解卷积和自注意力将标准的卷积核分解成多个1×1卷积核每个核处理不同的特征子集同时将自注意力机制中的查询query、键key和值value的生成也转换为1×1卷积操作。重构为混合模块将分解后的卷积和自注意力运算重构成一个统一的混合模块既包含了卷积的空间特征提取能力也融入了自注意力的全局信息聚合功能。提高运算效率这种分解与重构的方法减少了冗余计算提高了运算效率同时降低了模型的复杂度。​这张图片展示了ACmix提出的混合模块的结构。图示包含了(a) 卷积3x3卷积通过1x1卷积的方式被分解展示了特征图的转换过程。(b)自注意力输入特征先转换成查询query、键key和值value使用1x1卷积实现并通过相似度匹配计算注意力权重。(c) ACmix结合了(a)和(b)的特点在第一阶段使用三个1x1卷积对输入特征图进行投影在第二阶段将两种路径得到的特征相加作为最终输出。右图显示了ACmix模块的流程强调了两种机制的融合并提供了每个操作块的计算复杂度。三、ACmix的核心代码使用方法看下一章.import torch import torch.nn as nn __all__ [ACmix, C2PSA_ACmix] def position(H, W, type, is_cudaTrue): if is_cuda: loc_w torch.linspace(-1.0, 1.0, W).cuda().unsqueeze(0).repeat(H, 1).to(type) loc_h torch.linspace(-1.0, 1.0, H).cuda().unsqueeze(1).repeat(1, W).to(type) else: loc_w torch.linspace(-1.0, 1.0, W).unsqueeze(0).repeat(H, 1) loc_h torch.linspace(-1.0, 1.0, H).unsqueeze(1).repeat(1, W) loc torch.cat([loc_w.unsqueeze(0), loc_h.unsqueeze(0)], 0).unsqueeze(0) return loc def stride(x, stride): b, c, h, w x.shape return x[:, :, ::stride, ::stride] def init_rate_half(tensor): if tensor is not None: tensor.data.fill_(0.5) def init_rate_0(tensor): if tensor is not None: tensor.data.fill_(0.) class ACmix(nn.Module): def __init__(self, in_planes, kernel_att7, head4, kernel_conv3, stride1, dilation1): super(ACmix, self).__init__() if head 0: # 防止head参数等于0避免报错. head 1 # head等于0时默认不用多头. out_planes in_planes self.in_planes in_planes self.out_planes out_planes self.head head self.kernel_att kernel_att self.kernel_conv kernel_conv self.stride stride self.dilation dilation self.rate1 torch.nn.Parameter(torch.Tensor(1)) self.rate2 torch.nn.Parameter(torch.Tensor(1)) self.head_dim self.out_planes // self.head self.conv1 nn.Conv2d(in_planes, out_planes, kernel_size1) self.conv2 nn.Conv2d(in_planes, out_planes, kernel_size1) self.conv3 nn.Conv2d(in_planes, out_planes, kernel_size1) self.conv_p nn.Conv2d(2, self.head_dim, kernel_size1) self.padding_att (self.dilation * (self.kernel_att - 1) 1) // 2 self.pad_att torch.nn.ReflectionPad2d(self.padding_att) self.unfold nn.Unfold(kernel_sizeself.kernel_att, padding0, strideself.stride) self.softmax torch.nn.Softmax(dim1) self.fc nn.Conv2d(3 * self.head, self.kernel_conv * self.kernel_conv, kernel_size1, biasFalse) self.dep_conv nn.Conv2d(self.kernel_conv * self.kernel_conv * self.head_dim, out_planes, kernel_sizeself.kernel_conv, biasTrue, groupsself.head_dim, padding1, stridestride) self.reset_parameters() def reset_parameters(self): init_rate_half(self.rate1) init_rate_half(self.rate2) kernel torch.zeros(self.kernel_conv * self.kernel_conv, self.kernel_conv, self.kernel_conv) for i in range(self.kernel_conv * self.kernel_conv): kernel[i, i // self.kernel_conv, i % self.kernel_conv] 1. kernel kernel.squeeze(0).repeat(self.out_planes, 1, 1, 1) self.dep_conv.weight nn.Parameter(datakernel, requires_gradTrue) self.dep_conv.bias init_rate_0(self.dep_conv.bias) def forward(self, x): q, k, v self.conv1(x), self.conv2(x), self.conv3(x) scaling float(self.head_dim) ** -0.5 b, c, h, w q.shape h_out, w_out h // self.stride, w // self.stride # ### att # ## positional encoding pe self.conv_p(position(h, w, x.dtype, x.is_cuda)) q_att q.view(b * self.head, self.head_dim, h, w) * scaling k_att k.view(b * self.head, self.head_dim, h, w) v_att v.view(b * self.head, self.head_dim, h, w) if self.stride 1: q_att stride(q_att, self.stride) q_pe stride(pe, self.stride) else: q_pe pe unfold_k self.unfold(self.pad_att(k_att)).view(b * self.head, self.head_dim, self.kernel_att * self.kernel_att, h_out, w_out) # b*head, head_dim, k_att^2, h_out, w_out unfold_rpe self.unfold(self.pad_att(pe)).view(1, self.head_dim, self.kernel_att * self.kernel_att, h_out, w_out) # 1, head_dim, k_att^2, h_out, w_out att (q_att.unsqueeze(2) * (unfold_k q_pe.unsqueeze(2) - unfold_rpe)).sum( 1) # (b*head, head_dim, 1, h_out, w_out) * (b*head, head_dim, k_att^2, h_out, w_out) - (b*head, k_att^2, h_out, w_out) att self.softmax(att) out_att self.unfold(self.pad_att(v_att)).view(b * self.head, self.head_dim, self.kernel_att * self.kernel_att, h_out, w_out) out_att (att.unsqueeze(1) * out_att).sum(2).view(b, self.out_planes, h_out, w_out) ## conv f_all self.fc(torch.cat( [q.view(b, self.head, self.head_dim, h * w), k.view(b, self.head, self.head_dim, h * w), v.view(b, self.head, self.head_dim, h * w)], 1)) f_conv f_all.permute(0, 2, 1, 3).reshape(x.shape[0], -1, x.shape[-2], x.shape[-1]) out_conv self.dep_conv(f_conv) return self.rate1 * out_att self.rate2 * out_conv def autopad(k, pNone, d1): # kernel, padding, dilation Pad to same shape outputs. if d 1: k d * (k - 1) 1 if isinstance(k, int) else [d * (x - 1) 1 for x in k] # actual kernel-size if p is None: p k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad return p class Conv(nn.Module): Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation). default_act nn.SiLU() # default activation def __init__(self, c1, c2, k1, s1, pNone, g1, d1, actTrue): Initialize Conv layer with given arguments including activation. super().__init__() self.conv nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groupsg, dilationd, biasFalse) self.bn nn.BatchNorm2d(c2) self.act self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity() def forward(self, x): Apply convolution, batch normalization and activation to input tensor. return self.act(self.bn(self.conv(x))) def forward_fuse(self, x): Perform transposed convolution of 2D data. return self.act(self.conv(x)) class PSABlock(nn.Module): PSABlock class implementing a Position-Sensitive Attention block for neural networks. This class encapsulates the functionality for applying multi-head attention and feed-forward neural network layers with optional shortcut connections. Attributes: attn (Attention): Multi-head attention module. ffn (nn.Sequential): Feed-forward neural network module. add (bool): Flag indicating whether to add shortcut connections. Methods: forward: Performs a forward pass through the PSABlock, applying attention and feed-forward layers. Examples: Create a PSABlock and perform a forward pass def __init__(self, c, attn_ratio0.5, num_heads4, shortcutTrue) - None: Initializes the PSABlock with attention and feed-forward layers for enhanced feature extraction. super().__init__() self.attn ACmix(c, headnum_heads) self.ffn nn.Sequential(Conv(c, c * 2, 1), Conv(c * 2, c, 1, actFalse)) self.add shortcut def forward(self, x): Executes a forward pass through PSABlock, applying attention and feed-forward layers to the input tensor. x x self.attn(x) if self.add else self.attn(x) x x self.ffn(x) if self.add else self.ffn(x) return x class C2PSA_ACmix(nn.Module): C2PSA module with attention mechanism for enhanced feature extraction and processing. This module implements a convolutional block with attention mechanisms to enhance feature extraction and processing capabilities. It includes a series of PSABlock modules for self-attention and feed-forward operations. Attributes: c (int): Number of hidden channels. cv1 (Conv): 1x1 convolution layer to reduce the number of input channels to 2*c. cv2 (Conv): 1x1 convolution layer to reduce the number of output channels to c. m (nn.Sequential): Sequential container of PSABlock modules for attention and feed-forward operations. Methods: forward: Performs a forward pass through the C2PSA module, applying attention and feed-forward operations. Notes: This module essentially is the same as PSA module, but refactored to allow stacking more PSABlock modules. def __init__(self, c1, c2, n1, e0.5): Initializes the C2PSA module with specified input/output channels, number of layers, and expansion ratio. super().__init__() assert c1 c2 self.c int(c1 * e) self.cv1 Conv(c1, 2 * self.c, 1, 1) self.cv2 Conv(2 * self.c, c1, 1) self.m nn.Sequential(*(PSABlock(self.c, attn_ratio0.5, num_headsself.c // 64) for _ in range(n))) def forward(self, x): Processes the input tensor x through a series of PSA blocks and returns the transformed tensor. a, b self.cv1(x).split((self.c, self.c), dim1) b self.m(b) return self.cv2(torch.cat((a, b), 1)) if __name__ __main__: # Generating Sample image image_size (1, 64, 240, 240) image torch.rand(*image_size) # Model mobilenet_v1 C2PSA_ACmix(64, 64) out mobilenet_v1(image) print(out.size())四、手把手教你添加ACmix下面的步骤如果你不会或者不想麻烦操作可以联系作者获得本专栏添加所有项目文件的源代码可直接训练.4.1 修改一第一还是建立文件我们找到如下ultralytics/nn文件夹下建立一个目录名字呢就是Addmodules文件夹​4.2 修改二然后在Addmodules文件夹内建立一个新的py文件将本文章节三中的“核心代码复制粘贴进去。4.3 修改三第二步我们在该目录下创建一个新的py文件名字为__init__.py然后在其内部导入我们的文件如下图所示。​​​​4.4 修改四第三步我门中到如下文件ultralytics/nn/tasks.py进行导入和注册我们的模块(此处只需要添加一次即可如果你用我其它的改进机制这里的步骤只需要添加一次)​​​​4.5 修改五在ultralytics/nn/tasks.py文件内的parse_model方法函数内位置大概在1500行左右按照图示位置添加即可此处需要自己有一定的判别能力如果不会可联系作者获得视频教程。​​​​4.6 修改六在ultralytics/nn/tasks.py文件内的parse_model方法函数内位置大概在1550行左右按照图示位置添加即可此处一定要对应好位置和缩进否则很容易报错。elif m in {此处填写本章代码的名字.}: c2 ch[f] args [c2, *args]五、正式训练到此就修改完成了大家可以复制下面的yaml文件运行更多使用方式可以联系作者获得使用视频本文仅列出常见的使用方式。5.1 yaml文件5.1.1 yaml文件1训练信息YOLO26-Att-ACmix summary: 269 layers, 2,523,962 parameters, 2,523,962 gradients, 6.0 GFLOPs# Ultralytics AGPL-3.0 License - https://ultralytics.com/license # Ultralytics YOLO26 object detection model with P3/8 - P5/32 outputs # Model docs: https://docs.ultralytics.com/models/yolo26 # Task docs: https://docs.ultralytics.com/tasks/detect # Parameters nc: 80 # number of classes end2end: True # whether to use end-to-end mode reg_max: 1 # DFL bins scales: # model compound scaling constants, i.e. modelyolo26n.yaml will call yolo26.yaml with scale n # [depth, width, max_channels] n: [0.50, 0.25, 1024] # summary: 260 layers, 2,572,280 parameters, 2,572,280 gradients, 6.1 GFLOPs s: [0.50, 0.50, 1024] # summary: 260 layers, 10,009,784 parameters, 10,009,784 gradients, 22.8 GFLOPs m: [0.50, 1.00, 512] # summary: 280 layers, 21,896,248 parameters, 21,896,248 gradients, 75.4 GFLOPs l: [1.00, 1.00, 512] # summary: 392 layers, 26,299,704 parameters, 26,299,704 gradients, 93.8 GFLOPs x: [1.00, 1.50, 512] # summary: 392 layers, 58,993,368 parameters, 58,993,368 gradients, 209.5 GFLOPs # YOLO26n backbone backbone: # [from, repeats, module, args] - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4 - [-1, 2, C3k2, [256, False, 0.25]] - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8 - [-1, 2, C3k2, [512, False, 0.25]] - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16 - [-1, 2, C3k2, [512, True]] - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32 - [-1, 2, C3k2, [1024, True]] - [-1, 1, SPPF, [1024, 5, 3, True]] # 9 - [-1, 2, C2PSA, [1024]] # 10 # YOLO26n head head: - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 6], 1, Concat, [1]] # cat backbone P4 - [-1, 2, C3k2, [512, True]] # 13 - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 4], 1, Concat, [1]] # cat backbone P3 - [-1, 2, C3k2, [256, True]] # 16 (P3/8-small) - [-1, 1, Conv, [256, 3, 2]] - [[-1, 13], 1, Concat, [1]] # cat head P4 - [-1, 2, C3k2, [512, True]] # 19 (P4/16-medium) - [-1, 1, Conv, [512, 3, 2]] - [[-1, 10], 1, Concat, [1]] # cat head P5 - [-1, 1, C3k2, [1024, True, 0.5, True]] # 22 (P5/32-large) - [16, 1, ACmix, []] # 23 # - [19, 1, ACmix, []] # 24 # - [22, 1, ACmix, []] # 25 # 此处的使用说法注释: 其中上面的三个注意力机制目前仅使用了23层如果你想使用24层那么就取消掉代码注释 # 并将下面检测头中的19改为24,如果想使用第25层注意力机制同理将下面检测头中的22改为25即可。 # 此处用法比较复杂如过不会联系Snu77博主获取视频教程 - [[23, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)5.1.2 yaml文件2训练信息YOLO26-C2PSA-ACmix summary: 261 layers, 2,514,964 parameters, 2,514,964 gradients, 5.8 GFLOPs# Ultralytics AGPL-3.0 License - https://ultralytics.com/license # Ultralytics YOLO26 object detection model with P3/8 - P5/32 outputs # Model docs: https://docs.ultralytics.com/models/yolo26 # Task docs: https://docs.ultralytics.com/tasks/detect # Parameters nc: 80 # number of classes end2end: True # whether to use end-to-end mode reg_max: 1 # DFL bins scales: # model compound scaling constants, i.e. modelyolo26n.yaml will call yolo26.yaml with scale n # [depth, width, max_channels] n: [0.50, 0.25, 1024] # summary: 260 layers, 2,572,280 parameters, 2,572,280 gradients, 6.1 GFLOPs s: [0.50, 0.50, 1024] # summary: 260 layers, 10,009,784 parameters, 10,009,784 gradients, 22.8 GFLOPs m: [0.50, 1.00, 512] # summary: 280 layers, 21,896,248 parameters, 21,896,248 gradients, 75.4 GFLOPs l: [1.00, 1.00, 512] # summary: 392 layers, 26,299,704 parameters, 26,299,704 gradients, 93.8 GFLOPs x: [1.00, 1.50, 512] # summary: 392 layers, 58,993,368 parameters, 58,993,368 gradients, 209.5 GFLOPs # YOLO26n backbone backbone: # [from, repeats, module, args] - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4 - [-1, 2, C3k2, [256, False, 0.25]] - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8 - [-1, 2, C3k2, [512, False, 0.25]] - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16 - [-1, 2, C3k2, [512, True]] - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32 - [-1, 2, C3k2, [1024, True]] - [-1, 1, SPPF, [1024, 5, 3, True]] # 9 - [-1, 2, C2PSA_ACmix, [1024]] # 10 # YOLO26n head head: - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 6], 1, Concat, [1]] # cat backbone P4 - [-1, 2, C3k2, [512, True]] # 13 - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 4], 1, Concat, [1]] # cat backbone P3 - [-1, 2, C3k2, [256, True]] # 16 (P3/8-small) - [-1, 1, Conv, [256, 3, 2]] - [[-1, 13], 1, Concat, [1]] # cat head P4 - [-1, 2, C3k2, [512, True]] # 19 (P4/16-medium) - [-1, 1, Conv, [512, 3, 2]] - [[-1, 10], 1, Concat, [1]] # cat head P5 - [-1, 1, C3k2, [1024, True, 0.5, True]] # 22 (P5/32-large) - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)5.2 训练代码大家可以创建一个py文件将我给的代码复制粘贴进去配置好自己的文件路径即可运行。import warnings warnings.filterwarnings(ignore) from ultralytics import YOLO if __name__ __main__: model YOLO(模型配置文件地址,也就是5.1你保存到本地文件的地址) # 如何切换模型版本, 上面的ymal文件可以改为 yolo26s.yaml就是使用的26s, # 类似某个改进的yaml文件名称为yolo26-XXX.yaml那么如果想使用其它版本就把上面的名称改为yolo26l-XXX.yaml即可改的是上面YOLO中间的名字不是配置文件的 # model.load(yolo26n.pt) # 是否加载预训练权重,科研不建议大家加载否则很难提升精度 model.train( datar数据集文件地址, # 如果大家任务是其它的ultralytics/cfg/default.yaml找到这里修改task可以改成detect, segment, classify, pose cacheFalse, imgsz640, epochs20, single_clsFalse, # 是否是单类别检测 batch16, close_mosaic0, workers0, device0, optimizerMuSGD, # using SGD/MuSGD # resume, # 这里是填写last.pt地址 ampTrue, # 如果出现训练损失为Nan可以关闭amp projectruns/train, nameexp, )5.3 训练过程截图​五、本文总结到此本文的正式分享内容就结束了在这里给大家推荐我的YOLOv26改进有效涨点专栏本专栏目前为新开的平均质量分98分后期我会根据各种最新的前沿顶会进行论文复现也会对一些老的改进机制进行补充如果大家觉得本文帮助到你了订阅本专栏关注后续更多的更新~专栏链接YOLOv26有效涨点专栏包含Conv、注意力机制、主干/Backbone、损失函数、优化器、后处理等改进机制​​​​​​​