SSD算法学习及PyTorch代码分析[1]-整体框架

    xiaoxiao2022-06-30  105

    SSD(Single Shot Multibox Detector)是one-stage目标检测算法的典型代表,SSD在速度上表现不错,精度上也不差,是一个非常优秀的算法。 这里,通过SSDPyTorch代码进行分析学习。这篇主要分析SSD的整体网络,有个大致的概念。 一些用到的卷积计算公式:

    图像卷积输出大小公式(正常): o = ⌊ i − k + 2 p s ⌋ + 1. o = \left\lfloor \frac{i - k+2p}{s} \right\rfloor + 1. o=sik+2p+1.

    图像卷积输出大小公式(ceil_mode): o = ⌈ i − k + 2 p s ⌉ + 1. o = \left\lceil \frac{i - k+2p}{s} \right\rceil + 1. o=sik+2p+1.

    图像卷积输出大小公式(带空洞卷积 d d d): o = ⌈ i − k + 2 p − ( k − 1 ) ∗ ( d − 1 ) s ⌉ + 1. o = \left\lceil \frac{i - k+2p-(k-1)*(d-1)}{s} \right\rceil + 1. o=sik+2p(k1)(d1)+1.

    i i i为输入图片大小, k k k为卷积核大小, p p p为padding大小, s s s为stride大小, d d d为(空格数+1)

    1. VGG部分 {conv1_2, conv2_2, conv3_3, conv4_3, conv5_3, fc6(conv6), fc7(conv7)}

    # 这里给出输入图像的大小(C,H,W) input_size:(3, 300, 300) # conv1_2 Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace) Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace) MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) # 这里给出通过conv_2后图像计算方式和大小, 后面的image_size亦是如此 image_size:(300-2+2*0)/2+1=150 (64, 150, 150) # conv2_2 Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace) Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace) MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) image_size:(150-2+2*0)/2+1=75 (128, 75, 75) # conv3_3 Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace) Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace) Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace) MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=True) image_size: ceil[(75-2+2*0)/2+1]=38 (256, 38, 38) # conv4_3 Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace) Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace) Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))#--> ReLU(inplace) MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) image_size:(38-2+2*0)/2+1=19 (512, 19, 19) # conv5_3 Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace) Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace) Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ReLU(inplace) MaxPool2d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False) image_size:(19-3+2*1)/1+1=75 (64, 19, 19) # conv6,空洞卷积 Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(6, 6), dilation=(6, 6)) ReLU(inplace) image_size:(19-3+2*6-(3-1)*(6-1)/1+1=19 (1024, 19, 19) # conv7 Conv2d(1024, 1024, kernel_size=(1, 1), stride=(1, 1))#--> ReLU(inplace) image_size:(19-1+2*0)/1+1=19 (1024, 19, 19)

    2. Extra Feature Layers{conv8_2, conv9_2, conv10_2, conv11_2}

    input_size:(19,19) # conv8_2 Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1)) Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) #--> image_size:(19-3+2*1)/2+1=10 (10,10) # conv9_2 Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1)) Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))#--> image_size: (10-3+2*1)/2+1=5 (5,5) # conv10_2 Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1)) Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1))#--> image_size: (5-3+2*0)/1+1=3 (3,3) # conv11_2 Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1)) Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1))#--> image_size: (3-3+2*0)/2+1=1 (1,1)

    其中#-->表示连接到detections层,做定位与置信度分类层

    3. Loc Layer

    Conv2d(512, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) Conv2d(1024, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) Conv2d(512, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) Conv2d(256, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) Conv2d(256, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) Conv2d(256, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

    4. Conf Layer

    Conv2d(512, 84, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) Conv2d(1024, 126, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) Conv2d(512, 126, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) Conv2d(256, 126, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) Conv2d(256, 84, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) Conv2d(256, 84, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

    最新回复(0)