Software Open Access
Glenn Jocher; Alex Stoken; Jirka Borovec; NanoCode012; ChristopherSTAN; Liu Changyu; Laughing; tkianai; yxNONG; Adam Hogan; lorenzomammana; AlexWang1900; Ayush Chaurasia; Laurentiu Diaconu; Marc; wanghaoyang0106; ml5ah; Doug; Durgesh; Francisco Ingham; Frederik; Guilhen; Adrien Colmagro; Hu Ye; Jacobsolawetz; Jake Poznanski; Jiacong Fang; Junghoon Kim; Khiem Doan; Lijun Yu 于力军
This release implements two architecture changes to YOLOv5, as well as various bug fixes and performance improvements.Breaking Changes
Latest models are all slightly smaller to due removal of one convolution within each bottleneck, which have been renamed as C3() modules now in light of the 3 I/O convolutions each one does vs the 4 in the standard CSP bottleneck. The previous manual concatenation and LeakyReLU(0.1) activations have both removed, simplifying the architecture, reducing parameter count, and better exploiting the .fuse() operation at inference time.
nn.SiLU() activations replace nn.LeakyReLU(0.1) and nn.Hardswish() activations throughout the model, simplifying the architecture as we now only have one single activation function used everywhere rather than the two types before.
In general the changes result in smaller models (89.0M params -> 87.7M YOLOv5x), faster inference times (6.9ms -> 6.0ms), and improved mAP (49.2 -> 50.1) for all models except YOLOv5s, which reduced mAP slightly (37.0 -> 36.8). In general the largest models benefit the most from this update. YOLOv5x in particular is now above 50.0 mAP at --img-size 640, which may be the first time this is possible at 640 resolution for any architecture I'm aware of (correct me if I'm wrong though).
<img src="https://user-images.githubusercontent.com/26833433/103594689-455e0e00-4eae-11eb-9cdf-7d753e2ceeeb.png" width="1000">** GPU Speed measures end-to-end time per image averaged over 5000 COCO val2017 images using a V100 GPU with batch size 32, and includes image preprocessing, PyTorch FP16 inference, postprocessing and NMS. EfficientDet data from google/automl at batch size 8.Pretrained Checkpoints Model size AP<sup>val</sup> AP<sup>test</sup> AP<sub>50</sub> Speed<sub>V100</sub> FPS<sub>V100</sub> params GFLOPS YOLOv5s 640 36.8 36.8 55.6 2.2ms 455 7.3M 17.0 YOLOv5m 640 44.5 44.5 63.1 2.9ms 345 21.4M 51.3 YOLOv5l 640 48.1 48.1 66.4 3.8ms 264 47.0M 115.4 YOLOv5x 640 50.1 50.1 68.7 6.0ms 167 87.7M 218.8 YOLOv5x + TTA 832 51.9 51.9 69.6 24.9ms 40 87.7M 1005.3