There is a newer version of this record available.

Software Open Access

ultralytics/yolov5: v4.0 - nn.SiLU() activations, Weights & Biases logging, PyTorch Hub integration

Glenn Jocher; Alex Stoken; Jirka Borovec; NanoCode012; ChristopherSTAN; Liu Changyu; Laughing; tkianai; yxNONG; Adam Hogan; lorenzomammana; AlexWang1900; Ayush Chaurasia; Laurentiu Diaconu; Marc; wanghaoyang0106; ml5ah; Doug; Durgesh; Francisco Ingham; Frederik; Guilhen; Adrien Colmagro; Hu Ye; Jacobsolawetz; Jake Poznanski; Jiacong Fang; Junghoon Kim; Khiem Doan; Lijun Yu 于力军

This release implements two architecture changes to YOLOv5, as well as various bug fixes and performance improvements.

Breaking Changes
  • nn.SiLU() activations replace nn.LeakyReLU(0.1) and nn.Hardswish() activations used in previous versions. nn.SiLU() was introduced in PyTorch 1.7.0 (, and due to the recent timeframe certain export pipelines may be temporarily unavailable (CoreML possibly) without updates to the associated tools (i.e. coremltools).
Bug Fixes
  • Multi-GPU --resume #1810
  • leaf Variable inplace bug fix #1759
  • Various bug fixes contained in PRs #1235 through #1837
Added Functionality
  • Weights & Biases (W&B) Feature Addition #1235
  • Utils reorganization #1392
  • PyTorch Hub and autoShape update #1415
  • W&B artifacts feature addition #1712
  • Various additional feature additions contained in PRs #1235 through #1837
Updated Results

Latest models are all slightly smaller to due removal of one convolution within each bottleneck, which have been renamed as C3() modules now in light of the 3 I/O convolutions each one does vs the 4 in the standard CSP bottleneck. The previous manual concatenation and LeakyReLU(0.1) activations have both removed, simplifying the architecture, reducing parameter count, and better exploiting the .fuse() operation at inference time.

nn.SiLU() activations replace nn.LeakyReLU(0.1) and nn.Hardswish() activations throughout the model, simplifying the architecture as we now only have one single activation function used everywhere rather than the two types before.

In general the changes result in smaller models (89.0M params -> 87.7M YOLOv5x), faster inference times (6.9ms -> 6.0ms), and improved mAP (49.2 -> 50.1) for all models except YOLOv5s, which reduced mAP slightly (37.0 -> 36.8). In general the largest models benefit the most from this update. YOLOv5x in particular is now above 50.0 mAP at --img-size 640, which may be the first time this is possible at 640 resolution for any architecture I'm aware of (correct me if I'm wrong though).

<img src="" width="1000">** GPU Speed measures end-to-end time per image averaged over 5000 COCO val2017 images using a V100 GPU with batch size 32, and includes image preprocessing, PyTorch FP16 inference, postprocessing and NMS. EfficientDet data from google/automl at batch size 8.

Pretrained Checkpoints Model size AP<sup>val</sup> AP<sup>test</sup> AP<sub>50</sub> Speed<sub>V100</sub> FPS<sub>V100</sub> params GFLOPS YOLOv5s 640 36.8 36.8 55.6 2.2ms 455 7.3M 17.0 YOLOv5m 640 44.5 44.5 63.1 2.9ms 345 21.4M 51.3 YOLOv5l 640 48.1 48.1 66.4 3.8ms 264 47.0M 115.4 YOLOv5x 640 50.1 50.1 68.7 6.0ms 167 87.7M 218.8 YOLOv5x + TTA 832 51.9 51.9 69.6 24.9ms 40 87.7M 1005.3

Files (1.0 MB)
Name Size
1.0 MB Download
All versions This version
Views 12,1955,723
Downloads 22091
Data volume 504.9 MB94.9 MB
Unique views 9,6994,621
Unique downloads 18780


Cite as